Analytics: Lessons Learned from Winston Churchill

chrurchill

I had the pleasure to be invited for lunch by Prof. Baessens earlier this week and we talked about a next meetup subject that could be ‘War and Analytics’. As you might know Bart  is a WWI fanatic and he has already written a nice article on the subject called ‘Analytics: Lessons Learned from Winston Churchill’

here is the article—

Nicolas Glady’s Activities

Activities Overview‎ > ‎Online articles‎ > ‎ Analytics: Lessons Learned from Winston Churchill

Analytics has been around for quite some time now.  Even during World War II, it proved critical for the Allied victory. Some famous examples of allied analytical activities include the decoding of the enigma code, which effectively removed the danger of submarine warfare, and the 3D reconstruction of 2D images shot by gunless Spitfires, which helped Intelligence at RAF Medmenham eliminate the danger of the V1 and V2 and support operation Overlord. Many of the analytical lessons learned at that time are now more relevant than ever, in particular those provided by one of the great victors of WWII, then Prime Minister, Sir Winston Churchill.

The phrase “I only believe in statistics that I doctored myself” is often attributed to him. However, while its wit is certainly typical of the Greatest Briton, it was probably a Nazi Propaganda invention. Even so, can Churchill still teach us something about statistical analyses and Analytics?

 

A good analytical model should satisfy several requirements depending upon the application area and follow a certain process. The CRISP-DM, a leading methodology to conduct data-driven analysis, proposes a structured approach: understand the business, understand the data, prepare the data, design a model, evaluate it, and deploy the solution. The wisdom of the 1953 Nobel Prize for literature can help us better understand this process.

Have an actionable approach: aim at solving a real business issue

Any analytics project should start with a business problem, and then provide a solution. Indeed, Analytics is not a purely technical, statistical or computational exercise, since any analytical model needs to be actionable. For example, a model can allow us to predict future problems like credit card fraud or customer churn rate. Because managers are decision-makers, as are politicians, they need “the ability to foretell what is going to happen tomorrow, next week, next month, and next year… And to have the ability afterwards to explain why it didn’t happen.” In other words, even when the model fails to predict what really happened, its ability to explain the process in an intelligible way is still crucial.

In order to be relevant for businesses, the parties concerned need first to define and qualify a problem before analysis can effectively find a solution. For example, trying to predict what will happen in 10 years or more makes little sense from a practical, day-to-day business perspective: “It is a mistake to look too far ahead. Only one link in the chain of destiny can be handled at a time.”  Understandably, many analytical models in use in the industry have prediction horizons spanning no further than 2-3 years.

Understand the data you have at your disposal

There is a fairly large gap between data and comprehension. Churchill went so far as to argue that “true genius resides in the capacity for evaluation of uncertain, hazardous, and conflicting information.”  Indeed, Big Data is complex and is not a quick-fix solution for most business problems. In fact, it takes time to work through and the big picture might even seem less clear at first. It is the role of the Business Analytics expert to really understand the data and know what sources and variables to select.

Prepare the data

Once a complete overview of the available data has been drafted, the analyst will start preparing the tables for modelling by consolidating different sources, selecting the relevant variables and cleaning the data sets. This is usually a very time-consuming and tedious task, but needs to be done: “If you’re going through hell, keep going.”

Never forget to consider as much past historical information as you can. Typically, when trying to predict future events, using past transactional data is very relevant as most of the predictive power comes from this type of information. “The longer you can look back, the farther you can look forward.”

read more here

Event – IABE – Big Data – December 3rd

logo

Dear Mr. Van Impe,

On behalf of the IABE (institute of actuaries from Belgium), we are pleased to inform you that an exciting and relevant seminar on “Big Data” is to be held on December 3rd , 2015. Eminent speakers coming from both the academic world and the business side are gathered. The seminar is set to bring insight on Big Data and how it could impact the insurance sector. The seminar will cover theoretical and practical aspects. We are sure it will provide an enriching experience for all our participants. http://www.iabe.be/nl/iabe-forum-big-data

Would you mind sharing the invitation with whom it would be of interest?

Kind regards,
Yasmine Nouri
(member of IABE and co-organizer of the event),

Here is the agenda:

  • 13h30 : Welcome by Jean-François Hannosset, President of the IA|BE
  • 13h35 : Introduction to Big Data by Mateusz Maj.
  • 14h00 : Data science : “State of the art” Professor dr. Bart Baesens an associate professor at KU Leuven, and a guest lecturer at the University of Southampton (United Kingdom). 
  • 14h40 : Overview of (a) practical application by Jo Coutuer 
  • ——————– 15h20 : Coffee break———–
  • 15h40 : Overview of (a) practical application by Jean-Philippe Schepens  
  • 16h20 : Legal framework around Big Data by Laurens Naudts is a legal researcher at the KU Leuven 
  • 17h00 : Panel discussion : experts insurance companies
    Jean-Claude Debussche –  Pierre Ars –  Dries De Dauw –  Marieke Geeraert – Yasmine Nouri
  • 17h55 : Conclusion
  • ——————– 18h00 : cocktail ————

Book – Fraud Analytics by Veronique, Bart and Wouter available on Amazon

bart en veerle 2Fraud Analytics

Using descriptive, predictive and social network techniques.

by Veronique Van Vlasselaer, Bart Baesens and Wouter Verbeke.

We are please to announce that the book about Fraud Analytics is now available for purchase on amazon.

Here is the full video of this presentation:

bart en veerlebart en veerle 1

Hiring Data Scientists: What to Look for?

datascienceapps

Check out the latest article in Data Science Briefings, the newsletter with updates on the latest news, trends, techniques, tools and our research in data mining and analytics.

Feel free to send this newsletter along to friends or colleagues; they can subscribe for free as well through our subscribe page. We keep an online record of our feature articles and QA’s over at www.dataminingapps.com as well, in case you want to catch up if you’re just joining us.

Kindest regards,
Prof. dr. Bart Baesens
Dr. Seppe vanden Broucke

Feature Article: Hiring Data Scientists: What to Look for?

Contributed by: Bart Baesens, Richard Weber, Cristián BravoSeppe vanden Broucke

In this column, we would like to elaborate on the key characteristics of a good data scientist from the perspective of the hiring manager. Big data and analytics are all around these days. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90% of the data in the world has been created in the last two years. Gartner projects that during 2015, 85% of Fortune 500 organizations will be unable to exploit big data for competitive advantage and about 4.4 million jobs will be created around big data. Although these estimates should not be interpreted in absolute sense, they are a strong indication of the ubiquity of big data and the strong need for analytical skills and resources, because as the data piles up, managing and analyzing these data resources in the best way become critical success factors in creating competitive advantage and strategic leverage.

To address these challenges, companies are hiring data scientists. However, in the industry, there are strong misconceptions and disagreements about what constitutes a good data scientist.

In this article, we will discuss the key characteristics of what makes up a good data scientist. It is based upon the authors’ consulting and research experience, having collaborated with many companies world-wide on the topic of big data and analytics.

A data scientist should be a good programmer

As per definition, data scientists work with data. This involves plenty of activities such as sampling and preprocessing of data, model estimation and post-processing (e.g. sensitivity analysis, model deployment; backtesting, model validation). Although many user-friendly software tools are on the market nowadays to automate this, every analytical exercise requires tailored steps to tackle the specificities of a particular business problem. In order to successfully perform these steps, programming needs to be done. Hence, a good data scientist should possess sound programming skills in e.g. R, Python, SAS, … The programming language itself is not that important as such, as long as he/she is familiar with the basic concepts of programming and knows how to use these to automate repetitive tasks or perform specific routines.

A data scientist should have solid quantitative skills

Obviously, a data scientist should have a thorough background in statistics, machine learning and/or data mining. The distinction between these various disciplines is getting more and more blurred and is actually not that relevant. They all provide a set of quantitative techniques to analyze data and find business relevant patterns within a particular context (e.g. risk management, fraud detection, marketing analytics, …). The data scientist should be aware of which technique can be applied when and how. He/she should not focus too much on the underlying mathematical (e.g. optimization) details but rather have a good understanding of what analytical problem a technique solves, and how its results should be interpreted. In this, the local formation of engineers in computer science and business/industrial engineering should aim at an integrated, multidisciplinary view, with recent grads formed in both the use of the techniques, and with the business acumen necessary to bring new endeavors to fruition. Also important in this context is to spend enough time validating the analytical results obtained so as to avoid situations often referred to as data massage and/or data torture whereby data is (intentionally) misrepresented and/or too much focus is spent discussing spurious correlations. When selecting the optimal quantitative technique, the data scientist should take into account the specificities of the business problem. Typical requirements for analytical models are:

  • actionability (to what extent is the analytical model solving the business problem?),
  • performance (what is the statistical performance of the analytical model?),
  • interpretability (can the analytical model be easily explained to decision makers?),
  • operational efficiency (how much efforts are needed to setup, evaluate and monitor the analytical model?),
  • regulatory compliance (is the model in line with regulation?)
  • and economical cost (what is the cost of setting up, running and maintaining the model?).

Based upon a combination of these requirements, the data scientist should be capable of selecting the best analytical technique to solve the business problem.

A data scientist should excel in communication and visualization skills

Like it or not, but analytics is a technical exercise. At this moment, there is a huge gap between the analytical models and the business users. To bridge this gap, communication and visualization facilities are key! Hence, a data scientist should know how to represent analytical models and their accompanying statistics and reports in user-friendly ways using e.g. traffic light approaches, OLAP (on-line analytical processing) facilities, If-then business rules, … He/she should be capable of communicating the right amount of information without getting lost into complex (e.g. statistical) details which will inhibit a model’s successful deployment. By doing so, business users will better understand the characteristics and behavior in their (big) data which will improve their attitude towards and acceptance of the resulting analytical models. Educational institutions must learn to balance, since it is known that many academic degrees form students that are skewed to either too much analytical or too much practical knowledge.

A data scientist should have a solid business understanding

While this might be obvious, we have witnessed (too) many data science projects that failed since the respective analyst did not understand the business problem at hand. By “business” we refer to the respective application area, which could be e.g. churn prediction or credit scoring in a real business context or astronomy or medicine if the respective data to be analyzed stem from such areas.

A data scientist should be creative

A data scientist needs creativity on at least two levels. First, on a technical level, it is important to be creative with regard to feature selection, data transformation and cleaning. These steps of the standard knowledge discovery process have to be adapted to each particular application and often the “right guess” could make a big difference. Second, big data and analytics is a fast evolving field! New problems, technologies and corresponding challenges pop up on an ongoing basis. It is important that a data scientist keeps up with these new evolutions and technologies and has enough creativity to see how they can create new business opportunities.

Conclusion

In this article, we provided a brief overview of characteristics to be looked for when hiring data scientists. To summarize, given the multidisciplinary nature of big data and analytics, a data scientist should possess a mix of skills: programming, quantitative modelling, communication and visualization, business understanding, and creativity! The below given figure shows how to represent such a profile.

datascience

Do you also wish to contribute to Data Science Briefings? Shoot us an e-mail (just reply to this one) and let’s get in touch!

Meetup – Data Sciences and Banking – VUBrussels – Wednesday 20/5

meetup annuncement

Join over 200 professionals this Wednesday (20/5) for your monthly Meetup – VUB – Aula QD – 18:30.

Data Science in the Banking world

Bart Hamers has composed a splendid agenda for this meetup:

We will start at 18:30 with an activity update:

  • 10 weeks applied datascience bootcamp starting in October,
  • our visit to Strata London,
  • our new 400m² office Space and training centre,
  • about our team winning 3rd prize in the epilepsy hackathon organized by UCB
  • more about the success of our our Mooc coaching activities,
  • something about our coming trip to California  and much more …
  • Call for candidates to set up a management team for next season
  • Call for presentation for next Meetup about privacy, security etc …

Followed by Presentations from:

• Introduction by Bart Hamers (Dexia), Data Sciences in Banking

KUL: Bart Baesens/Véronique Van Vlasselaer: Gotch’all! Advanced Network Analysis for Detecting Groups of Fraud

• Euroclear : more details about the  projects presented during the Data Innovation Summit

• ING : Data Science and (Advanced) Predictive Analytics @ ING

•  Data Science Governance, my 6 principles. Bart Hamers

Open presentation:

• Belgian startup track: Presentation of Data Camp

• Any member is welcome to present a topic , share valuable insight at the end of each meetup.

Networking:

And our usual networking sessions will be held in the Kulturkaffee.

See you this Wednesday at the VUB !

Registrations:

  • The meetups of the Brussels Data Science Community are free for our members.
  • You can join our Linkedin Group here.
  • Registration is done on our meetup page here.

Data Science in the Banking world

Wednesday, May 20, 2015, 6:30 PM

VUB – Aula QD
Pleinlaan 2B – 1050, Brussels Brussels, BE

209 Business & Data Science pro’s Attending

Data Science in the Banking worldWe have a splendid agenda:Activity update:• about the 10 weeks bootcamp in October, about our visit to Strata London, about our new office Space and training centre, about winning 3rd prize in the epilepsy hackathon organized by UCB, about the success of our our Mooc coaching activities, about our coming trip to …

Check out this Meetup →

Free webinar on Analytics in a Big Data World by Bart Baesens.

Baesens_Bart_small

Nice overview on how analytics and datasciences are used in a bigdata world.

Professor Baesens will be present at the Data Innovation Summit on March 26th in Brussels.

He will present his latest book about Big Data Science.

Join us, you can get your free full day access pass when you  answer the Data Innovation Survey 2015.

Enjoy the webinar …

Join our next event

Please register using this meetup page:

Summit: Data Innovation Summit – Made in Belgium

Thursday, Mar 26, 2015, 8:00 AM

AXA building
boulevard du Souverain, 25 Watermael-Boitsfort, BE

450 Business & Data Science pro’s Attending

Toon Vanagt- Laurent Fayet – Filip Maertens – Kris Peeters – Vincent Blondel –David Martens – Hans ConstandtThe Data Innovation Summit in Brussels is a one day conference gathering all the Belgian actors facilitating data innovation. It is an action packed conference where more than 50 speakers will demonstrate what they do that helps us compete i…

Check out this Meetup →

Sponsors of the Data Innovation Summit – Brussels

This event is organized by

Brussels Data Science Community

We love doing data for good

Structural Summit Partners

Axa  Agoria  Euroclear

Academic Partners

ucl_logo KUL ULB  Ugent   ULG   UMONS   UNamur   Universiteit Antwerpen   vub_0

Summit Sponsors

sas-logo Keyrus

Exhibitors

Business Insight   logo_businessdecisionBig-Industries-stamp-logo  Arrow-Group2    finaxys  datalayer   Infofarm   Keyrus   mathworks microstrategy   neo4j   pépite    sentianceDeloitte-logoRIAInformatica-logo Dataminded_Logo_transparant_96px_0

Job-flash partners

Webinar – Bart Baesens – State of the Art in Credit Risk Analytics

Bart Baesens   Professor of Big Data & Analytics

   https://lnkd.in/d2UQEzU

more about Bart on : http://www.dataminingapps.com/

 

 

 

New e-learning course Credit Risk Analytics by professor Bart Baesens

Baesens_Bart_small     Big Data World

Beste mensen van de Brussels Data Science Community,
Na 6 maanden werk eraan, is het zover!
Mijn E-learning cursus staat online op:
Laat gerust weten als je nog vragen zou hebben.
Vriendelijke groeten,
Bart

Prof. Dr. Bart Baesens
Faculty of Economics and Business
KU Leuven
Naamsestraat 69
B-3000 Leuven
Belgium

www.dataminingapps.com

Master of Information Management

————–

New e-learning course Credit Risk Analytics by professor Bart Baesens

The outline of the course is as follows:
Lesson 1: Introduction to Credit Scoring
Lesson 2: The Basel Capital Accords
Lesson 3: Preparing the data for credit scoring
Lesson 4: Classification for credit scoring
Lesson 5: Measuring the Performance of Credit Scoring Classification Models
Lesson 6: Variable Selection for Classification
Lesson 7: Issues in Scorecard Construction
Lesson 8: Defining Default Ratings and Calibrating PD
Lesson 9: LGD modeling
Lesson 10: EAD modeling
Lesson 11: Validation of Credit Risk Models
Lesson 12: Low Default Portfolios
Lesson 13: Stress testing
You are invited to send an email to Bart.Baesens@gmail.com if interested in more information.

Sneak preview – Mooc – Bart Baesens – Credit Risk Analytics

Baesens_Bart_small     Big Data World
I had a nice lunch with Prof. Dr Bart Baesens today at the MIM to discuss his recent book ‘Analytics in a Big Data World: The Essential Guide to Data Science and its Applications’
One topic we discussed was knowledge transfer and certification.
Next to the recorded presentations already available on dataminingapps.com, the professor told me that his new course about Credit Risk Analytics would soon be released. Here is for you, in avant première, the content of this course that he has put together with SAS. This course will be available mid November 2014.

New e-learning course Credit Risk Analytics by professor Bart Baesens

The outline of the course is as follows:
Lesson 1: Introduction to Credit Scoring
Lesson 2: The Basel Capital Accords
Lesson 3: Preparing the data for credit scoring
Lesson 4: Classification for credit scoring
Lesson 5: Measuring the Performance of Credit Scoring Classification Models
Lesson 6: Variable Selection for Classification
Lesson 7: Issues in Scorecard Construction
Lesson 8: Defining Default Ratings and Calibrating PD
Lesson 9: LGD modeling
Lesson 10: EAD modeling
Lesson 11: Validation of Credit Risk Models
Lesson 12: Low Default Portfolios
Lesson 13: Stress testing
You are invited to send an email to Bart.Baesens@gmail.com if interested in more information.