#DISUMMIT – #machinelearning to improve ranking system of schools by Fritz Schiltz

 

DiS18_Speakers_Fritz SchiltzPresenting Fritz Schiltz our youngest speaker of #disummit – he will talk about using #machinelearning to improve the ranking system of schools.

Fritz is an applied econometrician at the University of Leuven where he applies advanced analytics to evaluate policies, mainly in education. He has worked on reports for the European Union, the Ministry of Education and Syntra. Halfway his PhD in Economics he shifted his interests towards machine learning methods. His presentation is the result from joint work with the Bank of Italy and illustrates how machine learning or AI methods can be used to improve school rankings using an Italian dataset.

Machine learning methods can be used to improve the assessment of school quality. School rankings based on value-added (VA) estimates are subject to prediction errors, since VA is defined as the difference between predicted and actual scores of students. More accurate predictions result in more informative school rankings, and better policies. We introduce a more flexible random forest (RF), rooted in the machine learning literature, to minimize prediction errors and to improve school rankings. Monte Carlo simulations demonstrate the advantages of this approach. Applying the proposed method to administrative data on Italian middle schools indicates that school rankings are sensitive to prediction errors, even when extensive controls are added.

 

Analytics: Lessons Learned from Winston Churchill

chrurchill

I had the pleasure to be invited for lunch by Prof. Baessens earlier this week and we talked about a next meetup subject that could be ‘War and Analytics’. As you might know Bart  is a WWI fanatic and he has already written a nice article on the subject called ‘Analytics: Lessons Learned from Winston Churchill’

here is the article—

Nicolas Glady’s Activities

Activities Overview‎ > ‎Online articles‎ > ‎ Analytics: Lessons Learned from Winston Churchill

Analytics has been around for quite some time now.  Even during World War II, it proved critical for the Allied victory. Some famous examples of allied analytical activities include the decoding of the enigma code, which effectively removed the danger of submarine warfare, and the 3D reconstruction of 2D images shot by gunless Spitfires, which helped Intelligence at RAF Medmenham eliminate the danger of the V1 and V2 and support operation Overlord. Many of the analytical lessons learned at that time are now more relevant than ever, in particular those provided by one of the great victors of WWII, then Prime Minister, Sir Winston Churchill.

The phrase “I only believe in statistics that I doctored myself” is often attributed to him. However, while its wit is certainly typical of the Greatest Briton, it was probably a Nazi Propaganda invention. Even so, can Churchill still teach us something about statistical analyses and Analytics?

 

A good analytical model should satisfy several requirements depending upon the application area and follow a certain process. The CRISP-DM, a leading methodology to conduct data-driven analysis, proposes a structured approach: understand the business, understand the data, prepare the data, design a model, evaluate it, and deploy the solution. The wisdom of the 1953 Nobel Prize for literature can help us better understand this process.

Have an actionable approach: aim at solving a real business issue

Any analytics project should start with a business problem, and then provide a solution. Indeed, Analytics is not a purely technical, statistical or computational exercise, since any analytical model needs to be actionable. For example, a model can allow us to predict future problems like credit card fraud or customer churn rate. Because managers are decision-makers, as are politicians, they need “the ability to foretell what is going to happen tomorrow, next week, next month, and next year… And to have the ability afterwards to explain why it didn’t happen.” In other words, even when the model fails to predict what really happened, its ability to explain the process in an intelligible way is still crucial.

In order to be relevant for businesses, the parties concerned need first to define and qualify a problem before analysis can effectively find a solution. For example, trying to predict what will happen in 10 years or more makes little sense from a practical, day-to-day business perspective: “It is a mistake to look too far ahead. Only one link in the chain of destiny can be handled at a time.”  Understandably, many analytical models in use in the industry have prediction horizons spanning no further than 2-3 years.

Understand the data you have at your disposal

There is a fairly large gap between data and comprehension. Churchill went so far as to argue that “true genius resides in the capacity for evaluation of uncertain, hazardous, and conflicting information.”  Indeed, Big Data is complex and is not a quick-fix solution for most business problems. In fact, it takes time to work through and the big picture might even seem less clear at first. It is the role of the Business Analytics expert to really understand the data and know what sources and variables to select.

Prepare the data

Once a complete overview of the available data has been drafted, the analyst will start preparing the tables for modelling by consolidating different sources, selecting the relevant variables and cleaning the data sets. This is usually a very time-consuming and tedious task, but needs to be done: “If you’re going through hell, keep going.”

Never forget to consider as much past historical information as you can. Typically, when trying to predict future events, using past transactional data is very relevant as most of the predictive power comes from this type of information. “The longer you can look back, the farther you can look forward.”

read more here

Job – KUL – Liir – 4 open PhD positions and 1 open postdoc position!

LIIR Research Team

The Language Intelligence & Information Retrieval (LIIR) research team of the Katholieke Universiteit Leuven, Belgium has 4 open PhD positions and 1 open postdoc position!

  • Phd- Deep learning for natural language understanding
  • Phd- Multimodal querying for mobile search
  • Phd- Hierarchical text classification with a large number of categories
  • Phd- Spatio-temporal information extraction from non-well formed texts
  • Postdoc- Text Mining of Biomedical Texts

LIIR is part of the Human Computer Interaction (HCI) unit in the Department of Computer Science. The research of the LIIR group strives to develop a general framework for information processing of texts. The core of the research regards problems of information retrieval, extraction, linking, summarization and search focused on the textual medium in a multimedia context often involving “big data” sets. Fundamental problems with regard to text understanding, mining, indexing, and retrieval models are studied. The developed technologies are applied in the domains of news, business intelligence, bioinformatics, police and intelligence services, legal documents, electronic mail, user generated content, and the World Wide Web. LIIR encourages research on the interrelation between information retrieval and other disciplines, especially computational linguistics, machine learning, data mining, automated reasoning, and multimedia processing.

Under the supervision of Marie-Francine Moens the team participates in several research projects sponsored by the European Commission, Belgian and Flemish governments and academic research institutions. In some projects the group collaborates with important industrial partners.

Apply:

Make sure that you are a member of the Brussels Data Science Community linkedin group before you apply. Join  here.

Please note that we also manage other vacancies that are not public, if you want us to bring you in contact with them too, just send your CV to datasciencebe@gmail.com .

More details about the jobs at the KUL LIIR can be found following this link: http://liir.cs.kuleuven.be/jobs.php

Meetup – Data Sciences and Banking – VUBrussels – Wednesday 20/5

meetup annuncement

Join over 200 professionals this Wednesday (20/5) for your monthly Meetup – VUB – Aula QD – 18:30.

Data Science in the Banking world

Bart Hamers has composed a splendid agenda for this meetup:

We will start at 18:30 with an activity update:

  • 10 weeks applied datascience bootcamp starting in October,
  • our visit to Strata London,
  • our new 400m² office Space and training centre,
  • about our team winning 3rd prize in the epilepsy hackathon organized by UCB
  • more about the success of our our Mooc coaching activities,
  • something about our coming trip to California  and much more …
  • Call for candidates to set up a management team for next season
  • Call for presentation for next Meetup about privacy, security etc …

Followed by Presentations from:

• Introduction by Bart Hamers (Dexia), Data Sciences in Banking

KUL: Bart Baesens/Véronique Van Vlasselaer: Gotch’all! Advanced Network Analysis for Detecting Groups of Fraud

• Euroclear : more details about the  projects presented during the Data Innovation Summit

• ING : Data Science and (Advanced) Predictive Analytics @ ING

•  Data Science Governance, my 6 principles. Bart Hamers

Open presentation:

• Belgian startup track: Presentation of Data Camp

• Any member is welcome to present a topic , share valuable insight at the end of each meetup.

Networking:

And our usual networking sessions will be held in the Kulturkaffee.

See you this Wednesday at the VUB !

Registrations:

  • The meetups of the Brussels Data Science Community are free for our members.
  • You can join our Linkedin Group here.
  • Registration is done on our meetup page here.

Data Science in the Banking world

Wednesday, May 20, 2015, 6:30 PM

VUB – Aula QD
Pleinlaan 2B – 1050, Brussels Brussels, BE

209 Business & Data Science pro’s Attending

Data Science in the Banking worldWe have a splendid agenda:Activity update:• about the 10 weeks bootcamp in October, about our visit to Strata London, about our new office Space and training centre, about winning 3rd prize in the epilepsy hackathon organized by UCB, about the success of our our Mooc coaching activities, about our coming trip to …

Check out this Meetup →

Free webinar on Analytics in a Big Data World by Bart Baesens.

Baesens_Bart_small

Nice overview on how analytics and datasciences are used in a bigdata world.

Professor Baesens will be present at the Data Innovation Summit on March 26th in Brussels.

He will present his latest book about Big Data Science.

Join us, you can get your free full day access pass when you  answer the Data Innovation Survey 2015.

Enjoy the webinar …

Join our next event

Please register using this meetup page:

Summit: Data Innovation Summit – Made in Belgium

Thursday, Mar 26, 2015, 8:00 AM

AXA building
boulevard du Souverain, 25 Watermael-Boitsfort, BE

450 Business & Data Science pro’s Attending

Toon Vanagt- Laurent Fayet – Filip Maertens – Kris Peeters – Vincent Blondel –David Martens – Hans ConstandtThe Data Innovation Summit in Brussels is a one day conference gathering all the Belgian actors facilitating data innovation. It is an action packed conference where more than 50 speakers will demonstrate what they do that helps us compete i…

Check out this Meetup →

New Survey and (Big) Data Governance Research by Andra Mertilos

data-governance

Dear friends from the Brussels Data Science community, I would like to ask for your help in conducting my master thesis research (aside from my consultancy job, I am also an MBA student at the KU Leuven) which I launched recently.

Data governance research is ambiguous in the scientific community today, mostly due to the differences existing between the concepts which form the building blocks of a governance program : data and information, governance and management, IT and business labels,…As such, there exists no homogeneous definition nor in the scientific community, nor in the practitioner one as to what data governance really encompasses. Correctly positioning data governance in todays landscape will allow for the integration of new technologies, concepts and phenomena. But for this positioning to take place, specifying a common data governance definition proves to be crucial in determining and isolating the different elements which constitute the backbone of such programs. Identifying, defining and explaining the process layers, responsibilities and decision-making structures that come together and interact in governance topics allows for prioritizing and ranking the elements which constitute a data governance program. These layers allow in return for tailoring to specific needs and requirements such as integration of new concepts and phenomena like big data technologies.

The purpose of my research is specifically to build a (big) data governance maturity model which holds for the Belgian financial sector. I choose the banking sector because of its size and complexity: such an intricate environment allows for building a larger, more comprehensive data governance model which could further be tailored to fit to other sectors.

For this purpose, based on an extensive literature review, I have built a questionnaire, in the form of a maturity assessment, which evaluates different data governance aspects : MDM, enterprise architecture, technology, applications, big data initiatives etc.

I would like to draw on your input, experience and expertise for the evaluation of this model in practice by inviting you to participate in my survey. A possible prerequisite would be to have some former experience in the banking/financial sector (in any form pertaining to data programs) but as such, it is not an exclusion criteria, as I plan to use these answers to develop a standard capability maturity model for data governance. If you choose to participate please bear in mind that you don’t have to be “able” to answer all sections – the questionnaire dealing with multiple aspects of governance – your feedback on a singular issue being also of extreme value to this research.

The link to survey can be found here: https://qtrial2014az1.az1.qualtrics.com/jfe/form/SV_1B4wOzh3GVKTnoh

It takes an average of 10-15 minutes to complete the survey and the link will be available until begin of April.

Feel free to contact me should you have any questions or remarks (at mertilosandra@gmail.com), any feedback is great feedback 🙂
Many thanks in advance for your participation.

About Andra Mertilos :

Andra Mertilos

Andra Mertilos

mertilosandra@gmail

Thank you for your help in conducting my master thesis research

I am a MBA student at the KU Leuven

The link to survey can be found here: https://qtrial2014az1.az1.qualtrics.com/jfe/form/SV_1B4wOzh3GVKTnoh

Training – KULeuven – Data Science in Practice – 5-6 February 2015

Data Science in Practice

INTRODUCTION

Modern information and communication technology is increasingly capable of collecting and generating large amounts of data that need to be analyzed to become useful or profitable. In fact, these amounts quickly become too large for immediate human understanding, leading to a situation in which “we are drowning in data but starved for knowledge”.

Data science represents an essential technology to transform such data into knowledge. It allows the automated discovery of interesting regularities or anomalies in large databases, thereby surpassing standard statistical summarizing. Typical tasks include the construction of predictive and descriptive models for classification, regression, clustering, associations, and probabilistic inference.

The DTAI research group of the department of Computer Science, KU Leuven, presents a course that provides a gentle introduction to data science for professionals who need to analyze data themselves, interpret results obtained using data science techniques, or give guidance to data analysts. The course introduces the principles, techniques and methodology of data science. It provides the attendants with an overview of the wide variety of data science techniques available, insight in which techniques are useful for what kind of tasks, expertise with practical data science tools, and real-life case studies.

The target audience of this course consists of professionals who experience a need for a better understanding of data science: which tasks can be solved, which techniques can be used, which are their strengths and weaknesses.

IMPORTANT DATES

Registration deadline: 20 January 2015
Course: 5-6 February 2015

Click here to register.

More info: http://dsip.cs.kuleuven.be/

 

 

Sneak preview – Mooc – Bart Baesens – Credit Risk Analytics

Baesens_Bart_small     Big Data World
I had a nice lunch with Prof. Dr Bart Baesens today at the MIM to discuss his recent book ‘Analytics in a Big Data World: The Essential Guide to Data Science and its Applications’
One topic we discussed was knowledge transfer and certification.
Next to the recorded presentations already available on dataminingapps.com, the professor told me that his new course about Credit Risk Analytics would soon be released. Here is for you, in avant première, the content of this course that he has put together with SAS. This course will be available mid November 2014.

New e-learning course Credit Risk Analytics by professor Bart Baesens

The outline of the course is as follows:
Lesson 1: Introduction to Credit Scoring
Lesson 2: The Basel Capital Accords
Lesson 3: Preparing the data for credit scoring
Lesson 4: Classification for credit scoring
Lesson 5: Measuring the Performance of Credit Scoring Classification Models
Lesson 6: Variable Selection for Classification
Lesson 7: Issues in Scorecard Construction
Lesson 8: Defining Default Ratings and Calibrating PD
Lesson 9: LGD modeling
Lesson 10: EAD modeling
Lesson 11: Validation of Credit Risk Models
Lesson 12: Low Default Portfolios
Lesson 13: Stress testing
You are invited to send an email to Bart.Baesens@gmail.com if interested in more information.