Launching The Dengue Hackathon

On October 11th, the diHub hosted the launch event for the hackathon, taking place on November 25th and 26th.Each Tuesday leading up to the hackathon, you’re welcome to join our meetings at the diHub to discuss the data we’ve gathered and prepare for the hackathon. You can learn more about our upcoming events on our meetup page.

We were lucky enough to have the following speakers present: Serge Masyn from Janssen (Pharmaceutical company of Johnson and Johnson), Dr. Guillermo Herrera-Taracena from Johnson & Johnson, Anne-Mieke Vandamme, a professor at KU Leuven, Daniel Balog, Stefan Pauwels, and Tom Crauwels from Luciad, Jeroen Dries from Vito, Guy Hendrickx from Avia-GIS, and Pierre Marchand from Teradata.

Annelies Baptist, bootcamp participant and project manager for the hackathon, opened the presentation by explaining the importance of our hackathon and fighting the spread of dengue, and ended by introducing the rest of our speakers.

Copy of IMG_2554.jpg

Serge Masyn, director at Janssen Global Public Health, presented Janssens three goals for the hackathon: to raise awareness about global public health, to raise awareness on dengue, and to try to create new insights into the spread of dengue and predictions into future outbreaks. A year ago, this initiative was only an idea, and Serge was pleased to see how much progress we’ve made toward making it a reality (here is a video from the March 2016 di-Summit, where Serge announced Janssen’s desire to sponsor what would become this very dengue hackathon).

Copy of IMG_2444.jpg

Serge then introduced Dr. Guillermo Herrera-Taracena, the global clinical leader on infectious diseases and global public health for Johnson & Johnson. Guillermo is an engaging and enthusiastic speaker, and he made a point to emphasize the importance of this work to global health at large. After the ebola outbreak, Zika took its place in the public perception as the leading global health concern. Though Dengue is a serious public health burden in it’s own right, Zika, Guillermo claimed, is a cousin, if not a brother, of the Dengue virus, and both diseases are carried by the same species of mosquito. Whatever you do to understand Zika, you’ve done for Dengue, and vice versa. If that isn’t a good enough reason to work on Dengue, he said, he wasn’t sure what is.

Copy of IMG_2487.jpg

Anne-Mieke Vandamme, a professor at KU Leuven and head of the Laboratory of Clinical and Epidemiological Virology called in from Lisbon to give a talk about mapping epidemics. Using phylogenetic trees, scientists can reconstruct the origin and development of a virus outbreak. After her presentation, she introduced Daniel Balog, a senior software engineer at Luciad who she had previously collaborated with. Daniel gave a demo using Luciad software showing an animation of the Ebola outbreak in Sierra Leone, Liberia, and Guinea.

Copy of IMG_2628.jpg

Then, Stefan Pauwels and Tom Crauwels gave a demo of the software products from Luciad. Though most of their software is geared toward military and aviation use, the technology that makes visualizing position updates every second for millions of points possible has applications beyond the scopes of those industries. For the hackathon, Luciad will be offering the free use of their software, and will also provide a training workshop in preparation for the event.

Copy of IMG_2687.jpgTom Crauwels

Stefan Pauwels

Jeroen Dries from Vito, then discussed how data satellite pictures can be used for the hackathon to fight dengue. Vito operates a Belgian satellite to take daily images to create a time series, combining these images to create a global time series analysis of how an area has been evolving. They’ve built an application focused on these time series that includes meteorological data from each country, which is of particular importance for the hackathon. For this event, Vito will provide us with a cloud platform that has access to a Hadoop cluster for processing their satellite data.

IMG_2731 (1).jpg

Guy Hendrickx from Avia-GIS presented their research on Dengue, where they mapped the Tiger mosquito. In the 90’s, Guy was one of the first people to use satellite data to model tsetse fly distribution and the diseases they transmit. In 2010 for the European Center for Disease Control, Avia-GIS began developing a database for the network of mosquitos, ticks, and sandflies all over Europe and producing maps of these different species every three months. Avia-GIS are also generously providing the free use of these databases for the hackathon.

Copy of IMG_2775.jpg

Finally, Pierre Marchand presented from Teradata. Put in the unfortunate position of being the last barrier between a room full of hungry people and their pizza, he made his presentation quick. Teradata will be providing the free use of their Aster platform for storing and modeling the data, and will be providing training on using this platform in the coming weeks leading up to the hackathon.

Copy of IMG_2791.jpg

And, at the end, there was pizza, beer, and networking.

Copy of IMG_2822.jpg

Again, we’d like to extend an enormous thank you to the speakers at the event and for the previous and ongoing support provided by the organizations involved. You can view pictures of the event on our facebook page and videos of the presentations on our youtube channel.

OCT20 – FREE Meetup about Process Mining @VUBrussel

process mining.png

18:30 Update on the activities of the Data Science Community

Confirmed speakers:

19:00 Jochen De Weerdt (KU Leuven) : Process mining – Making data science actionable

19:30 Mieke Jans (UHasselt): The art of building valuable event logs out of relational databases

20:00 Pieter Dyserinck (ING) & Pieter Van Bouwel (Python Predictions): Process mining, the road to a superior customer experience

20:30 Open discussion and flash presentations. Startups welcome.

20:40 Networking and drinks @ ‘t Complex situated above the swimming pool

Reserve your seat here

Data Science Bootcamp: Week 2

My name’s Alexander Chituc, and I’ll be your foreign correspondent in Brussels, regularly reporting on the diHub and the data science community here in Belgium. I’m an American, I studied philosophy at Yale, and I’m one of the seventeen boot-campers for the di-Academy.

We started the second week of the Data Science bootcamp developing some more practical skills. The first day was devoted to learning about building predictive models using R with Nele Verbiest, a Senior Analyst from Python Predictions. The second day, we worked with Xander Steenbrugge, a data analyst from Datatonic, learning about Data Visualization using Tableau Software.

Day 1: Predictive modeling

Nele told us to think of predictive modeling as the use of all available information to predict future events to optimize decision making. Just making predictions isn’t enough, she said, if there’s no action to take.

The analogy used throughout the training was that developing a predictive model was like cooking. We can think of cooking for a restaurant as having five general steps: take the order, prepare the ingredients, determine the proportion of ingredients to use and how to cook them, taste and approve the dish, and finally, serve the dish and check in with the customer. We can translate this into five analogous steps for preparing a predictive model: project definition, data preparation, model building, model validation, and model usage.


We were given a lab in predictive modeling in R, providing us with hands-on experience with the methodology and techniques of predictive modeling. A sample dataset was provided, and the lab walked us step by step through the process of developing a model to detect the predictors that determine the likelihood of whether a customer will churn (for those outside the biz, a churn rate is the rate at which individuals leave a community over time, in this case that means canceling a subscription with a telecom provider). This lab took us through all five steps of the process, and along the way we cleaned data, replaced any outliers, went over the basics of model building, discussed the danger of over-fitting a model (the analogy here was recording a concert — you want to record the music, not the sound of the audience, conductor’s baton, or pages turning) and how to simplify a model to prevent this. We went over decision trees, linear regression, logistic regression, variable selection, and how to evaluate your model.


There’s obviously a lot more detail I could get into here, but if I had to write about all of it, I’d never get the chance to write about day two.

Day 2: Data Visualization using Tableau Software

The second day, we immediately jumped into how to use Tableau software. Considering just how much it’s possible to do with this program, I was surprised by how intuitive and and easy to use it was. Managing data is extremely simple, and to create a graph you simply set the parameters, select the graph type, assign data to the columns and rows, set any filters you might want, and choose which data you want to visually represent by color, size, or label.

Xander walked us through how to create the dashboard below, demonstrating the sales of a sample superstore geographically, showing which quarters and departments had the most sales, as well as the average shipping delay for each category and subcategory. tableau dashboard.jpg

After lunch, we were given a dataset and an image of a desktop, and asked to recreate it ourselves in Tableau. After learning the basics with Xander, it was nice to be tossed into the pool to get some real practice swimming:

dashboard 2.png

If you’re interested in seeing more of what Tableau software is capable of, here’s an example of an interactive graph from their website, where you can explore Global Nuclear Energy Use. You can explore the entire gallery here.

Thanks again to Nele Verbiest and Xander Steenbrugge for being such great teachers, and expect a post on week 3 soon.

Bayes in Action

During my coursework in Philosophy, we devoted a lot of time to discussing Bayes’ theorem. Two fields find it particularly important, the Philosophy of Science and Epistemology, or the study of what knowledge is. It’s considered a pillar for rational thinking and increasing our understanding of the world, and it’s fundamental for evaluating claims given the evidence we have. Bayes’ theorem looks like this:

codecogseqnBayes’ Theorem

To put it simply, Bayes’ theorem describes the probability of an hypothesis or event based on relevant conditions or evidence. This equation might look complex, but it’s actually quite easy to understand after a little bit of translation. ‘P’ stands for ‘the probability that’, ‘|’ is a symbol that means something like ‘given that’, ‘A’ stands for a hypothesis, and ‘B’ stands for an event or evidence that might impact the likelihood of the hypothesis. When we understand it this way, the equation reads: the probability of a hypothesis given some evidence is equal to the probability of that evidence given the hypothesis, multiplied by the probability of hypothesis, and all of this is divided by the probability of that evidence.

An example can clear things up. Let’s say you check WebMD because you have a nasty cough. You see that having a nasty cough is a symptom of cancer, and that the likelihood of having this cough if you have cancer is very, very high. If you had cancer, this nasty cough is exactly what you would have expect to see, so it must be pretty probable that you have cancer, and like most people who visit WebMD, you walk away convinced that you’re dying. Bayes’ theorem helps us see why thinking this way is a mistake.

Let’s fill in the equation with some numbers we made up. Let’s assume the probability that you have the cough given that you have cancer is very high: 95%. But, you’re a young and healthy person, so at your age, only one in a hundred thousand people get this kind of cancer. And again, lets assume having a nasty cough is pretty common, it’s cold season after all, so one in a hundred people have a nasty cough. Filling it in, we get this:


So, if we do the math, we come up with your probability of having cancer given that you have a nasty cough: 0.00095, a pretty small chance.

The application of Bayes’ theorem in the field of medicine is extremely useful, especially when considering the accuracy of tests and the likelihood of false positives or false negatives, and there are countless other practical applications for it.

Bayes’ theorem is quite simple, but it’s application to the field of statistics, or Bayesian Statistics, is quite complex, and it’s an important part of how Google can filter search results for you, how your email can detect spam, and how Nate Silver could accurately predict the 2008 presidential election in the United States.

A PhD in Astronomy, Romke Bontekoe typically offers his course on Bayes in Action in Amsterdam, but on October 20th, he’ll be offering his training here at the European Data Innovation Hub. The training is geared towards managers and researchers who want to understand Bayesian Statistics and its application, but the course is open to anybody interested.

If you’d like to learn more about Bayes’ theorem, you can look at this video that I animated for Wireless Philosophy, and if you want to learn more about Bayesian Statistics and its application, register for the training on the di-academy’s website.


Mons OCT 24 – Big Data et Vie privée -Vincent Blondel


vincent-blondelEn prélude à la Big Data Week 2016 :

Grande conférence de Vincent Blondel, recteur de l’UCL

Big data et Vie privée

Lundi 24 octobre à 19 heures, Au Mundaneum à Mons

Introduction par Monsieur Philippe Busquin, Ministre d’Etat et Commissaire européen chargé de la Recherche scientifique de 1999 à 2004

« L’Internet promeut nos libertés et est source de possibilités extraordinaires. En même temps, les technologies de l’information et de la communication créent des risques majeurs vis à vis de nos libertés et de la protection de notre vie privée. La surveillance sous toutes ses formes est devenue commune et les grands acteurs de l’internet et les États ne s’en privent pas. Les révélations de Snowden ont ouvert bien des yeux. Les technologies qui permettent de nous espionner peuvent pourtant aussi servir à nous protéger. Mais où trouver l’équilibre ? »

(Académie Royale de Belgique)

Vincent Blondel est recteur de l’Université catholique de Louvain depuis le 1er septembre 2014. Ses recherches sont à l’interface des mathématiques et des technologies de l’information. Il a obtenu un Master of Science à l’Imperial College à Londres et a réalisé des postdoctorats à Oxford, Stockholm et Paris. Il a été professeur invité au MIT, ainsi que Fulbright Scholar et a été invité à intervenir dans de nombreuses institutions, dont Stanford, Harvard, Princeton et Cambridge. Il a en outre collaboré à de nombreux projets transversaux à l’UCL.

Le débat à l’issue de la conférence sera modéré par André Blavier (Agence du Numérique).

Une conférence organisée par le Mundaneum en collaboration avec l’UCL et digital wallonia

Le Mundaneum est partenaire de l’année académique 2016-2017 de l’UCL placée sous le signe de l’ « Aventure scientifique ».

Adresse du jour : Mundaneum, rue de Nimy 76 à 7000 Mons (Belgique)

Inscription souhaitée : ou 065/31.53.43



Announcing the Launch Event for the Dengue Hackathon

I’m excited to announce the launch event for the diHack’s Dengue Hackathon, at 6 p.m. on Tuesday, October 11th at the European Data Innovation Hub. We’ll present the dengue challenge, give examples of of how data science can help stop the spread of dengue, provide information about coming events, and leave time for networking. You can view the event on meetup here.


There are over 390 million cases of Dengue fever every year, and half of the world is currently at risk of contracting the Dengue virus. We believe that if we get enough data and data scientists together, we can make a difference in stopping the disease’s spread. 


You can check out our website here, and everyone is invited to the launch event. Don’t forget to share your data or ideas and sign up for the hackathon. 

Data Science Boot Camp: Week One

My name’s Alexander Chituc, and I’ll be your foreign correspondent in Brussels, regularly reporting on the diHub and the data science community here in Belgium. I’m an American, I studied philosophy at Yale, and I’m one of the seventeen boot-campers for the di-Academy.

It might be an unconventional way to start a Data Science Bootcamp, but the first week was devoted to working on our communication skills with Martine George, PhD, professor of Management Practice at the Solvay Brussels school of Economics and Management. The Director and Head of Marketing Analytics and Research at BNP Paribas Fortis for nearly four years, a database analysis manager for three years, a lecturer on Business Analytics for five, now Martine was teaching us about our personality types and how to effectively communicate with each other, upper management, and potentially, coworkers with drastically different styles of communication.


The main objectives of our training were to make us aware of our own communication style, to learn to adjust the presentations on the results of analytics to different audiences, and how we could convince clients of the importance of our results.

We learned our communication styles through using the Process Communication Model, a tool that “enables you to understand, motivate, and communicate effectively with others.” On the first day, we received our profiles determined by the results of a questionnaire we had taken the week before.

personality-resultsan example of a personality profile

The model divides people into six “base” personalities, with one “phase.” My own “Structure of Personality” had a base of Thinker (organized, responsible, logical), followed by Persister (dedicated, observant, conscientious), Rebel (spontaneous, creative, playful), Imaginer (calm, imaginative, reflective), Promoter (adaptable, persuasive, and charming), and Harmonizer (compassionate, sensitive, warm), in that order (I wont get into too much detail about the different types, except to share the fun fact that in earlier versions of the model, my base personality type, Thinker, was named “Workaholic,” but if you’re interested in learning more, you can visit the website).

six-personality-typesThe second day we focused on communication with managers and giving presentations taken into account what we had learned the first day.

One important aspect of this was writing good one-pager, something a busy executive can quickly read to understand what exactly you’ve learned in your analysis, how you did it, and what to do now. We went over some example one-pagers and explained where they went wrong and how we could improve them: making sure the business question is clear, making the conclusion explicit with an actionable next step, and removing any unnecessary information when explaining the method. No matter how exciting or interesting you might find the methodology of your report, executives and upper management typically don’t.

We also spent a good portion of the second day learning about giving presentations, and how to alter your presentation given a potential change in time. With focusing on governing thoughts, story boarding, and logically organizing our ideas, you can turn a thirty minute presentation into a five minute presentation if the need arises, and vice versa. After some work with Martine, structuring the major key ideas she wanted to express, Annelies gave a great five minute pitch for an app she wanted to build using using data science, and she could just as easily turn it into a thirty minute presentation.

The biggest take away from our training was to target the right group with the right message, and to cater your message not to your own communication style, but the communication style of your audience.

It was an unusual way to start a bootcamp, but communication is an often neglected skill for a data scientists, and beginning this way really put an emphasis on its importance. Next week we would be moving on to Predictive Modeling in R and Data Visualization and Story Telling.

Summer Camps and Leader Boards

My name’s Alexander Chituc, and I’ll be your foreign correspondent in Brussels, regularly reporting on the diHub and the data science community here in Belgium. I’m an American, I studied philosophy at Yale, and I’m one of the seventeen boot-campers for the di-Academy.


Of the hundred or so applicants who applied for the Di Hub’s Data Science Boot Camp, only 40 were selected for the five-week Summer Coding Camp. Most of us had little to no experience coding in Python and R – or in my case, coding – and the Summer Coding Camp was to serve two purposes: first, to narrow down this pool of applicants to the twelve who would eventually be selected for the boot camp, and second, to catch us up as quickly as possible with the coding skills we would need for our training to become data scientists.

I had already expected that there would be a lot to catch up on. I have a bachelor’s degree in philosophy, and my elective coursework was in psychology and writing. My coding experience consisted of one semester in college where I took a class in Object Oriented Programming in Java, seven years ago. Suddenly, I found myself in a room with a couple of Master’s in Statistics, several Master’s in Business Engineering, a few digital marketers, and a lot of data enthusiasts with backgrounds in computer science, all competing for twelve spots.

The Di Hub was open to all of us as a place to study during the camp, providing coaches to answer any questions we might have. Each week of the camp covered a different topic. The first week covered SAS, the second Python, the third R, the fourth statistics, and the fifth SQL. When we began the first week, I was relieved to see just about everybody struggle as much as I did. This didn’t surprise me: all training in SAS comes directly from the company, so regardless of your background, it was no natural that none of us knew how to code in it. It was by far the most intensive week of summer camp, and in the following weeks, many of us were still working on it it, preparing for the certification exam on September 16th, which only half of us passed (I’ll leave it up to you to guess whether I was one of them).

The second, third, and forth weeks we learned using Data Camp’s platform. We were assigned 17 courses to complete on their website: three in Python (at the time, their Python content was admittedly lacking, but they’ve recently added several more Python courses to their website), seven in R, and seven in statistics. During the fourth week, we were given the option of following the courses in R on Data Camp, or to do instead a separate module in SAS. As far as I know, everyone chose to do statistics in R. Doing work in SAS, after all, didn’t count for the leader board.

I should explain the leader board. It began as a joke, Nele announced it that Friday afternoon on Slack. After finishing the day’s coding, we were going to celebrate the completion of our second week, and before this celebration, Nele would be announcing our leader board. Suddenly, all of us became aware of the feature available to groups on Data Camp: a leader board that ranks all of the members of the group by experience earned completing exercises in their courses.

I noticed that I was, at the time, ranked at number twelve, and I was determined to make it into the top ten by the time the day was over. Between exercises, I compulsively checked the status of the leader board, figuring out just how many exercises I had to complete before I would pass number eleven, which I did, and then to pass number 10, which I did. That afternoon, Nele announced the leader board, and on the board were written only six scores. The top six, in descending order were Liza, Goran, Olivier, Agustina, Ruben, Victor, and you can imagine my disappointment.


The leader board was a source of healthy competition, granting bragging rights and a way to measure ourselves against each other and judge our prospects for being selected. It became a little more serious, however, when it was announced that there would be a job fair on September 9th, where we would all present ourselves to companies looking to hire data scientists, and finding a company to sponsor you would guarantee your seat in the boot camp. The order in which we were presenting was determined by your leaderboard ranking.

It was an intense five weeks, and we all learned much more than we could have on our own. I’ll have to devote an entire post to the job fair later on, but I’ll leave off this one by thanking all of the great coaches we had during the summer camp: coaching Python, Elie Jesuran from Keyrus, coaching R, Dominique De Beul, Eric Lecoutre and Pieterjan Geens from Business&Decision, and coaching SQL, Erwin Gurickx from Teradata.

Job @ Medialaan: Data Quality expert

Are you passionate about Data Crunching and keen on having business impact?

We are looking for a talented Data Quality Expert!

You will be part of our dynamic Research & Marketing team and report to the CRM manager of our central unit.

The Central CRM unit is a department in the organization that gives strategic and operational advice in how to build a trustworthy 2-way relationship with our customers. Our unit is charged with supporting a wide range of internal users and we help them to take important business decisions and to steer the strategy of the company.

Your Challenge:

Set up a Data Governance program together with all stakeholders (IT, Marketing, Sales…) that describes the establishment and deployment of roles, responsibilities, policies, and procedures concerning the acquisition, maintenance, dissemination and disposition of our data.

You main tasks are:

  • Identify, assess, fix, document, and communicate potential quality issues in the way data are collected, stored, processed, or used.
  • Understanding the data, looking for discrepancies, inconsistencies, data redundancy and taking steps to solve data deficiencies.
  • Be responsible for the enrichment of our data to create added value.
  • Continuous Monitoring output data quality through KPI dashboards and reports and make recommendations based on the outcome.
  • You are the person that leads all data quality related projects, which includes writing business requirements, managing the scope, budget and risks etc.
  • You have a crucial role in helping the organization to understand the value and impact of good data quality.


Additionally you will:

  • Setup business and validation rules together with stakeholders that govern our data.
  • Be responsible for business processes that create or change data.
  • Automation of solutions for data quality issues associated with all sorts of data (customer data, mobile data, online data, financial data etc.).
  • Design, implementation and testing of (automated) DQ processes; for example data cleansing operations or deduplication.
  • You contribute to the continuous improvement of our Data Governance methodology to keep it tuned with technology evolution (unstructured data, cloud, data privacy, …)
  • You act in a cross-functional team and work closely with our IT department.


Your profile: 

  • You have a passion for information management and you are challenged by the data struggle many companies are facing at the moment. You have good analytical skills and you are able to communicate both in business as well as in technical language.
  • You have at least 3 to 5 years’ experience as an analyst in a Data Quality and/or Business Intelligence environment.
  • You possess a basic understanding of data models and architecture, data governance/data management concepts, approaches, methodologies and tools.
  • You are good in translating analytical results into business concepts.
  • You have strong programming skills, preferably in SQL.
  • Knowledge of Tableau is also a plus.
  • You have strong communication skills and like to take initiative.
  • You like to work independently, to take ownership and you are accountable.
  • You have a solid understanding of the business and the importance of Data Quality.
  • You have excellent communication skills.
  • You are an objective and diplomatic person that likes to work in a team.
  • You are solution-minded.


Intrigued? Please send your motivation letter and resume to Maybe you can join our new CRM unit and build a great data story at MEDIALAAN.