Top 5 presentations of DIS2015 (Data Science Innovation Summit).

Dear friends,

March 26th will be the milestone of our community.

We had 68 speakers at our Data Innovation Summit with over 500 attendants. Check our new DataScience video channel with all the presentations. Pictures of the event are available on our facebook page. Over 600 people replied to the datascience survey.  Read Ward’s analysis of the satisfaction survey.

Here is the top 5 of the presentations:

  • Kris Peeters: The people aspect of Data Science (view)
  • Elena Tsiporkova: Data Innovation Lab (view)
  • Toon Vanagt: How Open Data allows faster innovation (view)
  • Hans Constandt: The disruptive Role of startups in Data Innovation (view)
  • Steven Beeckman: Government and Data (view)

Thank you for participating to the Data Science Survey. The results are still available to all participants. Here is a summary done by our experts:

  • Data Innovation Summit Dashboard by Dieter and Nicholas (view)
  • Data Innovation Survey results – In Neo4j by Rik (view)
  • The question “Are all Data Scientists nerds?” – By Nele (view)
  • Data Innovation Survey 2015 – preliminary analysis by Ward (view)

The plan is to bundle these in a e-book, if you want to be part of this book you only need to submit your analysis. We are still waiting for a team that wants to link this survey with another existing survey. Winners of the best analysis will be announced during our Banking meetup on May 20th.

Thank you for all for making this summit a success.

Philippe Van Impe

Join us on our next meetup:

Brussels Data Science Meetup

Brussels, BE
1,161 Business & Data Science pro’s

The Brussels Data Science Community:Mission:  Our mission is to educate, inspire and empower scholars and professionals to apply data sciences to address humanity’s grand cha…

Next Meetup

Data for Good & Kaggle competitions

Thursday, Apr 23, 2015, 6:30 PM
86 Attending

Check out this Meetup Group →

New Links:

.

Three Personas of a Data Scientist

I wish I was the artist but I’m afraid I have to settle with the last option.

Big Data Noir

In light of the recent news of my article being chosen to be published (among other great pieces by subject matter experts) in EMC Corporations Proven Professional Knowledge Sharing Competition, I have decided to share a piece of the article now.

Lately I’ve been reading a lot about “what is a Data Scientist” and after much research, I wanted to showcase what qualities an individual must possess to don the white coat!

Data Scientists are unique individuals that push the boundary of machine and human learning in an effort to discover what cannot be seen by others. A Data Scientist is a balanced role and requires someone with the skills to organize, develop, create and share their work amongst their colleagues and upper management. This section is comprised of detailed explanations of three personas that every Data Scientist possesses: the Nerd, the Artist, and the Business Professional9 (Dutra 2015).

View original post 944 more words

Starting free hands-on MOOC Coaching!

Great initiative by Hendrik and Ward. Over the coming months these volunteers will assist you in your effort to finalize the Stanford MOOC about Machine Learning from Instructors

Andrew Ng
Stanford University

The Data Science Community

Hi everyone!

We’re excited to announce our coaching in the most popular Massive Open Online Course: Machine Learning by Andrew Ng! We’ve got some beautiful new office space thanks to our buddies at AXA Belgium where we’ll be holding meetups to discuss and work through course materials. We’ll start Monday 4th of May around 7 pm, so keep a look out at our various channels of communication! Links are in the sidebar, and expect a Meetup.com event soon concerning address details, a Calendar file and so on.

Andrew Ng’s ‘Machine Learning’ is one of the first courses on Coursera which has grown to amazing popularity, and rightfully so! This course covers ‘how to make computers act without explicitly programming them’, as Andrew puts it, by explaining concepts like multivariate regression, neural networks, support vector machines and much more. This information is invaluable for many branches of data science and…

View original post 228 more words

Starting free hands-on MOOC Coaching!

Hi everyone!

We’re excited to announce our coaching in the most popular Massive Open Online Course: Machine Learning by Andrew Ng! We’ve got some beautiful new office space thanks to our buddies at AXA Belgium where we’ll be holding meetups to discuss and work through course materials. We’ll start Monday 4th of May around 7 pm, so keep a look out at our various channels of communication! Here’s the Meetup.com event with address details, a Calendar file and so on.

Andrew Ng’s ‘Machine Learning’ is one of the first courses on Coursera which has grown to amazing popularity, and rightfully so! This course covers ‘how to make computers act without explicitly programming them’, as Andrew puts it, by explaining concepts like multivariate regression, neural networks, support vector machines and much more. This information is invaluable for many branches of data science and gives a good look at what’s ahead for those willing to get their hands dirty. You don’t have to be an expert programmer for it either. Everything Andrew does is in Octave, but to make our learning experience even more exciting we’ll be repeating the Octave exercises in R, a very common language among all data- or statistics workers. R is a great language to learn if you’re looking to go forward in (online) courses concerning data science.

With our group, you’ll be guided in understanding the concepts and assignments given to you in this course, giving you valuable experience in what Machine Learning is and what can be accomplished with it. We’ll also give a little more background on some of the stuff Ng talks about so that each and all can keep their head above water.

And, of course, it’s free! We want to stimulate a learning environment and attract enthusiasts on all levels, so feel free to join in. After our first meetup we’ll hook a camera up with a Google Hangouts group so that you can follow online.

First ‘in-real-life’ meeting will start Monday May 4th and from then on we’ll get together every Thursday (except on main meetup days, about once per month). Edward and I will coach, though enthusiasts are always welcome to help out or hang around.

See you there!

Job – Predicube – Junior Data Scientist – Antwerp

predicube

Experience:Entry Level
Job Function:Information Technology
Employment Type:Full-Time
Description:
We are looking for a junior data scientist who is interested in learning about big data analytics in a cloud-based environment and who is eager to collaborate with our CTO in a growing and exciting startup. We offer a dynamic job on the 19th floor of the KBC tower in Antwerp, with possibility to be included in the employees’ stock option program at a later stage.

If you have good technical skills, knowledge of data science and interested in working on a system that analyzes 1 billion records on a daily basis to build dozens of predictive models, you are our man or woman!
Desired Skills:
Technology: Linux, Python, PHP
Affection for data & numbers
Ethical reflex concerning privacy-friendly analytics
Also a plus: expertise about Hadoop, Pig, mongoDB

Apply:

Make sure that you are a member of the Brussels Data Science Community linkedin group before you apply. Join  here.

Please note that we also manage other vacancies that are not public, if you want us to bring you in contact with them too, just send your CV to datasciencebe@gmail.com .

Here is the link to the original job add.

Please Apply Online

Data Innovation Summit – Satisfaction Survey

During the Data Innovation Summit, we asked you to fill in a short survey. Many among you were kind enough to do so. Below is a brief presentation of the results.

General

There were 85 responses; since it was a short survey, most respondents went to the end, as shown in he bar graph below. There is something of a ‘dip’ where we asked for the Brussels Data Science Meetup activities you would attend – probably we had quite a bit of visitors who came specifically for the Summit, and aren’t attending any regular monthly meetings.

Number of responses to individual questions/question groups

Number of responses to individual questions/question groups

Where were the responses from?

SurveyMonkey reports the IP addresses from which the responses were made. These were used to find an approximate latitude/longitude using FreeGeoIP (http://freegeoip.net/), and these locations were plotted on the Google map below.

Locations from where the surveys were filled

Locations from where the surveys were filled

The vast majority of surveys were filled in Brussels, with some more coming from larger cities in the Flanders. Interesting to note is the lack of responses from the South of the country. Worth further investigation.

Date and time the Survey was filled in

Another piece of information we get from SurveyMonkey is the date and time of day the survey was filled. Below is the graph showing the time of day.

Time of day the survey was filled

Time of day the survey was filled

The plot above has the x axis in GMT – one hour earlier than time on the clock in Brussels during the period of the survey. Still, we agree with Nele in her analysis of the main Survey that Data Scientists aren’t really that nocturnal (except for some, of course, as the time of posting this analysis might tell you).

Preferred presentations

The next three questions asked you which presentations you liked best. The results of these three questions were pooled, and the pooled votes counted. There were many presenters that received at least one vote, the distribution had a very long tail to the right. Here are the top five, with the number of votes they received:

  • Kris Peeters: The people aspect of Data Science (16)
  • Elena Tsiporkova: Data Innovation Lab (15)
  • Toon Vanagt: How Open Data allows faster innovation (14)
  • Hans Constandt: The disruptive Role of startups in Data Innovation (13)
  • Steven Beeckman: Government and Data (11)

In the open-answer part of this question, we asked you for suggestions for presentations that were not on the programme, but that you would have liked. We received very useful suggestions. Several people would have liked more technical talks, others asked for more use cases.

Format of the presentations

There were no clear ‘winners’ in the question what your preferred format of the presentations was. In the plot below, we show a series of box plots, for the preference for each of the formats. The thick line in a box plot corresponds to the median; 12-minute presentations were slightly more popular than the others. The ‘box’ in a box plot represents the interquartile range (the lower edge is the 25% quartile, the upper the 75% quartile). Clearly the ‘Ignite’ format has a very wide spread: some people like it best, others least.

preferences for the different formats of presentations

preferences for the different formats of presentations

In the open-ended part of this question, several people commented on the fact that there just too many Ignite presentations, which made for a very fast-paced day, difficult to keep attention up. There were several suggestions to include at least some longer, in-depth presentations of 30 minutes. As an alternative, or complement to Ignite presentations, some suggested ‘posters’ – a very popular format in scientific conferences, in which authors do not present orally, but put their ideas on a single A0 poster. These posters are then on display throughout the meeting, and can be seen during the breaks; usually, there is at least one extended break in which people have time to look at the posters.

Logistics

Both venue and location scored very high, as is apparent from the series of box plots below. Unfortunately we failed to include a possible response ‘Didn’t use/Didn’t take part’ – so it’s difficult to interpret the lower scores. For example, many people arrived after breakfast- they might very well have given a ‘neutral’ score to this part of the event, so artificially decreasing the score.

Scores on appreciation of different logistics aspects

Scores on appreciation of different logistics aspects

Participation

Out of the 85 respondents, 60 stated that they were passive members, 19 active members, with the rest either not responding or stating explicitly that they were not a member of the community. Many did expect to participate in one or several of he planned events, as shown in the bar graph below.

Number of people planning to attend specific events

Number of people planning to attend the different planned events

Again, we see a fairly large proportion of respondents stating that they would not attend any of the monthly activities, as shown in the bar graph below. So these people came specifically for the Summit

Number of people planning to attend 0, 1... 6 (=all) events

Number of people planning to attend 0, 1… 6 (=all) events

This is somewhat in contrast with what is shown in the next bar graph: most people stay in touch through the Meetup site – though the pages specifically on the Data Innovation Summit came as a close second

Number of people per communication channel

Number of people per communication channel

Thanks again!

Once again, thanks to all those who took the time to respond. Your answers will certainly make it easier to organise the next Data Innovation Summit. Especially the suggestions were very valuable, and will certainly be taken into account when planning for a possible future installment.

On a technical note: the analysis was done using R; all graphs were created with the ggplot2 package. There is a MarkDown document which creates even more graphs but unfortunately can’t be meaningfully published on this blog site. If you are interested contact me through evberghe@gmail.com

Edward Vanden Berghe

Data Innovation Survey remains open

We have now collected over 580 responses to the Data Innovation Survey 2015 – which was launched just before our Summit of 26 March. Part of the survey was to collect information on who would be attending the Summit, and which parts of the Summit. But the most import aspect of the Survey was to allow us to get a picture of the Data Science landscape in Belgium. We have now created an abridged version of the Survey, omitting the questions specific about the Summit. For those that have not yet filled the Survey in its previous incarnation, they can still do so on

https://nl.surveymonkey.com/s/DiS2015

Taking 10-15 minutes to fill the Survey will allow us to make our picture of the Data Science landscape even sharper, and to make better decisions on future directions and activities for the Brussels Data Science community.

Also the ‘Satisfaction’ survey, asking your opinion on various aspects of the Survey is still open, and can be reached here.

Thanks for your collaboration.

Edward

Hackathon – Brussels April 24-26 – Hack Epilepsy

hack-epilepsy-360x90px

A TWO-CITY HACKATHON WITH A SINGLE PURPOSE: IMPROVING THE LIVES OF PEOPLE WITH EPILEPSY

Dear friends,

It is time to put your data4good shoes on to support this initiative. Join us in our effort to help improve the lives of people with epilepsy.

WHAT IS HACK EPILEPSY

On 24-26th April, 2015, developers, designers and epilepsy experts (doctors and patients) will come together at two simultaneous hackathons in Brussels, Belgium and Atlanta, US.

OUR GOAL:

To build innovative new digital tools for people with epilepsy and their caregivers.

WHY AN EPILEPSY HACKATHON?

wordcloud-sampleEpilepsy is a common serious brain disorder that affects 65 million people worldwide1,6 – young and old, rich and poor. It leaves no part of daily life untouched. Study, work, sport, travel, friends and family can all be affected.  Epilepsy impairs physical, psychological and social functioning and can be fatal1.

WHAT IS EPILEPSY?
In practical terms, people living with epilepsy may face challenges at school or college or in getting a job2. They may not be allowed to drive, and they may be unable to live independently2. Many people living with epilepsy live in fear, owing to the unpredictability of seizures1.

There is a need for new and better digital tools to support people with epilepsy. Tools that can enable them to connect with others. Tools that show them and their caregivers how to get reliable information about epilepsy. Tools that can raise awareness about epilepsy and reduce discrimination. Tools that can potentially change lives.
Hack Epilepsy is an opportunity for you to use your expertise, creativity and specialist skills to build innovative digital tools to break down barriers, bridge gaps and bring new solutions to the challenges of living with epilepsy.

HOW CAN YOU HELP?


The epilepsy community needs your expertise, creativity and technical know-how. Whether you’re a designer, a developer, a communicator or an entrepreneur, they need your skills.

In each location, teams will have a chance to win one of three cash prizes, to be divided among participating team members:

• First prize: €6,000/$6,900
• Second prize: €3,000/$3,450
• Third prize: €1,000/$1,150

You don’t need to know about epilepsy. At Hack Epilepsy, doctors and patients will explain more about epilepsy and what it means to live with unpredictable seizures. You’ll gain all the insights you need to develop meaningful prototype digital solutions which can make a much-needed difference to the epilepsy community.

Registration?

Registration will be  managed on eventbride: https://www.eventbrite.com/e/hack-epilepsy-brussels-tickets-15532606444
use discount code ‘datascience’ to get 50% discount.

Hack Epilepsy – Data4Good Hackathon

Friday, Apr 24, 2015, 6:00 PM

No location yet.

2 Business & Data Science pro’s Attending

A TWO-CITY HACKATHON WITH A SINGLE PURPOSE: IMPROVING THE LIVES OF PEOPLE WITH EPILEPSYmore info: http://www.hackepilepsy.com/Dear friends,It is time to put your data4good shoes on to support this initiative. This is not a pure data driven hackathon it is a global workshop. Join us in our effort to help improve the lives of people with epilepsy….

Check out this Meetup →