Seminar- ULB – Context-sensitive Ordinal Regression Models for Human Facial Behaviour Analysis

depression-face_fear
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    ULB  Machine Learning Group (MLG)
             S E M I N A R
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Title:
       “Context-sensitive Ordinal Regression Models for Human Facial Behaviour Analysis”
When:
       Wed 8 July 2015 from 11:30
Where:
       Université libre de Bruxelles,
       Campus de la Plaine (http://www.ulb.ac.be/campus/plaine/plan.html)
       Département d’Informatique
       NO Building, Floor 8, local P.2NO8.08 (Rotule) (http://www.ulb.ac.be/campus/plaine/plan-NO.html)
       Boulevard du Triomphe – CP212
       1050 Bruxelles
Abstract:
Enabling computers to understand human facial behaviour has the potential to revolutionize many important areas such as clinical diagnosis, marketing, human computer interaction, and social robotics, to mention but a few. However, achieving this is challenging as human facial behaviour is a highly non-linear dynamic process driven by many internal and external factors, including ‘who’ the observed subject is, ‘what’ is his current task, and so on. All this makes the target problem highly context-sensitive, resulting in the changes of dynamics of human facial behaviour, which, in turn, is critical for interpretation and classification of target affective states (e.g., intensity levels of emotions or pain). In this talk, I will propose several extensions of the Conditional Ordinal Random Fields (CORF) model that are able to learn spatio-temporal and context-sensitive representations of human facial behaviour useful in various tasks of facial analysis. In particular, I will show how the proposed CORF models can be used for problems such as intensity estimation of facial expressions of emotion, intensity estimation of facial action units and facial expressions of pain. I will also demonstrate the performance of the models on the task of classification of facial expressions of persons with autism spectrum condition. Finally, I will discuss other potential applications of the models proposed and further challenges in modelling of human facial behaviour.
Speaker:
Ognjen Rudovic rreceived his PhD from Imperial College London, Computing Dept., UK, in 2014, a MSc degree in Computer Vision and Artificial Intelligence from Computer Vision Center (CVC), Barcelona, Spain, in 2008, and BSc in Automatic Control Theory from Electrical Engineering Dept., University Of Belgrade, Serbia, in 2006. He is currently working as a Research Fellow at the Computing Dept., Imperial College London, UK. His research interests include computer vision and machine learning, with a particular focus on face analysis, Bayesian learning and inference methods, and their application to human sensing. He is a member of Intelligent Behaviour Understanding Group (IBUG) at Imperial College London (http://ibug.doc.ic.ac.uk/people/orudovic).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
       MLG     http://www.ulb.ac.be/di/mlg/
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Pr. Gianluca Bontempi
co-Head of the Machine Learning Group
Département d’Informatique
Université Libre de Bruxelles
Boulevard du Triomphe – CP212
1050 Bruxelles, Belgium
email: gbonte@ulb.ac.be
Office Phone: +32-2-650 55 91
Fax: +32 2 650.56.09
mlg.ulb.ac.be
 
Director
Interuniversity Institute of Bioinformatics in Brussels (IB)²
ibsquare.be

Datascience Hackathon – Old School vs New School – Challenge 1: build a recommendation engine

indian-vs-honda---pr_600x0w

As a final exercise before the summer 2 teams will challenge each other and build the best recommendation model during a hackathon held in Brussels over the weekend of July 3,4 .

This is the first challenge where old school datascience experts will compete against New School big data science techniques.

Our first challenge is about building a recommendation engine on subscribers data from 4UCampus.

The challenge from 4UCampus:

This company is facilitating subscriptions for students and academic personnel at discounted rates.

Over 100k customers have different subscriptions out of 130 possibilities.

Based on historic data of the past 12 years the teams will work on building a recommendation engine.

Registration:

Please useoureventbrite page to register for this hackathon.

bdsc-hackathon.eventbrite.com

The presentation of the data:

Presentation of the data that will be used in our First Hackathon

Monday, Jun 22, 2015, 7:00 PM

No location yet.

14 Fast learners Attending

We have prepared a nice dataset for you.During this hackathon the Classics vs Bigdata approaches will be used to analyse the dataset.We expect that teams will build for example a recommendation engine in batch and/or realtime.

Check out this Meetup →

The hackathon:

Hackathon – Old School vs New School – Challenge 1: build recommendation engine

Friday, Jul 3, 2015, 7:00 PM

No location yet.

7 Fast learners Attending

We have prepared a nice dataset for you.During this hackathon the Classics vs Bigdata approaches will be used to analyse the dataset.We expect that teams will build for example a recommendation engine in batch and/or realtime.

Check out this Meetup →

Meetup – June 2015 – Privacy and security

meetup annoucement

Our last meetup of the academic year was a success.

Thank you to all the presenters for their enthusiastic pitches.

Videos of the presentations will be available soon on our video channels:

Here are the links to the presentations:

Agenda Datasciences and Security & Privacy
VUB 19/06/2015
pdf video
BDSC:

Introduction by Philippe Van Impe , Update on the activities of the community

here here
Collibra:

Ann Wuyts from Collibra presents the governance challenges of the privacy and security issues.

here here
Inpuls :

Christoph Balduck from Inpuls, a practical approach towards the compliance to the EU regulations about data privacy & Data protection regulations.

here here
Big Industries :

Robert Gibbon from Big Industries, about setting up a secure Hadoop based environment.

here here
IAPP:

Paul Jordan, European Managing Director, IAPP will present the International Association for Privacy Professionals:  The independent Voice for Privacy in Europe.

here here
European Data Innovation Hub:

Short presentation on the benefits of Joining the Hub.

here here

Training – Business Analytics in R – Starters Track – 3days

business-analytics-with-r-online-training

I’m please to announce that we will cooperate with BNOSAC to allow you to benefit from R-training in the Hub.

The starters track training will start in September:

R for starters:

duration: 2 days

What is covered in this module:

  • What is R, packages available (CRAN, R-Forge, …), R documentation search, finding help, RStudio editor, syntax  Data types (numeric/character/factor/logicals/NA/Dates/Times)
  • Data structures (vector/data.frame/matrix/lists and standard operations on these)
  • Saving (RData) & importing data from flat files, csv, Excel, Oracle, MS SQL Server, SAS, SPSS
  • Creating functions, data manipulation (subsetting, adding variables, ifelse, control flow, recoding, rbind, cbind) and aggregating and reshaping
  • Plotting in R using base and lattice functionality (dot plots, barcharts, graphical parameters, legends, devices)
  • Basic statistics in R (mean, variance, crosstabs, quantile, correlation, distributions, densities, histograms, boxplot, t-tests, wilcoxon test, non-parametric tests)

During the course, you will need to do exercises, so bring your laptop.

Introduction to R programming – 2 days

Monday, Sep 7, 2015, 9:00 AM

European Data Innovation Hub
Boulevard du Souverain 23 Brussels, BE

2 Fast learners Attending

R for starters: duration: 2 daysWhat is covered in this module:• What is R, packages available (CRAN, R-Forge, …), R documentation search, finding help, RStudio editor, syntax  Data types (numeric/character/factor/logicals/NA/Dates/Times)• Data structures (vector/data.frame/matrix/lists and standard operations on these)• Saving (RData) & imp…

Check out this Meetup →

Common data manipulation for R programmers:

duration: 1 day

This module allows you to be a better programmer by writing your own functions, getting acquainted with commonly used R functions for basic data manipulation and the R object oriented programming environment.

The following is covered in this module:

  • with, within, by, apply family of functions & split-apply-combine strategy
  • vectorisation, parallel execution of code
  • data.table – fast group by, joining and data.table programming tricks
  • basic regular expressions
  • writing your own functions
  • do.call
  • reshaping from wide to long format
  • environments
  • S3 classes, generics and basic S4 methodology
  • handling of errors and exceptions, debugging code

Be prepared on some tough exercises, so bring your laptop.

Common data manipulation for R programmers – 1 day

Monday, Sep 21, 2015, 9:00 AM

European Data Innovation Hub
Boulevard du Souverain 23 Brussels, BE

2 Fast learners Attending

duration: 1 dayThis module allows you to be a better programmer by writing your own functions, getting acquainted with commonly used R functions for basic data manipulation and the R object oriented programming environment.The following is covered in this module:• with, within, by, apply family of functions & split-apply-combine strategy• vecto…

Check out this Meetup →

List of other available courses:

  • R programming
  • R for starters
  • Common data manipulation for R programmers
  • Reporting with R
  • Creating R packages and R repositories
  • Managing R processes
  • Using SVN/git with RStudio
  • Data connectivity using R
  • Integration of R into web applications – R analytics
  • Statistical Machine Learning with R
  • Text mining with R
  • Applied Spatial modelling with R – Oracle R Enterprise
  • ROracle and Oracle R Enterprise (ORE) – transparancy layer
  • Oracle R Enterprise – advanced data manipulation
  • Data mining models inside Oracle R Enterprise (ORE) and Oracle Data Mining (ODM)

About European Data Innovation Hub:

The European Data Innovation Hub aims at facilitating information sharing.

Next to hands-on workshops, hackathons and coached MOOCS we are also facilitating training of data science related topics.These training are always offered in cooperation with our members. If you want us to facilitate your training please send email to training@dihub.eu

If you want to have an overview of the training we have scheduled please follow this link.

About BNOSAC:

BNOSAC, is a Belgium consultancy network specialized in open source analytical intelligence. We gather a group of dedicated open source software engineers with a focus on data mining, business intelligence, statistical engineering and advanced artificial intelligence.

We are experts in using analytical open source software and provide expertise, consultancy and training for the use of well-established open source tools like R, Python, Pentaho, PostgreSQL, OpenBugs, PostGIS and Mapserver in your organisation.

Annual Data Science BBQ – July 9th 2015 – Hippo Droom Zoniën Woud

zicht

I’m happy to announce that our yearly BBQ will be held at the Hippo-Droom on July 9th 2015.

The BBQ will start at 19:00

Companies are welcome to add an activity to the BBQ such as

  • Off site team event before the BBQ
  • Local Beer tasting

We have agreed with the owner a discounted B&B room rate of 110€ if you reserve before July 2nd.

For more information please email pvanimpe@dihub.eu or call me 0477 23 78 42

Reserve your table now:

Please reserve you seat using our eventbrite page.

Companies who want to book a full table at a special rate should contact pvanimpe@dihub.eu / 0477 23 78 42.

https://www.eventbrite.com/e/annual-data-science-bbq-tickets-17556433766#

About the Hippo-Droom

B&B Hippo-Droom, a charmfully renovated Belle Epoque villa (1900) is situated in the middle of the Sonian Forest. Our B&B holds 8 luxurious rooms, a private garden, inner court with 6 horse stables and a pasture. The B&B’s decoration is cosy, contemporary with a little wink towards the region and nature.

more info: http://hippo-droom.be/

Internship – PrediCube – text mining, topic modeling and segmentation

predicube

Internship at PrediCube

Experience: Intern Level
Job Function: Information Technology
Employment Type: One to three months

Description:
We are looking for an enthusiast intern who is interested in learning about text mining, topic modeling and segmentation for a growing and exciting startup. We offer a dynamic experience on the 19th floor of the KBC tower in Antwerp.

If you have good technical skills and you are interested in working on a system that analyzes 1 billion records on a daily basis to build dozens of predictive models, you are our man or woman!
Desired Skills:
Technology: Linux, Python, data science
Affection for data & numbers
Ethical reflex concerning privacy-friendly analytics

Apply:

Make sure that you are a member of the Brussels Data Science Community linkedin group before you apply. Join  here.

Please note that we also manage other vacancies that are not public, if you want us to bring you in contact with them too, just send your CV to datasciencebe@gmail.com .

To learn more, please contact careers@predicube.com

 

Follow our courses to boost your datascience skills ?

The European Data Innovation Hub organizes trainings and hands-on workshops about bigdata and datascience.

Here is the agenda:

Data Innovation Training Hub

Brussels, BE
103 Fast learners

This group is an initiative from the nonprofit European Data Innovation Hub, the premier networking space where business, startup, academic and political decision makers share…

Next Meetup

Day 3: Coached Mooc – Introduction to Big Data with Apache S…

Tuesday, Jun 16, 2015, 7:00 PM
21 Attending

Check out this Meetup Group →

Job – Humix – VACATURE ERVAREN DATA ANALIST

Over Humix

Humix is een bedrijf van user experience en digital marketing experten. Wij analyseren en optimaliseren de gebruiksvriendelijkheid (usability), de gebruikerservaring (user experience) en de conversie van websites, webshops en andere ICT applicaties. Wij verzamelen inzichten over het gedrag en de voorkeuren van mensen, deze inzichten zetten we vervolgens in om de applicatie, website of marketing kanalen te optimaliseren en te zorgen dat de business doelstellingen gehaald worden. Afhankelijk van het project en de vraag van de klant selecteren we de beste mix van methodes voor analyse, prototyping, testen en meten.

Je bent:

Gepassioneerd door hoe mensen zich online gedragen en bent altijd op zoek naar het waarom achter de dingen die je ziet. Je hebt al ervaring met web analytics en wilt je nu verder specialiseren. Als data analist gebruik je ruwe data om je aanbevelingen te ondersteunen. Je vertaalt inzichten naar de business op een heldere manier.

Dit kun je:

  • Je hebt ervaring met web analytics tools als Google Analytics en/of Adobe Analytics.
  • Je weet dat er meer te analyseren valt dan visits en time on site.
  • Je hebt ervaring in het begrijpbaar visualiseren van data en inzichten.
  • Je bent volledig thuis in Excel en weet hoe je grote data sets omzet naar concrete optimalisaties.
  • Ervaring in specifieke dashboarding tools is een plus.
  • Je kan zelfstandig werken. Je hebt een sterk analytisch inzicht.
  • Je spreekt en schrijft goed Nederlands en Engels.
  • Je hebt veel goesting, om hard te werken, om bij te leren en om met je klanten resultaten te halen. Een gezonde dosis nieuwsgierigheid is een must.

Dit worden je taken:

  • Je begeleidt als expert klanten op weg naar een data driven business.
  • Je definieert KPI’s voor de online aanwezigheid van de klant.
  • Je denkt mee over de opzet van een measurement framework.
  • Je spreekt zowel met business, marketing en IT om doelstellingen te bereiken.
  • Je onderzoekt bezoekerscijfers en KPI’s in Analytics paketten en vertaalt ze naar concrete verbeteringen voor de website of webshop.

Dit mag je van ons verwachten:

  • Aantrekkelijk salaris inclusief extralegale voordelen zoals een bedrijfswagen, laptop,…
  • Flexibele werkuren
  • Uitdagende projecten voor interessante Belgische en internationale bedrijven
  • Ruimte om nieuwe en creatieve initiatieven te nemen
  • Jaarlijks opleidingsbudget om bij te blijven met de laatste ontwikkelingen. Dit mag zelf ingevuld worden.
  • Vast contract (onbepaalde duur)

Apply:

Make sure that you are a member of the Brussels Data Science Community linkedin group before you apply. Join  here.

Please note that we also manage other vacancies that are not public, if you want us to bring you in contact with them too, just send your CV to datasciencebe@gmail.com .

Here is the link to the original job add.

The positions are being filled up as they are coming in. Interested? Please send your motivation letter and your curriculum vitae to  lonneke.spinhof@humix.be

 

Why don’t you follow these courses to boost your datascience skills ?

The European Data Innovation Hub organizes trainings and hands-on workshops about bigdata and datascience.

Here is the agenda:

Data Innovation Training Hub

Brussels, BE
101 Fast learners

This group is an initiative from the nonprofit European Data Innovation Hub, the premier networking space where business, startup, academic and political decision makers share…

Next Meetup

Day 3: Coached Mooc – Introduction to Big Data with Apache S…

Tuesday, Jun 16, 2015, 7:00 PM
21 Attending

Check out this Meetup Group →

Datascience Meetup – VUBrussel – Security and Privacy

COLOURBOX4752687_small     privacy-big-data-291x300

Agenda:

• 18:30 update

• 19:00 Ann Wuyts from Collibra presents the governance challenges of the privacy and security issues.

• 19:30 Christoph Balduck from Inpuls, a practical approach towards the compliance to the EU regulations about data privacy & Data protection regulations.

• 20:00 Robert Gibbon from Big Industries, about setting up a secure Hadoop based environment.

•20:30 Paul Jordan, European Managing Director, IAPP will present the International Association for Privacy Professionals:  The independent Voice for Privacy in Europe.

• 21:00 Open Forum

• 21:30 Networking

Join us this Thursday at the VUB.

Data Science and Security & Privacy issues

Thursday, Jun 18, 2015, 6:30 PM

VUB – Aula QB
Pleinlaan 2B – 1050 Brussels, BE

83 Business & Data Science pro’s Attending

Agenda:• 18:30 update• 19:00 Ann Wuyts from Collibra presents the governance challenges of the privacy and security issues.• 19:30 Christoph Balduck from Inpuls, a practical approach towards the compliance to the EU regulations about data privacy & Data protection regulations.• 20:00 Robert Gibbon from Big Industries, about setting up a secure …

Check out this Meetup →

Job – IÉSEG School – Lille – Ph.D. CANDIDATES IN BIG DATA MARKETING ANALYTICS

logo_site_icma

JOB OPENING: Ph.D. CANDIDATES IN BIG DATA MARKETING ANALYTICS

The IÉSEG School of Management (Lille, France) is searching for PhD candidates in the field of big data marketing analytics. The content of the PhD is defined in the field of marketing analytics in a financial services or (r)etail context in collaboration with leading French institutions. The PhD candidates will get all the means and support to engage in innovative business relevant research projects with high potential to get published in international peer-reviewed journals. The Ph.D. students will integrate into the IÉSEG Center for Marketing Analytics team (http://icma.ieseg.fr).

The aim is to obtain the degree of PhD after 3 years.

About the IÉSEG School of Management

  • IÉSEG is AACSB and EQUIS accredited and is an active member of the ‘Conférence des Grandes Écoles’.
  • IÉSEG is one of the leading French business Schools in terms of research. The IÉSEG Research Center is accredited by the French CNRS (National Center for Scientific Research). IÉSEG promotes research and provides resources for active scholars.
  • The IÉSEG faculty is highly qualified and very diverse with 32 nationalities represented.
  • IÉSEG offers Bachelor and Master Degrees as well as Executive Education programs.
  • IÉSEG ranks 21st in the most recent Financial Times ranking of Masters in Management.
  • The Lille Campus is in the heart of the Northern French city. More information about IÉSEG School of Management is available online at: http://www.ieseg.fr

Qualifications of the PhD candidates

  • A profound interest to do high-quality academic research with a clear added value for business in an international environment.
  • A passion for big data and its analytical opportunities to improve customer relationships and company’s marketing strategy.
  • Obtained a master in Computational Linguistics, Engineering, Data Mining, Text Mining, Statistics, Computer Science or similar.
  • A degree in business, marketing or communication is a plus. 2/2
  • Programming skills of at least one statistical software language such as SAS/Base, SAS/Macro, SAS/Stat, SAS/IML, R and/or SQL are essential.
  • Notions of Matlab, C++, FORTRAN, Java SPARK or Python are a plus.
  • Fluent in English. Ability to write high-standard text in English.
  • Dynamic, pro-active, creative and serious personality.
  • Fluency in French is a plus.

Working conditions of the PhD candidates

The PhD students will work under the conditions of a research contract for a period of 3 years starting the latest in December 2015. Salary conditions are in line with the French research system.

Supervision of the PhD candidate

The PhD student will be supported by a multi-disciplinary team by Prof. dr. Kristof Coussement and Prof. dr. Koen W. De Bock, and surrounded by their international research network.

Apply:

Make sure that you are a member of the Brussels Data Science Community linkedin group before you apply. Join  here.

Please note that we also manage other vacancies that are not public, if you want us to bring you in contact with them too, just send your CV to datasciencebe@gmail.com .

Here is the link to the original job add.

The positions are being filled up as they are coming in. Interested? Please send your motivation letter and your curriculum vitae to icma@ieseg.fr .

 

Why don’t you follow these courses to boost your datascience skills ?

The European Data Innovation Hub organizes trainings and hands-on workshops about bigdata and datascience.

Here is the agenda:

Data Innovation Training Hub

Brussels, BE
101 Fast learners

This group is an initiative from the nonprofit European Data Innovation Hub, the premier networking space where business, startup, academic and political decision makers share…

Next Meetup

Day 3: Coached Mooc – Introduction to Big Data with Apache S…

Tuesday, Jun 16, 2015, 7:00 PM
21 Attending

Check out this Meetup Group →

Hiring Data Scientists: What to Look for?

datascienceapps

Check out the latest article in Data Science Briefings, the newsletter with updates on the latest news, trends, techniques, tools and our research in data mining and analytics.

Feel free to send this newsletter along to friends or colleagues; they can subscribe for free as well through our subscribe page. We keep an online record of our feature articles and QA’s over at www.dataminingapps.com as well, in case you want to catch up if you’re just joining us.

Kindest regards,
Prof. dr. Bart Baesens
Dr. Seppe vanden Broucke

Feature Article: Hiring Data Scientists: What to Look for?

Contributed by: Bart Baesens, Richard Weber, Cristián BravoSeppe vanden Broucke

In this column, we would like to elaborate on the key characteristics of a good data scientist from the perspective of the hiring manager. Big data and analytics are all around these days. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90% of the data in the world has been created in the last two years. Gartner projects that during 2015, 85% of Fortune 500 organizations will be unable to exploit big data for competitive advantage and about 4.4 million jobs will be created around big data. Although these estimates should not be interpreted in absolute sense, they are a strong indication of the ubiquity of big data and the strong need for analytical skills and resources, because as the data piles up, managing and analyzing these data resources in the best way become critical success factors in creating competitive advantage and strategic leverage.

To address these challenges, companies are hiring data scientists. However, in the industry, there are strong misconceptions and disagreements about what constitutes a good data scientist.

In this article, we will discuss the key characteristics of what makes up a good data scientist. It is based upon the authors’ consulting and research experience, having collaborated with many companies world-wide on the topic of big data and analytics.

A data scientist should be a good programmer

As per definition, data scientists work with data. This involves plenty of activities such as sampling and preprocessing of data, model estimation and post-processing (e.g. sensitivity analysis, model deployment; backtesting, model validation). Although many user-friendly software tools are on the market nowadays to automate this, every analytical exercise requires tailored steps to tackle the specificities of a particular business problem. In order to successfully perform these steps, programming needs to be done. Hence, a good data scientist should possess sound programming skills in e.g. R, Python, SAS, … The programming language itself is not that important as such, as long as he/she is familiar with the basic concepts of programming and knows how to use these to automate repetitive tasks or perform specific routines.

A data scientist should have solid quantitative skills

Obviously, a data scientist should have a thorough background in statistics, machine learning and/or data mining. The distinction between these various disciplines is getting more and more blurred and is actually not that relevant. They all provide a set of quantitative techniques to analyze data and find business relevant patterns within a particular context (e.g. risk management, fraud detection, marketing analytics, …). The data scientist should be aware of which technique can be applied when and how. He/she should not focus too much on the underlying mathematical (e.g. optimization) details but rather have a good understanding of what analytical problem a technique solves, and how its results should be interpreted. In this, the local formation of engineers in computer science and business/industrial engineering should aim at an integrated, multidisciplinary view, with recent grads formed in both the use of the techniques, and with the business acumen necessary to bring new endeavors to fruition. Also important in this context is to spend enough time validating the analytical results obtained so as to avoid situations often referred to as data massage and/or data torture whereby data is (intentionally) misrepresented and/or too much focus is spent discussing spurious correlations. When selecting the optimal quantitative technique, the data scientist should take into account the specificities of the business problem. Typical requirements for analytical models are:

  • actionability (to what extent is the analytical model solving the business problem?),
  • performance (what is the statistical performance of the analytical model?),
  • interpretability (can the analytical model be easily explained to decision makers?),
  • operational efficiency (how much efforts are needed to setup, evaluate and monitor the analytical model?),
  • regulatory compliance (is the model in line with regulation?)
  • and economical cost (what is the cost of setting up, running and maintaining the model?).

Based upon a combination of these requirements, the data scientist should be capable of selecting the best analytical technique to solve the business problem.

A data scientist should excel in communication and visualization skills

Like it or not, but analytics is a technical exercise. At this moment, there is a huge gap between the analytical models and the business users. To bridge this gap, communication and visualization facilities are key! Hence, a data scientist should know how to represent analytical models and their accompanying statistics and reports in user-friendly ways using e.g. traffic light approaches, OLAP (on-line analytical processing) facilities, If-then business rules, … He/she should be capable of communicating the right amount of information without getting lost into complex (e.g. statistical) details which will inhibit a model’s successful deployment. By doing so, business users will better understand the characteristics and behavior in their (big) data which will improve their attitude towards and acceptance of the resulting analytical models. Educational institutions must learn to balance, since it is known that many academic degrees form students that are skewed to either too much analytical or too much practical knowledge.

A data scientist should have a solid business understanding

While this might be obvious, we have witnessed (too) many data science projects that failed since the respective analyst did not understand the business problem at hand. By “business” we refer to the respective application area, which could be e.g. churn prediction or credit scoring in a real business context or astronomy or medicine if the respective data to be analyzed stem from such areas.

A data scientist should be creative

A data scientist needs creativity on at least two levels. First, on a technical level, it is important to be creative with regard to feature selection, data transformation and cleaning. These steps of the standard knowledge discovery process have to be adapted to each particular application and often the “right guess” could make a big difference. Second, big data and analytics is a fast evolving field! New problems, technologies and corresponding challenges pop up on an ongoing basis. It is important that a data scientist keeps up with these new evolutions and technologies and has enough creativity to see how they can create new business opportunities.

Conclusion

In this article, we provided a brief overview of characteristics to be looked for when hiring data scientists. To summarize, given the multidisciplinary nature of big data and analytics, a data scientist should possess a mix of skills: programming, quantitative modelling, communication and visualization, business understanding, and creativity! The below given figure shows how to represent such a profile.

datascience

Do you also wish to contribute to Data Science Briefings? Shoot us an e-mail (just reply to this one) and let’s get in touch!