Hiring Data Scientists: What to Look for?

datascienceapps

Check out the latest article in Data Science Briefings, the newsletter with updates on the latest news, trends, techniques, tools and our research in data mining and analytics.

Feel free to send this newsletter along to friends or colleagues; they can subscribe for free as well through our subscribe page. We keep an online record of our feature articles and QA’s over at www.dataminingapps.com as well, in case you want to catch up if you’re just joining us.

Kindest regards,
Prof. dr. Bart Baesens
Dr. Seppe vanden Broucke

Feature Article: Hiring Data Scientists: What to Look for?

Contributed by: Bart Baesens, Richard Weber, Cristián BravoSeppe vanden Broucke

In this column, we would like to elaborate on the key characteristics of a good data scientist from the perspective of the hiring manager. Big data and analytics are all around these days. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90% of the data in the world has been created in the last two years. Gartner projects that during 2015, 85% of Fortune 500 organizations will be unable to exploit big data for competitive advantage and about 4.4 million jobs will be created around big data. Although these estimates should not be interpreted in absolute sense, they are a strong indication of the ubiquity of big data and the strong need for analytical skills and resources, because as the data piles up, managing and analyzing these data resources in the best way become critical success factors in creating competitive advantage and strategic leverage.

To address these challenges, companies are hiring data scientists. However, in the industry, there are strong misconceptions and disagreements about what constitutes a good data scientist.

In this article, we will discuss the key characteristics of what makes up a good data scientist. It is based upon the authors’ consulting and research experience, having collaborated with many companies world-wide on the topic of big data and analytics.

A data scientist should be a good programmer

As per definition, data scientists work with data. This involves plenty of activities such as sampling and preprocessing of data, model estimation and post-processing (e.g. sensitivity analysis, model deployment; backtesting, model validation). Although many user-friendly software tools are on the market nowadays to automate this, every analytical exercise requires tailored steps to tackle the specificities of a particular business problem. In order to successfully perform these steps, programming needs to be done. Hence, a good data scientist should possess sound programming skills in e.g. R, Python, SAS, … The programming language itself is not that important as such, as long as he/she is familiar with the basic concepts of programming and knows how to use these to automate repetitive tasks or perform specific routines.

A data scientist should have solid quantitative skills

Obviously, a data scientist should have a thorough background in statistics, machine learning and/or data mining. The distinction between these various disciplines is getting more and more blurred and is actually not that relevant. They all provide a set of quantitative techniques to analyze data and find business relevant patterns within a particular context (e.g. risk management, fraud detection, marketing analytics, …). The data scientist should be aware of which technique can be applied when and how. He/she should not focus too much on the underlying mathematical (e.g. optimization) details but rather have a good understanding of what analytical problem a technique solves, and how its results should be interpreted. In this, the local formation of engineers in computer science and business/industrial engineering should aim at an integrated, multidisciplinary view, with recent grads formed in both the use of the techniques, and with the business acumen necessary to bring new endeavors to fruition. Also important in this context is to spend enough time validating the analytical results obtained so as to avoid situations often referred to as data massage and/or data torture whereby data is (intentionally) misrepresented and/or too much focus is spent discussing spurious correlations. When selecting the optimal quantitative technique, the data scientist should take into account the specificities of the business problem. Typical requirements for analytical models are:

  • actionability (to what extent is the analytical model solving the business problem?),
  • performance (what is the statistical performance of the analytical model?),
  • interpretability (can the analytical model be easily explained to decision makers?),
  • operational efficiency (how much efforts are needed to setup, evaluate and monitor the analytical model?),
  • regulatory compliance (is the model in line with regulation?)
  • and economical cost (what is the cost of setting up, running and maintaining the model?).

Based upon a combination of these requirements, the data scientist should be capable of selecting the best analytical technique to solve the business problem.

A data scientist should excel in communication and visualization skills

Like it or not, but analytics is a technical exercise. At this moment, there is a huge gap between the analytical models and the business users. To bridge this gap, communication and visualization facilities are key! Hence, a data scientist should know how to represent analytical models and their accompanying statistics and reports in user-friendly ways using e.g. traffic light approaches, OLAP (on-line analytical processing) facilities, If-then business rules, … He/she should be capable of communicating the right amount of information without getting lost into complex (e.g. statistical) details which will inhibit a model’s successful deployment. By doing so, business users will better understand the characteristics and behavior in their (big) data which will improve their attitude towards and acceptance of the resulting analytical models. Educational institutions must learn to balance, since it is known that many academic degrees form students that are skewed to either too much analytical or too much practical knowledge.

A data scientist should have a solid business understanding

While this might be obvious, we have witnessed (too) many data science projects that failed since the respective analyst did not understand the business problem at hand. By “business” we refer to the respective application area, which could be e.g. churn prediction or credit scoring in a real business context or astronomy or medicine if the respective data to be analyzed stem from such areas.

A data scientist should be creative

A data scientist needs creativity on at least two levels. First, on a technical level, it is important to be creative with regard to feature selection, data transformation and cleaning. These steps of the standard knowledge discovery process have to be adapted to each particular application and often the “right guess” could make a big difference. Second, big data and analytics is a fast evolving field! New problems, technologies and corresponding challenges pop up on an ongoing basis. It is important that a data scientist keeps up with these new evolutions and technologies and has enough creativity to see how they can create new business opportunities.

Conclusion

In this article, we provided a brief overview of characteristics to be looked for when hiring data scientists. To summarize, given the multidisciplinary nature of big data and analytics, a data scientist should possess a mix of skills: programming, quantitative modelling, communication and visualization, business understanding, and creativity! The below given figure shows how to represent such a profile.

datascience

Do you also wish to contribute to Data Science Briefings? Shoot us an e-mail (just reply to this one) and let’s get in touch!

Advertisements

Job – Amplidata – Big Data System Engineer IWT/CAP

amplidata_logo_transp-hi-res

Big Data System Engineer IWT/CAP

A unique systems engineering job ! 

Building the Cloud Storage technology  of tomorrow… in Ghent!

Company: Amplidata N.V.
Sector: Big Data Cloud Storage
Products: AmpliStor Exabyte Storage
Location: Ghent/Lochristi, Belgium

Reading this can dramatically change your future… Who’s Amplidata ?
Amplidata is the sole European company that builds the Big Data cloud storage of tomorrow. Our customers are the largest Telco’s, datacenter providers and governments in the world. With our technology they manage and unlock Peta- and very soon Exa-bytes of data: Hollywood movies, genetic data from billions of species, music archives from the best performers all over the world, surveillance videos and so much more…

A unique job in Big Data !
At Amplidata you combine:

  • Passion and Flexibility
  • Challenge and Collegiality

You are key for the success for Amplidata. That’s why you will get a very attractive remuneration: fixed salary (75th percentile), a decent company car, a complete insurance package, stock options, meal vouchers, a GSM and an Internet subscription.

What’s in it for you?

Work together with the top-class engineers in ICT and Storage and live in a world of congenial minds and report directly to the CTO. As a team, we eat complex challenges for lunch, like building a high-performance Big Data Systems.

You will be part of an IWT/CAP research project led by Amplidata in cooperation with an important industry partner and with Sirris, the collective centre of the Belgian technological industry.

You will be responsible for helping engineer and integrate Hadoop-based systems within our AmpliStor Cloud Storage-technology whilst further researching, testing and implementing new technologies.

You will work with technologies such as Hadoop, Python, the Amazon S3 API and REST/HTTP.

The ideal candidate will be passionate about up and coming technologies and love experimenting with new technology.

Amplidata saves no energy and time to take you and your knowledge to the next level. In return we expect 100% commitment to your job. Oh by the way, our office hours are flexible. One day a week you can work from home, you have the freedom to plan your own workday and with our ironing service, that horrendous job is now history for you or your partner.

Do you have what it takes to join our team and “make” history in ICT ?
Send us your mobile number, email address or send your LinkedIn profile.  We look forward to meet you!

More Jobs ?

hidden-jobs1

Click here for more jobs offers

Check out our twitter account: @datajobsbe