This article was originally published here
Although the Data Scientist has been declared the sexiest job of the 21st century by HBR and others, if we are honest, we need to admit that data scientists are still associated with nerds by the mainstream population. This data innovation survey was the perfect opportunity to me to investigate whether data scientists are really that nerdy as perceived by many.
I started this article by looking up some background information (after all, I do consider myself as a data scientist) on nerds. I found a very appropriate description on Wikipedia:
Nerd (adjective: nerdy) is a descriptive term, often used pejoratively, indicating that a person is overly intellectual, obsessive, or socially impaired. They may spend inordinate amounts of time on unpopular, obscure, or non-mainstream activities, which are generally either highly technical or relating to topics of fiction or fantasy, to the exclusion of more mainstream activities. Additionally, many nerds are described as being shy, quirky, and unattractive, and may have difficulty participating in, or even following, sports. Stereotypical nerds are commonly seen as intelligent but socially and physically awkward. Some interests and activities that are likely to be described as nerdy are: Intellectual, academic, or technical hobbies, activities, and pursuits, especially topics related to science, mathematics, engineering and technology.
Does any of this sound familiar to you?
Let’s dive into the results of the data innovation survey, together with my best friend SAS Visual Analytics, to check if these stereotypes are true in the Belgian Data Science Landscape.
Stereotype n°1: All data scientists are young males
It probably doesn’t come as a surprise to you that the 87.2% of the respondents are male, but I’m glad to see that 36 other woman took the survey along with me. In terms of age, we do find a lot of youngsters, but the categories above 35 seem to be well represented too.
Stereotype n°2: Data scientists are in front of their computer all night
Participants had nine days to respond to the survey. In the bar chart below you can see on which days the 289 respondents submitted the survey. We observe a clear pattern in the beginning of both weeks and strangely enough a drop towards Friday 13th… Maybe data scientists are more superstitious than they would like to admit?
Even more interesting to analyze are the times of the day when people took the survey. To my big surprise there’s a peak in the morning, so the Belgian data scientists seem to be early birds!
As we received the start time and the end time, I also calculated how long the average data scientist took to solve the questionnaire: 12.66 minutes, but the median data scientist had the job done in 10 minutes. We all remember our first statistics class: when the median is not equal to the mean, there is no symmetric distribution…
Stereotype n°3: Data scientists are disconnected from the real world
If all data scientists are actually nerds, then they should all be quite “unworldly”. According to the Belgian Data Science survey, almost one third is working for a business organization or NGO with 7 777 employees worldwide on average, doesn’t sound that nerdy to me…
In total, 42% of the Belgian data scientists who took the survey are employed in the IT and technology industry. Ok, what else did you expect?
If data scientists were really that socially inadequate as what could be believed by some bad influences, they would never make it to a management position in their organization. And look, almost 55% our respondents have management responsibilities to a certain extent.
Stereotype n°4: All Data scientists hold a PhD in science or mathematics
Wrong again! Only 18.3% of the Belgian Data Scientists are holding a PhD degree. Although the majority graduated in science&math, ict or engineering, a significant amount completed commerce or social studies.
Stereotype n°5: All Data scientists are programming geeks and only use non-mainstream techniques
In part 6 of the survey, participants were asked to rate their skills with a score between 1 (don’t know this technique) and 5 (I’m a guru). It turns out that data scientists are not all guru’s in the newer techniques like big data and machine learning but are mostly familiar with traditional techniques like data manipulation (regexes, Python, R, SAS, web scraping) and structured data (RDBMS, SQL, JSON, XML, ETL).
Although we observe some quite high correlations (between math & optimization 0.73, big data & unstructured data 0.67, …) it doesn’t necessarily mean that the scores are high on these topics. This is clearly illustrated with the heat maps below. On the left we have math and optimization which are highly correlated but with low scores, and on the right there is data manipulation and structured data with a moderate correlation of 0.42 but with the highest scores.
Stereotype n°6: All Data scientists are socially isolated and afraid to appear in public
The Belgian Data Scientists don’t only attend the monthly meetup meetings to learn about the new developments in Data Science or to hear what’s happening on the Belgian Data Science scene, but many of them also state social and networking reasons as motivation to get away from their pc to attend these meetings.
Stereotype n°7: There are clear role models for data scientists, they all look up to the same persons
Not that many respondents seem to be influenced by other data scientists in this world, as only a few of them answered this question with the name of a fellow data scientist and mostly different ones. For Belgium on the other hand, we do find two names that each appeared eight times among the answers. Congratulations to Bart Baesens and Philippe Van Impe, the Belgian Data Science guru’s!
The conclusion of the analysis of the Data Innovation Survey is as straightforward as simple: Data Scientist is the sexiest job of the 21st century! Unfortunately I’ll have to finish off here as my pole dancing class is going to start…