What does a data scientist REALLY look like?

The cast of HBO's Silicon Valley (Source: HBO)

In this guest post, data scientist Genevieve Hayes analyzes the responses from Stack Overflow's 2018 Annual Developer Survey to build a portrait of data scientists today.

Six years ago, the Harvard Business Review named data scientist the "sexiest job of the 21st century." Since then, data scientist has become one of the US's fastest-growing professions, with graduates achieving six-digit starting salaries and employer demand continuing to outstrip supply.

But who are these people lucky enough to score a gig that Glassdoor has described as the "best job in America"? What does it take to become one of them? And is being a data scientist really as great as the hype would have you believe?

To explore these questions, I used data collected by Stack Overflow in response to their 2018 Annual Developer Survey. This data set contains almost 100,000 responses from software developers from 183 countries and territories worldwide.

Of the respondents, 7,088 (7.7%) self-identified as data scientists. These respondents were compared to the remaining 85,010 non-data scientist software developers represented by the data.

Part 1: What does a "typical" data scientist look like?

Computer science and software development have historically been portrayed as the domains of nerdy male programmers. Just look at the cast of HBO's Silicon Valley (pictured at the top of this post) to see what I mean.

But with the recent hype around data science, it was my hope that this might have changed. Could the prospect of working in "the sexiest job of the 21st century" be enough to attract a more demographically diverse group of individuals to computing and tech? The answer appears to be no.

gender_age_combined
Figure 1: Comparison of gender (left) and age (right) distributions for data scientists (DS) vs non-data scientists (Non_DS)

As can be seen from Figure 1, the age and gender distributions of data scientist and non-data scientist respondents are almost identical. The average age of both data scientists and non-data scientists is 30.5 years, and 91% of data scientists are male, compared to 92% of non-data scientists.

This suggests that, rather than attracting individuals from new demographics to computing and technology, the growth of data science jobs has merely creating a new career path for those who were likely to become developers anyway.

Yet, comparing the educational backgrounds of data scientists and non-data scientists does reveal one key difference between these two groups.

ed_compare
Figure 2: Comparison of highest degree level distributions for data scientists (DS) vs non-data scientists (Non_DS)

Figure 2 shows that, even though, contrary to popular belief, it is possible to become a data scientist without a master's or Ph.D., data scientists are much more likely to hold an advanced degree than non-data scientists, with 45% of the data scientist respondents holding a master's or a Ph.D., compared to 23% of the non-data scientists.

This suggests a difference in skills required for data science and non-data science developer roles, with data science roles more likely to require skills that are taught as part of advanced degree programs.

Part 2: How do the coding skills differ between data scientists and non-data scientists?

Given the higher academic requirements employers place on data scientist roles, this raises the question: Do employers also require greater coding experience of their data scientists compared to their non-data scientists?

Figure 3 shows that the opposite is, in fact, true.

coding_compare
Figure 3: Comparison of the distribution of professional coding experience for data scientists (DS) vs non-data scientists (Non_DS)

Data scientists typically have fewer years professional coding experience than non-data scientist developers, with 62% of the data scientist respondents having five or fewer years of professional coding experience, compared to 57% of non-data scientists.

This suggests that, rather than demanding more of data scientists in all respects, in developer roles, there exists a trade-off between coding skills and the sorts of technical skills that are taught in universities.

Yet, not all programming languages are created equal, and the programming languages data scientists and non-data scientists use in their day-to-day jobs are not necessarily the same.

Data scientists are more likely to use languages designed for, or with libraries for, statistical modelling and analysis, such as Python or R, while non-data scientists are more likely to program in languages associated with web development activities, such as HTML, CSS, and JavaScript.

For example, 77% of data scientists report having programmed in Python in the past year, compared to 35% of non-data scientists, while 72% of non-data scientists report having programmed in JavaScript in the past year, compared to 55% of data scientists.

This reflects the differences in the types of tasks commonly performed by data scientists, who typically focus on using statistics and modeling techniques to derive insights from data, versus non-data scientists, who are more likely to be involved in software engineering or web development-type activities.

Part 3: Are data scientists more satisfied with their careers than non-data scientists?

If data scientist really is the best job to be in right now, then we would expect data scientists to be more satisfied than non-data scientists with both their jobs and their careers in general. And this is exactly what we observe from the data.

However, even though data scientists do tend to be more satisfied with both their jobs and their careers than non-data scientists, both groups tend to enjoy high levels of satisfaction in their jobs and careers.

Figure 4 shows that 73% of data scientists and 70% of non-data scientists are at least slightly satisfied with their jobs, while 74% of data scientists and 73% of non-data scientists are at least slightly satisfied with their careers.

job_career_sat_combined
Figure 4: Comparison of the job satisfaction (left) and career satisfaction (right) distributions for data scientists (DS) vs non-data scientists (Non_DS)

Therefore, even if a career in data science is not for you, any development-related role is likely to lead to levels of job and career satisfaction similar to those of the "best job in the US."

Conclusion

After exploring what it takes to land a job as a data scientist, and how this differs from landing a non-data scientist developer role, as well as comparing the levels of job and career satisfaction of people in these two groups, we found:

  1. Although data scientists and non-data scientists tended to come from similar demographic backgrounds (that is, predominantly, young males), data scientists were more likely to have an advanced degree than non-data scientists but tended to have less professional coding experience.
  2. Data scientists were more likely to make use of statistical and modeling-focused programming languages, such as Python and R, than their non-data scientist counterparts, who tend to favor web development-focused languages, such as HTML, CSS, and JavaScript.
  3. Even though data scientists enjoy higher levels of job and career satisfaction than non-data scientists, both groups tend to be highly satisfied with their jobs and their careers.

Putting this all together, it seems that a typical data scientist is, therefore, the stereotypical nerdy male programmer: a male in his early 30s with an advanced degree and some professional experience programming in languages such as Python or R.

However, just because this is what a "typical" data scientist looks like now, this does not mean that this is what one will look like in the future. In fact, for the sake of the global economy, this image will have to change.

As mentioned previously, data science is a fast-growing profession where demand consistently outstrips supply and is expected to do so for many years to come.

The best way to meet this demand is for employers to look for ways to attract individuals from demographic groups that have traditionally been underrepresented in computer science and technology to this profession.

If you don't see yourself as fitting the "typical" data scientist mold, therefore, my advice is: don't be discouraged.

There is plenty of room in the data science profession for people of all backgrounds, and based on the levels of job and career satisfaction enjoyed by data scientists, the effort involved in developing the skills necessary to gain a data science role is well worth it.

After all, who wouldn't want to work in the "sexiest job of the 21st century"?

To learn more about this analysis, visit the GitHub repository for this project.