Why are so many people burning out of the “sexiest job of the 21st Century”?

New research has found that 97% of data professionals are feeling burned out in their day-to-day jobs, and 79% are considering leaving the industry altogether. With a major shortage of skilled data scientists in the market as it is, these numbers should be ringing major alarm bells... So, what's behind the “Data Science Brain Drain”, and how businesses can help it regain its former 'sexiest job of the century' glory?

Illustration of a businessman sitting on a burning match. Symbol of burn out
Shutterstock

The role of ‘data scientist’ saw a meteoric rise in the early stages of the specialism at the turn of the millennium. With data becoming one of the key ingredients for business success – and becoming more accessible than ever - data workers updated their CVs with the new job title and saw an instant increase in the salary they could command.

Database specialists, mathematicians, statisticians and physicists were some of the first to gravitate towards the data science industry. They wrote code from scratch and leveraged insights from relatively limited data sources. One of the biggest challenges, however, was piecing together data owned by different departments, controlled by different people, and stored within hard to access data silos. 

As no official data science training existed to upskill these specialists, they had to rely on their own coding knowledge to deliver data-driven insights. Their work had a high degree of freedom and needed an exceptionally high level of technological expertise to unlock business value – the perfect attributes for a popularity boom. In 2012, Davenport and Patil declared that “Data Scientist” was the “Sexiest Job of the 21st Century” and times were good.  

20 years on: what’s changed? 

A record demand for data scientist roles continues to rise in 2022. Today, LinkedIn has 44,000 listings for data scientist roles within the UK alone. With data-driven insights now a pre-requisite for supercharging efficiency and performance improvements, data science will continue to be one of the most in demand specialisms. An unsurprising statement given that the volume of data created is growing at an exponential rate – projected to reach 180 zettabytes by 2025.  

The need to leverage wild unstructured data for insight generation has never been more critical. With global supply chain disruption from COVID, net zero initiatives throughout the Western world, and the current energy crisis sparking a rethink of efficiency standards, businesses are asking more detailed questions than ever before. These are questions that also require a greater degree of data literacy, experience, and domain knowledge to develop insights around. Gartner confirms this, highlighting that data literacy is a critical and necessary driver of business value.  

This demand for data-led competitive advantage has been growing for some time. Brandon Purcell, a Forrester Research analyst, advised in 2019 that “The rise of AI and machine learning may also be a factor in the dramatic increase in demand for data scientists … A lot of this is because of branding. Many companies see data scientists as the key to embracing AI or machine learning, which are the hottest technologies out there.”

The Data Science Exodus 

The ever-increasing demand for data insights, combined with the data science skills gap, the number of open roles, and unsustainable workloads, has a knock-on effect. While enterprises often have access to data as the raw material, they also tend to lack the robust data infrastructure – a data-literate workforce and people with advanced data skills – to actualise the value from data in innovative and effective ways. The end result is one where existing data workers are burning out and are considering an exodus. 

Despite a massive increase in demand for data scientists over the last decade, 97% of data professionals now report they are “experiencing burnout in their day-to-day jobs”. According to a global study from a data science careers platform, data scientists now stay in their roles an average of only 1.7 years – falling far short of the average software developer tenure of 4.2 years. The research highlighted faulty data pipelines, finding and correcting data issues, unrealistic corporate expectations, and a culture of being “shamed and blamed” as key factors behind the burnout – a challenge present across the entire data skills specialism. 

Findings from an Alteryx-commissioned IDC Infobrief supported this conclusion, highlighting that 100,000 human lifetimes worth of data and analytic work hours are lost annually worldwide through the use of legacy spreadsheet software amongst data native workers. The research noted that 91% of organisations reported “some area of skills gaps in data and analytics” with a particular shortage of skills involving predictive, prescriptive, and machine learning. 

It is evident that this skills gap is not limited to any one area of the data science specialism. We are not only seeing a hugely increased demand for – and subsequently increased pressure on - highly skilled data scientists, but are also seeing this mirrored with data engineers, data workers, and data natives. Of those workers cited as experiencing burnout, the research noted that these data specialists were mostly proficient in Python (85%) and SQL (82%), with 56% holding master’s degrees. 

Without this foundational benchmark of data talent, and a continuous upskilling across the entire data continuum – from data native to data engineer and data scientist - we’re seeing a whiplash effect – one where the lack of capacity or experience in less experienced workers is picked up by those more experienced than them.  

Data Scientist retention: the need for foundational skillsets  

The data science and analytics sector is at a similar crossroads today to what we saw in IT teams in the mid-1900s - the fulcrum point between ivory tower specialists and overall business enabler. These are challenges that were solved by integrating greater levels of upskilling across businesses and moving away from the siloed model of working.  

IBM introduced the world’s first mass-market personal computer in 1981 helping to democratise the power of computing. The “Mark 1” of this innovation saw widespread use in 1944. This tool, however, measured 50 feet long and eight feet tall. Due to the cost involved, and the experiential barrier to entry, the exclusive skillset needed to access and use these machines was an absolute necessity. As this Mark 1 system evolved – becoming more accessible and user friendly - the productivity benefits became more apparent. The further democratisation of the technology and wider upskilling was a clear next step for success.  

Looking back, the levels of efficiency and productivity we enjoy today through the use of the personal computer would be impossible if the responsibility for the Mark 1 remained siloed within the IT team. If these teams held on to that siloed approach, we would have seen a similar burnout, mass staff exodus, and myriad other challenges associated with an unsustainable workload and exponential demand on their time... as we are currently seeing in data science.  

While the skills gap is certainly the most understood gap in business, one of the most pressing challenges in the data science sector today is the understanding gap. Put simply, the solution to resolving the data science exodus is not for businesses to hire more data scientists – the change needs to be foundational. It is simply not an efficient use of resource to have a master mechanic perform oil changes.  

The core strategy to mitigate data science burnout and retain your staff is to develop a more robust data team - one built on a strong foundation of domain expert data analysts, data engineers and led by data scientists. More importantly, it needs to be facilitated by a strong foundational data skillset among in-department data natives. Data teams with each experience level upskilled to the point they can enable the next step up the ladder will be able capitalize on what their data is telling them. Ensuring that significant volumes of complex, unstructured data are refined into standardized pipelines of relevant, timely, and high-quality data that deliver business value.