The data skills shortage isn’t what you think
The demand for data scientists is higher than ever as companies turn to the power of data and AI for everything from improving their customer experience to increasing operational efficiency.
Indeed, a 2021 survey found that data scientists in the United States can expect a median salary of $164,500 in 2020, up 8% from the median salary of $152,500 in 2019.
The fact that data scientists are increasingly hard to find and expensive to hire threatens to derail the data-driven future that organizations are working toward. But what if someone tells you that the shortage is not as bad as it first appears?
Dissecting the data talent shortage
In an opinion piece on TechRepublicGuillaume Moutier says the real problem behind the data talent shortage isn’t what many of us might think.
Moutier is the lead data engineering principal architect at Red Hat Cloud, and he says the shortage isn’t stemming from a shortage of data scientists who can do data modeling, but from finding people versed in data management and manipulation.
“Therein lies the shortage: data scientists who are almost as good at software engineering as they are at data modeling. Companies need people who know how to produce their output so it can be used in real use cases, not just people who can build an effective model,” he explained.
In a nutshell, Moutier says we don’t have enough people to do the required work around business analysis and data preparation, the latter ranging from accessing data repositories to manipulating data. (Data management is the process of transforming and mapping data from one format to another)
So yes, while we might need rocket scientists to build a rocket, they aren’t the only experts needed. And in the context of the field of data science, these other roles within the data ecosystem vastly outnumber those of data scientists.
The Making of a Data Engineer
Basically, training these data professionals takes far fewer resources and less time than it takes to train full-fledged data scientists. And while it sometimes takes years for a data scientist to complete their training, followed by several years for work experience to make a difference, organizations can equip the right professionals with data-centric skills fairly quickly.
A recent blog post from online learning platform DataCamp highlighted the wide range of technical skills data engineers need to possess. Skills include database management, programming and cloud computing, as well as knowledge of ETL (Extract, Transform, Load) frameworks to manipulate data processing and stream frameworks to work with real-time data .
Seen in this light, it is obvious that experienced database administrators or programmers already have a significant part of the skills that data engineers need. It’s no wonder organizations are starting to train their existing employees to transition into data science roles.
Bridging the Data Science Gap
Also, while data engineering roles probably make up the lion’s share of data roles, they are by no means the only ones. For example, there is the data analyst responsible for visualizing and transforming the data, the data storyteller for finding the narrative that best expresses the data, or the business intelligence developer for providing the analyzes and business information – and many other specialized roles.
Suddenly, bridging the data science gap seems a lot less daunting – and organizations run the risk of missing the forest for the trees if they focus only on the data scientist and ignore the larger data ecosystem. Of course, there’s no escaping the requirement to set competitive salaries and benefits to hire the data scientists you need.
To excel in data science, companies need to focus on building a team.
Ultimately, companies are increasingly moving away from instinctive decisions or limiting personal experiences when making major business decisions. They see machine learning and data-driven decisions as the way forward to improve the efficiency (or operations) of their organizations and increase profitability.
And as I observed previously, organizations looking to succeed with data must first establish organization-wide competence with data by developing data-centric skills and fostering the democratization of data. data. And they have to start today.
Paul Mah is the editor of DSAITrends. A former system administrator, programmer and professor of computer science, he enjoys writing code and prose. You can reach him at [email protected].
Image credit: iStockphoto/NickR