Data scientist is among the most demanded jobs of the last decade. However, the expectations of what a data scientist is, differ between employees and employers. What employers are usually looking for does note necessarily fit the skills of a standalone data scientist, but rather the set of skills that only a group of data engineers, software developers, academic minds, machine learning engineers, and business stakeholders bring. Hiring such a team is a costly effort that will generate returns in the long-term.
But… aren’t data scientist the sexiest people?
Yes, I know. You’ve read this everywhere. Investors invest more in companies with data scientists onboard. Data scientists and data science will find correlations in your data that will bring value and strategic advantage to your company… You’ve heard all of this. Well, I am sorry, but I am here to wake you up from a wild dream. Do you really want to succeed in your journey to the #ageofdata? Then please follow our thoughts through this blog to ask yourself some questions first.
What is a data scientist?
If you are going to hire a data scientist, you better be sure what you are looking for. The definition of what a data scientist is has changed over the years. Five years ago, data scientists were people that had strong statistics foundations, good coding skills, and knowledge in a given domain. However, companies then began to look for what could be known as data science generalists. Data scientists without a given domain knowledge but able to solve any kind of problem at hand. Of course, with the expectation of finding correlations in your data that will bring value and strategic advantage to your company.
Usually, these data scientists arrive at a company with skills such as: statistics, machine learning, Python, R, and the ability to explain complex results to management. Data scientists mainly learned their skills online, following online courses that show them how to run fancy machine learning algorithms. They can classify data, cluster it, find correlations hidden to the naked eye, and they can use that information to drive value using code in R or Python. These are the expectations of data scientists towards their jobs.
What companies expect from data scientists?
There is a whole range of expectations towards data scientists that the hiring company puts on them. Let’s assume that companies want to offer the best services to their customers, regardless of the latter being internal, external businesses, or individuals. Then we need to ask what customers expect from a data scientist work.
In the #ageofdata, automation and AI, customers want exactly that. The best products and services that can take decisions on their own (except if GDPR complains say otherwise). This has a profound impact on what companies expect from data scientists.
The data scientist in this scenario must be able to:
- Understand the task at hand, identify, and gather the data needed.
- Understand business processes involved. There is no value added otherwise.
- Clean the data and understand it.
- Generate insights on the data.
- Generate automatically refreshed data dashboards using BI solutions for management.
- Build an application that brings value out of the results obtained.
Let’s see what these points imply on what a data scientist really is.
Figure 1: Venn diagram traditionally used to summarize the skills that a successful data scientist should have
Task understanding, data gathering, business processes and SAP
First and foremost, companies present real world data scenarios. Unfortunately, data scientists are born in the utopic world. They learn to read flat files, which are basically “excel-like” well-structured tables with perfect data. Unfortunately, companies do not have ideal files with data but enterprise software (SAP, Oracle…) filled with bad quality data.
Data scientists must know how to identify and gather data in SAP. We know that there are entire careers on this topic. And once the data is found, the task is still not over. Data scientists’ tools, R and Python, require the data to leave SAP. They need to know how to do that too. Yet another career path ahead.
Moreover, business processes. Let’s explain this one with an example. Imagine that you have generated a world-class machine learning algorithm that tells you that the best way to optimize production in your factory is generating 873.23 cell phones per week. However, the company you work for, can generate these items only in batches of 5000 units. What does your result mean in this case? Well, easy. You can throw it to the rubbish bin. The data scientist needs to understand that cell phones can be created only in batches of 5000. This means that data scientists must access to business processes across the company, which might be another career path on its own.
Classical data science: Clean the data, understand it, and generate insights
This would be indeed the job of a data scientist that most closely resembles the classical definition of the role. Funny enough, this is where the data scientist will spend the least time in the entire project. And at the same time, here the core of value and strategic advantages are obtained.
Let that sink in for a while. It means that Data scientists work the least time (less than 20 percent on average) on the activities that generate insights that in turn generate value for your company.
Delivering results: BI dashboards and applications
There was a time when PowerPoint slides ruled the world. When you could show them to a customer and ask for your well-earnt money. Those times are long gone. People and business want automation. Would you rather use a solution that tells you how many cell phones you need to build per week and offers you a 99 percent accuracy on the results, but that still requires ten different tasks to be updated in your business processes first? Or would you rather have a 90 percent accuracy from an application that does everything on its own and keeps you updated without any tasks from your side?
I will guess that, as me, you would go with the second option. Automation. AI. This is the core value of #ageofdata and data science. For that you need software, meaning: In addition to the career paths outlined so far, companies expect data scientists to be able to develop software. Again, an entire career path of its own.
One Person for multiple areas of expertise?
All the tasks outlined so far require data scientists to gather the knowledge of different life-long career paths in a single person. Moreover, each path has ever-growing landscape of technical tools, resulting in an infinite list of skills that data scientists must master, and keep updated, to deliver high quality work.
- Web application frameworks and languages: bootstrap, html, css, angular, nodejs
- Development tools, frameworks: Git, DevOps, Docker, Kubernetes
- Machine learning: scikit-learn, TensorFlow
- Business processes and enterprise software
- Big data frameworks: Hadoop, Spark
- Communication: Soft skills + PowerPoint
- Cloud deployment: Microsoft Azure, AWS
- BI tools: PowerBI, Tableau, Qliq, SAP Analytics Cloud
In my opinion, and I dare to say in generalized opinion of data science practitioners, such a person is a unicorn. It does not exist.
Figure 2 : Actual skills demanded from data scientists
Analytics center to pool experiences
With the previous data at hand, we arrive at this conclusion: Hiring a data scientist does not solve your problems, but a team of specialists to generate value from data science will. Whether you want to build the team internally or outsource it, you should aim for at least a 4-people team that gathers the skills shown in figure 3.
Investing in an entire team might be quite an expensive bet for your company from both financial and time-to-market measurements. If you need to establish a team fast, or if you are staking out your data science initiative or if you need an experienced team to complement your internal resources, you might benefit from working with an external team.
Figure 3: Structure for data science projects as orchestrated by our model of Analytics Center of Corporation
Contact us to build together your path to the industry 4.0 and get your strategic advantage for the upcoming #ageofdata.