Sai Geetha M N
Data Scientists, Data Engineers, ML Engineers And More - Demystified
As the world of Big Data, Machine Learning and Artificial Intelligence is taking off, there is an overlap of roles and responsibilities of various designations in different companies.
You have various flavours of the same job description, called by different names. There are business analysts, data analysts, data scientists, data engineers, machine learning engineers, software engineers, and management consultants all vying for parts of the new pie.
Even as enterprises are creating new roles and job descriptions, I thought, I must share my take on what these roles mean, most often, as it stands in the industry now. Along with the roles, what is the kind of skillset each should possess to be successful at their job.
This can guide you in pursuing your career line or even for recruiters to understand the positions they are trying to fill up. Even industry will benefit by standardising on these roles, as we mature.
Roles and Associated Skillsets
I have shown the skillset required by each in a coloured heatmap with the intensity depicting how deeply that skill is required by that role.
How to read this graph? The darker the colour, the more important that skillset is for that role.
You can see that there is a lot of overlap between many roles and hence the confusion in the minds of many, as to what skill they have to acquire for a role. Key points to note across the roles are:
Almost all roles need to be aware of programming tools relevant to their roles. The actual stack varies from role to role, nevertheless. All except a management consultant need to be familiar with programming tools and that cannot be done away with.
As you move from the left to the right, the skills required move downwards, indicating the shift from being client-facing to software development and operations management with the entire gamut of data skills in between.
While this is how the skills are split into, in many companies currently, a new role in the offing, is called the Full Stack Data Scientist which spans across many of these roles. That could be quite a coveted role but comes with its stretch and stress as well.
Now to look at the skillsets for each role:
The figure shows that a Management Consultant works closely with the clients and needs to have a complete understanding of the business. From a data perspective, he/she has to have a cursory appreciation and intuition about the associated data. They may not have to go too far beyond this.
This role is no different from typical management consultants and hence continues to remain the same, even in the new data world.
On the other hand, the data analyst may have a little lesser client interaction but has to be very comfortable with the understanding of the business, to derive insights. An analyst has to be able to do data cleaning, data manipulation, transformation and analysis.
Knowing statistics would help in doing better analysis. The most important aspect of the analysts' job is to derive insights through visualization techniques and be an extremely good storyteller. While communication is a key skill required for any role, the ability to weave a story out of the data is of paramount importance for this role. It is this ability to convey insights in a very compelling manner that helps in making data-driven business decisions. This is what can help in opening up hitherto unseen business opportunities.
Moving on to the data scientist, there are a whole host of skills required. Considering we are targeting the 'Data World', the majority of skillsets are required here. Understanding business and data intuition, data wrangling, statistics, calculus, and linear algebra are essential. With these as the basis, Machine learning models need to be developed. Basic data visualisation techniques are almost a prerequisite for getting a deep understanding of the data on hand.
Unless you plan to be a Full Stack Data Scientist, the best practices of software engineering, API development and the DevOps or MLops is not a must-have. Though, knowing a bit about them would be empowering.
From a role perspective, it is the data scientist who can come up with the right machine learning algorithms as solutions to certain problems. They can coerce the data to give the required insights, predictions etc. They typically do the model creation as well as define the metrics that measure the success of the problem they are solving. They also need to keenly monitor the statistical performance of the model and keep readjusting it through retraining or newer models.
A data engineer would essentially help in ingesting all types of data into a central data lake or some similar location and would help in productionising the ML Models or statistical models that a data scientist has produced.
Since, we are in the big data world, knowing to work with the big data stack is a must. Data wrangling, data ingestion, data transformation, data validation and overall ownership of data pipelines and sometimes, even the ML pipelines may be part of this role.
It is a must to know software engineering practices and DevOps too, in this role.
Traditionally, they would have built the data warehouses of the organization. However, now, they must have the skillset to deal with much beyond as the current data lakes can have structured, semi-structured and unstructured data. They collate all the data for the enterprise, clean, validate, deduplicate, do common transformations of the data and define standard practices of data ingestion and data egression from the data lake. They understand the nuances of various file formats, the storage and processing demands, the cataloguing of data, governance of data and maintaining the data lineage.
In some companies they don the hat of an ML Engineer too, where they productionise the machine learning algorithms, set up the data and ML pipelines and manage and monitor the performance of the algorithms and the models.
Machine Learning Engineer
Some companies separate the data engineer role from an ML engineer role and that narrows down the portfolio of both of them. In such a case, ML engineer takes on from where the data engineer stops, in terms of productionising the models.
Once all the required data is made available, the pipelines for data preparation specific to the ML model are done by an ML Engineer. From there on, continuing to productionise the models and making the models available for predictions, insights, forecasts, real-time business actions like fraud prevention etc. are all owned by the ML Engineer. The continuous deployment pipelines, the building of the feedback mechanism, the monitoring of the performance of the models as time progresses are all automated here.
So, strong knowledge of software engineering, programming tools, API development and DevOps is a must. In fact, the specialised skill of DevOps for ML called MLOps is something that is needed here.
It is good to know about all of the data skills like statistics, calculus, linear algebra, data wrangling, as you can play a good collaborative role with data scientists as you productionise and monitor models. But these are not a must-have set of skills.
This is a traditional role that the industry is aware of, nothing specific to the data world. Hence none of the data skills is mandatory here. However, strong software engineering and DevOps skills along with a wide range of programming tools and API plus App development skills are needed here. This is a well-known domain and hence I am not elaborating much here.
It is easy for a software engineer to transition into an ML engineer role with a few additional skillsets.
Another way of representing the same, for ease of reading, is shown here:
Intersection between these roles
There is a lot of overlap between roles but there are also unique expectations of these roles. A very simplest representation of the overlaps between the roles is shown in the figure below:
Intersection between various roles
The amount of overlap is not to scale but it is just indicative of the overlapping in skillsets. I have called out DevOps as an independent role here that can be a specialisation in itself but is very much required by other roles too, as shown above.
Hope this gives you a bit more insight into the skillsets associated with roles and you are more empowered to choose the right career path or to choose the right candidate for your project.
Any thoughts, comments would be very welcome, to know how these roles pan out in your organisations.