Data Science requires data, tools and a desire to solve a problem. To be effective you need:
- a central environment that allows the collection, cleaning, and storage of data,
- tools to allow the analysis of that data,
- a way to deploy the insights that you develop.
This will require a team of individuals with expertise in different areas. This section describes the makeup of a good Analytical team to support the companies developing Data Science projects. Understanding how Data Science projects are developed will help you in describing potential projects for data science and also will help you to interact with the team.
Overview of a Data Science Team
The diagram below shows this flow as well as indicates types of people that go into making the entire system work.
Now that you are understanding how data science can improve company performance you might wonder how these projects get developed. This section will describe the inner working of a data science project in high-level detail. There are 4 major components to a data science project and as illustrated by the name it starts with Data. There are essentially 4 functions that make up a data science project. They are:
1. Data Gathering – this is getting the data from its source to a usable form for analysis. This may mean getting data out of the Point of Sale System (POS), Enterprise Resource Planning (ERP) system or Contact Relationship Management (CRM) system into a Data Lake (a central repository for storing multiple sources of data in one location) for being able to do analysis. With that move from the source to the Data lake, a process called data cleaning is done. This means that the data is made to be accurate (putting in missing data, getting the data in the right format and getting the data ready for usage). People who do this are usually called Data Engineers.
2. Data Analysis – once the data is gathered in a centralized location with cleaned information then that data can then be used to create dashboards, do predictions with machine learning or used to power a deep learning initiative. This data analysis is where the new data science techniques we discussed earlier are used. This can be done by a Data Analyst, Data Scientist or AI Expert. Each area goes progressively deeper into using the Data Science tools at hand.
3. Results Deployment – Once the results are known and the models are proven to reach the planned goal then the users have to get access to the information in a way that allows them to make better choices. So if you have a model that predicts sales then the person that makes decisions based on predict sales needs to have access to the model. This access needs to be up 24/7 and is now a live productive application. This is moving a project from discovering what works to now we have a working system that needs to be live. This is usually a team in IT or Dev Ops.
4. Data Architecture – Finally all of the above areas of expertise build and explore the data on a system that needs to be designed for the above needs of the first 3 functions. This is everything from where do we store all this data in a commercial cloud (AWS, Azure, Google, etc.) or on our own machines? There are questions that this role has to answer. The title for this position is usually a Data Architecture.