Today, let’s get back to basics and discuss the various steps in executing a data science project. One can go more granular than what we have in the following paragraphs. As they say, today’s content gives a ten thousand feet view.
Converting the Business Problem Statement into a Data Science Problem Statement:
If a business problem is expressed in a precise statement that is not confirmed by data, the solution must be non-data science. For example, the sales team claims that the units of a product are selling less and less every month. When you check the database, there could be three possibilities: 1) The sales are going down by roughly five per cent monthly, 2) There is no data stored to calculate the sales, and 3) The sales are not going down. Instead, they are increasing month by month. In the first case, we try to find a data-based solution for the problem. In the second case, a solution that is not based on data needs to be explored. In the third case, whether there is a problem with the sales must be further investigated.
Data Identification and Collection:
The next important activity is data identification and collection. We must determine which variables and data points to explore to gain further insights into the business problem. Which time slice of records would we need for our analysis? Which types of products, customers, or geographies can we exclude or include? Then, we must collect the data. The data engineering team can help obtain relevant data from various sources and store it at a location accessible by the analyst.
Data Preparation:
The data scientist must prepare the data to make the sets usable. They treat the datasets for missing values, outliers, and junk values. The variables need to be transformed into correct data types. For example, we must convert the date variables into a standard format across all sets. We must decide whether dates as strings are enough or must be converted into a date-type variable.
Exploratory Data Analysis:
Once the datasets are prepared, we can analyse the variables to understand their behaviour and relationships and extract as many insights as possible. To accomplish this, we plot various charts and graphs.
ML Model Building:
Often, recommendations resulting from insights serve as solutions to business problems. At other times, we prepare the data further and rely on predictive modelling to build a machine learning model. The ML model can be used to predict the values of the dependent variable in future. Such predictions give us the foresight to fix or act on something. For example, we can prevent a customer from churning, a machine from failing, a fraudulent transaction from passing through, and a product from running out of stock. We should remember that the entire solution to the business problem usually does not comprise only AIML outputs. The AIML component might frequently constitute only 20 or 30 per cent of the solution.
Deployment, Integration, and Monitoring:
Next, the solution must be deployed and integrated into the existing IT system and business process. We must monitor the business team’s adoption of the solution and determine its performance in positively impacting the business. In case the performance goes below a threshold, primarily due to a poorly performing ML model, we must plan to retrain the model with recent data points, additional variables, or both.
I have stressed, on multiple occasions, the difference between technical performance metrics and business performance metrics. A model’s sound technical performance is essential for sound business results. However, superb technical performance does not necessarily always lead to the best business results. The solution could be adopted poorly because of its hard-to-use features. The solution proposed might not be effective. An AUC (Area Under the Curve) value of 0.89 means nothing unless it increases sales by 7 per cent.
If this inspires you to dive deeper, congratulations on your marriage with data!
Disclaimer
Views expressed above are the author's own.
Top Comment
{{A_D_N}}
{{C_D}}
{{{short}}} {{#more}} {{{long}}}... Read More {{/more}}
{{/totalcount}} {{^totalcount}}Start a Conversation