Customer retention is another common use case that is a good candidate for machine learning. For a company that offers a subscription based model, for example, we might go back and label all past and current customers as having either cancelled their subscription (“churned”) or not. We could put together a table that looks like this:
Here, each row would represent a unique customer and the columns would represent different features describing that customer. The last column could be our target: a binary column specifying if the customer has cancelled their subscription (Yes or No). We could train a machine learning algorithm on this dataset to predict if any given customer will churn.
Again, this approach is full of problems similar to what we initially discussed in our regression example. First of all, this dataset may include customers with wildly different tenure lengths. We are comparing apples to oranges by comparing new and old customers, and for the customers that have not cancelled, we have no information about whether or not they will cancel down the road. Newly acquired customers may have all the characteristics of a terrible customer (maybe we know that males in their twenties who don’t buy much in their first month tend to cancel their subscription soon after) and since they are fairly new and haven’t cancelled yet, we are training our machine learning algorithm to associate those characteristics with a good customer that has not cancelled.
As before, the way to avoid these pitfalls is to get precise about how we define churn and about how we prepare a dataset for the problem. Let’s consider incorporating time into the question. We could choose to study which customers are going to cancel their services within their first 6 months. We may, for example, use their behavior during their first customer month to predict whether or not they will churn within the first 6 months. Now we have a precise way of defining customer churn, a way that incorporates a timeframe. We could aggregate a dataset somewhat like this:
Here, each row represents a customer, but now we only include customers that have historically lasted at least 6 months, and for all of them we use their number of purchases and total spend during the first month to predict whether or not they churned after 6 months. For the purposes of this question it has become irrelevant whether or not they churned after their first 6 months; our target column only tells us whether or not they cancelled their subscription in their first 6 months. Now, we have a training dataset where all rows are on equal footing and in which we are comparing apples to apples. Once we train a model on this dataset, we can take any new customer that has been around for at least one month and use their behavior during their first month and our trained model to predict whether or not they will churn during their first 6 months.
Getting a sense of how to ask business questions in a precise and appropriate way so that they can be tackled by machine learning is something that comes with practice, but seeing both good and bad examples of how to do this is helpful when getting started in machine learning for business applications. If you are unsure about how to frame your business questions for machine learning, consider incorporating a time frame into the definition of your business metrics; this strategy often goes a long way.
A few key points to keep in mind when asking a business question and preparing a data set for that question:
- Remember that a machine learning algorithm finds patterns in the data you feed it and uses those patterns to make predictions on data in the future. For this reason it is important that the training data set statistically resembles the data you will make predictions on. If the market has changed and business now is very different from what your training data set describes, you are probably using an outdated data set that will lead to inaccurate predictions.
- Make sure that all the features you include in the training data set (the feature columns), are data points you will have available at the time of making future predictions. It is a common mistake to use features that you have available for historic data but you will not have available in the future at the time you are interested in making a prediction. When a prediction is made, a machine learning algorithm will need to have values for all the features that were available in the training data set.
Define your question precisely and decide exactly what needs to be aggregated to approach that question. For example, if you want to predict which customers will churn, you need to aggregate a data set where each row represents a customer, each feature column represents a feature that describes that customer, and the target column is a label of whether or not that customer churned in a certain time period. If you want to predict what sales will be for a given month for a given region, you need to aggregate a data set where each row represents a given month for a given region, each feature column represents a feature that describes that month’s business in that region, and the target column is a label of the sales revenue for that region in that month.