A common business concern is predicting customer lifetime value (CLV). This is a perfect example of a business problem that machine learning can shed light on. At first glance, we might reason that a machine learning model trained on historic customers might learn to predict customer lifetime value using several features that could drive that prediction. We might define CLV as the total amount of money a person brings in during their history as a customer. We could then put together a table with historic information from all customers (past and present), something like this:
In this dataset, we have one row for each customer, and each column gives a relevant feature describing that customer: their customer ID, gender, age, the date when they became a customer, their zip code, the number of purchases they have made, and their total monetary spend. We could define the total monetary spend of a customer (Total_Spend) as CLV, feed this dataset to Kraken, an automated machine learning platform, and have it learn to predict Total_Spend from these examples. As new customers are acquired in the future, we could use that trained algorithm to predict Total_Spend and get a sense of how much monetary value they will provide during their customer life.
However, there are several problems with this approach. Think about it–the dataset we put together for this use case may include people who have been customers for one day, one month, or one year. The Total_Spend column in this dataset does not reflect the total money a customer will provide during their lifetime; it is the total money they have provided to-date. Also, a customer that is one day old might have the characteristics of a stellar customer (maybe it’s winter and they live in a region where people of their age and gender tend to buy a lot of our products when it’s cold), but because they just became a customer yesterday, they have only made one purchase and have not spent a lot of money, yet. By including them in the training data set, we are incorrectly teaching our machine learning algorithm that they are the type of customer who does not bring in much money.
We might have a new customer who in their first month has been ordering products 3 times a week, totaling 12 purchases. Another customer that has been around for a year and purchasing once per month might have spent the same amount of money. Our machine learning algorithm would put these two customers on equal footing in terms of CLV, when in reality the one month old customer might be significantly more valuable in the long run.
To avoid these pitfalls, we need to get precise about how we define CLV and about how we prepare a dataset for the problem. A good way to do this is to think about incorporating time into our definition. We could, for example, choose to define first year value (FYV) as the total money a customer spends in their first year as a customer. We could then decide to use a customer’s behavior during their first 3 months, say, as features to predict their total spend over their first year. FYV is a precise definition of a metric of interest that incorporates time (we only look at a one-year timeframe for each customer). The advantage of creating such a precisely defined metric is that is puts all examples from our training dataset on equal footing. Note that since we are now looking at the total money people spent during their first year as customers, we must limit our training dataset to customers that have been around for at least one year. Now we can prepare a dataset that looks like this:
Here, each row represents a customer that has been around for at least a year. The columns include features that describe the customer at the moment they were acquired (CustomerID, Gender, Age, Start_Date, Zip) as well as features that represent the customer’s activity during a chosen timeframe, such as the number of purchases they made in their first 3 months (NBR_Purchases_3mths) and the total monetary spend in their first 3 months (Total_Spend_3mths). The target column (Total_Spend_1yr) represents the total money they spent in their first year and that is what we will call first year value (FYV) and what we will teach our machine learning algorithm to predict.
Notice how we are now asking a very precise question that is defined within a timeframe. We are predicting how much money a customer will bring in during their first year, based on their behavior during their first 3 months. It is up to a business analyst to use their valuable domain expertise and pick time frames that make sense for their industry. Knowing what business question to ask, being able to frame it in this precise way, and aggregating a training dataset where all examples are on equal footing (where we are, so to speak, comparing apples to apples) is the valuable contribution a citizen data scientist brings to the table.