Although there are a lot of challenges to get to this point, much of the value is lost if what work has been done can’t be summarized to the points that another stakeholder cares about. This means that there needs to be some effort made to be able to summarize the process and output of Kraken. One thing to be wary of is confusing correlation with causation. A variable may be very feature important in prediction, but might not prove that forcing a certain behavior is actually going to benefit the business. One example of this might be a business that is predicting churn of customers and finds that autoship is the most feature important variable in the retention of customers. Using scenarios, the business sees that if they increase the number of users on autoship, they will retain more customers. The business could come to the conclusion that they should force all customers to go on autoship. But, this might be a poor interpretation. Maybe it’s the people who choose to autoship that are more likely to stay longer and it just so happens that there is a pattern present. Forcing everyone to be on autoship might not affect the customers who already choose autoship, but might make other potential customers balk, or cancel just the same as they have done historically and not actually changing anything. In this case, maybe the question should be asked separately, how can we decrease the churn rate for customers on autoship and how can we decrease the churn rate for customers not on autoship? Or maybe even another experiment–how can we make those not on autoship behave more like those on autoship or provide them incentives to change their behavior?
The reason that it is so important to ask distinct questions is that it also makes the explanation much easier. If a model is predicting customer revenue at 6 months based on the first 30 days, it will be much easier to explain that to another stakeholder than something vague. This final step is essentially tying together that initial precise question, the data that was used, and then the outputs without confusing correlation with causation. The final, and perhaps most important, piece to all of this is to understand how to use all of that process to drive business value. How can the model be used? It’s up to you and the expertise that you have learned here to know if the model is going to be good at extrapolating and helping the business learn about the outcome of possible scenarios that haven’t been seen before. It is important for you to be able to explain that there is always some error; there are always tails to a distribution, meaning sometimes there is a lot of error. But despite all of that, the model can perform really well on average and serve different use cases. Some of these potential use cases will be discussed more in implementation training, but if you can’t answer some of these questions, then use the resources available to you to have any additional questions answered, because without the ability to summarize and simplify, the final output will be lackluster and lose the tremendous potential it has to affect business outcomes.