There are a number of important things to determine when deciding if a model is a good fit for the use case and good to be put into production, but it ultimately boils down to the single question – “Is it accurate enough to make a positive ROI without unacceptable repercussions?” A few questions that can help to break that down might be:
- Is the model informing a human decision or automating it?
- Is there a cost to a false positive or a false negative? Is it quantifiable?
- How much better is it than random?
- Is it better than making an ultimatum?
To illustrate these questions a little further, here are a couple of examples:
- If a model is trained to determine how much employees should make, there is a lot of risk associated with setting salaries or giving raises. It would stand to reason that accuracy would probably need to be higher if it was automating the decision versus a human manager seeing that one of her employees is being drastically underpaid or overpaid. That manager could then use her own discretion to determine if the model was in error or not.
- Under the same example, say that the model is simply human informing, but the manager trusts the model and doesn’t give an employee a pay raise because the model output that the employee would be overpaid if a raise was given. The employee then leaves because the employee found work elsewhere. What was the cost of losing that employee? If the reverse happened, what was the cost of falsely giving a raise? Can you quantify that? If so, it makes a difference to the determination of the level of accuracy required to consider it a great model.
- If it’s regression, what would the error be if the average of the target column was always assumed? How much better is the model than that? If it’s classification, take the rate of the positive class squared and add it to the rate of the negative class squared, to get random accuracy. How much better is the model’s accuracy than that?
- Depending on if there is a cost associated with error, is the model better than an ultimatum? One example of this might be a firm that does free consultations that are expensive and time consuming ($6,000), but then makes good money when the deals close ($60,000). The firm currently operates under the assumption that 100% of consultations will close, but would make better profits if they could determine which consultations they shouldn’t do. What does the model accuracy need to be in order for the firm to determine to use the models output vs the ultimatum that 100% will close?
With all model types, it’s important to validate that there aren’t issues with the data being fed in, but if scoring is too high or too low, it can indicate that there is an issue with the data being fed in and warrants additional investigation to the dataflows.
Finally, it’s always critical to evaluate the scoring metrics with the use case and business domain. What could look like a terrible score to one use case, might be a great score and generate a really high ROI in another use case.