What is a ROC Curve and why should I care?
Check out the video for a brief overview of using ROC Curves in Kraken.
How does a ROC Curve work?
A ROC Curve for AUC (Area Under Curve) describes how good the model is at predicting the positive class when the actual outcome is positive. It is a plot of the false positive rate (x-axis) versus the true positive rate (y-axis) for a number of different candidate threshold values between 0.0 and 1.0. Put another way, it plots the false alarm rate versus the hit rate. The closer the true positive rate is to 1.0 (the maximum possible area under the curve) the more deterministic the model is. The closer the true positive rate is to 0.5 the less deterministic the model is.
The true positive rate is calculated as the number of true positives divided by the sum of the number of true positives and the number of false negatives. Plotting the true positive rate on a ROC Curve is useful for understanding if separation between classes is possible, thereby indicating if the data is good enough to accurately distinguish between predicted outcomes.
Are there "good curves" and "bad curves"?
Here's an example of a "good" ROC Curve, with a high area under the curve:
Here's an example of a "poor" ROC Curve, with a low area under the curve: