Prediction Influencers are calculated differently, depending on the algorithm you select in Kraken.
As explained in Chapter Five of the SONAR© Guide, Prediction Influencers offer important insight about the predictions created in Kraken. This article provides details on how Prediction Influencers are calculated for various algorithms in Kraken.
Prediction Influencers are SHAP values
Kraken uses the SHAP package to calculate SHAP (SHapley Additive exPlanations) values for a variety of algorithms. SHAP is based on the game theoretically optimal Shapley Values. These SHAP values appear in Kraken prediction datasets as "K_" columns that we call Prediction Influencers. Prediction Influencers need to be interpreted differently depending on the algorithm that Kraken is using. It's important to be careful to interpret the Shapley value correctly: The Shapley value is the average contribution of a feature value to the prediction in different coalitions. The Shapley value is NOT the difference in prediction when we would remove the feature from the model.
How Kraken calculates Prediction Influencers
Kraken produces Prediction Influencers on datasets with up to 100,000 rows for various algorithms in both classification and regression models using two distinct methods:
1. Tree SHAP
Tree SHAP is a fast and exact method to estimate SHAP values for tree models and ensembles of trees, under several different possible assumptions about feature dependence. More academic information can be found here. Tree SHAP is used to calculate Prediction Influencers with the following models:
- Classification
- RandomForestClassifier
- XGBClassifier
- Regression
- RandomForestRegressor
- XGBRegressor
2. Linear SHAP
Linear SHAP computes the SHAP values for a linear model and can account for the correlations among the input features. More academic information can be found here. Linear SHAP is used to calculate Prediction Influencers with the following models:
- Classification
- LogisticRegression
- Regression
- LinearRegression
- SGDRegressor