Kraken requires a dataset that is mostly ready for machine learning. However, we do apply some basic pre-processing steps to the data before building models.
- Imputation of nulls
- Encoding categorical features (also know as creating "dummy variables")
- Feature scaling, or normalization
- Handling high correlation of a Driver to the predicted Metric or correlation between Drivers
- Take random samples of the data and perform five fold cross validation
All of these pre-processing steps are performed given different thresholds set in our pipeline. The thresholds can be changed by us as we learn more about how accurate the models are that Kraken creates.