Univariate Time Series
Kraken currently implements a univariate time series model that has similarities to best performing models in the industry like Auto-SARIMA. Time series modeling looks for seasonality and general growth trends based on historical data. This allows the models to predict into the future any amount of time.
Kraken forecasts 12 months into the future and can handle daily, weekly, or monthly data. It also gracefully handles missing days. The return dataset has the date, actual value, predicted value, upper bound, and lower bound (bounds at an 80% confidence interval), as well as marks outliers to the dataset that are automatically considered anomalies and removed from training.
The forecast window for time series modeling is 365 days. The default horizon is 30 days. Kraken will attempt to use five-fold cross validation unless there aren't enough rows of data, in which case Kraken will drop to four-fold cross validation. A minimum of 60 days' worth of data is necessary for a viable prediction, but we strongly recommend a larger dataset for more accurate forecasting.
Important note - Date Field:
Kraken will use the data type that is used in the source connection. I.E. If you are using a MySQL connector and the schema has a column as a date type, it will be read as a date in Kraken. For CSVs Kraken currently only parses dates in the “yyyy-mm-dd” format.
Important note - Aggregation:
Kraken sums data to the daily level and then looks at the date frequency/sparsity of the data that it is given (that means that if you have multiple points per day, it will sum those together). If given daily data but with a lot of missing days, the data will automatically aggregate the sum of the data at the weekly level; the same goes for aggregating the weekly level to the monthly level. If this type of aggregation occurs, it is possible that the dates returned from the forecast will not exactly match the dates given to be analyzed but the sum of the value will match the sum of the value on the input dataset.