Time Series models in Qlik AutoML are both simple and powerful. The input dataset needs only two columns: a date and a numeric outcome for that date (examples: unit sales, new subscribers, revenue).
Some basics for getting started with Time Series in Qlik AutoML:
- If the date field is being read as a string instead of a date, ensure that your dates are represented in a supported format.
- AutoML will predict out 365 days from the last date in the training dataset
- You do not need to include "apply" rows for future dates in the training data. You also do not need to apply to a new dataset (like classification and regression models). Rows with future predictions will be generated automatically and included in the downloaded/synced data.
- Time series can be run on daily, weekly, monthly, or quarterly data. More information on how the data is aggregated below.
- Outliers are automatically detected, excluded and flagged in the predicted dataset
What is a time series model?
What's going on under the hood: AutoML currently implements a univariate time series model that has similarities to best performing models in the industry like Auto-SARIMA. Time series modeling looks for seasonality and general growth trends based on historical data.
How do I select a Time Series model in AutoML?
When a dataset includes a date column the "Target" section of the Pipeline will include both a "Regression" and "Timeseries" option.
Why doesn't the predicted data match the aggregation of the training data?
Time Series in AutoML can handle daily, weekly, monthly, or quarterly data. It also gracefully handles missing days. AutoML sums data to the daily level and then looks at the date frequency/sparsity of the data that it is given (that means that if you have multiple points per day, it will sum those together). If given daily data but with a lot of missing days, the data will automatically aggregate the sum of the data at the weekly level; the same goes for aggregating the weekly level to the monthly level. If this type of aggregation occurs, it is possible that the dates returned from the forecast will not exactly match the dates given to be analyzed but the sum of the value will match the sum of the value on the input dataset.
How much training data do I need in order to use a time series model?
The forecast window for time series modeling is 365 days. The default horizon is 30 days. AutoML will attempt to use five-fold cross validation unless there aren't enough rows of data, in which case AutoML will drop to four-fold cross validation. A minimum of 60 days' worth of data is necessary for a viable prediction, but we strongly recommend a larger dataset for more accurate forecasting.
We recommend using at least a 3:1 training to prediction ratio for time series. This means if you want to predict 4 months out, you should train on at least 12 months of data (more is better- but remember time series is only looking at patterns related to time so it's important to include only data relevant to current business practices/conditions).
What happens with Outliers?
AutoML will automatically detect outliers and exclude them from the training data. These rows will be flagged in the 'outliers' column in the output dataset.
How are time series models scored?
Time Series models are scored using 1-MAPE (Mean Absolute Percent Error). A perfect model would score 100% in AutoML. A model with 30% MAPE (on average having an error 30% from the actual outcome) would score 70% in AutoML.
What is included in the predicted dataset?
The predicted dataset is generated automatically for time series (no need for an apply dataset!). In additional to the date and target column the predicted dataset includes: predicted value, upper bound, and lower bound (bounds at an 80% confidence interval), and a flag of outliers (which have been automatically considered anomalies and removed from training). Note- any additional columns from the training data will not be included in the predicted dataset for a time series model.
In the example below "Sales" is the target column and the training data included information through 1/31/2021. 1/27/2021 has been flagged as an outlier and excluded from the training (and prediction) data. Rows for 2/1/2021 onward have automatically been added by AutoML and populated with predicted values.
Why would I choose a time series model over a regression model?
There are several ways to approach a time series. Each comes with its own caveats. Univariate time series uses only time and a value, let’s say revenue. Even in univariate time series, there are several different algorithms, but the more complex ones parse the date to look for different cyclical trends like daily, weekly, monthly, quarterly, yearly seasonality trends. The benefit of this is that you can forecast out to any date because you can always know the date. The downside to this is that it is only looking at the cyclical growth patterns in that single value, but if revenue is highly dependent on marketing spend, and the business decides to change strategy and slow or increase marketing spend, the time series model will not effectively pick that up and will forecast on the historical trend of marketing spend. AutoML uses a model very similar to ARIMA for time series, which can do a good job, but for some use cases, regression may be a better option.