In the first article in this series, we broke down the preprocessing and feature engineering techniques needed to build high-performing time series models. But we didn’t discuss the models themselves. In this article, we will dig into this.

As a quick refresher, time series data has time on the x-axis and the value you are measuring (demand, temperature, etc.) on the y-axis; it can also have other features, but we will keep it simple for now.

All the examples in this article use the UKNonDurables dataset ; it’s part of the Rdatasets package and can be accessed with `statsmodels.datasets.get_rdataset`

function. It shows the total GPB consumption of non-durables in the UK each quarter from 1955 to 1988 measured in 1985 prices.

Let’s load and inspect our data.

```
# Standard imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.datasets import get_rdataset
# Nicer plots than standard matplotlib
sns.set()
# Load and visualise dataset
uk = get_rdataset('UKNonDurables', 'AER')
uk = uk.data
plt.plot(uk.time, uk.value)
plt.show()
```

The dataset has strong yearly seasonality and a clear linear upward trend up until about 1984. After that, the trend becomes exponential. Many of the models we discuss cannot handle data with trend or seasonal components, so this dataset will be a challenge. We’ll mark everything before 1980 for training and everything afterward for validation. We’ll compare performance by visually inspecting the predictions.

Note: throughout this article, we freely use standard time series preprocessing and feature engineering terms. If they are alien to you, read this article to get up to speed.

We can break traditional time series models into two categories: autoregressive (AR) and smoothing. The former contains models such as ARIMA and SARIMA, while the latter includes exponential smoothing and weighted averaging, to name a few.

We start our exploration with the autoregressive models.

ARIMA has been the model of choice for decades and has consistently made reliable predictions. If you are doing forecasting, this is a model you need to have in your toolbox.

ARIMA stands for AutoRegressive Integrated Moving Average. Let’s break that down.

An autoregressive (AR) model assumes that a prediction at time `t`

is a linear combination of `p`

past sequence values `(t-1, t-2, …, t-p)`

. Thus AR models take lag features as input.

An integrated (I) model doesn’t predict the value at time `t`

but rather the difference between time `t-1`

and `t`

. In other words, it predicts on stationary data. To create stationary data, you apply differencing `d`

times (usually `d`

is 1 and sometimes 2).

A moving average (MA) model considers model errors at previous time steps to improve the prediction for this time step. In more technical terms, we model the prediction at time `t`

as a linear combination of `q`

residual errors `(t-1, t-2, …, t-q)`

.

Combining this, the ARIMA model has three hyperparameters to tune: `p`

, `d`

, and `q`

; the function signature is ARIMA(p, d, q).

Enough talking, let’s see this model in action.

```
from statsmodels.tsa.arima.model import ARIMA
train = uk.loc[uk.time < 1980].value.copy()
val = uk.loc[uk.time >= 1980].value.copy()
def plot_arima_model(order=(0, 0, 0),
seasonal_order=(0, 0, 0, 0),
savefig=False):
if seasonal_order == (0, 0, 0, 0):
title = f'ARIMA{order} Predictions'
else:
title = f'SARIMA{order}{seasonal_order} Predictions'
arima = ARIMA(train, order=order, seasonal_order=seasonal_order)
result = arima.fit()
# Predict len(val) timesteps into the future
predictions = result.forecast(steps=len(val))
# Plot
fig, ax = plt.subplots()
ax.plot(uk.value, 'b', label='Actual')
ax.plot(predictions, 'r', label='Preds')
ax.set(xlabel='Date', ylabel='Total Consumption (£)',
title=title)
plt.legend()
plt.tight_layout()
if savefig:
plt.savefig(title)
plt.show()
# Plot ARIMA(1,1,1)
plot_arima_model(order=(1, 1, 1))
```

The ARIMA(1, 1, 1) model is linear and predicts a straight line for all time steps. Since our data has yearly seasonality (every four quarters), let’s try a model that incorporates this information.

We keep most of the code the same and change `order`

to `(4, 1, 4)`

.

```
# Plot ARIMA(4,1,4)
plot_arima_model(order=(4, 1, 4))
```

Much better! Including more lag and moving average variables increases the model’s power and means its predictions can (somewhat) reflect the trend and seasonality of the data. However, we don’t advise using an ARIMA model to model data with seasonal trends. They are prone to overfitting and are not specifically designed to handle such data; thus, your forecasts will be unreliable.

So, let’s look at a model that can handle such data.

Seasonal ARIMA (SARIMA) extends ARIMA so it can handle seasonal data. It includes ARIMA’s hyperparameters (`p`

, `d`

, and `q`

) and also `P`

, `D`

, `Q`

, and `s`

, which are the seasonal lag order, seasonal differencing order, seasonal moving average window size, and the seasonal periodicity, respectively.

Note that there is just one set of `P`

, `D`

, `Q`

, and `s`

. SARIMA only works with data that has *one* seasonal component; it cannot handle multiple seasonality. This is a problem all traditional and many non-traditional models face and is far from solved.

```
# Plot SARIMA
plot_arima_model(order=(1, 0, 1), seasonal_order=(4, 1, 4, 4))
```

Note that, mathematically speaking, SARIMA is a subset of ARIMA, and thus, the ARIMA class also creates SARIMA models.

The SARIMA model’s predictions are much more confident and pronounced than ARIMA(4, 1, 4). They do not fade away and capture the existing trend quite well. Note that they do not predict the exponential trend, but no model should do this since the training data is not exponential.

There are many other models in the ARIMA family, such as AR, MA, and SARIMAX. Check out the link to learn more.

Let’s now move our attention to the second class of traditional models. Smoothing models assume that they can predict future values from aggregated past values. There are numerous aggregation methods. Let’s explore some of the most popular.

For this model, simply take the weighted average of the past n values.

The simplest model is to apply no weight to any value, so they all count equally towards the prediction. If we have a window size of 3, to predict the value at time `t`

, we would sum `t-1`

, `t-2`

, and `t-3`

and divide by 3.

Since this is such a simple model with little predictive power, `statsmodels`

doesn’t implement it. However, you can do it manually with pandas and the `df.rolling()`

method. As you will see shortly, the basic smoothing models in `statsmodels`

only predict a single value regardless of the forecasting timeline. It’s currently impossible to predict on a validation set and use each actual value as part of your next prediction. You will have to implement that manually if you desire it.

In the interests of time and to enable fair comparisons with `statsmodels`

, we predict a single value for our equally weighted (or unweighted) moving average model (the final one calculated).

```
train_df = pd.DataFrame(train)
# Moving avergae window size 5 (chosen arbitrarily)
train_df['MA'] = train_df.rolling(5).mean()
preds = [train_df['MA'].iloc[-1] for _ in range(len(val))]
preds = pd.Series(preds, index=val.index)
preds = train_df.MA.append(preds)
title = 'Unweighted Moving Average Model - Predictions'
plot_preds(preds, title)
```

As you can see from the points before index 100, this can be a helpful model for data processing or feature engineering (especially if your data is very noisy). But giving all points an equal weighting is unlikely to be optimal. Intuitively, we would think that the values closest to time `t`

would significantly impact the predicted value more than those further away.

Let’s look at one popular method to combat this challenge.

Simple Exponential Smoothing (SES) is a weighted average where the weights decay exponentially from the most recent to the oldest historical value. Thus it gives the most recent values more importance and exponentially less to each point behind it. This method reacts more to recent changes than a standard weighted average as the latter applies equal weighting to all past observations.

```
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
exp_smooth = SimpleExpSmoothing(train, initialization_method='estimated')
res_exp_smooth = exp_smooth.fit()
preds = res_exp_smooth.predict(start=0, end=len(train)+len(val))
title = 'Simple Exponential Smoothing Predictions'
plot_preds(preds, title)
```

As mentioned above, all predictions for simple exponential smoothing in `statsmodels`

are single values. It does an excellent job of modeling the training data but is of limited use once you want to make forecasts. So, let’s look at a more robust model.

This is an extension of simple exponential smoothing designed specifically to forecast data with a trend. It will usually make predictions extending infinitely up or down (depending on the direction of the trend). Because of this, it is best to use this method with a damping parameter to prevent these infinite unabated trends, which rarely happen in real life.

```
from statsmodels.tsa.holtwinters import Holt
holt = Holt(train, initialization_method='estimated')
res_holt = holt.fit()
preds = holt.predict(res_holt.params, start=0, end=len(train)+len(val))
title = "Holt's Exponential Smoothing Predictions"
plot_preds(preds, title, savefig=True)
```

Much better! The training data is significantly smoother than with SES, and the predictions (while still being a straight line) are at least a straight line in the same direction as the trend of the data.

With damped trend (this is the only line that changes):

`holt = Holt(train, initialization_method='estimated', damped_trend=True)`

Comparing the predictions between indexes 120-140, we see that with `damped_trend=True`

, the model starts to decrease the gradient of its forecast, whereas, without that parameter, the predictions are an unwavering straight line.

This model extends Holt’s exponential smoothing so that it can handle seasonal data. There are many hyperparameters and ways to formulate this model, which, unfortunately, we do not have time to cover. However, rest assured that this is an improvement over the other methods discussed above and will likely be your go-to since you can model the three main components of a time series quite well.

Since this model is used most often, it’s simply called ExponentialSmoothing in `statsmodels`

. Note that you must specify whether the trend and seasonal components are additive (`add`

) or multiplicative (`mult`

).

```
from statsmodels.tsa.holtwinters import ExponentialSmoothing
hw_exp_smooth = ExponentialSmoothing(train,
initialization_method='estimated',
trend='add',
seasonal='add',
seasonal_periods=4)
res_hw_exp_smooth = hw_exp_smooth.fit()
preds = hw_exp_smooth.predict(res_hw_exp_smooth.params,
start=0, end=len(train)+len(val))
title = "Holt Winter's Exponential Smoothing Predictions"
plot_preds(preds, title, savefig=True)
```

Wow! In contrast to all other smoothing models, this model learns the training data perfectly and makes predictions that display strong seasonal and trend components.

Traditional time series models have been used for almost 100 years with much success. But they are not without their limitations.

These models all require at least some preprocessing to make predictions. The amount of preprocessing varies by model and slows down development. Moreover, since they are statistical models, it is best to check their assumptions before using them in production. For example, with ARIMA and the basic smoothing models, you must ensure the data is stationary and has neither a trend nor seasonal component.

Most of the models can only capture linear trends, and thus any non-linearities are left uncaptured. Models this simplistic cannot accurately describe the vast majority of time series we see in the world today.

All of the models listed above can model at most one seasonality (and several of them cannot even model that, e.g., ARIMA). This problem still exists even with some of the newly created models and is by no means solved. If you want to model data with multiple seasonalities, check out the BATS/TBATS models or Prophet .

When tackling a machine learning problem, the best model for the job is often a combination of multiple models: an ensemble. These are beneficial since you are not left at the mercy of one model’s strengths and weaknesses. Traditional time series models share different strengths and weaknesses, but it is not easy to develop ensemble models in an automated way.

The examples in this article are just for univariate time series, we have just been modeling and making predictions for a single variable: the total consumption of non-durables in the UK. Multivariate time series problems involve more than one time-dependent variable. Each variable depends on its past values, and they also depend on each other. A good example is the weather. A univariate time series problem predicts the temperature tomorrow. A multivariate one predicts the temperature, the amount of rain, the wind speed, and cloud coverage. All of these variables influence each other, and thus we have a multivariate problem.

There are some models you can use to tackle multivariate problems; however, the selection is not great (especially for spatio-temporal time series). The models are challenging to build and, once they are built, they are complex to use.

In this article, we discussed some of the classic time series models, how they work, and their limitations. These models have been used successfully for decades, and your business may still benefit from implementing them. But beware of their limits and make sure you check their assumptions before putting them into production.

If you want to quickly create hundreds of high-performing univariate and multivariate time series forecasting models with minimal effort using state-of-the-art machine learning techniques, check out H2O’s AI Cloud – request a demo today.