Table of Contents

Seasonal ARIMA Models

In this final chapter, you'll learn how to use seasonal ARIMA models to fit more complex data. You'll learn how to decompose this data into seasonal and non-seasonal parts and then you'll get the chance to utilize all your ARIMA tools on one last global forecast challenge. This is the Summary of lecture "ARIMA Models in Python", via datacamp.

Seasonal time series

Seasonal decompose

You can think of a time series as being composed of trend, seasonal and residual components. This can be a good way to think about the data when you go about modeling it. If you know the period of the time series you can decompose it into these components.

In this exercise you will decompose a time series showing the monthly milk production per cow in the USA. This will give you a clearer picture of the trend and the seasonal cycle. Since the data is monthly you will guess that the seasonality might be 12 time periods, however this won't always be the case.

first with an updated but still not stationary dataset:

test with an updated and made stationary dataset:

Seasonal ACF and PACF

In this exercise you will use the ACF and PACF to test this data for seasonality. You can see from the plot above that the time series isn't stationary, so you should probably detrend it. You will detrend it by subtracting the moving average. Remember that you could use a window size of any value bigger than the likely period.

Based on this figure, 12 time steps is the time period of the seasonal component.

Lupa with updated flowrate and rain data

Seasonal ACF and PACF

Based on this figure, 12 time steps is the time period of the seasonal component.

What's up with the 28 and 45 weeks partial correlation?

SARIMA models

Fitting SARIMA models

Fitting SARIMA models is the beginning of the end of this journey into time series modeling.

It is important that you get to know your way around the SARIMA model orders and so that's what you will focus on here.

In this exercise, you will practice fitting different SARIMA models to a set of time series.

Choosing SARIMA order

In this exercise you will find the appropriate model order for a new set of time series. This is monthly series of the number of employed persons in Australia (in thousands). The seasonal period of this time series is 12 months.

You will create non-seasonal and seasonal ACF and PACF plots and use the table below to choose the appropriate model orders.

AR(p) MA(q) ARMA(p, q)
ACF Tails off Cuts off after lag q Tails off
PACF Cuts off after lag p Tails off Tails off

The non-seasonal ACF doesn't show any of the usual patterns of MA, AR or ARMA models so we choose none of these. The Seasonal ACF and PACF look like an MA(1) model. : $\text{SARIMAX}(0,1,0)(0,1,1 / 4)[12]$

SARIMA vs ARIMA forecasts

with weekly data:

with updated data for Arrone

Automation and saving

pmdarima packages for Automated model selection

Automated model selection

The pmdarima package is a powerful tool to help you choose the model orders. You can use the information you already have from the identification step to narrow down the model orders which you choose by automation.

Remember, although automation is powerful, it can sometimes make mistakes that you wouldn't. It is hard to guess how the input data could be imperfect and affect the test scores.

In this exercise you will use the pmdarima package to automatically choose model orders for some time series datasets.

Create auto_arima monthly model

Create model: Weekly data

try out with log transform this time

SARIMA and Box-Jenkins

SARIMA model diagnostics

Usually the next step would be to find the order of differencing and other model orders. However, this time it's already been done for you. The time series is best fit by a $\text{SARIMA}(1, 1, 1)(0, 1, 1, 12)$ model with an added constant.

In this exercise you will make sure that this is a good model by first fitting it using the SARIMAX class and going through the normal model diagnostics procedure.

These are warnings that we show as a way of alerting you that you may be in a non-standard situation. Most likely, one of your variance parameters is converging to zero. If you get a model result and the parameter estimates and standard errors are finite, I think you can just use them and ignore all the warnings. Note that inference for variance parameters is not meaningful when the point estimate is at or near zero.

SARIMA forecast

In the previous exercise you confirmed that a SARIMA (1,1,1) x (0,1,1, 12 )model was a good fit to the time series by using diagnostic checking.

Now its time to put this model into practice to make future forecasts. Climate scientists tell us that we have until 2030 to drastically reduce our CO2 emissions or we will face major societal challenges.

We better use some exogenous data, we start with 1 series...

We pick up the "unseen" data starting July 1 2020.

The SARIMAX model

S - seasonal
AR - AutoRegressive
I - Integrated
MA - Moving Average
X - Exogenous
https://www.statsmodels.org/v0.10.0/_modules/statsmodels/tsa/statespace/sarimax.html

A test with infiltration data was not succesfull...
ARIMA(6,0,0)(0,0,2)[52] : AIC=5412.368, Time=40.97 sec
Best model: ARIMA(6,0,0)(0,0,2)[52]
Total fit time: 962.179 seconds

Lupa_updWavg= Lupaupd.loc["2010-01-01":].Infilt.resample("W").mean() # Lupa_updWavg = Lupa_updWavg.asfreq('W')

Auto arima search with a cyclic trend

p= the order (or number of time lags) of the auto-regressive ("AR") model
start_Q : int, optional (default=1) The starting value of Q, the order of the moving-average portion of the seasonal model.

No exogene data

with exogene data

2 exogenous series

Inserting the average over 11 years before the resampling to weekly data.

Double exogenous: An optional 2-d array of exogenous variables.

It turned out that using data starting at 2009-01-04 didn't gave a satisfying result, so let's start again with 2010 onwards.
Indeed: AIC from +3000 to 2829.5