Autoregressive moving average models

# Autoregressive moving average models
### Kevin Kotzé

---

---
# Contents

1. Introduction
1. Moving Average Models
1. Autoregressive Models
1. ARMA Models
1. Seasonal ARMA Models
1. Model Specification and Parameter Estimation
1. Structural Breaks
1. Conclusion

---
# Univariate models for persistent data

- Dominant feature of many time series is that today's values are close to tomorrow's values 
- Observations are not independent, but autocorrelated
- Need to account for this behaviour in the explained part of the model, otherwise it will be captured by the error, which violates the assumptions of the model
- Example of stochastic process:

`\begin{equation}
y_t = 0.7 y_{t-1} + \varepsilon_t
\end{equation}`

- This could represent an example of a linear stochastic difference equation, that includes discrete information
- Descriptive information should be used to populate the coefficient and random noise should be contained in the error

---
# Moving average models

- Linear combination of white noise (i.e. `\(\varepsilon_{t}\)`), such that the `\(MA(1)\)` may take the form,

`\begin{equation}
y_{t}=\mu +\varepsilon_{t}+\theta \varepsilon_{t-1}
\end{equation}`

- where `\(\mu\)` is a constant, while `\(\varepsilon_{t}\)` and `\(\varepsilon_{t-1}\)` are independent and identically distributed white noise, `\(\varepsilon_{t}\sim \mathsf{i.i.d.} \;\;  \mathcal{N}(0,\sigma^{2})\)`
- To determine whether the `\(MA(1)\)` process is stationary, we calculate the different moments

---
# MA models - Expected Mean

- Note that `\(\mathbb{E}[\varepsilon_{t}] =0\)` and `\(\mathbb{E}[\varepsilon_{t}^2] = \sigma^2\)`,

`\begin{eqnarray}
\mathbb{E}\left[ y_{t}\right] &=&\mathbb{E}[\mu +\varepsilon_{t}+\theta \varepsilon_{t-1}] \\
 &=&\mu +\mathbb{E}[\varepsilon_{t}]+\theta \mathbb{E}\left[ \varepsilon_{t-1}\right] \\
 &=&\mu
\end{eqnarray}`

- Since error terms are `\(\mathsf{i.i.d.}\)` and their expected mean value is zero
- Hence, the mean for this process is `\(\mu\)`, which is constant and does not depend on time

---
# MA models - Variance

`\begin{eqnarray}
\mathsf{var}[y_{t}] &=&\mathbb{E}\big[ y_{t}-\mathbb{E}[y_{t}] \big]^2  \\
&=&\mathbb{E}\big[ \left( \mu +\varepsilon_{t}+\theta \varepsilon_{t-1}\right) -\mu \big]^2  \\
&=&\mathbb{E}[\varepsilon_{t}]^{2}+2\theta \mathbb{E}[\varepsilon_{t}\varepsilon_{t-1}]+\mathbb{E}[\theta \varepsilon_{t-1}]^{2}  \\
&=& \sigma^{2} + 0 + \theta \sigma^2    \\
&=&\left( 1+\theta^{2}\right) \sigma^{2}  
\end{eqnarray}`

- which is constant and does not depend on time

---
# MA models - Covariance

- For the first lag,

`\begin{eqnarray}
\mathsf{cov}[y_{t},y_{t-1}] &=&\mathbb{E}\Big[ \big( y_{t}-\mathbb{E}\left[ y_{t}\right] \big) \big( y_{t-1}-\mathbb{E}\left[ y_{t-1}\right] \big) \Big] \\
 &=&\mathbb{E}\big[ \left( \varepsilon_{t}+\theta \varepsilon_{t-1}\right) \left( \varepsilon_{t-1}+\theta \varepsilon_{t-2}\right) \big]  \\
 &=&\mathbb{E}\left[ \varepsilon_{t}\varepsilon_{t-1}]+\theta \mathbb{E}[\varepsilon^{2}_{t-1}]+\mathbb{E}[\theta \varepsilon_{t}\varepsilon_{t-2}\right] +\mathbb{E}[\theta^{2} \varepsilon_{t-1}\varepsilon_{t-2}]  \\
 &=&0+\theta \sigma^{2}+0+0  \\
 &=&\theta \sigma^{2}
\end{eqnarray}`

- which is constant and does not depend on time

---
# MA models - Covariance

- For the general case of `\(j\)`  lags,

`\begin{eqnarray}
\mathsf{cov}[y_{t},y_{t-j}] &=&\mathbb{E}\Big[ \big( y_{t}-\mathbb{E}\left[ y_{t}\right] \big) \big( y_{t-j}-\mathbb{E}\left[ y_{t-j}\right] \big) \Big]  \\
 &=&\mathbb{E}\big[ \left( \varepsilon_{t}+\theta \varepsilon_{t-1}\right) \left(\varepsilon_{t-j}+\theta \varepsilon_{t-j}\right) \big]  \\
 &=&0 \;\;\;\; \text{for} \;\;  j > 1
\end{eqnarray}`

- which is constant and does not depend on time

---
# MA models - Stationarity

- Neither the mean, variance nor covariances depend on time
- Hence the `\(MA(1)\)` process is covariance stationary
- Such a `\(MA(1)\)` process is stationary regardless of the value `\(\theta\)`

---
# MA models - ACFs

- ACF for a `\(MA(1)\)` may then be derived from the expression,

`\begin{eqnarray}
\rho \left(j\right) \equiv \frac{\gamma \left( j\right) }{\gamma \left( 0\right) } = \frac{\mathsf{cov} [ y_{t},y_{t-j} ] }{\mathsf{var} [ y_{t} ] }
\end{eqnarray}`

- Hence,

`\begin{eqnarray}
\rho \left( 1\right) &=&\frac{\theta }{\left( 1+\theta^{2}\right) } \\
\rho \left( j\right) \ &=&0 \;\;\;\; \text{for } \;\; j > 1 
\end{eqnarray}`

- for lag orders `\(j > 1\)`, the autocorrelations are zero

---
background-image: url(image/ma.svg)
background-position: top
background-size: 90% 90%

Figure 1: Simulated `\(MA(1)\)`: `\(\varepsilon_t - 0.5\varepsilon_{t-1}\)`

---
background-image: url(image/ma_acf.svg)
background-position: top
background-size: 90% 90%

Figure 2: Autocorrelation Functions for `\(MA(1)\)`: `\(\varepsilon_t - 0.5\varepsilon_{t-1}\)`

---
# MA models - Higher Order

- Finite order `\(MA(q)\)` process may be,

`\begin{equation}
y_{t}=\mu +\varepsilon_{t}+\theta_{1}\varepsilon_{t-1}+\theta_{2} \varepsilon_{t-2}+ \ldots +\theta_{q}\varepsilon_{t-q} 
\end{equation}`

- Infinite-order moving average process, `\(MA(\infty)\)`,

`\begin{equation}
y_{t}=\mu +\overset{\infty }{\underset{j=0}{\sum }}\theta_{j}\varepsilon_{t-j}=\mu +\theta_{0}\varepsilon_{t}+\theta_{1}\varepsilon_{t-1}+\theta_{2}\varepsilon_{t-2}+ \ldots
\end{equation}`

- With `\(\theta _{0} = 1\)`

---
# MA models - Higher Order

- After excluding extreme cases,

`\begin{equation}
\overset{\infty }{\underset{j=0}{\sum }}|\theta_{j}|<\infty 
\end{equation}`

- which implies that the coefficients are absolute summable
- Moreover, the process is covariance-stationary when,

`\begin{equation}
\overset{\infty }{\underset{j=0}{\sum }}|\gamma_{j}|<\infty 
\end{equation}`

---
# MA models - Identifying the order

- With a `\(MA(1)\)` process the effect of the shock `\(\varepsilon_{t-1}\)` affects the value of `\(y_t\)`
- Hence, the value for the first autocorrelation, `\(\rho(1)\)` should differ from zero but the others would not
- With a `\(MA(2)\)` process the effect of the shocks `\(\varepsilon_{t-1}\)` and `\(\varepsilon_{t-2}\)` affect the value of `\(y_t\)`
- Hence, the value for the first two autocorrelations, `\(\rho(1)\)` and `\(\rho(2)\)` should differ from zero but the others would not
- This would allow us to use the ACF to identify the order of an `\(MA(q)\)` process

---
background-image: url(image/ma_acf1_3.svg)
background-position: top
background-size: 85% 85%

Figure 3: Identifying the order - `\(MA(1)\)`, `\(MA(2)\)` & `\(MA(3)\)` process

---
# AR models - Solutions

- Given the `\(AR(1)\)`,

`\begin{equation}
y_{t}=\phi y_{t-1}+\varepsilon_{t}  
\end{equation}`

- Relates the value of a variable `\(y\)` at time `\(t\)`, to its previous value at time `\((t-1)\)` and a random disturbances `\(\varepsilon\)`, also at time `\(t\)`
- Assuming that `\(\varepsilon_{t}\)` is independent and identically distributed white noise, `\(\varepsilon_{t}\sim \mathsf{i.i.d.} \mathcal{N}(0,\sigma^{2})\)`
- We showed that if `\(|\phi |<1\)`, the `\(AR(1)\)` is covariance-stationary,

`\begin{eqnarray}
\mathbb{E}\left[ y_{t}\right] &=&0 \\
\mathsf{var}[y_{t}] &=&\frac{\sigma^{2}}{1-\phi^2 } \\
\mathsf{cov}[y_{t},y_{t-j}] &=&\phi^{j} \mathsf{var}[y_{t}] 
\end{eqnarray}`

- To prove this we  use recursive substitution, method of undetermined coefficients, or lag operators

---
# AR models - Recursive Substitution

- Starting at some period of time, `\(j\)`

`\begin{eqnarray}
y_{t} &=&\phi y_{t-1}+\varepsilon_{t} \\
&=& \phi (\phi y_{t-2}+\varepsilon_{t-1})+\varepsilon_{t}  \\
&=& \phi ^{2}(\phi y_{t-3}+\varepsilon_{t-2})+\phi \varepsilon_{t-1}+\varepsilon_{t} \\
&=& \vdots  \\
&=& \phi^{j+1}y_{t-(j+1)}+\phi^{j}\varepsilon_{t-j} + \ldots + \phi^{2}\varepsilon_{t-2} + \phi \varepsilon_{t-1} + \varepsilon_{t} 
\end{eqnarray}`

- Explains `\(y\)` as a linear function of the initial value `\(y_{t-(j+1)}\)` and the historical values of `\(\varepsilon_{t}\)`
- If `\(|\phi | <1\)` and `\(j\)` becomes large, `\(\phi^{j+1}y_{t-(j+1)}\rightarrow 0\)`
- Thus, the `\(AR(1)\)` can be expressed as an `\(MA(\infty)\)`
- Note that if `\(|\phi | >1\)` and `\(j\)` becomes large, `\(\phi^{j}\rightarrow \infty\)`
- Hence, the equivalent of an autoregressive random walk is an moving average with coefficients that are not summable

---
# AR models - Lag operators

- Lag operators are particularly useful when dealing with more complex model structures
- The straightforward `\(AR(1)\)` model can be written as,

`\begin{equation}
\left( 1-\phi L\right) y_{t}=\varepsilon_{t}  
\end{equation}`

- Such a sequence `\(\left\{ y_{t}\right\}_{t=-\infty }^{\infty}\)` is bounded if there exists a finite number `\(k\)`, such that `\(|y_{t}| <k\)` for all `\(t\)` 
- Provided `\(|\phi | <1\)` and we restrict ourselves to bounded sequences, we can multiply by `\(\left(1-\phi L\right) ^{-1}\)` on both sides of the equality (the process is invertible),

`\begin{eqnarray}
\left( 1-\phi L\right)^{-1} \left( 1-\phi L\right) y_{t}&=&\left( 1-\phi L\right)^{-1}\varepsilon_{t}  \\
y_{t}&=&\left( 1-\phi L\right)^{-1}\varepsilon_{t}
\end{eqnarray}`

---
# AR models - Lag operators

- Under the assumption that `\(|\phi |<1\)`, we can apply the geometric rule,

`\begin{equation}
\left( 1-\phi L\right)^{-1}=\underset{j\rightarrow \infty }{\lim }\left( 1+\phi L+\left( \phi L\right)^{2}+ \ldots +\left( \phi L\right)^{j}\right) 
\end{equation}`

- This is based on the expression, `\(\left( 1-z\right)^{-1}=1+z+z^{2}+z^{3}+ \ldots \;\)`, which holds if `\(|z| < 1\)`
- Using this we can solve for,

`\begin{equation}
y_{t}=\varepsilon_{t}+\phi \varepsilon_{t-1}+\phi^{2}\varepsilon_{t-2}+\phi^{3}\varepsilon_{t-3}+ \ldots =\overset{\infty }{\underset{j=0}{\sum }}\phi^{j}\varepsilon_{t-j}  
\end{equation}`

---
# AR models - Lag operators

- This expression could be written as a `\(MA(\infty)\)`,

`\begin{equation}
y_{t}=\varepsilon_{t}+\theta_{1}\varepsilon_{t-1}+\theta_{2}\varepsilon_{t-2}+\theta_{3}\varepsilon_{t-3}+ \ldots =\overset{\infty}{\underset{j=0}{\sum }}\theta_{j}\varepsilon_{t-j}
\end{equation}`

- Therefore, when `\(|\phi |\)` `\(<1\)`,

`\begin{equation}
\overset{\infty }{\underset{j=0}{\sum }}|\theta _{j}|=\overset{\infty }{\underset{j=0}{\sum }}|\phi ^{j}|  
\end{equation}`

---
# AR models - Unconditional Moments

- The unconditional first-and second-order moments of a stable `\(AR(1)\)` process may be represented by an `\(MA(\infty)\)`,
- Where for  `\(y_{t}=\phi y_{t-1}+\varepsilon_{t}\)`,

`\begin{equation}
\mathbb{E}\left[ y_{t}\right] = \mathbb{E}\left[ \varepsilon_{t}+\phi \varepsilon_{t-1}+\phi^{2}\varepsilon_{t-2}+\phi^{3}\varepsilon_{t-3}+ \ldots \right] =0 
\end{equation}`

- The variance is then,

`\begin{eqnarray}
\gamma \left[ 0\right] &=&\mathsf{var}\left[ y_{t}\right] =\mathbb{E}\big[ y_{t}-\mathbb{E}\left[ y_{t}\right] \big]^{2} \\ 
&=&\mathbb{E}\left[ \varepsilon_{t}+\phi \varepsilon_{t-1}+\phi^{2}\varepsilon_{t-2}+\phi^{3}\varepsilon_{t-3}+ \ldots \right]^{2} \\
&=&\mathsf{var}\left[ \varepsilon_{t}\right] +\phi ^{2}\mathsf{var}\left[ \varepsilon_{t-1}\right] +\phi^{4}\mathsf{var}\left[\varepsilon_{t-2}\right] +\phi^{6}\mathsf{var}\left[ \varepsilon_{t-3}\right] + \ldots \\
&=&\left( 1+\phi^{2}+\phi^{4}+\phi^{6}+ \ldots \; \right) \sigma^{2} \\
&=&\frac{1}{1-\phi^{2}}\sigma^{2}  
\end{eqnarray}`

---
# AR models - Unconditional Moments

- The first order covariance is then,

`\begin{eqnarray}
\gamma \left( 1\right) &=&\mathbb{E}\Big[ \big(y_{t}-\mathbb{E}\left[ y_{t}\right] \big)\big(y_{t-1}-\mathbb{E}\left[ y_{t-1}\right] \big) \Big] \\
&=&\mathbb{E}\left[ (\varepsilon_{t}+\phi \varepsilon_{t-1}+\phi^{2}\varepsilon_{t-2}+ \ldots )\times (\varepsilon_{t-1}+\phi \varepsilon_{t-2}+ \ldots )\right]  \\
&=&\left( \phi +\phi^{3}+\phi^{5}+ \ldots \right) \sigma^{2}=\phi \left( 1+\phi^{2}+\phi^{4}+ \ldots \right) \sigma^{2}  \\
&=&\phi \frac{1}{1-\phi^{2}}\sigma^{2}  \\
&=&\phi \mathsf{var}\left[ y_{t}\right]  
\end{eqnarray}`

- While for `\(j>1\)` we have,

`\begin{equation}
\gamma \left( j\right) =\mathbb{E}\Big[ \big(y_{t}-\mathbb{E}\left[ y_{t}\right] \big)\big( y_{t-j}-E \left[ y-j\right] \big) \Big] =\phi^{j} \mathsf{var} \left[ y_{t}\right] 
\end{equation}`

- which proves the result relating to the stationarity of the `\(AR(1)\)` model when `\(|\phi|<1\)`

---
# AR models - Unconditional Moments

- As noted previously the ACF for an `\(AR(1)\)` process coincides with its impulse response function
- where for the ACF of an `\(AR(1)\)` for `\(j = 1, \ldots ,J\)`

`\begin{equation}
\rho \left( 0\right) =\frac{\gamma \left( 0\right) }{\gamma \left( 0\right) } =1,\ \rho \left( 1\right) =\frac{\gamma \left( 1\right) }{\gamma \left( 0\right) }=\phi , \ldots , \rho \left( j\right) =\frac{\gamma \left( j\right) }{\gamma \left( 0\right) }=\phi^{j}  
\end{equation}`

- Which equals the dynamic multipliers that may be summarised by the impulse response function

`\begin{equation}
\frac{\partial y_{t}}{\partial \varepsilon_{t}}=1,\frac{\partial y_{t}}{\partial \varepsilon_{t-1}}=\phi , \ldots , \frac{\partial y_{t}}{\partial \varepsilon_{t-j}}=\phi^{j}  
\end{equation}`

---
background-image: url(image/ar12_acf.svg)
background-position: top
background-size: 90% 90%

Figure 4: Autocorrelation functions for `\(AR(1)\)` processes

---
# AR models - Adding a constant

- To ascertain how the results change after adding a constant,

`\begin{equation}
y_{t}=\mu +\phi y_{t-1}+\varepsilon_{t}  
\end{equation}`

- We can define `\(\upsilon_{t}=\mu +\varepsilon_{t}\)`, such that,

`\begin{eqnarray}
y_{t} &=&\phi y_{t-1}+\upsilon_{t} \\
y_{t} &=&(1-\phi L)^{-1}\upsilon_{t}  \\
&=&\left( \frac{1}{1-\phi }\right) \mu +\varepsilon_{t}+\phi \varepsilon_{t-1}+\phi^{2}\varepsilon_{t-2}+ \ldots
\end{eqnarray}`

- with unconditional first moment,

`\begin{equation}
\mathbb{E}\left[ y_{t}\right] =\left( \frac{1}{1-\phi }\right) \mu  
\end{equation}`

- which does not depend on time

---
# AR models - Higher order processes

- For higher-order autoregressive processes, things become a bit more complicated, where

`\begin{equation}
y_{t}=\phi_{1}y_{t-1}+\phi_{2}y_{t-2}+\varepsilon_{t}  
\end{equation}`

- No longer able to consider the value of `\(\phi_{1}\)` alone to determine whether it is stationary
- To complete the process we have to rewrite the `\(AR(2)\)` expression as a first order difference equation

---
# AR models - Higher order processes

- Using a vector, `\(Z_{t}\)`, which is of dimension `\((2 \times 1)\)`,

`\begin{equation}
Z_{t}= \left[
\begin{array}{c}
{y_{t}}\\
{y_{t-1}}
\end{array}\right]
\end{equation}`

- With a vector for the errors,

`\begin{equation}
\upsilon _{t}= \left[
\begin{array}{c}
{\varepsilon_{t}}\\
{0}
\end{array}\right]
\end{equation}`

- And the `\((2 \times 1)\)` matrix for the coefficients,

`\begin{equation}
\Gamma =\left[
\begin{array}{cc}
\phi_{1} & \phi_{2} \\
1 & 0
\end{array}\right]
\end{equation}`

---
# AR models - Higher order processes

- The first-order vector difference equation can be written,

`\begin{equation}
Z_{t}=\Gamma Z_{t-1}+ \upsilon_{t}
\end{equation}`

- The matrix `\(\Gamma\)` is termed the *companion form* matrix of the `\(AR(2)\)` process
- To check for stationarity we can compute the eigenvalues of this matrix
- Moreover, the eigenvalues of `\(\Gamma\)` are two solutions of `\(x\)` polynomial that satisfy the characteristic equation:

`\begin{equation}
x^{2}-\phi_{1}x-\phi_{2}=0
\end{equation}`

---
# AR models - Higher order processes

- These eigenvalues `\((m_{1}\)` and `\(m_{2})\)` must then satisfy `\(\left( x-m_{1}\right) \left( x-m_{2}\right)\)`, and can be found from the formula:

`\begin{equation}
m_{1},m_{2}=\frac{\left( \phi_{1} \pm \sqrt{\phi_{1}^{2}+4\phi_{2}}\right)}{2}  
\end{equation}`

- Stationarity requires that the eigenvalues are less than one in absolute value
- In the `\(AR(2)\)` case, one can show that this will be the case if,

`\begin{eqnarray}
\phi_{1}+\phi_{2} &<&1  \\
-\phi_{1}+\phi_{2} &<&1  \\
\phi_{2} &>&-1 
\end{eqnarray}`

---
background-image: url(image/eigen.svg)
background-position: top
background-size: 85% 85%

Figure 5: Eigenvalues for difference equation `\(x^{2}- 0.6 x - 0.2=0\)`

---
# AR models - Higher order processes

- The `\(AR(p)\)` can then be written as,

`\begin{equation}
y_{t}=\phi_{1} y_{t-1}+\phi_{2} y_{t-2}+ \ldots + \phi_{p}y_{t-p}+\varepsilon_{t}  
\end{equation}`

- Checking for stationarity involves similar calculations
- In this case the `\(\Gamma\)` matrix will be of the form:

`\begin{equation}
\Gamma =\left[ 
\begin{array}{cccccc}
\phi _{1} & \phi _{2} & \phi _{2} & \dots & \phi _{p-1} & \phi _{p} \\ 
1 & 0 & 0 & \dots & 0 & 0 \\ 
0 & 1 & 0 & \dots & 0 & 0 \\ 
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 
0 & 0 & 0 & \dots & 1 & 0
\end{array}\right]  
\end{equation}`

- Provided the eigenvalues are less than one in absolute value, (i.e. they lie within the unit circle), the `\(p^{\text{th}}\)` order autoregression is stable

---
# AR models - Identify the order of AR(p)

- As in the case of the `\(MA(q)\)` processes one could use the ACF coefficients to identify the order of the `\(AR(p)\)` process
- However, as the `\(AR(p)\)` process passes on the persistence to successive lags so the ACF would not be useful
- As the PACF removes the effects of the persistence that is passed on from intervening lags of the `\(AR(p)\)` process it may be used to identify the order of an `\(AR(p)\)` process

---
# ARMA models

- We can specify an `\(ARMA(1,1)\)` process as,

`\begin{equation}
y_{t}=\phi y_{t-1}+\varepsilon_{t}+\theta \varepsilon_{t-1}
\end{equation}`

- Or using the lag polynomials, a general form of an ARMA model is,

`\begin{equation}
\phi \left( L\right) y_{t}= \theta \left( L\right) \varepsilon_{t}
\end{equation}`

- Note that the number of lags, `\((p)\)` and `\((q)\)`, can differ
- For instance, an `\(ARMA(2,1)\)` combines an `\(AR(2)\)` with an `\(MA(1)\)`:

`\begin{eqnarray}
\left( 1-\phi_{1}L-\phi_{2}L^{2}\right) y_{t} &=&\left( 1+\theta_{1}L\right) \varepsilon_{t}\\
y_{t} &=&\phi_{1}y_{t-1}+\phi_{2}y_{t-2}+\varepsilon_{t} +\theta_{1}\varepsilon_{t-1}
\end{eqnarray}`

---
# ARMA processes

- Whether an `\(ARMA(p,q)\)` process is stationary depends solely on its autoregressive past
- Assume an `\(ARMA(1,1)\)` and using the lag operator,

`\begin{equation}
\left( 1-\phi L\right) y_{t}=\left( 1+\theta L\right) \varepsilon_{t}
\end{equation}`

- Multiplying by `\(\left( 1-\phi L\right)^{-1}\)` on both sides,

`\begin{eqnarray}
y_{t} &=&\frac{\left( 1+\theta L\right) }{\left( 1-\phi L\right) } \varepsilon_{t} \\
 &=&\left( 1-\phi L\right)^{-1}\varepsilon_{t} + \left( 1-\phi L \right)^{-1} \theta_{1} \varepsilon_{t-1}
\end{eqnarray}`

---
# ARMA processes

- When `\(|\phi | < 1\)`, this can be written as the geometric process,

`\begin{eqnarray}
y_{t} &=&\overset{\infty }{\underset{j=0}{\sum }}\left(\phi L\right)^{j}\varepsilon_{t}+\theta 
\overset{\infty }{\underset{j=0}{\sum }}\left(\phi L\right)^{j}\varepsilon_{t-1}\\
 &=&\varepsilon_{t}+\overset{\infty }{\underset{j = 1}{\sum }}\phi^{j}\varepsilon_{t-j}+\theta \overset{\infty }{\underset{j=1}{\sum }}\phi^{j-1}\varepsilon_{t-j}\\
&=&\varepsilon_{t}+\overset{\infty }{\underset{j=1}{\sum }}\left( \phi^{j}+\theta \phi^{j-1}\right) \varepsilon_{t-1}      
\end{eqnarray}`

---
background-image: url(image/arma1_acf.svg)
background-position: top
background-size: 90% 90%

Figure 6: ACF and PACF for `\(AR(1)\)` with `\(\phi=0.5\)`

---
background-image: url(image/arma2_acf.svg)
background-position: top
background-size: 90% 90%

Figure 7: ACF and PACF for `\(MA(1)\)` with `\(\theta=0.6\)`

---
background-image: url(image/arma3_acf.svg)
background-position: top
background-size: 90% 90%

Figure 8: ACF and PACF for `\(ARMA(1,1)\)` with `\(\phi=0.5\)` and `\(\theta=0.6\)`

---
# Autocorrelation patterns

- When combining the AR and MA correlation functions the results may be somewhat unclear
- Possibly `\(ARMA(2,2)\)`, `\(ARMA(1,2)\)`, `\(ARMA(2,1)\)`, `\(ARMA(1,1)\)`, `\(MA(2)\)` or `\(AR(2)\)`

---
# Seasonal ARMA Models

- In several cases the dependence on the past occurs with a seasonal lag `\(s\)`
- With monthly economic data the behaviour in Jan 2010 may be related to Jan 2011
- Could introduce autoregressive and moving average terms that arise at a seasonal interval
- For example, `\(ARMA(p,q)_s\)` model that takes the form `\(ARMA(1,1)_{12}\)` would be written as,

`\begin{eqnarray*}
y_t = \phi y_{t-12} + \varepsilon_t + \theta \varepsilon_{t-12}
\end{eqnarray*}`

- Estimation is relatively straightforward

---
# Seasonal ARMA Models - Identification

- The `\(MA(1)\)` with a seasonal `\((s = 12)\)`, which could be written as, `\(y_t =  \varepsilon_t + \theta \varepsilon_{t-12}\)`
- It is easy to verify that

`\begin{eqnarray}
\gamma(0) &=& (1 + \theta^2)\sigma^2\\
\gamma(12) &=& \theta \sigma^2\\
\gamma(j) &=& 0, \;\; \text{for values where } j \ne 12
\end{eqnarray}`

- The only non-zero autocorrelation, aside from lag zero is, `\(\rho(12) = \theta / (1+\theta^2)\)`

---
# Seasonal ARMA Models - Identification

- Similarly, for the `\(AR(1)\)` model with seasonal `\((s = 12)\)`, we could calculate,

`\begin{eqnarray}
\gamma(0) &=& \sigma^2 / (1 - \phi^2)\\
\gamma(12) &=& \sigma^2 \phi^k /( 1 - \phi^2) \;\; \text{for } k = 1, 2, \ldots\\
\gamma(j) &=& 0, \;\; \text{for values where } j \ne 12
\end{eqnarray}`

- Results suggest the PACF from non-seasonal are analogous to the seasonal models

---
# Seasonal ARMA Models - Identification

- Could allow for mixed seasonal models in the general `\(ARMA(p,q)_s\)` framework,

`\begin{eqnarray}
y_t = \phi y_{t-12} + \varepsilon_t + \theta \varepsilon_{t-1}
\end{eqnarray}`

- While estimation would be straightforward, the identification of the structural form may be problematic

---
# Box-Jenkins methodology

- In a real world application we would not know the functional form of the underlying data generating process
- The respective parameters in these models would then need to be estimated
- Thereafter, we could assess the model fit
- This procedure is encapsulated in the Box & Jenkins (1979) methodology
  - Identification, Estimation, Diagnostic testing

---
# Box-Jenkins - Identification

- Examine the time plot of the data to
    - detect and correct for outliers, missing values, structural breaks (if possible)
    - detect nonstationary by a pronounced trend or prolonged meander (possibly correct)

- If you are uncertain about the degree of stationarity then perform unit root tests
    - plot ACF and PACF to consider the persistence in the data
    - when ACF quickly returns to zero then there will be no unit root

- Alternatively, if you think that the data represents white noise then use `\(Q\)`-statistic

---
# Box-Jenkins - Identification

- Calculate `\(Q\)`-statistic to test whether a group of autocorrelations is different from zero
- Originally developed by Box-Pierce (1970) better small-sample performance reported by Ljung and Box (1978)

`\begin{eqnarray}
Q = T(T +2) \sum_{k=1}^{s}\rho_j^2 / (T-j)
\end{eqnarray}`

- If sample value of `\(Q\)` exceeds the critical value `\(\chi^2\)` with `\(s\)` degrees of freedom then at least one value of `\(\rho_j\)` is statistically different from zero at specified significance level

---
# Box-Jenkins - Identification

- Examine the ACF and PACF functions more closely to try to identify the order of a potential `\(ARMA(p,q)\)`
- For the ACF and PACF functions that were provided previously we would consider an `\(ARMA(2,2)\)`, an `\(ARMA(1,2)\)`, an `\(ARMA(2,1)\)` or an `\(ARMA(1,1)\)`
- Would also think about using a `\(MA(2)\)` or an `\(AR(2)\)`, but not a `\(MA(1)\)` or an `\(AR(1)\)`

---
# Box-Jenkins - Estimation Stage

Fit each of the candidate models and examine the various `\(\phi_i\)` and `\(\theta_i\)` coefficients according to:

- Parsimony: 
  - Additional coefficients increase fit but reduce degrees of freedom
  - Parsimonious models often produce better out-of-sample fit

- Stationarity and Invertibility:
  - Distribution theory underlying the use of sample ACF and PACF as approximations for the true DGP assume that `\({y_t}\)` is stationary
  - `\(t\)`-statistics and `\(Q\)`-statistics presume that the data is stationary
- Be suspicious if the estimated value of `\(|\phi_1|\)` is close to unity
- Model must be invertible since the ACF and PACF assume that `\(y_t\)` can be approximated by an `\(AR(1)\)` model where `\(|\phi_1|<1\)`

---
# Box-Jenkins - Estimation Stage

- To evaluate the different candidate models consider the goodness-of-fit measures:
    - Look at `\(R^2\)` and average of the residual sum of squares
    - AIC and BIC are more suitable criteria since they weigh-up parsimony and "goodness-of-fit"
    - Smaller values of a AIC are better (or where AIC `\(< 0\)`, choose the model with the most negative statistic)

---
# Box-Jenkins estimation - AIC & BIC

- Adding additional lags will reduce the sum of squares of the estimated residuals (and will lead to a higher `\(R^2\)`)
- But you will also loose degrees of freedom (which may be essential)
- Akaike and Bayesian Information Criteria test for goodness of fit, while prizing parsimony

`\begin{eqnarray}
AIC = \log \hat{\sigma}^2_k+\frac{T+ 2k}{T}\\
BIC = \log \hat{\sigma}^2_k+\frac{k \log T}{T}
\end{eqnarray}`

- where `\(k =\)` number of estimated parameters and `\(T=\)` nobs
- `\(\hat{\sigma}^2_k = \frac{SSR_k}{T}\)` is the variance of the residual sum of squares

---
# Box-Jenkins - Estimation Stage

- Make sure `\(T\)` is fixed, when comparing `\(AR(1)\)` & `\(AR(2)\)`
- Including a parameter must decrease `\(SSR_k\)` if AIC or BIC is to decrease
- Since `\(\log T\)` is greater than `\(2\)`, BIC likes more parsimonious models

---
# Box-Jenkins - Diagnostic checking

- Plot the residuals to look for outliers or periods where the model does not fit the data
- Construct ACF and PACF of the residuals
- Serial correlation in the residuals implies that a systematic movement in the `\(y_t\)` sequence is not accounted for by the `\(ARMA(p,q)\)` coefficients
  - Those models should be eliminated and re-estimated
  - Use `\(Q\)`-statistic to determine whether any or all of the ACF or PACF coefficients are significant

- When applying the `\(Q\)`-statistic to the residuals of an `\(ARMA(p,q)\)` model use `\(\chi^2\)` with `\(s-p-q\)` degrees of freedom
  - Ensure that the standard errors for the coefficient estimates are appropriate, if not re-estimate model

---
# Box-Jenkins - Diagnostics & Forecasts

- If possible fit `\(ARMA(p,q)\)` models to subsamples - stability of DGP

`\begin{eqnarray}
F =  \frac{(RSS - RSS_1 - RSS_2)/n}{(RSS_1 + RSS2)/(T-2k)}
\end{eqnarray}`

- where `\(k\)` is the number of parameters, i.e. `\(p + q + 1\)` (with constant)
- If all the coefficients are equal `\((RSS_1 + RSS_2)\)` should equal `\(RSS\)` and `\(F=0\)`  
- You could then use the model for forecasting `\(y_{T+1}, y_{T+1}, \ldots\)` for out-of-sample comparison

---
# Structural Breaks - Chow's Breakpoint

- Model two sub-samples of the data and see whether there are significant differences in the parameters
- Test whether the null hypothesis of "no structural change" holds after constructing a `\(F\)` test statistic for the parameters
- Could construct a model for changes at date `\(\tau\)`

`\begin{eqnarray}
y_t = x_t^{\top} \beta_t + \varepsilon_t
\end{eqnarray}`

- where

`\begin{equation}
\beta_{t} = \left\{
\begin{array}{lcl}
\beta & \; & t \leq \tau \\ 
\beta + \delta & \; & t > \tau \\ 
\end{array}\right.
\end{equation}`

- Or alternatively we could test for change in all the model parameters with an `\(F\)` test

---
# Structural Breaks - Chow's Breakpoint

- Major drawback is that the change point must be known *a priori* 
- Must ensure that each sub-sample has at least as many observations as the number of estimated parameters

---
# Structural Breaks - Quandt LR Test

- Extension of the Chow test where an `\(F\)` test statistic is calculated for all potential breakpoints within an interval `\([\underline{i}, \overline{\imath}]\)`
- Reject the null hypothesis of no structural change if the absolute value of any of the test statistics are too large
- Takes the form of a sup `\(F\)` test 
- Asymptotic properties of this statistic are non-standard so use those that are referenced in the notes

---
background-image: url(image/qlr.svg)
background-position: top
background-size: 90% 90%

Figure 9: Quandt Likelihood Ratio Test - Breakpoint at observation 100 with `\(n=200\)` and `\(p =0.00\)`

---
# Structural Breaks - CUSUM Test

- CUSUM test is based on the cumulative sum of the recursive residuals
- Plot the cumulative sum together with the 5% critical boundaries 
- If the cumulative sum breaks either of the two boundaries there is parameter instability and a possible structural break
- Need to specify the model *a priori* to obtain the residuals

---
background-image: url(image/cusum.svg)
background-position: top
background-size: 90% 90%

Figure 10: CUSUM Test - Breakpoints for change in coeffic

---
# Conclusion

- Relatively simple ARMA models can be used to describe stationary univariate time series
- Easy to estimate and use of the straightforward Box & Jenkins method can identify possible functional forms for the underlying data generating process
- It is possible to test the data generating process for structural breaks