Structural vector autoregressive models

# Structural vector autoregressive models
### Kevin Kotzé

---

---
# Contents

1. Introduction
1. Estimation & Identification
1. Impulse Response Functions
1. Variance Decompositions
1. Alternative restrictions for coefficient matrix
1. Long-run restrictions

---
# Introduction

- SVAR models allow for:
    - contemporaneous variables that may be treated as explanatory variables
    - specific restrictions on the parameters in the coefficient and residual covariance matrices
- Allowing for contemporaneous variables is important in many economic studies, where we often deal with quarterly data
- Allows for the identification of specific independent shocks that are not affected by covariance terms

---
# Introduction

- With the VAR model, errors must have positive definite covariance matrix
- This leads to difficulties when trying to evaluate the effect of an independent shock
- SVAR models become an indispensable tool for studying relationships and the effects of shocks in macroeconomics

---
# Incorporating contemporaneous variables

- Start off by assuming that each variable is symmetrical
- For the two variable case let,
    - `\(y_{1,t}\)` be affected by current and past realizations of `\(y_{2,t}\)`
    - `\(y_{2,t}\)` be affected by current and past realizations of `\(y_{1,t}\)`
`\begin{eqnarray}
y_{1,t} = b_{10} - b_{12} y_{2,t} + \gamma_{11}y_{1,t-1} + \gamma_{12}y_{2,t-1} + \varepsilon_{1,t} \\
y_{2,t} = b_{20} - b_{21} y_{1,t} + \gamma_{21}y_{1,t-1} + \gamma_{22}y_{2,t-1} + \varepsilon_{2,t}
\end{eqnarray}`
    - where both `\(y_{1,t}\)` and `\(y_{2,t}\)` are stationary
    - `\(\varepsilon_{1,t}\)` and `\(\varepsilon_{2,t}\)` are white noise with `\(\sigma_1\)` and `\(\sigma_2\)` std
    - `\(\varepsilon_{1,t}\)` and `\(\varepsilon_{2,t}\)` are uncorrelated, since we want to identify the effect of each independent shock
    - Hence covariance elements in `\(\Sigma_\varepsilon\)` are set to zero
- Note: `\(b_{12}\)` describes the contemporaneous effect of a change in `\(y_{2,t}\)` on `\(y_{1,t}\)` and vice versa for `\(b_{21}\)`

---
# Incorporating contemporaneous variables

- Given the model:
`\begin{eqnarray}
y_{1,t} = b_{10} - b_{12} y_{2,t} + \gamma_{11}y_{1,t-1} + \gamma_{12}y_{2,t-1} + \varepsilon_{1,t} \\
y_{2,t} = b_{20} - b_{21} y_{1,t} + \gamma_{21}y_{1,t-1} + \gamma_{22}y_{2,t-1} + \varepsilon_{2,t}
\end{eqnarray}`
    - There will be an indirect contemporaneous effect of `\(\varepsilon_{1,t}\)` on `\(y_{2,t}\)` if `\(b_{21} \ne 0\)`
    - Similarly, `\(\varepsilon_{2,t}\)` affects `\(y_{1,t}\)` if `\(b_{12} \ne 0\)`
- Much richer characterisation of dynamics than in previous lecture
    - In previous model, `\(\varepsilon_{2,t}\)` could only affect `\(y_{1,t-1}\)`, and v.v.
- However, the inclusion of contemporaneous parameters does present some challenges with parameter estimation

---
# Standard VAR: Structural Form

- To express the above *structural-form* of the model as a *reduced-form* expression:
`\begin{eqnarray}
B \boldsymbol{y}_t = \Gamma_0 + \Gamma_1 \boldsymbol{y}_{t-1} + \varepsilon_t
\end{eqnarray}`
- where
`\begin{eqnarray}
B =\left[ \begin{array}{cc}
 1 & b_{12} \\
 b_{21} &1
 \end{array} \right],
\hspace{0.5cm} \boldsymbol{y}_t = \left[ \begin{array}{c}
 y_{1,t} \\
 y_{2,t}
 \end{array} \right],
\hspace{0.5cm} \Gamma_0 = \left[ \begin{array}{c}
 b_{10} \\
 b_{20}
 \end{array} \right]
\end{eqnarray}`
`\begin{eqnarray}
\Gamma_1 =\left[ \begin{array}{cc}
 \gamma_{11} & \gamma_{12} \\
 \gamma_{21} & \gamma_{22} \\
 \end{array} \right], \hspace{0.5cm}  \text{and   } \;\;
\varepsilon_t = \left[ \begin{array}{c}
 \varepsilon_{1,t} \\
 \varepsilon_{2,t}
 \end{array} \right]
\end{eqnarray}`

---
# Standard VAR: Reduced-Form

- Premultiplication by `\(B^{-1}\)` gives us the VAR in *reduced-form*:
`\begin{eqnarray}
\boldsymbol{y}_t = A_0 + A_1 \boldsymbol{y}_{t-1} + \boldsymbol{u}_t
\end{eqnarray}`
- where `\(A_0 = B^{-1} \Gamma_0\)`,  `\(A_1 = B^{-1}\Gamma_1\)`  and  `\(\boldsymbol{u}_t = B^{-1}\varepsilon_t\)`
- Now where:
    - `\(a_{i0}\)` is the `\(i\)` element in `\(A_0\)`
    - `\(a_{ij}\)` is row `\(i\)` column `\(j\)` of matrix `\(A_1\)`
    - `\(\boldsymbol{u}_{t}\)` has elements `\(u_{1,t}\)` and `\(u_{2,t}\)`
`\begin{eqnarray}
y_{1,t} = a_{10} + a_{11}y_{1,t-1} + a_{12}y_{2,t-1} + u_{1,t}  \\
y_{2,t} = a_{20} + a_{21}y_{1,t-1} + a_{22}y_{2,t-1} + u_{2,t}
\end{eqnarray}`

---
# Standard VAR: Reduced-Form

- By using the relationship `\(\boldsymbol{u}_t = B^{-1}\varepsilon_t\)`, or:
`\begin{eqnarray}
 \left[ \begin{array}{c}
 u_{1,t} \\
 u_{2,t}
 \end{array} \right]
 =\left[ \begin{array}{cc}
 1 & b_{12} \\
 b_{21} &1
 \end{array} \right]^{-1}
  \left[ \begin{array}{c}
 \varepsilon_{y,t} \\
 \varepsilon_{2,t}
 \end{array} \right]
\end{eqnarray}`
- We can show that,
`\begin{eqnarray}
u_{1,t} = (\varepsilon_{1,t} - b_{12}\varepsilon_{2,t})/(1-b_{12}b_{21})\\
u_{2,t} = (\varepsilon_{2,t} - b_{21}\varepsilon_{1,t})/(1-b_{12}b_{21})
\end{eqnarray}`

---
# Standard VAR: Variance/covariance

- Since `\(\varepsilon_{1,t}\)` and `\(\varepsilon_{2,t}\)` are white noise processes
    - The residuals `\(u_{1,t}\)` and `\(u_{2,t}\)` have zero means, constant variances, and have little autocorrelation
    - However, as  `\(\boldsymbol{u}_{t}\)` is dependent upon both `\(\varepsilon_{1,t}\)` and `\(\varepsilon_{2,t}\)`, there may be some evidence of covariation
- The covariance of the two terms is:
`\begin{eqnarray}
\mathsf{cov} \left[ u_{1,t}, u_{2,t} \right] & = & \mathbb{E}\left[(\varepsilon_{1,t}-b_{12}\varepsilon_{2,t})(\varepsilon_{2,t}-b_{21}\varepsilon_{1,t})\right] / (1-b_{12}b_{21})^2 \\
& = & -\left[(b_{21}\sigma_1^2 + b_{12} \sigma_{2}^2)\right] / (1-b_{12}b_{21})^2
\end{eqnarray}`
- Since they are all time invariant, the variance/covariance matrix will be,
`\begin{eqnarray}
\Sigma_{\boldsymbol{u}} =\left[ \begin{array}{cc}
 \sigma_{11} & \sigma_{12} \\
 \sigma_{21} & \sigma_{22} \\
 \end{array} \right]
\end{eqnarray}`
- where `\(\mathsf{var}[ u_{i,t} ] = \sigma_{ii}\)` and `\(\sigma_{12} = \sigma_{21} = \mathsf{cov} \big[ u_{1,t}, u_{2,t}\big]\)`

---
# Estimation

- Note that in the *Reduced-Form*:
    - RHS contains only predetermined variables
    - Error terms are serially uncorrelated with constant variance
- Hence we can use OLS - consistent and asymptotically efficient

---
# Identification

- The structural equations can't be estimated directly (due to feedback effects from contemporaneous variables)
    - However, we can estimate the *reduced-form* of the VAR model
    - This would allow for us to obtain the residuals `\(u_{1,t}\)` and  `\(u_{2,t}\)` and the coefficients in the `\(A_0\)` and `\(A_1\)` matrices
    - Could we use these to recover the *structural-form* parameter estimates given the relationships between the structural and reduced forms?

---
# Identification

- Unfortunately not, since the  *structural-form* contains 10 parameters:
    - `\(b_{10}, b_{20}, \gamma_{11}, \gamma_{12}, \gamma_{21}, \gamma_{22}, b_{12}, b_{21}, \sigma_1, \sigma_2\)`
- while the *reduced-form* contains 9 parameters:
    - `\(a_{10}, a_{20}, a_{11}, a_{12}, a_{21}, a_{22}, \mathsf{var}[u_{1,t}], \mathsf{var}[u_{2,t}], \mathsf{cov}[u_{1,t},u_{2,t}]\)`
- And there is no mapping that enables us to obtain the *structural-form* parameters from the *reduced-form* parameters

---
# Identification

- However, it may be possible to show that:
    - If one variable in the *structural-form* is restricted to a calibrated value then the structural system could be exactly identified?????

---
# Recursive estimation

- Consider the method of recursive estimation (Sims, 1980)
    - Suppose that you are willing to assume that `\(b_{21} = 0\)` in the structural system:
`\begin{eqnarray}
y_{1,t} = b_{10} - b_{12} y_{2,t} + \gamma_{11}y_{1,t-1} + \gamma_{12}y_{2,t-1} + \varepsilon_{1,t}\\
y_{2,t} = b_{20} \hspace{1.26cm} + \gamma_{21}y_{1,t-1} + \gamma_{22}y_{2,t-1} + \varepsilon_{2,t}
\end{eqnarray}`
`\begin{eqnarray}
\text{such that   } \; B^{-1} =\left[ \begin{array}{cc}
 1 & - b_{12} \\
 0 &1
 \end{array} \right]
\end{eqnarray}`
- Premultiplying by `\(B^{-1}\)` yields
`\begin{eqnarray}
\left[ \begin{array}{c}
 y_{1,t} \\
 y_{2,t}
 \end{array} \right] =
\left[ \begin{array}{c}
 b_{10}-b_{12}b_{20} \\
 b_{20}
 \end{array} \right] +
\left[ \begin{array}{cc}
 \gamma_{11} - b_{12} \gamma_{21} & \gamma_{12} - b_{12} \gamma_{22}\\
 \gamma_{21} & \gamma_{22}
 \end{array} \right] \cdot
\end{eqnarray}`
`\begin{eqnarray}
\left[ \begin{array}{c}
 y_{1,t-1} \\
 y_{2,t-1}
 \end{array} \right] +
\left[ \begin{array}{c}
 \varepsilon_{1,t} -b_{12} \varepsilon_{2,t} \\
 \varepsilon_{2,t}
 \end{array} \right]
\end{eqnarray}`

---
# Recursive estimation

- Take note of the previous expression:
`\begin{eqnarray}
\left[ \begin{array}{c}
 y_{1,t} \\
 y_{2,t}
 \end{array} \right] = \dots +
\left[ \begin{array}{c}
 \varepsilon_{1,t} -b_{12} \varepsilon_{2,t} \\
 \varepsilon_{2,t}
 \end{array} \right]
\end{eqnarray}`
- Hence, by setting `\(b_{21} = 0\)`, the shocks from `\(\varepsilon_{1,t}\)` do not effect contemporaneous values of `\(y_{2,t}\)`
- However both `\(\varepsilon_{1,t}\)` and `\(\varepsilon_{2,t}\)` affect `\(y_{1,t}\)`
- Note also that `\(\varepsilon_{1,t-1}\)` could still influence `\(y_{2,t}\)` through its effect on `\(y_{1,t-1}\)` 
- Furthermore, by returning to the relationship `\(\boldsymbol{u}_t = B^{-1}\varepsilon_t\)`,
`\begin{eqnarray}
 \left[ \begin{array}{c}
 u_{1,t} \\
 u_{2,t}
 \end{array} \right]
 =\left[ \begin{array}{cc}
 1 & b_{12} \\
 0 & 1
 \end{array} \right]^{-1}
  \left[ \begin{array}{c}
 \varepsilon_{1,t} \\
 \varepsilon_{2,t}
 \end{array} \right]
\end{eqnarray}`
- We have `\(\varepsilon_{2,t}=u_{1,t}\)`, and using `\(b_{12} =  - \mathsf{cov} [ u_{1,t}, u_{2,t}] / \sigma_2^2\)`, which allows us to get `\(\varepsilon_{1,t} = b_{12}\varepsilon_{2,t} + u_{1,t}\)`

---
# Mapping the reduced to structural form
- From the reduced form (where all the coefficient matrices are premultiplied by `\(B^{-1}\)`);
`\begin{eqnarray}
y_{1,t} = a_{10} + a_{11}y_{1,t-1} + a_{12}y_{2,t-1} + u_{1,t} \\
y_{2,t} = a_{20} + a_{21}y_{1,t-1} + a_{22}y_{2,t-1} + u_{2,t}
\end{eqnarray}`
`\begin{eqnarray}
\begin{array}{lcl}
 a_{10} = b_{10} - b_{12}b_{20} & \; & a_{11} = \gamma_{11} - b_{12}\gamma_{21} \\
 a_{12} = \gamma_{12} - b_{12}\gamma_{22} & \; & a_{20} = b_{20} \\
 a_{21} = \gamma_{21} & \; & a_{22} = \gamma_{22}
 \end{array}
\end{eqnarray}`
`\begin{eqnarray}
\begin{array}{l}
 \mathsf{var}[u_1] = \sigma_1^2 + b_{12}^2 \sigma_2^2 \\
 \mathsf{var}[u_2] = \sigma_2^2\\
 \mathsf{cov}[u_1, u_2] = -b_{12}\sigma_2^2
 \end{array}
\end{eqnarray}`

---
# Cholesky decomposition

- In the above example, we were able to recover the `\(\varepsilon_{1,t}\)` and `\(\varepsilon_{2,t}\)` sequences use the relationship `\(u_{1,t} = \varepsilon_{1,t}-b_{12}\varepsilon_{2,t}\)` and `\(u_{2,t} = \varepsilon_{2,t}\)`
    - When `\(b_{21}=0\)`, `\(y_{1,t}\)` does not have a contemporaneous effect on `\(y_{2,t}\)` and `\(\varepsilon_{1,t}\)` does not affect `\(y_{2,t}\)`
    - Observed values of `\(u_{2,t}\)` are attributed to pure shocks in `\(y_{2,t}\)`
    - This procedure of setting the the lower triangle of the `\(B\)` coefficient matrix equal to zero is termed applying the Cholesky decomposition
    - It turns out that the number of restrictions that we need to impose is equivalent to the number of terms in the lower (or upper) triangle of the `\(B\)` matrix, which is `\([(K^2-K)/2]\)`
    - The alternative ordering of the Cholesky decomposition is to let `\(b_{12}=0\)` (i.e. the upper triangle)

---
# IRF: MA representation

- In many cases it is useful to express a `\(AR(p)\)` process as a `\(MA(q)\)` process
    - For example, the stationary univariate `\(AR(1)\)` model:
`\begin{eqnarray}
y_t =  \phi y_{t-1} + \varepsilon_t
\end{eqnarray}`
    - has the `\(MA(\infty)\)` representation,
`\begin{eqnarray}
y_t =  \sum_{i=0}^{\infty} \theta_i \varepsilon_{t-i}
\end{eqnarray}`
- This representation is particularly useful for calculating impact multipliers and impulse response functions

---
# VMA representation

- Just as every stable `\(AR(p)\)` has a `\(MA(q)\)` representation; every `\(VAR(p)\)` has a `\(VMA(q)\)` representation
- From;
`\begin{eqnarray}
\left[ \begin{array}{c}
 y_{1,t} \\
 y_{2,t}
 \end{array} \right] =
\left[ \begin{array}{c}
 a_{10} \\
 a_{20}
 \end{array} \right] +
\left[ \begin{array}{cc}
 a_{11}& a_{12}\\
 a_{21} & a_{22}
 \end{array} \right] \cdot
\left[ \begin{array}{c}
 y_{1,t-1} \\
 y_{2,t-1}
 \end{array} \right] +
\left[ \begin{array}{c}
 u_{1,t} \\
 u_{2,t}
 \end{array} \right]
\end{eqnarray}`
- Where `\(\mu_1\)` and `\(\mu_2\)` are mean values for `\(y_{1,t}\)` and `\(y_{2,t}\)`;
`\begin{eqnarray}
\left[ \begin{array}{c}
 y_{1,t} \\
 y_{2,t}
 \end{array} \right] =
\left[ \begin{array}{c}
 \mu_1 \\
 \mu_2
 \end{array} \right] + \sum_{i=0}^\infty
\left[ \begin{array}{cc}
 a_{11}& a_{12}\\
 a_{21} & a_{22}
 \end{array} \right]^i \cdot
\left[ \begin{array}{c}
 u_{1,t-i} \\
 u_{2,t-i}
 \end{array} \right]
\end{eqnarray}`

---
# VMA representation

- Now since, `\(\boldsymbol{u}_t = B^{-1}\varepsilon_t\)`, and where,

`\begin{eqnarray}
B^{-1} = \frac{1}{\det}
\left[ \begin{array}{cc}
 1 & - b_{12}\\ - b_{21} & 1
 \end{array} \right] = \frac{1}{1-b_{12}b_{21}}
\left[ \begin{array}{cc}
 1& - b_{12}\\ - b_{21} & 1
 \end{array} \right]
\end{eqnarray}`

- We have:

`\begin{eqnarray}
\left[ \begin{array}{c}
 u_{1,t} \\ u_{2,t}
 \end{array} \right] =
 \frac{1}{1-b_{12}b_{21}} \sum_{i=0}^\infty \cdot
\left[ \begin{array}{cc}
 1& - b_{12}\\ - b_{21} & 1
 \end{array} \right]
\left[ \begin{array}{c}
 \varepsilon_{1,t} \\
 \varepsilon_{2,t}
 \end{array} \right]
\end{eqnarray}`

- such that the SVAR model can be written as,

`\begin{eqnarray}
\left[ \begin{array}{c}
 y_{1,t} \\ y_{2,t}
 \end{array} \right] =
\left[ \begin{array}{c}
 \mu_1 \\ \mu_2
 \end{array} \right] + \frac{1}{1-b_{12}b_{21}} \sum_{i=0}^\infty
\left[ \begin{array}{cc}
 a_{11}& a_{12}\\ a_{21} & a_{22}
 \end{array} \right]^i \cdot
\left[ \begin{array}{cc}
 1& - b_{12}\\ - b_{21} & 1
 \end{array} \right]
\left[ \begin{array}{c}
 \varepsilon_{1,t-i} \\ \varepsilon_{2,t-i}
 \end{array} \right]
\end{eqnarray}`

- This expression may be used to describe the effect of a shock in `\(\varepsilon_t\)` on the endogenous variables

---
# VMA representation

- The impact multipliers, which describe the effect of shocks on the endogenous variables, are summarised in matrix `\(\Theta_i\)`
`\begin{eqnarray}
\Theta_i =
\left[ \begin{array}{cc}
 \theta_{11}& \theta_{12}\\ \theta_{21}& \theta_{22}
 \end{array} \right]_i
= \frac{a_1^i}{1-b_{12}b_{21}}
\left[ \begin{array}{cc}
 1& - b_{12}\\ - b_{21} & 1
 \end{array} \right]
\end{eqnarray}`
- where `\(\mu = [ \mu_1\; \mu_2 ]^{\prime}\)` and `\(\boldsymbol{y}_t = [ {y_{1,t}}\; {y_{2,t}} ]^{\prime}\)` we are left with,
`\begin{eqnarray}
\boldsymbol{y}_t = \mu +  \sum_{i=0}^\infty \Theta_i \varepsilon_{t-i}
\end{eqnarray}`
- This is a particularly useful expression, as the `\(\Theta_i\)` matrix describes the effects of the shocks, `\(\varepsilon_{1,t}\)` and `\(\varepsilon_{2,t}\)` on the entire paths of `\(y_{1,t}\)` and `\(y_{2,t}\)`

---
# VMA representation
 
- For example, where the numbers in brackets refer to the lags of `\(\theta_{jk}(i)\)`:
    - `\(\theta_{12}(0)\)` is the instant impact of 1 unit change in `\(\varepsilon_{2,t}\)` on `\(y_{1,t}\)`
    - `\(\theta_{11}(1)\)` is the instant impact of 1 unit change in `\(\varepsilon_{1,t-1}\)` on `\(y_{1,t}\)`
    - `\(\theta_{12}(1)\)` is the instant impact of 1 unit change in `\(\varepsilon_{2,t-1}\)` on `\(y_{1,t}\)`

---
# Impulse response functions

- The impact multipliers `\(\theta_{11}(i), \theta_{12}(i), \theta_{21}(i)\)` and `\(\theta_{22}(i)\)` are used to generate the impulse response functions for different values of `\(i\)`
    - Visually represent the behaviour of `\(y_{1,t}\)` and `\(y_{2,t}\)` in response to various shocks, `\(\varepsilon_{1,t}\)` and `\(\varepsilon_{2,t}\)`
- To avoid the problem of an under-identified system we use the Cholesky decomposition;
`\begin{eqnarray}
u_{1,t} = \varepsilon_{1,t} - b_{12} \varepsilon_{2,t}\\
u_{2,t} =  \varepsilon_{2,t}
\end{eqnarray}`
    - Note that all the errors from `\(u_{2,t}\)` are attributed to `\(\varepsilon_{2,t}\)`
    - We can then find `\(\varepsilon_{1,t}\)` using `\(b_{12}\)`, `\(u_{1,t}\)` and `\(\varepsilon_{1,t}\)`
- Although the Cholesky decomposition constrains the system such that `\(\varepsilon_{1,t}\)` has no direct effect on `\(y_{2,t}\)`, you should note that lagged values of `\(y_{1,t}\)` affect the contemporaneous value of `\(y_{2,t}\)`

---
# Ordering of Cholesky decomposition

- The ordering of the Cholesky decomposition (i.e. whether to set `\(b_{12}\)` or `\(b_{21}\)` to `\(0\)`) depends on the magnitude of the correlation between `\(u_{1,t}\)` and `\(u_{2,t}\)`
- When `\(\rho_{12} = \sigma_{12}/\big(\sqrt{\sigma_{11}} \sqrt{\sigma_{22}}\big)\)`;
    - If the correlation is zero then ordering is immaterial
    - If the correlation is unity then it is inappropriate to attribute the shock to a single source
    - If the correlation is between `\(0\)` and `\(1\)` then you usually need to consider both ordering - if the results are different then you need to investigate further
- Try where possible to relate ordering to theoretical consideration. (i.e. shock to the US exchange rate may affect SA exchange rate immediately, but not the other way around)

---
# Impulse response functions

- Note that with zero off-diagonal elements in the variance-covariance matrix we could consider the effect of independent shocks
- Or alternatively we could order the variables from most exogenous to most endogenous when using a Cholenski decomposition

---
background-image: url(image/irf_gdp_une.svg)
background-position: top
background-size: 90% 90%

Figure : IRF - unemployment shock on output

---
background-image: url(image/irf_une_une.svg)
background-position: top
background-size: 90% 90%

Figure : IRF - unemployment shock on unemployment

---
# Variance Decompositions

- If you knew the coefficients of `\(A_0\)` and `\(A_1\)` and wanted to forecast values of `\(\boldsymbol{y}_{t+h}\)` conditional on `\(\boldsymbol{y}_t\)`
    - The conditional expectation of `\(\boldsymbol{y}_{t+1}\)` is
`\begin{eqnarray}
\mathbb{E}_t[\boldsymbol{y}_{t+1}] = A_0 + A_1 \boldsymbol{y}_t
\end{eqnarray}`
- and the conditional expectation of `\(\boldsymbol{y}_{t+2}\)` is
`\begin{eqnarray}
\mathbb{E}_t[\boldsymbol{y}_{t+2}] = [I + A_1]A_0 + A_1^2 \boldsymbol{y}_t
\end{eqnarray}`
- such that the conditional expectation of `\(\boldsymbol{y}_{t+H}\)` is
`\begin{eqnarray}
\mathbb{E}_t[\boldsymbol{y}_{t+H}] = [I + A_1 + A_1^2 + \ldots + A_1^{H-1}]A_0 + A_1^H \boldsymbol{y}_t
\end{eqnarray}`

---
# Variance Decompositions: Forecast errors

- One-step ahead forecast error is `\(\big(\boldsymbol{y}_{t+1} - \mathbb{E}_t[\boldsymbol{y}_{t+1}]\big)\)`
- This equals `\({\boldsymbol{u}}_{t+1}\)`, since `\(\mathbb{E}_t[{\bf{y}}_{t+1}] = A_0 + A_1 {\boldsymbol{y}}_t\)` and `\({\boldsymbol{y}}_{t+1} = A_0 + A_1 {\boldsymbol{y}}_t + {\boldsymbol{u}}_{t+1}\)`
- Two-step ahead forecast error is `\(\big(\boldsymbol{u}_{t+2} + A_1 \boldsymbol{u}_{t+1}\big)\)`
- `\(H\)`-step ahead forecast error is `\(\big(\boldsymbol{u}_{t+H} + A_1 \boldsymbol{u}_{t+H-1} + A_1^2 \boldsymbol{u}_{t+H-2} + \ldots + A_1^{H-1} \boldsymbol{u}_{t+1}\big)\)`
- Of course it is possible to write the forecast errors in terms of the *structural-form* errors, `\(\varepsilon_{1,t}\)` and `\(\varepsilon_{2,t}\)`
- The forecast error variance decomposition tells us the proportion of the expected variance in a variable that is due to each of the shocks in the model
    - If `\(\varepsilon_{2,t}\)` explains none of the forecast error variance of `\(y_{1,t}\)`; then `\(y_{1,t}\)` is exogenous as it evolves independent of `\(\varepsilon_{2,t}\)`  and `\(y_{2,t}\)`
    - If `\(\varepsilon_{2,t}\)` explains all the forecast error variance of `\(y_{1,t}\)`; then `\(y_{1,t}\)` is entirely endogenous

---
# Variance Decomposition

- Variance decomposition also has identification problems (as per above)
    - Cholesky decomposition necessitates that all one period forecast error of `\(y_{2,t}\)` is due to `\(\varepsilon_{2,t}\)`
    - Similarly for alternate ordering
- It is often useful to examine the variance decompositions at different horizons
    - as `\(H\)` increases the decompositions should converge
- Analysis of impulse responses and variance decompositions may be termed innovation accounting

---
background-image: url(image/fevd.svg)
background-position: top
background-size: 90% 90%

Figure : Variance Decomposition

---
# Structural Decomposition

- In a three variable model, where `\(C = B^{-1}\)` the Cholesky decomposition would suggest,
`\begin{eqnarray}
u_{1,t} = \varepsilon_{1,t}\\
u_{2,t} = c_{21}\varepsilon_{1,t} + \varepsilon_{2,t}\\
u_{3,t} = c_{31}\varepsilon_{1,t} + c_{32}\varepsilon_{2,t} + \varepsilon_{3,t}
\end{eqnarray}`
- Sims (1986) and Bernanke (1986) provide examples of theoretical restrictions that may differ from the upper or lower triangle
    - Involves estimating the relationships among the structural shocks using an economic model
    - For example, they would consider the decomposition,
`\begin{eqnarray}
u_{1t} = \varepsilon_{1t} + c_{13}\varepsilon_{3t} \\
u_{2t} = c_{21}\varepsilon_{1t} + \varepsilon_{2t} \\
u_{3t} = c_{31}\varepsilon_{2t} + \varepsilon_{3t}
\end{eqnarray}`

---
# Structural Decomposition

- Note that with this structural decomposition:
    - We have lost the triangular structure
    - where each variable is affected by its own structural innovation and the structural innovation in one other variable
    - The condition for `\((K^2-K)/2\)` restrictions is satisfied, so the conditions for exact identification are maintained

---
# Example of identifying restrictions

- Suppose that we have a 2 variable model with a sample size of 5
- This gives us 5 residuals for `\(u_{1,t}\)` and `\(u_{2,t}\)`

`\(\;\)`      |  **1**  |  **2**  |  **3**  |  **4**  |  **5**  
----------|---------|---------|---------|---------|---------
`\(u_{1,t}\)` |   1.0   |   -0.5  |   0.0   |  -1.0   | 0.5     
`\(u_{2,t}\)` |   0.5   |   -1.0  |   0.0   |  -0.5   | 1.0

- Note that both `\(u_{1,t}\)` and `\(u_{2,t}\)` sum to zero
- `\(\sigma_1=0.5, \sigma_{12} = \sigma_{21} =0.4, \text{ and } \sigma_2 =0.5\)`, which gives a variance/covariance
`\begin{eqnarray}
\Sigma_\boldsymbol{u} =
\left[ \begin{array}{cc}
 0.5 & 0.4 \\
 0.4 & 0.5
 \end{array} \right]
\end{eqnarray}`

---
# Example of identifying restrictions

- Since we premultiplied `\(\varepsilon_t\)` by `\(B^{-1}\)` to get `\(\boldsymbol{u}_t\)`
- We can derive values for `\(\Sigma_{\varepsilon}\)` from `\(\Sigma_\boldsymbol{u}\)` as
`\begin{eqnarray}
\Sigma_{\varepsilon} = B \Sigma_\boldsymbol{u} B^{\prime}
\end{eqnarray}`
- Hence,
`\begin{eqnarray}
\left[ \begin{array}{cc}
 \mathsf{var}(\varepsilon_1) & 0 \\
 0 & \mathsf{var}(\varepsilon_2)
 \end{array} \right] =
\left[ \begin{array}{cc}
 1 & b_{12} \\
 b_{21} & 1
 \end{array} \right]
\left[ \begin{array}{cc}
 0.5 & 0.4 \\
 0.4 & 0.5
 \end{array} \right]
\left[ \begin{array}{cc}
 1 & b_{21} \\
 b_{12} & 1
 \end{array} \right]
\end{eqnarray}`

---
# Example of identifying restrictions

- This leaves us with,
`\begin{eqnarray}
\mathsf{var}(\varepsilon_1) = 0.5 + 0.8b_{12} + 0.5b_{12}^2\\
0 = 0.5b_{21} + 0.4b_{21}b_{12} + 0.4 + 0.5b_{12}\\
0 = 0.5b_{21} + 0.4b_{21}b_{12} + 0.4 + 0.5b_{12}\\
\mathsf{var}(\varepsilon_2) = 0.5b^2_{21} + 0.8b_{21} + 0.5
\end{eqnarray}`
- Since the middle lines are identical we have 3 independent equations to solve for 4 unknowns

---
# Identification: Cholesky decomposition

- When `\(b_{12} = 0\)` we have,
`\begin{eqnarray}
\mathsf{var}(\varepsilon_1) = 0.5 && \\
0 = 0.5b_{21} + 0.4 & \; \text{s.t. } & b_{21} = -0.8\\
0 = 0.5b_{21} + 0.4 & \; \text{s.t. } & b_{21} = -0.8\\
\mathsf{var}(\varepsilon_2) = 0.5b^2_{21} + 0.8b_{21} + 0.5  =0.18 &&
\end{eqnarray}`
- Since `\(\varepsilon_{1,t} = u_{1,t}\)` and `\(\varepsilon_{2,t} = -0.8 u_{1,t} + u_{2,t}\)`

`\(\;\)`               |  **1**  |  **2**  |  **3**  |  **4**  |  **5**  
-------------------|---------|---------|---------|---------|---------
`\(\varepsilon_{1,t}\)` |   1.0   |   -0.5  |   0.0   |  -1.0   | 0.5     
`\(\varepsilon_{2,t}\)` |  -0.3   |   -0.6  |   0.0   |   0.3   | 0.6

---
# Alternative identification restrictions

- If one shock, `\(\varepsilon_{2,t}\)` has a one-for-one affect on `\(y_{1,t}\)` s.t. `\(b_{12}=1\)`
`\begin{eqnarray}
\mathsf{var}(\varepsilon_1)  & = 0.5 + 0.8b_{12} + 0.5b_{12}^2 = &  1.8\\
\vdots & \vdots & \vdots
\end{eqnarray}`
- From which we could derive `\(\varepsilon_t\)`

---
# Alternative identification restrictions

- Although there is little theory that informs us on the variance of shocks
- If it is given that `\(\mathsf{var}(\varepsilon_1) = 1.8\)` we could work out values for `\(b_{12}\)`
`\begin{eqnarray}
\mathsf{var}(\varepsilon_1) &= 1.8 =& 0.5 + 0.8b_{12} + 0.5b_{12}^2\\
\vdots & \vdots & \vdots
\end{eqnarray}`
- From which we could derive `\(\varepsilon_t\)`

---
# Alternative identification restrictions

- If we assume that `\(b_{12} = b_{21}\)`
- Then replacing `\(b_{21}\)` with `\(b_{12}\)` in the following
`\begin{eqnarray}
0 &= 0.5b_{21} + 0.4b_{21}b_{12} + 0.4 + 0.5b_{12}\\
\vdots & \vdots
\end{eqnarray}`
- Allows us to derive values for `\(b_{12}\)` and we can then solve for the rest

---
# Long-run restrictions

- Suggested that economic theory does not always provide enough meaningful contemporaneous restrictions
- As an alternative we could impose restrictions on the long-run properties of shocks, allowing for the neutrality of the effects of certain shocks over time
- Blanchard & Quah (1989) consider the use of such restriction on a model for output (demand) and unemployment (supply)
- This bivariate VAR would need a single restriction
- Suggested that output growth and unemployment were driven by two orthogonal structural shocks
- Demand side shocks have a temporary effect on real GNP
- Supply side productivity shocks have a permanent effect on real GNP
- Rate of unemployment is considered stationary, so no shock could change unemployment permanently

---
# Decomposition using Blanchard-Quah

- If the logarithm of output, `\(y_{1,t}\)`, is `\(I(1)\)` then output growth, `\(\Delta y_{1,t}\)`, is `\(I(0)\)`
- Assume rate of unemployment, `\(y_{2,t}\)`, is affected by the same variables and is `\(I(0)\)`
- The bivariate moving average representation, where `\(\boldsymbol{y}_t\)` is a vector of both variables is
`\begin{eqnarray}
\boldsymbol{y}_{t}=\sum_{i=0}^{\infty}\Theta_{i}\varepsilon_{t-i}
\end{eqnarray}`

---
# Decomposition using Blanchard-Quah

- Which may be expanded as
`\begin{eqnarray}
\left[ \begin{array}{c}
 \Delta y_{1,t} \\
 y_{2,t}
 \end{array} \right] =
\left[ \begin{array}{cc}
 \theta_{11}(0) & \theta_{12}(0) \\
 \theta_{21}(0) & \theta_{22}(0)
 \end{array} \right]
\left[ \begin{array}{c}
 \varepsilon_{1,t} \\
 \varepsilon_{2,t}
 \end{array} \right] + \ldots \\
\left[ \begin{array}{cc}
 \theta_{11}(1) & \theta_{12}(1) \\
 \theta_{21}(1) & \theta_{22}(1)
 \end{array} \right]
\left[ \begin{array}{c}
 \varepsilon_{1,t-1} \\
 \varepsilon_{2,t-1}
 \end{array} \right] + \ldots
\end{eqnarray}`
- where the effect of `\(\varepsilon_{1,t-1}\)` on `\(\Delta y_{1,t}\)` is summarized by `\(\theta_{11}(1)\)`

---
# Long-run restrictions

- Now, if `\(\varepsilon_{1,t}\)` has no long-run cumulative impact on `\(\Delta y_{1,t}\)` we could impose the restriction
`\begin{eqnarray}
\sum_{i=0}^{\infty}\theta_{11}(i)=0
\end{eqnarray}`
- which may be included in the coefficient matrix for the moving average representation,

`\begin{eqnarray}
\sum_{i=0}^{\infty}\Theta_{i}=\left[
\begin{array}{cc}
0 & \sum_{i=0}^{\infty}\theta_{12,i} \\ \sum_{i=0}^{\infty}\theta_{21,i} & \sum_{i=0}^{\infty} \theta_{22,i} 
\end{array} \right]  = \sum_{i=0}^{\infty} \left[
\begin{array}{cc}
0 & \theta_{12}(i) \\ \theta_{21}(i) & \theta_{22,}(i)
\end{array} \right]
\end{eqnarray}`

---
# Restrictions

- Hence, we can impose restrictions on either the short-run contemporaneous parameters, or the long-run moving average components
- Alternatively we could use a combination of the two
- The only condition is that the number of restrictions must equal `\([(K^2-K)/2]\)`

---
# Limitations of the VAR approach

- A major limitation of the traditional VAR approach is that it is highly parametrised
- In addition all of the effects of omitted variables will be contained in the residuals
- This may lead to major distortions in the impulse responses, making them of little use for structural interpretations
- Measurement errors or mis-specifications of the model make interpretation of the impulse responses difficult
- We can't make use of an infinite number of MA coefficients, since the dataset is finite (this may lead to a bias in the parameter estimates)

---
# Summary

- Sims (1980) introduced SVAR models as an alternative to the large-scale macroeconometric models that were used during that time
- The SVAR methodology has gained widespread use in applied time series research
- Allows for the incorporation of contemporaneous variables and an investigation into the impact of individual shocks
- To identify the structural VAR model, we need to impose restrictions
- Widely-used identification methods rely on short-run or long-run restrictions
- The short-run restrictions were originally suggested by Sims (1986)
- Blanchard & Quah (1989) introduced long-run restrictions

---
# Summary

- A system of `\(K\)` variables would require that we impose `\((K^2-K)/2\)` identifying restrictions for exact identification
- The use of the Cholesky decomposition would ensure that the identified shocks from the VAR model will be orthogonal (uncorrelated) and unique
- However, the choice of the this method for imposing restrictions could affect the results of the model
- An impulse response function describes how a given (structural) shock affects a variable over time
- The forecast error variance decomposition attributes the forecast error variance to specific structural shocks at different horizons