Volatility models

class: center, middle, inverse, title-slide

# Volatility models
### Kevin Kotzé

---

---
# Contents

1. Introduction
1. Structure of a model
1. ARCH model
1. GARCH model
1. Extensions to ARCH/GARCH
1. Stochastic Volatility model
1. Multivariate GARCH

---
# Introduction

- Many valuation methods for derivatives depend on volatility
- Measures such as value-at-risk (VaR) and expected shortfall (ES) depend on volatility
- Portfolio allocation in Markowitz mean-variance framework depends on volatility
- Volatility or risk affects the spread between long and short-term interest rates
- More accurate measure of volatility would allow us to identify a miss-priced asset
- Facilitate more efficient allocation of capital
- Volatility models could be used to analyse time-varying risk premiums

---
# Introduction

- While volatility in asset returns is not directly observable it has the following characteristics:
    - It is usually high for certain periods of time and low for other periods
    - It evolves over time in a continuous manner (volatility jumps are rare)
    - It does not diverge to infinity, as it varies within a fixed range
    - It seems to react differently to large price increases and decreases (asymmetric effects)
- If we don't not account for these features of the data in the model, it is incorrectly specified and the parameter estimates may be inaccurate

---
# Introduction

- Three types of volatility measures for securities include:
    - Volatility as the conditional standard deviation of daily returns
    - Implied volatility that makes use of prices from options markets
    - Realised volatility that relies on high frequency financial data to calculate intra-day returns and daily volatility measures
- In what follows we focus our attention on the first case

---
# Stylised facts

- Certain periods exhibit higher volatility than others
    - The variance is not constant it is heteroscedastic
- Large changes are associated with other large changes
    - These volatility clusters suggest that variance in `\(t\)` is dependent on variance in `\(t-1\)`
    - Conditional variance is dependent on time
    - Assumption of i.i.d. returns is violated
    - But volatility in 2002 would not appear to depend on the volatility in 1999 (no long-run dependence)
    - Unconditional volatility is independent of time
- These features are also displayed when taking the square or absolute value of the variable

---
background-image: url(image/SA_data.svg)
background-position: top
background-size: 90% 90%

class: clear, center, bottom

Figure :  FTSE/JSE All Share (daily returns)

---
# Stylised facts

- During certain periods the distribution of returns have fat tails:
    - This decreases with aggregation
    - May be partially attributed to volatility clustering although many assets have non-Gaussian returns
- Changes in prices are negatively related to changes in volatility:
    - Leverage effects occur in equity markets
    - Volatility rises in response to lower than expected returns

---
# Structure of a model

- Consider the conditional mean and variance of `\(y_t\)` given the information set, `\(I_{t-1}\)`
- The first two moments are given by
`\begin{eqnarray}
\mu_t &=& \mathbb{E} \big[ y_t|I_{t-1} \big] \\
\sigma^2_t&=&  \mathsf{var} \big[ y_t|I_{t-1} \big] = \mathbb{E} \big[(y_t - \mu_t )^2|I_{t-1} \big]
\end{eqnarray}`
- Usually assume that `\(y_t\)` follows a stationary `\(ARMA(p, q)\)` model where `\(y_t = \mu_t+ a_t\)` and
`\begin{eqnarray}
\mu_t= \phi_0 + \sum^p_{i=1} \phi_i y_{t-i} - \sum^q_{j=1} \theta_j a_{t-j}
\end{eqnarray}`
- One could also include explanatory variables in the above expression

---
# Structure of a model

- Note that now we have
`\begin{eqnarray}
\sigma^2_t = \mathsf{var}(y_t|I_{t-1}) = \mathsf{var}(a_t|I_{t-1})
\end{eqnarray}`
- `\(a_t\)` is referred to as the shock or innovation to variable `\(y_t\)` at time `\(t\)`
- `\(\mu_t\)` is the mean equation for `\(y_t\)`
- `\(\sigma^2_t\)` is the volatility equation for `\(y_t\)`
- Modelling conditional heteroscedasticity amounts to describing the evolution of the conditional variance over time

---
# Model Building

- Construction of a volatility model consists of the following four steps:
1. Specify a mean equation after testing for serial dependence in the data. If necessary, make use of an econometric model (e.g. an ARMA model) for the return series to remove any linear dependence
1. Use the residuals from the mean equation to test for ARCH effects
1. Specify a volatility equation, if ARCH effects are statistically significant perform a joint estimation of the mean and volatility equations
1. Check the fitted model carefully and refine it if necessary

---
# Testing for ARCH effects

- Assume that `\(a_t = y_t - \mu_t\)` are the residuals from the mean equation
- The squared series `\(a^2_t\)` is then used to check for conditional heteroscedasticity
- McLeod & Li (1983) apply the usual Ljung-Box statistics `\(Q(m)\)` to `\(a^2_t\)`
- Null hypothesis of the test statistic is that the first `\(m\)` lags of ACF of the `\(a^2_t\)` series are zero
- Engle (1982) uses a Lagrange multiplier test that is equivalent to an `\(F\)`-test for `\(\alpha_i = 0\)`, `\((i = 1, \ldots , m)\)` in the linear regression
`\begin{eqnarray}
a^2_t = \alpha_0+ \alpha_1 a^2_{t-1} + \ldots + \alpha_m a^2_{t-m} + \upsilon_t , \;\;\; t = m + 1, \ldots , T
\end{eqnarray}`

---
# ARCH Model

- ARCH models first introduced by Engle (1982) for modelling inflation
- Implications for financial risk-modelling soon became apparent
- Some researchers (notably Stock & Watson) still use of a variant of these models to forecast inflation
- Model has been extended & modified in many ways
- Surveys of the literature include, Engle & Bollerslev (1986), Bollerslev (1992), and Bera & Higgins (1993), amongst others

---
# ARCH Model

- Basic idea of ARCH models is that
    - Shock to an asset return is serially uncorrelated but dependent
    - Dependence of `\(a_t\)` is described by a quadratic function of lagged values
- Specifically, an `\(ARCH(m)\)` model assumes
`\begin{eqnarray}
a_t &=& \sigma_t \varepsilon_t, \\
\sigma^2_t &=& \alpha_0 + \alpha_1 a^2_{t-1} + \ldots + \alpha_m a^2_{t-m}
\end{eqnarray}`
- where `\(\varepsilon_t\)` is a sequence of `\(\mathsf{i.i.d.}\)` random variables with moments `\([0,1]\)`
- Can follow standard normal, student- `\(t\)`, generalised error distribution (GED), skew-distribution, etc.
- `\(\alpha_0 > 0\)`, and `\(\alpha_i \geq 0\)` for `\(i >0\)`
- Hence, large shocks tend to be followed by another large shock
- This may describe the volatility clustering

---
# ARCH: Properties of ARCH Models

- Consider the `\(ARCH(1)\)` model
`\begin{eqnarray}
a_t = \sigma_t \varepsilon_t, \;\;\; \sigma^2_t = \alpha_0 + \alpha_1a^2_{t-1},
\end{eqnarray}`
- Note that the unconditional mean of `\(a_t\)` remains zero, since
`\begin{eqnarray}
\mathbb{E} \big[ a_t \big] = \mathbb{E} \Big\{ \mathbb{E} \big[ a_t|I_{t-1} \big] \Big\} = \mathbb{E} \Big\{ \sigma_t \mathbb{E} \big[ \varepsilon_t \big] \Big\} = 0
\end{eqnarray}`
- Secondly, the unconditional variance of `\(a_t\)` can then be derived as
`\begin{eqnarray}
\mathsf{var}\big[ a_t \big] &=& \mathbb{E} \big[ a^2_t \big] = \mathbb{E} \Big\{ \mathbb{E} \big[ a^2_t|I_{t-1} \big] \Big\} \\
&=& \mathbb{E} \big[ \alpha_0 + \alpha_1 a^2_{t-1} \big] = \alpha_0+ \alpha_1 \mathbb{E} \big[ a^2_{t-1} \big]
\end{eqnarray}`

---
# ARCH: Properties of ARCH Models

- Since `\(a_t\)` is a stationary process with `\(\mathbb{E} \big[ a_t \big] = 0\)`, `\(\mathsf{var} \big[ a_t \big] = \mathsf{var} \big[a_{t-1} \big] = \mathbb{E} \big[ a^2_{t-1} \big]\)`
- Therefore, we have `\(\mathsf{var}\big[ a_t \big] = \alpha_0 + \alpha_1 \mathsf{var} \big[ a_t \big]\)` and `\(\mathsf{var} \big[ a_t \big] = \alpha_0/(1 - \alpha_1)\)`
- Now since the variance of `\(a_t\)` must be positive we require `\(0 \leq \alpha_1 < 1\)`
- When calculating the higher-order moments the excess kurtosis of `\(a_t\)` is positive and the tails are heavier than a normal distribution
- This is consistent with what is observed for asset price returns
- Similar findings may be provided for the `\(ARCH(m)\)` model

---
# ARCH: Advantages and Weaknesses

- Key advantages of using an ARCH model include
    - The model can produce volatility clusters
    - The shocks `\(a_t\)` in the model have heavy tails
- Weaknesses of these models include
    - Assumes positive and negative shocks have the same effect
    - Somewhat restrictive - parameters need to be within particular intervals
    - Does not provide insight into the source of variations
    - Over-predicts volatility as it responds slowly to large isolated shocks

---
# ARCH: Making use of an ARCH Model

- If an ARCH effect is significant we can use the PACF of `\(a^2_t\)` to determine the ARCH order
- Although `\(a^2_t\)` is not an efficient estimate of `\(\sigma^2_t\)` it is informative when specifying the order `\(m\)`
- Several likelihood functions are used in ARCH estimation, depending on the distribution of `\(\varepsilon_t\)`
- For complex models: starting values, optimisation algorithm, use of analytic or numerical derivatives, convergence criteria, etc. matter
- For a properly specified ARCH model the standardised residuals are given by `\(\tilde{a}_t = {a_t}/{\sigma_t}\)`
- Use Ljung-Box statistics for `\(\tilde{a}_t\)` to check remaining correlation in mean equation, while the test on `\(\tilde{a}^2_t\)` is used for the volatility equation
- Could use other diagnostic tests for appropriateness of these two equations
- Could forecast volatility by exploiting the recursive specification

---
# GARCH modelling

- Although the ARCH model has a simple functional form it often requires a large `\(m\)`
- To simplify the model Bollerslev (1986) proposed the generalised ARCH (GARCH) model
`\begin{eqnarray}
a_t &=& \sigma_t \varepsilon_t \\
\sigma^2_t &=& \alpha_0 + \sum^m_{i=1} \alpha_i a^2_{t-i} + \sum^s_{j=1} \beta_j \sigma^2_{t-j}
\end{eqnarray}`
- where `\(\varepsilon_t\)` is a sequence of `\(\mathsf{i.i.d.}\)` random variables with moments `\([0,1]\)`
- Can follow standard normal, student- `\(t\)`, generalised error distribution (GED), skew-distribution, etc.
- `\(\alpha_0 > 0\)`, and `\(\alpha_i \geq 0\)`, `\(\beta_j \geq 0\)`, and `\(\sum^{m,s}_{i,j=1} \left( \alpha_i + \beta_j \right) < 1\)`
- Incorporates AR and MA components in the volatility equation, which is more parsimonious and requires less restrictions

---
# GARCH modelling: Strengths & Weakness

- Consider the simple `\(GARCH(1,1)\)` model
`\begin{eqnarray}
\sigma^2_t = \alpha_0 + \alpha_1 a^2_{t-1} + \beta_1 \sigma^2_{t-1}, \;\; 0 \leq \alpha_1, \;\; \beta_1 \leq 1, \;\; (\alpha_1+ \beta_1) < 1
\end{eqnarray}`
- Note that a large `\(a^2_{t-1}\)` or `\(\sigma^2_{t-1}\)` gives rise to a large `\(\sigma^2_t\)`
- Can be shown that the tails of the distribution for a `\(GARCH(1,1)\)` process are heavier than a normal distribution
- Forecasts can utilise the recursive nature of the model
- Has similar weaknesses to that of the ARCH model

---
# GARCH modelling: Forecasting

- As the volatility of time series variable is not directly observable, comparing the forecasting performance of different models can be problematic
- Some compare the volatility forecasts for `\(\sigma^2_{h=H}\)` with the shock `\(a^2_{h=H}\)`
- However, we usually find a low correlation between `\(a^2_{h=H}\)` and `\(\sigma^2_{h=H}\)`
- Although `\(a^2_{t+1}\)` is a consistent estimate of `\(\sigma^2_{h+1}\)` it is not always an accurate estimate of `\(\sigma^2_{h+1}\)`
- A single observation of a random variable with a known mean value cannot provide an accurate estimate of its variance

---
# GARCH modelling: Practical points

- Estimate `\(y_t\)` using best ARMA model
- Obtain square of fitted residuals `\(a_t^2\)`
- Make use of the ACF and PACF to determine the order for `\(s\)` and `\(m\)`
- Use `\(Q\)`-statistics to test for groups of significant coefficients
- Rejecting the null that `\(a^2_t\)` is serially uncorrelated is equivalent to rejecting the null of no ARCH & GARCH errors

---
# GARCH modelling: Practical points

- In many `\(GARCH(1,1)\)` applications, the estimated `\(\alpha_1\)` is close to zero and the estimated `\(\beta_1\)` is close to unity
    - In this case `\(\beta_1\)` becomes unidentified if `\(\alpha_1 = 0\)`
    - The distribution of ML estimates can be ill-behaved when parameters are nearly unidentified
- Ma, Nelson and Startz (2007) show that in a GARCH model where `\(\alpha_1\)` is close to zero
    - Estimated standard error for `\(\beta_1\)` is usually spuriously small
    - `\(t\)`-statistics for testing hypotheses about the true value of `\(\beta_1\)` are severely size distorted
    - Concentrated log-likelihood as a function of `\(\beta_1\)` exhibits multiple maxima

---
# Practical points on estimation

- To guard against spurious inference Ma, Nelson and Startz (2007) recommend:
    - Compare estimates from pure `\(ARCH(m)\)` models, which do not suffer from the identification problem, with estimates from the `\(GARCH(1,1)\)`
    - If the volatility dynamics from these models are similar then the spurious inference problem is not likely to be present
    - However, if they differ then the value of `\(\beta_1\)` may be spuriously identified and there may not be any ARCH / GARCH effects
- Alternatively, use the Engle (1982) or McLeod & Li (1983) test that may only be applied to ARCH models

---
# The IGARCH model: Nelson (1990)

- The conditional volatility in financial returns is highly persistent
- The sum of `\(\alpha\)` and `\(\beta\)` coefficients is often close to `\(1\)`
- By restricting both coefficients to `\(\alpha + \beta = 1\)` we have a parsimonious model that has interesting properties
`\begin{eqnarray}
a_t = \sigma_t \varepsilon_t , \;\;\; \sigma^2_t= \alpha_0 + \beta_1 \sigma^2_{t-1} + (1 - \beta_1) a^2_{t-1}
\end{eqnarray}`
where the solution for `\(\sigma^2_t\)` is a slowly decaying exponential smoothing function and not a unit root (Nelson, 1990)
`\begin{eqnarray}
\sigma^2_t = (1 - \beta_1) \big[ a^2_{t-1}+ \beta_1 a^2_{t-2} + \beta^2_1 a^3_{t-3} + \ldots \big]
\end{eqnarray}`
- The conditional variance is a decaying function of current and past `\(a_t^2\)` and `\(\beta_1\)` is the discount factor

---
# The ARCH-M model

- Engle, Lilien and Robins (1987) extended the ARCH to allow the mean to depend on its own conditional variance
- Modern finance theory suggests that volatility may be related to risk premia on assets
- Higher volatility should result in higher risk premia since risk averse agents need compensation for risky assets
- Risk premia will be an increasing function of the conditional variance of returns

---
# The ARCH-M model

`\begin{eqnarray}
y_t &=& \mu + c \sigma^2_t + a_t \\
a_t &=& \sigma_t \varepsilon_t \\
\sigma^2_t &=& \alpha_0 + \alpha_1 a^2_{t-1} + \beta_1 \sigma^2_{t-1}
\end{eqnarray}`
- Risk-premium parameter is `\(c\)`
- Positive `\(c\)` suggests returns are positively related to past volatility

---
# Models with Asymmetry: EGARCH

- Nelson (1991) proposed a model where the conditional variance is in log-linear form:
`\begin{eqnarray}
a_t &=& \sigma_t \varepsilon_t, \\
\log(\sigma^2_t ) &=& \alpha_0 + \frac{1 + \beta_1 L + \ldots +\beta_{s-1}L^{s-1}}{1 - \alpha_1 L - \ldots - \alpha_m L^m } g(\varepsilon_{t-1})
\end{eqnarray}`
- where the asymmetry in `\(g(\varepsilon_t )\)` is given by
`\begin{eqnarray}
g(\varepsilon_t ) = \left\{
\begin{array}{cc}
(\theta + \gamma )\varepsilon_t - \gamma \mathbb{E}(|\varepsilon_t|) & \text{if } \varepsilon_t \geq 0, \\
(\theta - \gamma )\varepsilon_t - \gamma \mathbb{E}(|\varepsilon_t|) & \text{if } \varepsilon_t < 0
\end{array} \right.
\end{eqnarray}`

---
# Models with Asymmetry: EGARCH

- The alternative form for the `\(EGARCH(m, s)\)` model may be expressed as
`\begin{eqnarray}
\log (\sigma^2_t ) = \alpha_0 + \sum^m_{i=1} \alpha_i
\frac{|a_{t-i}| + \gamma_i a_{t-i}}{\sigma_{t-i}} + \sum^s_{j=1} \beta_j \log (\sigma^2_{t-j} )
\end{eqnarray}`
- A positive `\(a_{t-i}\)` contributes `\(\alpha_i (1 + \gamma_i )|\varepsilon_{t-i}|\)` to the log volatility
- A negative `\(a_{t-i}\)` gives `\(\alpha_i (1 - \gamma_i )|\varepsilon_{t-i}|\)`
- Where `\(\varepsilon_{t-i}= a_{t-i} / \sigma_{t-i}\)`
- The `\(\gamma_i\)` parameter signifies the leverage effect of `\(a_{t-i}\)`, which is expected to be negative

---
# Models with Asymmetry: TGARCH

- "Bad" news may effect volatility more than "good" news
- Glosten, Jaganathan, and Runkle (1994) propose Threshold GARCH (often termed GJR-GARCH)
- Consider the model
`\begin{eqnarray}
\sigma^2_t = \alpha_0 + \sum^m_{i=1} (\alpha_i + \gamma_i N_{t-i} ) a^2_{t-i} + \sum^s_{j=1} \beta_j \sigma^2_{t-j}
\end{eqnarray}`
- where `\(N_{t-i}\)` is an indicator for negative `\(a_{t-i}\)` and
`\begin{eqnarray}
N_{t-i}= \left\{
\begin{array}{cc}
1 & \text{if } a_{t-i} < 0, \\
0 & \text{if } a_{t-i} \geq 0
\end{array} \right.
\end{eqnarray}`
- A positive `\(a_{t-i}\)` contributes `\(\alpha_i a^2_{t-i}\)` to `\(\sigma^2_t\)`
- A negative `\(a_{t-i}\)` has a larger impact `\((\alpha_i+ \gamma_i ) a^2_{t-i}\)` with `\(\gamma_i >0\)`

---
# Asymmetric power ARCH model

- The general `\(APARCH(m, s)\)` model of Ding *et al.* (1993) could be written as
`\begin{eqnarray}
y_t &=& \mu_t + a_t , \;\; a_t = \sigma_t \varepsilon_t , \;\; \varepsilon_t \sim D(0, 1) \\
\sigma^\delta_t &=& {\omega} +\sum^m_{i=1} \alpha_i \left(|a_{t-i}| + \gamma_i a_{t-i} \right)^\delta +\sum^s_{j=1} \beta_j \sigma^\delta_{t-j}
\end{eqnarray}`
- where `\(\delta\)` is a positive real number
- When `\(\delta = 2\)` the APARCH model reduces to a TGARCH model
- When `\(\delta = 1\)` the model uses volatility directly in the volatility equation
- When `\(\delta \rightarrow 0\)` the model reduces to the EGARCH model

---
# Stochastic Volatility model

- Introduce a stochastic innovation to the conditional variance equation of `\(a_t\)`
- Additional stochastic term is used to explain the unexpected shocks to the volatility process
`\begin{eqnarray}
a_t &=& \sigma_t \varepsilon_t \\
 (1 - \alpha_1 L -\ldots-\alpha_m L^m) \log (\sigma^2_t ) &=& \alpha_0+ v_t
\end{eqnarray}`
- where `\(\varepsilon_t\)` are `\(\mathsf{i.i.d.} \mathcal{N}(0, 1)\)`, the `\(v_t\)` are `\(\mathsf{i.i.d.} \mathcal{N}(0, \sigma^2_v )\)`
- `\(\varepsilon_t\)` and `\(v_t\)` are independent, while `\(\alpha_0\)` is a constant
- All the polynomials, `\(1 - \sum^m_{i=1} \alpha_i L^i\)`, ensure stationarity
- Although its more flexible, parameter estimation is more difficult
- Usually provides better in-sample fit but is worse out-of-sample

---
# Multivariate GARCH

- Motivation:
    - The volatility of financial instruments may be interrelated
    - Portfolio allocation decisions are influenced by degree of covariation of stock prices or volatility following a shock
    - Can be used to measure spillover effects

---
# Multivariate GARCH

- Ignoring the mean equation, consider two variables:
`\begin{eqnarray}
a_{1,t} =  \sigma_{11,t} \varepsilon_{1,t}\\
a_{2,t} =  \sigma_{22,t} \varepsilon_{2,t}
\end{eqnarray}`
- Allow for interrelated shocks, where `\(\sigma^2_{12,t} = \mathbb{E}_{t-1}[a_{1,t} a_{2,t}]\)`,  the `\(vech\)` model may be expressed as,
`\begin{eqnarray}
\sigma^2_{11,t}  &=&  \alpha_{10} + \alpha_{11}a_{1,t-1}^{2} + \alpha_{12}a_{1,t-1}a_{2,t-1} + \alpha_{13}a_{2,t-1}^{2} + \ldots \\
&&\ldots + \beta_{11}\sigma^2_{11,t-1} + \beta_{12}\sigma^2_{12,t-1} + \beta_{13}\sigma^2_{22,t-1} \\
\sigma^2_{12,t}  &=&  \alpha_{20} + \alpha_{21}a_{1,t-1}^{2} + \alpha_{22}a_{1,t-1}a_{2,t-1} + \alpha_{23}a_{2,t-1}^{2} + \ldots \\
&&\ldots + \beta_{21}\sigma^2_{11,t-1} + \beta_{22}\sigma^2_{12,t-1} + \beta_{23}\sigma^2_{22,t-1} \\
\sigma^2_{22,t}  &=&  \alpha_{30} + \alpha_{31}a_{1,t-1}^{2} + \alpha_{32}a_{1,t-1}a_{2,t-1} + \alpha_{33}a_{2,t-1}^{2} + \ldots \\
&&\ldots + \beta_{31}\sigma^2_{11,t-1} + \beta_{32}\sigma^2_{12,t-1} + \beta_{33}\sigma^2_{22,t-1} \\
\end{eqnarray}`

---
# Multivariate GARCH

- Hence the conditional variance of `\(\sigma^2_{11,t}\)` and `\(\sigma^2_{22,t}\)` depends on:
    - its own past, `\(\sigma^2_{11,t}\)` and `\(\sigma^2_{22,t}\)`
    - the conditional covariance, `\(\sigma^2_{12,t}\)`
    - the lagged squared errors, `\(a_{1,t-1}^{2}\)` and `\(a_{2,t-1}^{2}\)`
    - the product of the lagged errors, `\(a_{1,t-1}\)` `\(a_{2,t-1}\)`
- Note that this simple model would be difficult to estimate:
    - The above model has 2 variables and 21 coefficients
    - As the model complexity increases (with more lags or variables) so does the estimation problem
    - Overparameterized models may struggle with convergence if one of the parameters is poorly identified (often occurs)
    - All the conditional variances must be positive
    - All the implied correlation coefficients, `\(\rho = \sigma^2_{ij} / ({\sigma_{ii}}{\sigma_{jj}})\)`, should be between `\(\pm 1\)`

---
# Multivariate GARCH: The BEKK model

- Engle & Kroner (1995) enter all the parameters as quadratic forms to ensure all the variances are positive
- For example one variant of the model may take the form
`\begin{eqnarray}
\sigma^2_{11,t}  & = &  \left[ \alpha^2_{10} + \alpha^2_{01}\right] + \left[ \alpha^2_{11}a_{1,t-1}^{2} + 2\alpha_{11} \alpha_{21}a_{1,t-1}a_{2,t-1} + \alpha_{21}^2 a_{2,t-1}^{2}\right] \ldots  \\
& &  \ldots+ \left[ \beta^2_{11}\sigma^2_{11,t-1} + 2\beta_{11}\beta_{12}\sigma^2_{12,t-1} + \beta_{21}^2 \sigma^2_{22,t-1}\right] \\
 \vdots \;\;  &=& \hspace{2cm}\vdots
\end{eqnarray}`
- This would imply that we still have a lot of parameters to estimate
- Does not overcome the problem of poorly identified parameters (and as such convergence may not be attained)

---
# Multivariate GARCH: Diagonal vech model

- Restrict `\(\sigma^2_{i,j}\)` to incorporate only lags of itself and cross-products `\(a_{i,t}a_{j,t}\)`
`\begin{eqnarray}
\sigma^2_{11,t} &=& \alpha_{10} + \alpha_{11}a_{1,t-1}^{2} + \beta_{11}\sigma^2_{11,t-1}  \\
\sigma^2_{12,t} &=& \alpha_{20} + \alpha_{22}a_{1,t-1}a_{2,t-1} + \beta_{22}\sigma^2_{12,t-1}  \\
\sigma^2_{22,t} &=& \alpha_{30} + \alpha_{33}a_{2,t-1}^{2} + \beta_{33}\sigma^2_{22,t-1}
\end{eqnarray}`
- Although this will obviously limit the interaction effects, the measure of covariation in volatility, `\(\sigma^2_{12,t}\)`, may be useful
- However, shocks to variable `\(1\)` would not affect `\(\sigma^2_{22,t}\)`

---
# Multivariate GARCH: CCC model

- Restrict the correlation coefficients to be constant
- For each `\(i \ne j\)` we have `\(\sigma^2_{ij,t} = \rho_{ij}\big(\sigma_{ii,t} \sigma_{jj,t}\big)\)`
`\begin{eqnarray}
\sigma^2_{12,t} = \rho_{12}{\big(\sigma_{11,t}\sigma_{22,t}\big)}
\end{eqnarray}`
- Were the expressions for `\(\sigma^2_{11}\)` and `\(\sigma^2_{22}\)` are as per the `\(vech\)`
- Hence, the number of parameters to be estimated is still large (but not as many as `\(vech\)`)

---
# Multivariate GARCH: Concluding remarks

- Nice concept, but implementation is tricky!
- Unless a specific postulate calls for a particular specification, you may need to try and estimate most of the above models, to consider various types of interaction (or to potentially identify where a parameter may be poorly identified)
- You'd usually want to allow for the more (rather than less) interaction, but if the parameters are poorly identified then your results should be interpreted with caution
- Many other variations that consider restrictions on the number of parameters - GoGarch, etc.