There are many reasons for wanting to analyse the risk that is associated with an asset, where the use of volatility is perhaps the most commonly used measure of risk. The objective of this section is to consider the characteristics of volatility models and their respective applications.

Measures of volatility are used in many important financial and economic models. It is a critical metric in the *value-at-risk* calculation and is a one of the key variables in options pricing and asset allocation models. For example, the well known Black-Scholes method for valuing the price of a call option is largely dependent on the measure of volatility. Therefore, if we are able to derive a more accurate measure of volatility then we could possibly identify miss-priced derivatives. In addition, portfolio allocation measures such as the Markowitz mean-variance framework also depend on volatility, so a better measure of this statistic would lead to a more efficient allocation of capital. More recently, some volatility indices have also been introduced on many of the financial markets. For example, the VIX index of the Chicago Board Options Exchange (CBOE) started to trade in futures on March 26, 2004. Similarly the South African Volatility Index (SAVI) was launched in 2007, as a measure of the implied volatility in the FTSE/JSE Top40 index.

Although we are able to define various aspects or types of volatility, this characteristic is not directly observable in practice. What we observe are the respective prices of assets. To estimate the volatility that is associated with this variable we would either need to transform this data, or derive a suitable model. This has several important implications for this particular research area. In addition, the choice of model or transformation technique is also of importance when trying to describe this feature of the data.

There are many different types of volatility models that are described in what is an extremely broad literature. These include univariate models, such as the autoregressive conditional heteroscedastic (ARCH) model of Engle (1982), the generalised autoregressive conditional heteroscedastic (GARCH) model of Bollerslev (1986), the exponential generalised autoregressive conditional heteroscedastic (EGARCH) model of Nelson (1991), the threshold generalised autoregressive conditional heteroscedastic (TGARCH) model of Glosten, Jagannathan, and Runkle (1993) and Zakoian (1994), the non-symmetric generalised autoregressive conditional heteroscedastic (NGARCH) model of Engle and Ng (1993) and Duan (1995), and the stochastic volatility (SV) models of Melino and Turnbull (1990), Taylor (1994), Harvey, Ruiz, and Shephard (1994), and Jacquier, Polson, and Rossi (1994). In addition, there are also a number of multivariate extensions, which we will also consider.

In what follows, we discuss the advantages and weaknesses of each of the main classes of volatility models, before we consider a few applications of these models.

Although volatility is not directly observable, it has a number of characteristics that are present in many asset returns. Firstly, volatility is usually high for certain periods of time and low for other periods. This gives rise to volatility clusters. Secondly, volatility evolves over time in a continuous manner, where volatility jumps are rare. Thirdly, volatility does not diverge to infinity, as we note that it varies within some fixed range. This may provide an indication that volatility may be stationary. Fourthly, volatility seems to react differently to a big price increase and a big price drop with the latter having a greater impact. This phenomenon may be referred to as the asymmetric effect that is due to leverage. These properties play an important role in the development of volatility models. For example, the EGARCH and TGARCH models were developed to capture the asymmetric effects in volatility that is induced by the differences in positive and negative price movements.

In financial applications, we typically estimate the volatility of an asset using the price of the security, its derivatives, or a combination of these two measures. To measure the daily volatility of a particular share that is quoted on a financial exchange we observe (i) the daily return for each trading day, (ii) tick-by-tick data for intra-day transactions and quotes, and (iii) the prices of options contingent on this particular share price. These three data sources give rise to three types of volatility measures for this share. They are as follows:

- Volatility as the conditional standard deviation of daily returns: This is the usual definition of volatility and is the focus of volatility models that we discuss.
- Implied volatility: Using prices from options markets, one can use a pricing formula, for example, the Black-Scholes pricing formula, to deduce the volatility of the share price. This volatility measure is called the
*implied volatility*. Since implied volatility is derived under certain assumptions that relate the price of an option to that of the underlying share price, it is often criticized as being model dependent. Experience shows that implied volatility of an asset return tends to be larger than that obtained by using daily returns and a volatility model. This might be due to the risk-premium for volatility in options markets or to the way that daily returns are calculated. The VIX on the CBOE and the SAVI on the FTSE/JSE are examples of implied volatility measures. - Realised volatility: With the availability of high frequency financial data, one can use intra-day returns, for example, 5-min returns, to estimate the daily volatility. This volatility measure is called
*realised volatility*, which has been a subject of intensive research over the last few years.

In most practical applications, the time interval used to measure volatility is a year. Thus, volatility is often annualised. If daily returns are used to estimate volatility, one can obtain the annualised volatility by multiplying daily volatility by \(\sqrt{252}\), for there are typically 252 trading days on most financial markets.

Let \(y_t\) be the observed value of a variable, that is assumed to be stationary. This could represent the natural logarithm of the return of a security, at a time period, \(t\). The basic idea behind a study of volatility is that while the first moment of the series is either serially uncorrelated or has a minor lower order serial correlation, the second order moment of the variable displays some form of serial dependence.

For illustration, consider the daily logarithmic returns for the FTSE/JSE All Share, which is displayed in Figure 1. This time series appears to be stationary and random, and one could use a sample ACF to confirm this feature of the data. Indeed, the Ljung-Box statistic suggests that \(Q(12) = 18.68\) with a \(p\) value of \(0.10\) for \(y_t\). However, the sample ACF of the absolute log returns (i.e. \(|y_t|\)) suggests that this transformation of the variable is serially correlated. The Ljung-Box statistics of \(|y_t|\) suggest that \(Q(12) = 124.91\) with a \(p\) value close to zero. Consequently, the logarithmic returns of the FTSE/JSE All Share are serially uncorrelated, but dependent through the second moment. This is the feature that a univariate volatility model is designed to capture.

To put the volatility models in proper perspective, it is useful to consider the conditional mean and variance of \(y_t\) given the information set, \(I_{t-1}\). We could then summarise the first two moments as,

\[\begin{eqnarray} \nonumber \mu_t &=& \mathbb{E} \big[ y_t|I_{t-1} \big] \\ \sigma^2_t&=& \mathsf{var} \big[ y_t|I_{t-1} \big] = \mathbb{E} \big[(y_t - \mu_t )^2|I_{t-1} \big] \tag{2.1} \end{eqnarray}\]

where \(\mu_t\) is the expected mean and we use the information set available at time \(t - 1\). Typically, \(I_{t-1}\) consists of all linear functions of the past observed values of this variable. When seeking to model the volatility of a financial security, we note that the serial dependence is usually weak, if it exists at all. Therefore, the equation for \(\mu_t\) in equation (2.1) should be simple, and we assume that \(y_t\) follows a simple time series model such as a stationary ARMA(\(p, q\)) model. For example, consider the log returns of the FTSE/JSE All Share. In this case the Ljung-Box statistics show that the returns have no serial correlations and a simple one-sample test confirms that the mean of \(y_t\) is significantly different from zero. More specifically, the \(t\)-ratio of testing \(H_0 : \mu = 0\) versus \(H_1 : \mu \ne 0\) is 2.38 with \(p\) value 0.018. Thus, for the FTSE/JSE All Share returns, we have \(y_t= \mu_t+ a_t\) with \(\mu_t = \mu\), a constant.

In general, we assume that \(y_t\) follows an ARMA(\(p, q\)) model so that \(y_t = \mu_t+ a_t\), where \(\mu_t\) is given by

\[\begin{eqnarray} \mu_t= \phi_0 + \sum^p_{i=1} \phi_i y_{t-i} - \sum^q_{j=1} \theta_j a_{t-j} \tag{2.2} \end{eqnarray}\]

If some explanatory variables are available, then we entertain the model \(y_t = \mu_t + a_t\), where

\[\begin{eqnarray} \mu_t= \phi_0+\sum^k_{i=1} \beta_i x_{i,t-1} + \sum^p_{i=1} \phi_i z_{t-i} - \sum^q_{j=1} \theta_j a_{t-j} \tag{2.3} \end{eqnarray}\]

where \(z_{t-i} = y_{t-i} - \phi_0 - \sum^k_{i=1} \beta_i x_{i,t-i-1}\) denotes the adjusted return series after removing the effect of explanatory variables, and \(x_{i,t-j}\) are explanatory variables available at time \(t - j\).

The order (\(p, q\)) of an ARMA model may depend on the frequency of the return series. For example, daily returns of a market index often show some minor serial correlations, but monthly returns of the index may not contain any significant serial correlation. The explanatory variables \(x_{i,t-j}\) in equation (2.3) are flexible. For example, a dummy variable can be used for the Mondays to study the effect of weekend on daily stock returns. In the capital asset pricing model (CAPM), the mean equation of \(y_t\) can be written as \(y_t = \phi_0 + \beta r_{m,t}+ a_t\), where \(r_{m,t}\) denotes the market return.

Combining equations (2.1) and (2.3), we have

\[\begin{eqnarray} \sigma^2_t = \mathsf{var}(y_t|I_{t-1}) = \mathsf{var}(a_t|I_{t-1}) \tag{2.4} \end{eqnarray}\]

where the positive square root, \(\sigma_t\) is the measure of volatility. The conditional heteroscedastic models that are described below are concerned with the evolution of \(\sigma^2_t\). The manner under which \(\sigma^2_t\) evolves over time distinguishes one volatility model from another.

Conditional heteroscedastic models can be classified into two general categories. Those in the first category use an exact function to govern the evolution of \(\sigma^2_t\), whereas those in the second category use a stochastic equation to describe \(\sigma^2_t\). The ARCH/GARCH models belongs to the first category, while the SV model belongs in the second category.

In what follows, \(a_t\) is referred to as the shock or innovation to variable \(y_t\) at time \(t\). The model for \(\mu_t\) in equation (2.3) is referred to as the mean equation for \(y_t\) and the model for \(\sigma^2_t\) is the volatility equation for \(y_t\). Therefore, modelling conditional heteroscedasticity amounts to augmenting a traditional time series model with a dynamic equation that governs the time evolution of the conditional variance of the variable.

The construction of a traditional volatility model consists of the following four steps:

- Specify a mean equation after testing for serial dependence in the data. If necessary, make use of an econometric model (e.g. an ARMA model) for the return series to remove any linear dependence.
- Use the residuals from the mean equation to test for ARCH effects.
- Specify a volatility equation, if ARCH effects are statistically significant, and perform a joint estimation of the mean and volatility equations.
- Check the fitted model carefully and refine it if necessary.

In what follows, we describe the first two steps of the modelling procedure in detail before we introduce various specifications for important volatility models.

For most asset return series, the serial correlations for the mean equation is usually weak and may be no existent. Therefore, building a mean equation usually amounts to removing the sample mean from the data if the sample mean is significantly different from zero. For some daily return series, a simple autoregressive model might be needed. In some cases, the mean equation may employ some explanatory variables such as an indicator variable for weekend or January effects.

For ease of notation, we assume that \(a_t = y_t - \mu_t\) represents the residuals from the mean equation. The squared series \(a^2_t\) is then used to check for conditional heteroscedasticity. This procedure provides an indication of possible ARCH effects. If we suspect that there are possible ARCH effects, we could then make use of two formal tests. The first test, which is discussed in McLeod and Li (1983), applies the usual Ljung-Box statistics \(Q(m)\) to the \(a^2_t\) series. The null hypothesis of the test statistic is that the first \(m\) lags of ACF of the \(a^2_t\) series are zero. The second test for conditional heteroscedasticity makes use of a Lagrange multiplier and is discussed in Engle (1982). This test is equivalent to the usual \(F\)-statistic for testing \(\alpha_i = 0\), \((i = 1, \ldots , m)\) in the linear regression,

\[\begin{eqnarray*} a^2_t = \alpha_0+ \alpha_1 a^2_{t-1} + \ldots + \alpha_m a^2_{t-m} + \upsilon_t , \;\;\; t = m + 1, \ldots , T \end{eqnarray*}\]

where \(\upsilon_t\) denotes the error term, \(m\) is a pre-specified positive integer, and \(T\) is the sample size. Specifically, the null hypothesis is \(H_0 : \alpha_1 = \ldots = \alpha_m = 0\) and the alternative hypothesis is \(H_1 : \alpha_i \ne 0\) for some \(i\) between \(1\) and \(m\).

Now if we let \(SSR_0 = \sum^T_{t =m+1}\big(a^2_t - \bar{\omega}\big)^2\), where \(\bar{\omega} = (1/T) \sum^T_{t =1} a^2_t\) is the sample mean of \(a^2_t\), and \(SSR_1 = \sum^T_{t =m+1} \hat{\upsilon}^2_t\), where \(\hat{\upsilon}_t\) is the least squares residual of the prior linear regression. Then we have

\[\begin{eqnarray*} F = \frac{(SSR_0 - SSR_1)/m}{SSR_1/(T - 2m - 1)} \end{eqnarray*}\]

which follows an \(F\)-distribution with degrees of freedom \(m\) and \(T - 2m - 1\) under \(H_0\). When \(T\) is sufficiently large, one can use \(m F\) as the test statistic, which has an asymptotic chi-squared distribution with \(m\) degrees of freedom under the null hypothesis. The decision rule is to reject the null hypothesis if \(m F > \chi^2_m (\alpha)\), where \(\chi^2_m (\alpha)\) is the upper \(100(1 - \alpha)\)th percentile of \(\chi^2_m\), or the \(p\) value of \(m F\) is less than \(\alpha\), for a Type I error.

The first model that provides a systematic framework for volatility modelling is the ARCH model of Engle (1982). The basic idea of ARCH models is that (i) the shock to an asset return is serially uncorrelated, but dependent, and (ii) the dependence of \(a_t\) can be described by a simple quadratic function of its lagged values. Specifically, an ARCH(\(m\)) model assumes that,

\[\begin{eqnarray}\nonumber a_t &=& \sigma_t \varepsilon_t, \\ \sigma^2_t &=& \alpha_0 + \alpha_1 a^2_{t-1} + \ldots + \alpha_m a^2_{t-m} \tag{3.1} \end{eqnarray}\]

where \(\varepsilon_t\) is a sequence of independent and identically distributed \(\mathsf{i.i.d.}\) random variables with mean \(0\) and variance \(1\), \(\alpha_0 >0\), and \(\alpha_i \geq 0\) for \(i >0\). The coefficients \(\alpha_i\) must satisfy some regularity conditions to ensure that the unconditional variance of \(a_t\) is finite. In practice, \(\varepsilon_t\) is often assumed to follow the standard normal, standardised Student \(t\)-distribution, or a generalised error distribution (GED). In some applications, a skew-distribution is also used for \(\varepsilon_t\).

From the structure of the model, it is seen that large past squared shocks, \(\{a^2_{t-i}\}^m_{i=1}\), imply a large conditional variance \(\sigma^2_t\) for the innovation \(a_t\). This means that, under the ARCH framework, large shocks tend to be followed by another large shock. Hence, the probability of obtaining a large variance is greater than obtaining a smaller variance, when the immediate past includes relatively large measures for the variance. This feature may induce the volatility clusterings that is observed in most asset returns.

**Remark**. Some authors follow the original notation of Engle (1982) and use \(h_t\) to denote the conditional variance in equation (3.1). In this case, the shock becomes \(a_t =\sqrt{h_t} \varepsilon_t\).

To understand the properties of the ARCH models, it is useful to study the ARCH(1) model in some detail. This model may be written as

\[\begin{eqnarray*} a_t = \sigma_t \varepsilon_t, \;\;\; \sigma^2_t = \alpha_0 + \alpha_1a^2_{t-1}, \end{eqnarray*}\]

where \(\alpha_0 > 0\) and \(\alpha_1 \geq 0\). Note that the unconditional mean of \(a_t\) remains zero, since

\[\begin{eqnarray*} \mathbb{E} \big[ a_t \big] = \mathbb{E} \Big\{ \mathbb{E} \big[ a_t|I_{t-1} \big] \Big\} = \mathbb{E} \Big\{ \sigma_t \mathbb{E} \big[ \varepsilon_t \big] \Big\} = 0. \end{eqnarray*}\]

Secondly, the unconditional variance of \(a_t\) can then be derived as

\[\begin{eqnarray*} \mathsf{var}\big[ a_t \big] &=& \mathbb{E} \big[ a^2_t \big] = \mathbb{E} \Big\{ \mathbb{E} \big[ a^2_t|I_{t-1} \big] \Big\} \\ &=& \mathbb{E} \big[ \alpha_0 + \alpha_1 a^2_{t-1} \big] = \alpha_0+ \alpha_1 \mathbb{E} \big[ a^2_{t-1} \big] \end{eqnarray*}\]

Since \(a_t\) is a stationary process, with \(\mathbb{E} \big[ a_t \big] = 0\), \(\mathsf{var} \big[ a_t \big] = \mathsf{var} \big[a_{t-1} \big] = \mathbb{E} \big[ a^2_{t-1} \big]\). Therefore, we have \(\mathsf{var}\big[ a_t \big] = \alpha_0 + \alpha_1 \mathsf{var} \big[ a_t \big]\) and \(\mathsf{var} \big[ a_t \big] = \alpha_0/(1 - \alpha_1)\). Now since the variance of \(a_t\) must be positive, we require \(0 \leq \alpha_1 < 1\).

Thirdly, in some applications we may need to make use of higher order moments of \(a_t\). Therefore, we can show that \(\alpha_1\) must also satisfy some additional constraints. For instance, to study the tails for the distribution of volatility, we require that the fourth moment of \(a_t\) is finite. Under the normality assumption of \(\varepsilon_t\) in equation (3.1), we have

\[\begin{eqnarray*} \mathbb{E} \big[ a^4_t|I_{t-1} \big] = 3\Big\{ \mathbb{E} \big[ a^2_t|I_{t-1}\big] \Big\}^2 = 3 \big(\alpha_0+ \alpha_1a^2_{t-1} \big)^2 \end{eqnarray*}\]

Therefore,

\[\begin{eqnarray*} \mathbb{E} \big[ a^4_t \big] &=& \mathbb{E} \Big\{ \mathbb{E} \big[ a4t|I_{t-1}\big]\big\} = 3 \mathbb{E} \Big(\alpha_0 + \alpha_1a^2_{t-1} \big)^2 \\ &=& 3\mathbb{E} \Big( \alpha^2_0 + 2\alpha_0 \alpha_1 a^2_{t-1} + \alpha^2_1 a4_{t-1} \Big) \end{eqnarray*}\]

If \(a_t\) is fourth-order stationary with \(m_4= \mathbb{E} \big[ a^4_t \big]\), then we have

\[\begin{eqnarray*} m_4 &=& 3 \Big( \alpha^2_0 + 2\alpha_0 \alpha_1 \mathsf{var}\big[ a_t\big] + \alpha^2_1 m_4 \Big) \\ &=& 3 \alpha^2_0 \left( 1 + 2 \frac{\alpha_1}{1 - \alpha_1} \right) + 3 \alpha^2_1 m_4 \end{eqnarray*}\]

Consequently,

\[\begin{eqnarray*} m_4 = \frac{3 \alpha^2_0 (1 + \alpha_1)}{(1 - \alpha_1)(1 - 3\alpha^2_1)} \end{eqnarray*}\]

This result has two important implications: (i) as the fourth moment of \(a_t\) is positive, we see that \(\alpha_1\) must also satisfy the condition \(1 - 3\alpha^2_1 >0\), where \(0 \leq \alpha^2_1 < \frac{1}{3}\); and (ii) the unconditional kurtosis of \(a_t\) is

\[\begin{eqnarray*} \frac{\mathbb{E} \big[ a^4_t \big]}{\Big( \mathsf{var} \big[a_t \big] \Big)^2} = 3 \frac{\alpha^2_0(1 + \alpha_1)}{(1 - \alpha_1)(1 - 3 \alpha^2_1)} \times \frac{(1 - \alpha_1)^2}{\alpha^2_0} = 3 \frac{1 - \alpha^2_1}{1 - 3\alpha^2_1} > 3 \end{eqnarray*}\]

Therefore, the excess kurtosis of \(a_t\) is positive so the tail of the distribution of \(a_t\) is heavier than that of a normal distribution. In other words, it is more likely that the shock \(a_t\) in a conditional Gaussian ARCH(1) model will produce *outliers*, than would be the case of a traditional Gaussian white noise series. This is in agreement with the empirical finding that *outliers* appear more often in asset returns than in an \(\mathsf{i.i.d.}\) sequence of a random normal variable.

These properties continue to hold for general ARCH models, but the formulas become more complicated for higher order ARCH models. For higher-order models, the condition \(\alpha_i \geq 0\) in equation (3.1) can be relaxed. However, we would still need to ensure that the conditional variance \(\sigma^2_t\) is positive for all \(t\). As an alternative, a simple way to achieve positive conditional variance is to rewrite an ARCH(\(m\)) model as

\[\begin{eqnarray} a_t = \sigma_t \varepsilon_t , \;\;\; \sigma^2_t= \alpha_0 + A^{\prime}_{m,t-1} \Omega A_{m,_t-1} \tag{3.2} \end{eqnarray}\]

where \(A_{m,t-1} = (a_{t-1}, \ldots , a_{t-m})^{\prime}\) and \(\Omega\) is an \(m \times m\) non-negative definite matrix. The ARCH(\(m\)) model in equation (3.1) would then require that \(\Omega\) is diagonal. Therefore, we could show that the representation of Engle (1982) makes use of a parsimonious approach to approximate a quadratic function.^{1}

The key advantages of using an ARCH model to describe the volatility in a particular variable include the following:

- The model can produce volatility clusters
- The shocks \(a_t\) in the model have heavy tails

Both of these characteristics are consistent with the data that is provided by returns on most financial assets. However, these models also have a number of weaknesses, which include:

- The model assumes that positive and negative shocks have the same effects on volatility, since it depends on the square of the previous shocks. In most cases we can show that the price of a financial asset responds differently to positive and negative shocks.
- The ARCH model is rather restrictive. For instance, the \(\alpha^2_1\) parameter in an ARCH(1) model must be within the interval \([0, \frac{1}{3}]\), if the measure of volatility has a finite fourth moment. This constraint becomes complicated for higher order ARCH models. This would limit the ability of ARCH models with Gaussian innovations to capture excess kurtosis.
- The ARCH model does not provide any new insight into our understanding of the source of variations in a financial time series. It merely provides a mechanical way to describe the behaviour of the conditional variance.
- ARCH models are likely to over-predict the volatility because they respond slowly to large isolated shocks in the time series.

It is relatively easy to specify an ARCH model. When doing so, it is worth taking note of the following:

**Order Determination**. If an ARCH effect is found to be significant, one can use the PACF of \(a^2_t\) to determine the ARCH order. Using PACF of \(a^2_t\) to select the ARCH order can be justified as follows. From the model in equation (3.1), we have

\[\begin{eqnarray*} \sigma^2_t = \alpha_0 + \alpha_1 a^2_{t-1} + \ldots + \alpha_m a^2_{t-m} \end{eqnarray*}\]

For a given sample, \(a^2_t\) is an unbiased estimate of \(\sigma^2_t\). Therefore, we expect that \(a^2_t\) is linearly related to \(a^2_{t-1}, \ldots , a^2_{t-m}\) in a manner similar to that of an autoregressive model of order \(m\). Note that a single \(a^2_t\) is generally not an efficient estimate of \(\sigma^2_t\), but it can serve as an approximation that could be informative in specifying the order \(m\).

Alternatively, define \(\eta_t = a^2_t - \sigma^2_t\). It can be shown that \(\eta_t\) is an uncorrelated series with mean \(0\). The ARCH model then becomes

\[\begin{eqnarray*} a^2_t = \alpha_0 + \alpha_1 a^2_{t-1} + \ldots + \alpha_m a^2_{t-m} + \eta_t \end{eqnarray*}\]

which is in the form of an AR(\(m\)) model for \(a^2_t\), except that \(\eta_t\) is not an \(\mathsf{i.i.d.}\) series. As noted previously, the PACF of \(a^2_t\) is a useful tool to determine the order \(m\). Since \(\eta_t\) are not identically distributed, the least squares estimates of the prior model are consistent, but not efficient. The PACF of \(a^2_t\) may not be effective when the sample size is small.

**Estimation**. Several likelihood functions are commonly used in ARCH estimation, depending on the distributional assumption of \(\varepsilon_t\). When dealing with more complicated models it is worth noting that: starting values, choice of optimisation algorithm, use of analytic or numerical derivatives, and convergence criteria all influence the resulting parameter estimates.^{2}

**Model Checking**. For a properly specified ARCH model, the standardised residuals are given by

\[\begin{eqnarray*} \tilde{a}_t = \frac{a_t}{\sigma_t} \end{eqnarray*}\]

which approximate a sequence of \(\mathsf{i.i.d.}\) random variables. Therefore, one can check the adequacy of a fitted ARCH model by examining the time series \(\tilde{a}_t\). In particular, the Ljung-Box statistics of \(\tilde{a}_t\) can be used to check the adequacy of the mean equation, while the test on \(\tilde{a}^2_t\) can be used to test the validity of the volatility equation. The skewness, kurtosis, and quantile-to-quantile plot (i.e. \(QQ\) plot) of \(\tilde{a}_t\) could then be used to check the validity of the assumption relating to the distribution of the residual.

**Forecasting**. Forecasts from the ARCH model in equation (3.1) can be obtained recursively, as was the case with an AR model. Consider an ARCH(\(m\)) model at point \(t\), where the \(1\)-step ahead forecast of \(\sigma^2_{h=1}\) is

\[\begin{eqnarray*} \sigma^2_{h=1} = \alpha_0 + \alpha_1 a^2_t + \ldots + \alpha_t a^2_{t+1-m} \end{eqnarray*}\]

The \(2\)-step ahead forecast is then

\[\begin{eqnarray*} \sigma^2_{h=2} = \alpha_0+ \alpha_1 \sigma^2_{h=1} + \alpha2 a^2_t + \ldots + \alpha_m a^2_{t+2-m} \end{eqnarray*}\]

and the \(H\)-step ahead forecast for \(\sigma^2_{h=H}\) is

\[\begin{eqnarray} \sigma^2_{h=H} = \alpha_0 + \sum^m_{i=1} \alpha_i \sigma^2_{h=H-i} \tag{3.3} \end{eqnarray}\]

where \(\sigma^2_{h=H-i} = a^2_{h+H-i}\) if \(H - i \leq 0\).

Although the ARCH model has a simple functional form, it often requires many parameters to adequately describe the volatility process of a financial asset, when \(m\) takes on a reasonably large value. To simplify the model, Bollerslev (1986) proposes a useful extension that lead to the development of the generalised ARCH (GARCH) model. Once again, where the mean equation in the time series \(y_t\), may be described by \(a_t = y_t - \mu_t\). Then we can show that, \(a_t\) may be described by a GARCH(\(m, s\)) model if

\[\begin{eqnarray} \nonumber a_t &=& \sigma_t \varepsilon_t \\ \sigma^2_t &=& \alpha_0 + \sum^m_{i=1} \alpha_i a^2_{t-i} + \sum^s_{j=1} \beta_j \sigma^2_{t-j} \tag{4.1} \end{eqnarray}\]

where again \(\varepsilon_t\) is a sequence of \(\mathsf{i.i.d.}\) random variables with mean \(0\) and variance \(1\), \(\alpha_0 >0\), \(\alpha_i \geq 0\), \(\beta_j \geq 0\), and \(\sum^{m,s}_{i,j=1} \left( \alpha_i + \beta_j \right) < 1\). Here, it is understood that \(\alpha_i = 0\) for \(i > m\) and \(\beta_j = 0\) for \(j > s\). The latter constraint on \(\alpha_i + \beta_j\) implies that the unconditional variance of \(a_t\) is finite, whereas its conditional variance \(\sigma^2_t\) evolves over time. As before, \(\varepsilon_t\) is assumed to follow either a standard normal, standardised Student \(t\)-distribution, GED, etc. Equation (4.1) would then reduce to a pure ARCH(\(m\)) model if \(s = 0\). The \(\alpha_i\) and \(\beta_j\) are referred to as ARCH and GARCH parameters, respectively.

To understand the properties of GARCH models, it is informative to use the following representation. Let \(\eta_t = a^2_t - \sigma^2_t\) so that \(\sigma^2_t = a^2_t - \eta_t\). By plugging \(\sigma^2_{t-i} = a^2_{t-i} - \eta_{t-i} (i = 0, \ldots , s)\) into equation (4.1), we can rewrite the GARCH model as

\[\begin{eqnarray} a^2_t = \alpha_0 + \sum^{m,s}_{i,j=1} \left( \alpha_i + \beta_j \right) a^2_{t-i} + \eta_t - \sum^s_{j=1} \beta_j \eta_{t-j} \tag{4.2} \end{eqnarray}\]

It is easy to check that \(\eta_t\) is a martingale difference series (i.e. \(\mathbb{E} \big[ \eta_t \big] = 0\) and \(\mathsf{cov}(\eta_t , \eta_{t-j} ) = 0\) for \(j \geq 1\)). However, \(\eta_t\) in general is not an \(\mathsf{i.i.d.}\) sequence. Equation (4.2) would effectively take on an ARMA form for the squared series \(a^2_t\). Therefore, a GARCH model can be regarded as an application of the ARMA idea to the squared series \(a^2_t\). Using the unconditional mean of an ARMA model, we have

\[\begin{eqnarray*} \mathbb{E} \big[ a^2_t \big] = \frac{\alpha_0 }{ 1 - \sum^{m,s}_{i,j=1} \left( \alpha_i + \beta_j \right) } \end{eqnarray*}\]

provided the denominator of the prior fraction is positive.

The strengths and weaknesses of GARCH models can be easily seen by focusing on the simplest GARCH(\(1,1\)) model with

\[\begin{eqnarray} \sigma^2_t = \alpha_0 + \alpha_1 a^2_{t-1} + \beta_1 \sigma^2_{t-1}, \;\; 0 \leq \alpha_1, \;\; \beta_1 \leq 1, \;\; (\alpha_1+ \beta_1) < 1. \tag{4.3} \end{eqnarray}\]

Note firstly that a large \(a^2_{t-1}\) or \(\sigma^2_{t-1}\) gives rise to a large \(\sigma^2_t\). This implies that a large \(a^2_{t-1}\) tends to be followed by another large \(a^2_t\), which would generate the well-known behaviour of volatility clustering in financial time series. Secondly, it can be shown that if \(1 - 2 \alpha^2_1- ( \alpha_1+ \beta_1 )^2 >0\), then

\[\begin{eqnarray*} \frac{\mathbb{E}(a^4_t )}{ \big(\mathbb{E}[a^2_t ]\Big)^2} = \frac{3[1 - (\alpha_1+ \beta_1)^2]}{1 - (\alpha_1+ \beta_1)^2 - 2\alpha^2_1} > 3 \end{eqnarray*}\]

Consequently, and similar to ARCH models, the tail distribution of a GARCH(\(1,1\)) process is heavier than that of a normal distribution. Thirdly, the model provides a simple parametric function that can be used to describe the evolution of volatility in a variable.

Forecasts of a GARCH model can be obtained using methods similar to those of an ARMA model. Consider the GARCH(\(1,1\)) model in equation (4.3) and assume that the forecast origin is \(t\). For \(1\)-step ahead forecast, we have

\[\begin{eqnarray*} \sigma^2_{h=1} = \alpha_0+ \alpha_1 a^2_t + \beta_1 \sigma^2_t \end{eqnarray*}\]

where \(a_t\) and \(\sigma^2_t\) are known at the time index \(t\).

As \(\mathbb{E}(\varepsilon^2_{h+1}- 1|I_h ) = 0\), the \(2\)-step ahead volatility forecast at the forecast origin \(t\) satisfies the equation

\[\begin{eqnarray*} \sigma^2_{h=2} = \alpha_0 + (\alpha_1+ \beta_1)\sigma^2_{h=1} \end{eqnarray*}\]

In general, we have

\[\begin{eqnarray} \sigma^2_{h=H} = \alpha_0 + (\alpha_1+ \beta_1) \sigma^2_{h=H-1}, \;\;\; H>1 \tag{4.4} \end{eqnarray}\]

This result is analogous to that of an ARMA(\(1,1\)) model. By repeated substitutions in equation (4.4), we obtain that the \(H\)-step ahead forecast, which can be written as

\[\begin{eqnarray*} \sigma^2_{h=H} = \frac{\alpha_0 \big[ 1 - (\alpha_1 + \beta_1)^{H-1} \big]}{ 1 - \alpha_1 - \beta_1 } + \big( \alpha_1 + \beta_1 \big)^{H-1} \sigma^2_{h=1}. \end{eqnarray*}\]

Therefore,

\[\begin{eqnarray*} \sigma^2_{h=H} \rightarrow \frac{\alpha_0}{ 1 - \alpha_1 - \beta_1}, \;\;\; \text{as } H \rightarrow \infty \end{eqnarray*}\]

provided that \(\alpha_1 + \beta_1 < 1\). Consequently, the multi-step ahead volatility forecasts of a GARCH(\(1,1\)) model converge to the unconditional variance of \(a_t\), as the forecast horizon increases to infinity, provided that \(\mathsf{var}\big[ a_t \big]\) exists.

The literature on GARCH models is extremely large. For example, see Bollerslev, Chou, and Kroner (1992), Bollerslev, Engle, and Nelson (1994) and references therein. In most cases, the basic model encounters the same weaknesses as the ARCH model. For instance, it responds equally to positive and negative shocks. In addition, recent empirical studies of high frequency financial time series indicate that the tail behaviour of GARCH models remains too short even with standardised Student \(t\) innovations.^{3}

As the volatility of time series variable is not directly observable, comparing the forecasting performance of different volatility models can be problematic. Some researchers use out-of-sample forecasts and compare the volatility forecasts, \(\sigma^2_{h=H}\), with the shock, \(a^2_{h=H}\), to assess the forecasting performance of a volatility model. This approach often finds a low correlation coefficient between \(a^2_{h=H}\) and \(\sigma^2_{h=H}\), which would equate to a low \(R^2\) when regressing these estimates on one another. However, such a finding is not surprising because \(a^2_{h=H}\) alone is not an adequate measure of the volatility at time index \(t+H\).

For example, consider the \(1\)-step ahead forecasts. From a statistical point of view, \(\mathbb{E} \big[ a^2_{t+1}|I_h \big] = \sigma^2_{h+1}\) so that \(a^2_{t+1}\) is a consistent estimate of \(\sigma^2_{h+1}\). However, this measure is not an accurate estimate of \(\sigma^2_{h+1}\), since a single observation of a random variable with a known mean value cannot provide an accurate estimate of its variance. Consequently, such an approach to evaluate forecasting performance of volatility models is strictly speaking not proper. For more information concerning forecasting evaluation of GARCH models, the interested reader is referred to Andersen and Bollerslev (1998).

On the basis of equation (4.2), a two-pass estimation method can be used to estimate GARCH models. Firstly, if we ignore any ARCH effects, we could estimate the mean equation of a variable using maximum likelihood. Thereafter, we could denote the residual series by \(a_t\). Secondly, treating \(a^2_t\) as an observed time series, we could apply the maximum likelihood method to estimate the parameters in equation (4.2). In this case we could denote the AR and MA coefficient estimates by \(\hat{\phi_i}\) and \(\hat{\theta_i}\). The GARCH estimates are then obtained as \(\hat{\beta}_i = \hat{\theta}_i\) and \(\hat{\alpha}_i = \hat{\phi}_i - \hat{\theta}_i\). Obviously, such estimates are approximations to the true parameters and their statistical properties would need to be rigorously investigated. However, limited experience shows that this simple approach often provides good approximations, especially when the sample size is moderate or large.

When the AR polynomial of the GARCH representation in equation (4.2) has a unit root, then we have an integrated-GARCH (IGARCH) model. Thus, IGARCH models are unit-root GARCH models. Similar to ARIMA models, a key feature of IGARCH models is that the impact of past squared shocks \(\eta_{t-i}= a^2_{t-i} - \sigma^2_{t-i}\) for \(i >0\) on \(a^2_t\) does not dissipate.

An IGARCH(\(1,1\)) model could then be written as

\[\begin{eqnarray*} a_t = \sigma_t \varepsilon_t , \;\;\; \sigma^2_t= \alpha_0 + \beta_1 \sigma^2_{t-1} + (1 - \beta_1) a^2_{t-1} \end{eqnarray*}\]

where \(\varepsilon_t\) is defined as before and \(1>\beta_1 >0\).

Hence, the IGARCH(\(1,1\)) model is equivalent to a GARCH(\(1,1\)) when \(\alpha_1 + \beta_1= 1\). Repeated substitutions in equation (4.4) give

\[\begin{eqnarray} \sigma^2_{h=H} = \sigma^2_{h=1} + (H - 1)\alpha_0, \;\;\; H \geq 1, \tag{5.1} \end{eqnarray}\]

where \(t\) is the forecast origin. Consequently, the effect of \(\sigma^2_{h=1}\) on future volatilities is also persistent, and the volatility forecasts form a straight line with slope \(\alpha_0\). Nelson (1990) studied some probability properties of the volatility process \(\sigma^2_t\) under an IGARCH model. The process \(\sigma^2_t\) is a martingale for which some nice results are available in the literature. Under certain conditions, the volatility process is strictly stationary, but not weakly stationary because it does not have the first two moments.

The case of \(\alpha_0= 0\) is of particular interest in studying the IGARCH(\(1,1\)) model. In this case, the volatility forecasts are simply \(\sigma^2_{h=1}\) for all forecast horizons; see equation (5.1). This special IGARCH(\(1,1\)) model is the volatility model used in RiskMetrics, which is an approach for calculating value-at-risk.^{4} The model is also an exponential smoothing model for the \(a^2_t\) series. To see this, rewrite the model as

\[\begin{eqnarray*} \sigma^2_t &=& (1 - \beta_1)a^2_{t-1} + \beta_1\sigma^2_{t-1} \\ &=& (1 - \beta_1)a^2_{t-1} + \beta_1\Big[ (1 - \beta_1) a^2_{t-2}+ \beta_1 \sigma^2_{t-2} \Big] \\ &=& (1 - \beta_1)a^2_{t-1}+ (1 - \beta_1)\beta_1 a^2_{t-2} + \beta^2_1 \sigma^2_{t-2} \end{eqnarray*}\]

By repeated substitutions, we have

\[\begin{eqnarray*} \sigma^2_t = (1 - \beta_1) \big[ a^2_{t-1}+ \beta_1 a^2_{t-2} + \beta^2_1 a^3_{t-3} + \ldots \big] \end{eqnarray*}\]

which is the well-known exponential smoothing formation with \(\beta_1\) being the discounting factor. Exponential smoothing methods can thus be used to estimate such an IGARCH(\(1,1\)) model.

In several financial applications, the return of an asset may depend on its volatility. To model such a phenomenon, we could make use of the popular GARCH-M model, where the “M” stands for GARCH-in-mean. A simple example of a GARCH-M(\(1,1\)) model could be written as

\[\begin{eqnarray} \nonumber y_t &=& \mu + c \sigma^2_t + a_t \\ \nonumber a_t &=& \sigma_t \varepsilon_t \\ \sigma^2_t &=& \alpha_0 + \alpha_1 a^2_{t-1} + \beta_1 \sigma^2_{t-1} \tag{5.2} \end{eqnarray}\]

where \(\mu\) and \(c\) are constants. The parameter \(c\) is called the risk-premium parameter. A positive \(c\) indicates that the return is positively related to its past volatility. Other specifications of risk-premium have also been used in the literature, including \(y_t = \mu + c \sigma_t + a_t\) and \(y_t= \mu + c \log ( \sigma^2_t ) + a_t\).

The formulation of the GARCH-M model in equation (5.2) implies that there are serial correlations in the time series \(y_t\). These serial correlations are introduced by those in the volatility process, \(\sigma^2_t\). Therefore, the existence of risk-premium is another reason that the returns for some financial assets have serial correlations.

To allow for asymmetric effects between positive and negative asset returns, Nelson (1991) extended the GARCH framework to derive the exponential-GARCH (EGARCH) model. He considered the weighted innovation

\[\begin{eqnarray} g(\varepsilon_t ) = \theta \varepsilon_t + \gamma \Big[ |\varepsilon_t| - \mathbb{E} \big[|\varepsilon_t| \big] \Big] \tag{5.3} \end{eqnarray}\]

where \(\theta\) and \(\gamma\) are real constants. Both \(\varepsilon_t\) and \(|\varepsilon_t| - \mathbb{E} \big[|\varepsilon_t| \big]\) are zero-mean \(\mathsf{i.i.d.}\) sequences with continuous distributions. Therefore, \(\mathbb{E}[g(\varepsilon_t )] = 0\), and the asymmetry of \(g(\varepsilon_t )\) can be easily seen by rewriting it as

\[\begin{eqnarray*} g(\varepsilon_t ) = \left\{ \begin{array}{cc} (\theta + \gamma )\varepsilon_t - \gamma \mathbb{E}(|\varepsilon_t|) & \text{if } \varepsilon_t \geq 0, \\ (\theta - \gamma )\varepsilon_t - \gamma \mathbb{E}(|\varepsilon_t|) & \text{if } \varepsilon_t < 0 \end{array} \right. \end{eqnarray*}\]

**Remark**. For the standard Gaussian random variable \(\varepsilon_t , \mathbb{E} \big[|\varepsilon_t| \big] = \sqrt{2/\pi}\). For the standardised Student \(t\)-distribution in equation (8.1), we have

\[\begin{eqnarray*} \mathbb{E} \big[|\varepsilon_t| \big] = \frac{ 2 \sqrt{v - 2}((v + 1)/2) }{ (v - 1)(v/2) \sqrt{\pi}} \end{eqnarray*}\]

An EGARCH(\(m, s\)) model could then be written as

\[\begin{eqnarray} \nonumber a_t &=& \sigma_t \varepsilon_t, \\ \log(\sigma^2_t ) &=& \alpha_0 + \frac{1 + \beta_1 L + \ldots +\beta_{s-1}L^{s-1}}{1 - \alpha_1 L - \ldots - \alpha_m L^m } g(\varepsilon_{t-1}) \tag{5.4} \end{eqnarray}\]

where \(\alpha_0\) is a constant, \(L\) is the back-shift (or lag) operator such that \(L g(\varepsilon_t ) = g(\varepsilon_{t-1})\), and \(1 + \beta_1 L + \ldots + \beta_{s-1}L^{s-1}\) and \(1 - \alpha_1 L - \ldots - \alpha_m L^m\) are stationary polynomials and have no common factors. Again, equation (5.4) uses the usual ARMA parameterisation to describe the evolution of the conditional variance of \(a_t\). On the basis of this representation, some properties of the EGARCH model can be obtained in a similar manner as those of the GARCH model. For instance, the unconditional mean of \(\log(\sigma^2_t )\) is \(\alpha_0\). However, the model differs from the GARCH model in several ways.

Firstly, it uses logged conditional variance to relax the positiveness constraint of model coefficients. Secondly, the use of \(g(\varepsilon_t )\) enables the model to respond asymmetrically to positive and negative lagged values of \(a_t\). Some additional properties of the EGARCH model can be found in Nelson (1991).

To better understand the EGARCH model, let us consider the simple model with order (\(1,1\)):

\[\begin{eqnarray} a_t = \sigma_t \varepsilon_t , \;\; (1 - \alpha L) \log (\sigma^2_t ) = (1 - \alpha)\alpha_0 + g(\varepsilon_{t-1}) \tag{5.5} \end{eqnarray}\]

where the \(\varepsilon_t\) are \(\mathsf{i.i.d.}\) standard normal and the subscript of \(\alpha_1\) is omitted. In this case, \(\mathbb{E} \big[ |\varepsilon_t| \big] = \sqrt{2/\pi}\) and the model for \(\log(\sigma^2_t )\) becomes

\[\begin{eqnarray} (1 - \alpha L) \log(\sigma^2_t ) =\left\{ \begin{array}{cc} \alpha_{\star} + (\gamma + \theta)\varepsilon_{t-1} & \text{if } \varepsilon_{t-1} \geq 0, \\ \alpha_{\star} + (\gamma - \theta)(-\varepsilon_{t-1}) & \text{if } \varepsilon_{t-1} < 0 \end{array} \right. \tag{5.6} \end{eqnarray}\]

where \(\alpha_{\star} = (1 - \alpha)\alpha_0- \sqrt{2/\pi} \gamma\). This is a nonlinear function similar to that of the threshold autoregressive (TAR) model of Tong (1978), Tong (1990). It suffices to say that for this simple EGARCH model the conditional variance evolves in a nonlinear manner depending on the sign of \(a_{t-1}\). Specifically, we have

\[\begin{eqnarray*} \sigma^2_t= \sigma^{2\alpha}_{t-1} \exp(\alpha_{\star}) \left\{ \begin{array}{cc} \exp \left[ (\gamma + \theta) \frac{a_{t-1}}{\sigma_{t-1}} \right] & \text{if } a_{t-1} \geq 0, \\ \exp \left[ (\gamma - \theta) \frac{|a_{t-1}|}{\sigma_{t-1}} \right] & \text{if } a_{t-1} < 0 \end{array} \right. \end{eqnarray*}\]

The coefficients \((\gamma + \theta)\) and \((\gamma - \theta)\) describe the asymmetry in response to positive and negative \(a_{t-1}\). Therefore, the model is nonlinear if \(\theta \ne 0\). Since negative shocks tend to have larger impacts, we expect \(\theta\) to be negative. For higher order EGARCH models, the nonlinearity becomes much more complicated.^{5}

For ease in estimation, one can use an alternative form for the EGARCH(\(m, s\)) model:

\[\begin{eqnarray} \log (\sigma^2_t ) = \alpha_0 + \sum^m_{i=1} \alpha_i \frac{|a_{t-i}| + \gamma_i a_{t-i}}{\sigma_{t-i}} + \sum^s_{j=1} \beta_j \log (\sigma^2_{t-j} ) \tag{5.7} \end{eqnarray}\]

Here, a positive \(a_{t-i}\) contributes \(\alpha_i (1 + \gamma_i )|\varepsilon_{t-i}|\) to the log volatility, whereas a negative \(a_{t-i}\) gives \(\alpha_i (1 - \gamma_i )|\varepsilon_{t-i}|\), where \(\varepsilon_{t-i}= a_{t-i} / \sigma_{t-i}\). The \(\gamma_i\) parameter thus signifies the leverage effect of \(a_{t-i}\). Again, we expect \(\gamma_i\) to be negative in real applications.

Another volatility model that is commonly used to handle leverage effects is the threshold generalised autoregressive conditional heteroscedastic (or TGARCH) model; see Glosten, Jagannathan, and Runkle (1993) and Zakoian (1994). Such a TGARCH(\(m, s\)) model may assume the form

\[\begin{eqnarray} \sigma^2_t = \alpha_0 + \sum^m_{i=1} (\alpha_i + \gamma_i N_{t-i} ) a^2_{t-i} + \sum^s_{j=1} \beta_j \sigma^2_{t-j} \tag{5.8} \end{eqnarray}\]

where \(N_{t-i}\) is an indicator for negative \(a_{t-i}\), that is,

\[\begin{eqnarray*} N_{t-i}= \left\{ \begin{array}{cc} 1 & \text{if } a_{t-i} < 0, \\ 0 & \text{if } a_{t-i} \geq 0 \end{array} \right. \end{eqnarray*}\]

and \(\alpha_i\), \(\gamma_i\), and \(\beta_j\) are non-negative parameters satisfying conditions similar to those of GARCH models. From the model, it is seen that a positive \(a_{t-i}\) contributes \(\alpha_i a^2_{t-i}\) to \(\sigma^2_t\), whereas a negative \(a_{t-i}\) has a larger impact \((\alpha_i+ \gamma_i ) a^2_{t-i}\) with \(\gamma_i >0\). The model uses zero as its threshold to separate the impacts of past shocks. Other threshold values can also be used.^{6} The model that is represented by (5.8) is also called the GJR model, in recognition of the contribution of Glosten, Jagannathan, and Runkle (1993).

The TGARCH model belongs to the class of asymmetric power autoregressive conditional heteroscedastic (APARCH) models of Ding, Granger, and Engle (1993). A general APARCH(\(m, s\)) model could be written as

\[\begin{eqnarray} \nonumber y_t &=& \mu_t + a_t , \;\; a_t = \sigma_t \varepsilon_t , \;\; \varepsilon_t \sim D(0, 1) \\ \sigma^\delta_t &=& {\omega} +\sum^m_{i=1} \alpha_i \left(|a_{t-i}| + \gamma_i a_{t-i} \right)^\delta +\sum^s_{j=1} \beta_j \sigma^\delta_{t-j} \tag{5.9} \end{eqnarray}\]

where \(\mu_t\) is the conditional mean, \(D(0, 1)\) denotes a distribution with mean zero and variance \(1\), \(\delta\) is a positive real number, and the coefficients \(\omega\), \(\alpha_i\), \(\gamma_i\), and \(\beta_j\) satisfy some regularity conditions so that the volatility is positive. Similar to GARCH models, the APARCH(\(1,1\)) model is often used in practice. Three special cases of the APARCH models are of interest. When \(\delta = 2\), the APARCH model reduces to a TGARCH model. When \(\delta = 1\), the model uses volatility directly in the volatility equation. The case of \(\delta = 0\) is at the limit of \(\delta \rightarrow 0\) and in this case the model becomes the EGARCH model of Nelson (1991).

The power function in equation (5.9) is a transformation used to improve the goodness of fit of the model. This appears to be a sensible approach if one is interested in prediction. On the other hand, except for some special values, it seems hard to find a good interpretation for the power parameter \(\delta\). %To demonstrate, consider the log return series of the Dollar/Euro exchange rate. A fitted APARCH(\(1,1\)) model with Gaussian innovations is

Another GARCH family model that can capture asymmetric volatility responses to past positive and negative shocks is described in Engle and Ng (1993) and Duan (1995). It takes the form

\[\begin{eqnarray} \nonumber y_t & =& \mu_t+ at , \;\;\;\; a_t = \sigma_t \varepsilon_t , \;\;\;\; \varepsilon_t \sim D(0, 1) \\ \sigma^2_t &=& \beta_0 + \beta_1\sigma^2_{t-1} + \beta_2(a_{t-1} - \theta \sigma_{t-1})^2 \tag{5.10} \end{eqnarray}\]

where \(\mu_t\) is the conditional mean, \(D(0, 1)\) denotes a distribution with mean \(0\) and variance \(1\), \(\beta_i\) are non-negative parameters with \(\beta_0 >0\), and \(\theta\) is a leverage parameter. The model in equation (5.10) is referred to as a non-symmetric GARCH(\(1,1\)), or NGARCH(\(1,1\)), model. It reduces to a GARCH(\(1,1\)) model if \(\theta = 0\).

To study properties of NGARCH(\(1,1\)) model, we rewrite equation (5.10) as

\[\begin{eqnarray} \sigma^2_t = \beta_0 + \beta_1 \sigma^2_{t-1} + \beta_2\sigma^2_{t-1}(\varepsilon_{t-1} - \theta)^2 \tag{5.11} \end{eqnarray}\]

Taking expectation and using independence between \(\varepsilon_{t-1}\) and \(\sigma_{t-1}\), we have

\[\begin{eqnarray*} \mathbb{E} \big[ \sigma^2_t \big] &=& \beta_0 + \beta_1 \mathbb{E} \big[ \sigma^2_{t-1} \big] + \beta_2 \mathbb{E} \big[ \sigma^2_{t-1} \big] \mathbb{E}(\varepsilon_{t-1} - \theta)^2 \\ &=& \beta_0+ \beta_1 \mathbb{E} \big[ \sigma^2_{t-1} \big] + \beta_2 \mathbb{E} \big[ \sigma^2_{t-1} \big](1 + \theta^2) \end{eqnarray*}\]

If \(y_t\) is weakly stationary, \(\mathbb{E} \big[ \sigma^2_t \big] = \mathbb{E} \big[ \sigma^2_{t-1} \big]\) and we have

\[\begin{eqnarray*} \mathbb{E} \big[ \sigma^2_t \big] = \frac{\beta_0}{1 - \beta_1- \beta_2 (1 + \theta^2)} \end{eqnarray*}\]

which is the unconditional variance of \(y_t\). Consequently, we require that \(1 - \beta_1 - \beta_2(1 + \theta^2)>0\) for the NGARCH(\(1,1\)) model. Multiplying equation (5.11) by \(\varepsilon_{t-1}\) and taking expectation, we obtain

\[\begin{eqnarray*} \mathbb{E}(\varepsilon_{t-1}\sigma^2_t ) &=& -2 \theta \beta_2 \mathbb{E} \big[ \sigma^2_{t-1} \big] \\ &=& \frac{-2 \theta \beta_0 \beta^2 }{1 - \beta_1- \beta_2(1 + \theta^2)} \end{eqnarray*}\]

This result suggests that if \(\theta >0\) and \(\beta_2 >0\), then \(\varepsilon_{t-1}\) is negatively related to \(\sigma^2_t\). Therefore, \(\theta\) is a leverage parameter and should be positive. Finally, it can be shown that, under certain conditions, the shock \(a_t\) of a NGARCH(\(1,1\)) model has heavy tails even if \(\varepsilon_t\) is Gaussian.^{7}

An alternative approach to describe the evolution of volatility in a time series is to introduce a stochastic innovation to the conditional variance equation of \(a_t\). This additional stochastic term could be used to explain the unexpected shocks to the volatility process and is considered in Melino and Turnbull (1990), Taylor (1994), Harvey, Ruiz, and Shephard (1994), and Jacquier, Polson, and Rossi (1994). The resulting model is called a stochastic volatility (SV) model and it takes a popular functional form. Similar to EGARCH models, to ensure positiveness of the conditional variance, SV models use \(\log (\sigma^2_t )\) instead of \(\sigma^2_t\), where an SV model is defined as

\[\begin{eqnarray} \nonumber a_t &=& \sigma_t \varepsilon_t \\ (1 - \alpha_1 L -\ldots-\alpha_m L^m) \log (\sigma^2_t ) &=& \alpha_0+ v_t \tag{6.1} \end{eqnarray}\]

where the \(\varepsilon_t\) are \(\mathsf{i.i.d.} \mathcal{N}(0, 1)\), the \(v_t\) are \(\mathsf{i.i.d.} \mathcal{N}(0, \sigma^2_v )\). Note that \(\varepsilon_t\) and \(v_t\) are independent, \(\alpha_0\) is a constant, and all the polynomials, \(1 - \sum^m_{i=1} \alpha_i L^i\), ensure stationarity. Adding the innovation \(v_t\) would increase the flexibility of the model and it ability to describe the evolution of \(\sigma^2_t\). However, it also increases the difficulty in parameter estimation. To estimate an SV model, we need a quasi-likelihood method that may employ Kalman filtering or a Monte Carlo method. Jacquier, Polson, and Rossi (1994) provide some comparison of estimation results between quasi-likelihood and Markov chain Monte Carlo (MCMC) methods. The difficulty with estimating the parameters in a SV model is understandable since for each shock \(a_t\) the model uses two innovations \(\varepsilon_t\) and \(v_t\).^{8}

Limited experience suggests that SV models often provided improvements in their in-sample ability, however, their contributions to out-of-sample volatility forecasts are somewhat mixed.

More recently, the SV model has been extended to allow for long memory in volatility. A time series is a long-memory process if its autocorrelation function decays at a hyperbolic, instead of an exponential, rate as the lag increases. The extension to long-memory models in volatility studies is motivated by the fact that the autocorrelation function of the squared or absolute-valued series of an asset return often decays slowly, even though the return series has no serial correlation; see Ding, Granger, and Engle (1993).

A simple long-memory stochastic volatility (LMSV) model can be written as

\[\begin{eqnarray} a_t = \sigma_t \varepsilon_t , \;\;\; \sigma_t = \sigma \exp(u_t /2), \;\;\; (1 - L)^d u_t= \eta_t \tag{6.2} \end{eqnarray}\]

where \(\sigma >0\), the \(\varepsilon_t\) are \(\mathsf{i.i.d.} \mathcal{N}(0, 1)\), the \(\eta_t\) are \(\mathsf{i.i.d.} \mathcal{N}(0, \sigma^2_{\eta} )\) and independent of \(\varepsilon_t\), and \(0 < d < 0.5\). The feature of long memory stems from the fractional difference \((1 - L)^d\), which implies that the ACF of \(u_t\) decays slowly at a hyperbolic, instead of an exponential, rate as the lag increases. For model LMSV, we have

\[\begin{eqnarray*} \log (a^2_t ) &=& \log (\sigma 2) + u_t+ \log (\varepsilon^2_t ) \\ &=& \Big[ \log (\sigma^2) + \mathbb{E} \big[ \log \varepsilon^2_t \big] \Big] + u_t+ \Big[\log (\varepsilon_2t ) - \mathbb{E} \big[ \log \varepsilon^2_t \big] \Big] \equiv \mu + u_t+ e_t \end{eqnarray*}\]

Thus, the \(\log (a^2_t )\) series is a Gaussian long-memory signal plus a non-Gaussian white noise; see Breidt, Crato, and de\(\;\)Lima (1998). Estimation of the LMSV model is complicated, but the fractional difference parameter \(d\) can be estimated by using either a quasi-maximum likelihood method or a regression method. Using the log series of squared daily returns for companies in the S&P 500 index, Bollerslev and Jubinski (1999) and Ray and Tsay (2000) found that the median estimate of \(d\) is about \(0.38\). For applications, Ray and Tsay (2000) studied common long-memory components in daily stock volatilities of groups of companies classified by various characteristics. They found that companies in the same industrial or business sector tend to have more common long-memory components (e.g. big US national banks and financial institutions).

In many cases, the volatility of time series variables may be interrelated, where the effects of contemporaneous shocks are correlated with one another. Multivariate GARCH models seek to estimate the conditional volatilities of the variables simultaneously, where they would also allow for an investigation into volatility spillovers. In addition, these models may be particularly useful in studies that are concerned with portfolio allocation decisions, which are influenced by degree of covariation of stock prices or correlation in the volatilities of a security that follows after a shock.

If the initially ignore the mean equations, we could consider the volatility equations for two variables:

\[\begin{eqnarray}\nonumber a_{1,t} = \sigma_{11,t} \varepsilon_{1,t}\\ \nonumber a_{2,t} = \sigma_{22,t} \varepsilon_{2,t} \end{eqnarray}\]

Then allow for interrelated shocks, where \(\sigma^2_{12,t} = mathbb{E}_{t-1}[a_{1,t} a_{2,t}]\), the multivariate *vech* model may be expressed as,

\[\begin{eqnarray} \nonumber \sigma^2_{11,t} &=& \alpha_{10} + \alpha_{11}a_{1,t-1}^{2} + \alpha_{12}a_{1,t-1}a_{2,t-1} + \alpha_{13}a_{2,t-1}^{2} + \ldots \\ \nonumber &&\ldots + \beta_{11}\sigma^2_{11,t-1} + \beta_{12}\sigma^2_{12,t-1} + \beta_{13}\sigma^2_{22,t-1} \\ \nonumber \sigma^2_{12,t} &=& \alpha_{20} + \alpha_{21}a_{1,t-1}^{2} + \alpha_{22}a_{1,t-1}a_{2,t-1} + \alpha_{23}a_{2,t-1}^{2} + \ldots \\ \nonumber &&\ldots + \beta_{21}\sigma^2_{11,t-1} + \beta_{22}\sigma^2_{12,t-1} + \beta_{23}\sigma^2_{22,t-1} \\ \nonumber \sigma^2_{22,t} &=& \alpha_{30} + \alpha_{31}a_{1,t-1}^{2} + \alpha_{32}a_{1,t-1}a_{2,t-1} + \alpha_{33}a_{2,t-1}^{2} + \ldots \\ \nonumber &&\ldots + \beta_{31}\sigma^2_{11,t-1} + \beta_{32}\sigma^2_{12,t-1} + \beta_{33}\sigma^2_{22,t-1} \\ \nonumber \end{eqnarray}\]

Hence the conditional variance of \(\sigma^2_{11,t}\) and \(\sigma^2_{22,t}\) depends on:

- its own past, \(\sigma^2_{11,t}\) and \(\sigma^2_{22,t}\)
- the conditional covariance, \(\sigma^2_{12,t}\)
- the lagged squared errors, \(a_{1,t-1}^{2}\) and \(a_{2,t-1}^{2}\)
- the product of the lagged errors, \(a_{1,t-1}\) \(a_{2,t-1}\)

Note that this simple model, which includes two variables would be difficult to estimate as it has 21 coefficients (when excluding the coefficients that pertain to the mean equation). In addition, as the model complexity increases (with more lags or variables) so does the estimation problem. Such an over-parameterised model may also struggle with convergence if one of the parameters is poorly identified, which is something that will usually occur. In addition, since volatility is always positive all the conditional variances must be positive, which would usually necessitate that all of the coefficients should also be negative and all the implied correlation coefficients, \(\rho = \sigma^2_{ij} / ({\sigma_{ii}}{\sigma_{jj}})\), should be between \(\pm 1\).

To simplify this model somewhat, Engle and Kroner (1995) transform the model so that all the parameters take on quadratic forms to ensure all the variances are positive. For example the first equation of such a BEKK multivariate GARCH model would then take the form:

\[\begin{eqnarray}\nonumber \sigma^2_{11,t} & = & \left[ \alpha^2_{10} + \alpha^2_{01}\right] + \left[ \alpha^2_{11}a_{1,t-1}^{2} + 2\alpha_{11} \alpha_{21}a_{1,t-1}a_{2,t-1} + \alpha_{21}^2 a_{2,t-1}^{2}\right] \ldots \\ \nonumber && \ldots+ \left[ \beta^2_{11}\sigma^2_{11,t-1} + 2\beta_{11}\beta_{12}\sigma^2_{12,t-1} + \beta_{21}^2 \sigma^2_{22,t-1}\right] \\ \nonumber \vdots \;\; &=& \hspace{2cm}\vdots \end{eqnarray}\]

Although this would ensure that most of the parameters will be positive, there are still a large number of parameter estimates that would need to be obtained. In addition, this specification doesn’t overcome the problem of poorly identified parameters, and as such convergence may not be attained.

To further simplify this model and reduce the problems that are associated with parameter estimation, the *diagonal vech* model has been used in a number of empirical applications, where we restrict \(\sigma^2_{i,j}\) to incorporate only lags of itself and cross-products \(a_{i,t}a_{j,t}\). This model could then be specified as,

\[\begin{eqnarray} \nonumber \sigma^2_{11,t} &=& \alpha_{10} + \alpha_{11}a_{1,t-1}^{2} + \beta_{11}\sigma^2_{11,t-1} \\ \nonumber \sigma^2_{12,t} &=& \alpha_{20} + \alpha_{22}a_{1,t-1}a_{2,t-1} + \beta_{22}\sigma^2_{12,t-1} \\ \nonumber \sigma^2_{22,t} &=& \alpha_{30} + \alpha_{33}a_{2,t-1}^{2} + \beta_{33}\sigma^2_{22,t-1} \end{eqnarray}\]

Note that although this will obviously limit the interaction effects, the measure of covariation in volatility, \(\sigma^2_{12,t}\), may be useful. However, it is worth noting that the shocks to variable \(1\) would not affect \(\sigma^2_{22,t}\), which limits the degree of interaction between the two variables.

A further alternative specification of the Multivariate GARCH model is that of the constant conditional correlation model, which seeks to restrict the correlation coefficients to be constant, where for each \(i \ne j\) we have \(\sigma^2_{ij,t} = \rho_{ij}\big(\sigma_{ii,t} \sigma_{jj,t}\big)\). Hence the second equation in this model would be specified as,

\[\begin{eqnarray} \nonumber \sigma^2_{12,t} = \rho_{12}{\big(\sigma_{11,t}\sigma_{22,t}\big)} \end{eqnarray}\]

where the expressions for \(\sigma^2_{11}\) and \(\sigma^2_{22}\) are as per the *vech* model. Hence, the number of parameters to be estimated is still large (but not as many as *vech*).

To conclude this section, while multivariate extensions of these models provide for interesting investigations, it should be noted that it is not easy to implement such a model effectively. In most cases you will find that unless a specific postulate calls for a particular specification, you may need to try and estimate most of the above models, to consider various types of interaction (or to potentially identify where a parameter may be poorly identified). You’d usually want to allow for the more (rather than less) interaction, but if the parameters are poorly identified then your results should be interpreted with caution. Of course there are many other variations of Multivariate GARCH models that one could consider that place alternative restrictions on the number of parameters. An example of an alternative model is the GO-GARCH, which you may wish to consider.

Under the normality assumption, the likelihood function of an ARCH(\(m\)) model is

\[\begin{eqnarray*} f (a_1, \ldots , a_T|\alpha) &=& f (a_T|I_{T-1})f (a_{T-1}|I_{T-2}) \ldots f (a_{m+1}|I_m) f (a_1, \ldots , a_m|\alpha) \\ &=& \prod^T_{t=m+1} \frac{1}{\sqrt{2 \pi \sigma^2_t} } \exp \left( - \frac{a^2_t}{2\sigma^2_t} \right) \times f (a_1, \ldots , a_m|\alpha), \end{eqnarray*}\]

where \(\alpha = (\alpha_0, \alpha_1, \ldots , \alpha_m)^{\prime}\) and \(f (a_1, \ldots , a_m|\alpha)\) is the joint probability density function of \(a_1, \ldots , a_m\). As the exact form of \(f (a_1, \ldots , a_m|\alpha)\) is complicated, it is commonly dropped from the prior likelihood function, especially when the sample size is sufficiently large. This results in using the conditional likelihood function

\[\begin{eqnarray*} f \Big(a_{m+1}, \ldots , a_T|\alpha, a_1, \ldots , a_m \Big) = \prod^T_{t=m+1} \frac{1}{\sqrt{2 \pi \sigma^2_t}} \exp \left( - \frac{a^2_t}{2\sigma^2_t} \right) \end{eqnarray*}\]

where \(\sigma^2_t\) can be evaluated recursively. We refer to estimates obtained by maximizing the prior likelihood function as the conditional maximum likelihood estimates (MLEs) under normality.

Maximizing the conditional likelihood function is equivalent to maximizing its logarithm, which is easier to handle. The conditional log likelihood function is

\[\begin{eqnarray*} \ell \Big( a_{m+1}, \ldots , a_T|\alpha, a_1, \ldots , a_m \Big) = \sum^T_{t=m+1} \left[ - \frac{1}{2} \log (2\pi) - \frac{1}{2} \log (\sigma^2_t ) - \frac{1}{2} \frac{a^2_t}{\sigma^2_t} \right] \end{eqnarray*}\]

As the first term \(\log(2\pi)\) does not involve any parameters, the log likelihood function becomes

\[\begin{eqnarray*} \ell \big(a_{m+1}, \ldots , a_T|\alpha, a_1, \ldots , a_m \big) = - \sum^T_{t=m+1} \left[ \frac{1}{2} \log( \sigma^2_t ) + \frac{1}{2} \frac{a^2_t}{\sigma^2_t} \right] \end{eqnarray*}\]

where \(\sigma^2_t = \alpha_0 + \alpha_1 a^2_{t-1} + \ldots + \alpha_m a^2_{t-m}\) can be evaluated recursively.

In some applications, it is more appropriate to assume that \(\varepsilon_t\) follows a heavy-tailed distribution such as a standardised Student \(t\)-distribution. Let \(x_v\) be a Student \(t\)-distribution with \(v\) degrees of freedom. Then \(\mathsf{var} \big[ x_v \big] = v/(v - 2)\) for \(v >2\), and we use \(\varepsilon_t= x_v / \sqrt{v/(v - 2)}\). The probability density function of \(\varepsilon_t\) is

\[\begin{eqnarray} f \big( \varepsilon_t|v \big) = \frac{\Gamma((v + 1)/2)}{\Gamma (v/2) \sqrt{(v - 2) \pi }} \left( 1 + \frac{\varepsilon^2_t}{v - 2} \right)^{-(v+1)/2}, \;\; v >2, \tag{8.1} \end{eqnarray}\]

where \(\Gamma(x)\) is the usual gamma function (i.e. \(\Gamma(x) = \int^{\infty}_0 y^{x-1} e^{-y} dy\)). Using \(a_t = \sigma_t \varepsilon_t\), we obtain the conditional likelihood function of \(a_t\) as

\[\begin{eqnarray*} f \big( a_{m+1}, \ldots , a_T|\alpha, A_m) = \prod^T_{t=m+1} \frac{ \Gamma ((v + 1)/2)}{ \Gamma (v/2) \sqrt{(v - 2) \pi} } \frac{1}{\sigma_t} \left( 1 + \frac{a^2_t}{(v - 2)\sigma^2_t} \right)^{-(v+1)/2} \end{eqnarray*}\]

where \(v > 2\) and \(A_m = (a_1, a^2, \ldots , a_m)\). We refer to the estimates that maximise the prior likelihood function as the conditional MLEs under \(t\)-distribution. The degrees of freedom of the \(t\)-distribution can be specified a priori or estimated jointly with other parameters. A value between \(3\) and \(6\) is often used if it is prespecified.

If the degrees of freedom \(v\) of the Student \(t\)-distribution is prespecified, then the conditional log likelihood function is

\[\begin{eqnarray}\nonumber \ell \big( a_{m+1}, \ldots , a_T |\alpha, A_m \big) = - \sum^T_{t=m+1} \left[ \frac{v + 1}{2} \log \left( 1 + \frac{a^2_t}{(v - 2)\sigma^2_t} \right) + \frac{1}{2} \log (\sigma^2_t )\right] \\ \tag{8.2} \end{eqnarray}\]

If one wishes to estimate \(v\) jointly with other parameters, then the log likelihood function becomes

\[\begin{eqnarray*} &&\ell \big(a_{m+1}, \ldots , a_T|\alpha, v, A_m \big) \\ && \;\; = \big(T - m \big) \Big[ \log \big( \Gamma (v + 1)/2)) \big) - \log \big( \Gamma (v/2) \big) - 0.5 \log \big( (v - 2) \pi \big) \Big] \\ &&\hspace{0.7cm} + \ell \big( a_{m+1}, \ldots , a_T|\alpha, A_m \big) \end{eqnarray*}\]

where the second term is given in equation (8.2).

Besides fat tails, empirical distributions of asset returns may also be skew. To handle this additional characteristic of asset returns, the Student \(t\)-distribution has been modified to become a skew Student \(t\)-distribution. There are multiple versions of skew Student \(t\)-distribution, but we shall adopt the approach of Fernandez and Steel (1998), which can introduce skewness into any continuous uni-modal and symmetric (with respect to \(0\)) univariate distribution. Specifically, for the innovation \(\varepsilon_t\) of an ARCH process, Lambert and Laurent (2001) apply the method of Fernandez and Steel (1998) to the standardised Student \(t\)-distribution in equation (8.1) to obtain a standardised skew Student \(t\)-distribution. The resulting probability density function is

\[\begin{eqnarray} g( |\varepsilon_t| \xi , v) = \left\{ \begin{array}{cc} \frac{2}{ \xi } + \frac{1}{\xi} \varrho f \Big[ \xi \big( \varrho \varepsilon_t + \varpi \big)|v \Big] & \;\; \text{if } \varepsilon_t < -\varpi / \varrho \\ \frac{2}{ \xi } + \frac{1}{\xi} \varrho f \Big[ \big( \varrho \varepsilon_t + \varpi \big) / \xi |v \Big] & \;\; \text{if } \varepsilon_t \geq -\varpi / \varrho \end{array} \right. \tag{8.3} \end{eqnarray}\]

where \(f (\cdot)\) is the pdf of the standardised Student \(t\)-distribution in equation (8.1), \(\xi\) is the skewness parameter, \(v >2\) is the degrees of freedom, and the parameters \(\varrho\) and \(\varpi\) are given below

\[\begin{eqnarray*} \varpi = \frac{\Gamma \Big( \big( v - 1 \big)/2 \Big) \sqrt{v - 2}}{ \sqrt{\pi} \Gamma (v/2)} \Big( \xi - \frac{1}{ \xi} \Big) \\ \varrho^2 = \left( \xi^2 + \frac{1}{ \xi^2}- 1 \right) - \varpi^2 \end{eqnarray*}\]

In equation (8.3), \(\xi^2\) is equal to the ratio of probability masses above and below the mode of the distribution and, hence, it is a measure of the skewness.

Andersen, T.G., and T. Bollerslev. 1998. “Answering the Skeptics: Yes, Standard Volatility Models Do Provide Accurate Forecasts.” *International Economic Review* 39: 885–905.

Bollerslev, T. 1986. “Generalized Autoregressive Conditional Heteroskedasticity.” *Journal of Econometrics* 31: 307–27.

Bollerslev, T., R.Y. Chou, and K.F. Kroner. 1992. “ARCH Modeling in Finance.” *Journal of Econometrics* 52: 5–59.

Bollerslev, T., R.F. Engle, and D.B. Nelson. 1994. “Handbook of Econometrics Iv.” In, edited by R.F. Engle and D.C. McFadden, 2959–3038. Amsterdam: Elsevier Science.

Bollerslev, T., and D. Jubinski. 1999. “Equality Trading Volume and Volatility: Latent Information Arrivals and Common Long-Run Dependencies.” *Journal of Business and Economic Statistics* 19: 9–21.

Breidt, F.J., N. Crato, and P. de\(\;\)Lima. 1998. “On the Detection and Estimation of Long Memory in Stochastic Volatility.” *Journal of Econometrics* 83: 325–48.

Cao, C., and Tsay R.S. 1992. “Nonlinear Time Series Analysis of Stock Volatilities.” *Journal of Applied Econometrics* 7: 165–85.

Ding, Z., C.W.J. Granger, and R.F. Engle. 1993. “A Long Memory Property of Stock Returns and a New Model.” *Journal of Empirical Finance* 1: 83–106.

Duan, J. 1995. “The Garch Option Pricing Model.” *Mathematical Finance* 5: 13–32.

Engle, R.F. 1982. “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation.” *Econometrica* 50: 987–1007.

Engle, R.F., and K. Kroner. 1995. “Multivariate Simultaneous Generalized Arch.” *Econometric Theory* 11(1): 122–50.

Engle, R.F., and V. Ng. 1993. “Measuring and Testing the Impact of News on Volatility.” *Journal of Finance* 48: 1749–78.

Fernandez, C., and M.F.J. Steel. 1998. “On Bayesian Modelling of Fat Tails and Skewness.” *Journal of American Statistical Association* 93: 1779–1801.

Glosten, L.R., R. Jagannathan, and D.E. Runkle. 1993. “On the Relation Between the Expected Value and the Volatility of Nominal Excess Return on Stocks.” *Journal of Finance* 48: 1779–1801.

Harvey, A.C., E. Ruiz, and N. Shephard. 1994. “Multivariate Stochastic Variance Models.” *Review of Econometric Studies* 61: 247–64.

Jacquier, E., N.G. Polson, and P. Rossi. 1994. “Bayesian Analysis of Stochastic Volatility Models (with Discussion).” *Journal of Business and Economic Statistics* 12: 371–417.

Lambert, P., and S. Laurent. 2001. “Modelling Financial Time Series Using Garch-Type Models and a Skewed Student Density.” Working paper. Universite de Liege.

McLeod, A.I., and W.K. Li. 1983. “Diagnostic Checking Arma Time Series Models Using Squared-Residual Autocorrelations.” *Journal of Time Series Analysis* 4: 269–73.

Melino, A., and S.M. Turnbull. 1990. “Pricing Foreign Currency Options with Stochastic Volatility.” *Journal of Econometrics* 45: 318–34.

Nelson, D.B. 1990. “Stationarity and Persistence in the Garch(1,1) Model.” *Econometric Theory* 6: 318–34.

———. 1991. “Conditional Heteroskedasticity in Asset Returns: A New Approach.” *Econometrica* 59: 347–70.

Ray, B.K., and R.S. Tsay. 2000. “Long-Range Dependence in Daily Stock Volatilities.” *Journal of Business and Economic Statistics* 18: 254–62.

Taylor, S.J. 1994. “Modeling Stochastic Volatility: A Review and Comparative Study.” *Mathematical Finance* 4: 183–204.

Tong, H. 1978. “Pattern Recognition and Signal Processing.” In, edited by C.H. Chen. Amsterdam: Sijhoff & Noordhoff.

———. 1990. *Non-Linear Time Series: A Dynamical System Approach*. Oxford: Oxford University Press.

Tsay, R.S. 2010. *Analysis of Financial Time Series*. Hoboken (NJ): John Wiley & Sons.

Zakoian, J.M. 1994. “Threshold Heteroscedastic Models.” *Journal of Economic Dynamics and Control* 18: 931–55.

Tsay (2010) contains additional details on this representation.↩

Further discussions of the likelihood function for the ARCH(\(m\)) model is contained in the appendix.↩

For further information about kurtosis of GARCH models, see Tsay (2010).↩

See, Cao and R.S. (1992) for how to obtain multi-step ahead volatility forecasts for nonlinear models, (including EGARCH models).↩

See, Tsay (2010) for the general concept of threshold models.↩

Tsay (2010) contains details relating to the use of MCMC methods to estimate SV models. For more discussions on SV models, see Taylor (1994).↩