Most macroeconomic and many financial variables are non-stationary. They drift upwards over time and often exhibit characteristics, which suggest that they have a stochastic trend. In previous chapters, we considered various tests that could be used to determine whether or not a time series contained a stochastic trend. Those variables that are integrated of the first or higher orders, could be transformed into stationary variables by differencing the data.

In this chapter we will consider a selection of methods that permit us to work with non-stationary data, where the integrated variables are not transformed into stationary counterparts. Our attention will be focused on the conditions that are used to analyse cointegrated variables, where two or more time series are said to be cointegrated, if they share the same common stochastic trend. The idea was first described in Engle and Granger (1987), who later received the 2003 Sveriges Riksbank Prize in Economic Sciences that are presented in Memory of Alfred Nobel for their contribution to the understanding of time series variables with common trends.¹

This methodology can be used to derive more informative models, where we are able to make use of information that pertains to the stochastic trend. If the variables that we are seeking to model are cointegrated, we might not need to make the variables stationary, through differencing, and as a result the information that relates to the stochastic trend is not lost.

Moreover in economic theory, the relationship between economic variables may be described by a non-stationary trend, which could reflect the general equilibrium, while short-term departures from this trend could be modelled with the aid of an error-correction representation. As such, the concept of cointegration allows us to specify econometric models that are directly linked to economic theory.

The results of Engle and Granger (1987) have opened up a series of new applications for studying non-stationary time series data. Moreover, these results, and subsequent statistical developments have enhanced the popularity of the VAR methodology that was developed by Sims (1980).

1 Cointegration defined

In most cases, when the variables \(y_{1,t}\) and \(y_{2,t}\) are non-stationary \(I(1)\) variables, a linear combination of these variables will also be non-stationary. However, in a few cases the linear combination of these variables may be stationary. This happens when the variables share the same stochastic trends, which are cancelled out when combined. In these cases, we say that the variables are cointegrated.

To see how this translates into practice, consider two variables, \(y_{1,t}\) and \(y_{2,t}\), which are integrated of the first order, \(I(1)\). When regressing these variables on one another, we could rearrange the linear regression model, such that

\[\begin{eqnarray*} u_{t}=y_{1,t}-\beta_{1}y_{2,t} \end{eqnarray*}\]

Now if the error term, \(u_{t}\) is stationary, \(I(0)\), then by definition the combined \(y_{1,t}-\beta_{1}y_{2,t}\) must also be stationary, since the properties of the left-hand-side must equal the properties on the right-hand-side. Hence, while both \(y_{1,t}\) and \(y_{2,t}\) have stochastic trends, we say that the variables \(y_{1,t}\) and \(y_{2,t}\) are cointegrated, as the linear combination \(y_{1,t}+\beta_{1}y_{2,t}\) has the same statistical properties as an \(I(0)\) variable. Note that these stochastic trends are related through \(\beta_1\), which contains this feature (relating to the common stochastic trends) of the data. Of course, if \(u_{t}\) is non-stationary, \(I(1)\), as would usually be the case, then \(y_{1,t}\) and \(y_{2,t}\) are not cointegrated and regressing \(y_{2,t}\) on \(y_{1,t}\) would yield a spurious result.

Figure 1 provides a graphical illustration of two variables that have been simulated with the equations, \(y_{1,t} = 0.1 + \mu_{y_{1t}} + \upsilon_{y_{1t}}\) and \(y_{2,t} = 0.3 + \mu_{y_{2t}} + \upsilon_{y_{2t}}\). Both variables have a positive drift so the two series increase over time as they are characterised random-walks plus drift, where \(\mu_{y,t}\) is the stochastic trend. Regressing \(y_{1,t}\) on \(y_{2,t}\) would usually produce a spurious regression as in panels on the left-hand-side of Figure 1, where the error is non-stationary (i.e. a shocks has a permanent effect on the error term). However, in certain cases when we regress \(y_{1,t}\) on \(y_{2,t}\) we produce a stationary error, which is reflected in right-hand-side panel of Figure 1. In this case the effects of stochastic trends have been removed.

Figure 1: Non-stationary variables

To relate these results to a general specification, we make use of matrix algebra, where \({\bf{y}}_{t}=(y_{1,t},y_{2,t})^{\prime}\) is a \((2\times1)\) vector of \(I(1)\) variables. Then the coefficient matrix may be given as, \(\beta=(1-\beta_{1})^{\prime}\), where the relationship between the variables could be summarised as \(\beta^{\prime} {\bf{y}}_{t} = y_{1,t}-\beta_{1}y_{2,t}\). These variables will then be cointegrated when \(\beta^{\prime}{\bf{y}}_{t}\sim I(0)\).

The linear combination of variables to derive \(\beta^{\prime}{\bf{y}}_{t}\) will typically be motivated by economic theory and this often referred to as the long-run equilibrium relationship. The idea is that variables in the \({\bf{y}}_{t}\) vector will drift together as they follow some form of long-run equilibrium. The vector \(\beta\) is termed the cointegrating vector, which summarises the relationship between the stochastic trends. When components of \({\bf{y}}_{t}\) are integrated of order \(d\) and the reduction in the order of the combined variables is \(b\), then we note that \({\bf{y}}_{t}\sim CI(d, b)\). We will see that the concept of cointegration can easily be extended to a setting with \(n\) variables. However, as we will discuss further, with \(n\) variables, there can only be a maximum of \(n-1\) linear combinations that are responsible for cointegrating relationships.

1.1 Cointegration and common trends

Stock and Watson (1988) show that it is possible to express \(y_{1,t}\) and \(y_{2,t}\), which represent \(I(1)\) variables, as

\[\begin{eqnarray*} y_{1,t} &=& \mu_{y_{1t}} + \upsilon_{y_{1t}} \\ y_{2,t} &=& \mu_{y_{2t}} + \upsilon_{y_{2t}} \end{eqnarray*}\]

where \(\mu_{i,t}\) is the random walk component representing the trend in variable \({\bf{y}}_t\), and \(\upsilon_{i,t}\) is the stationary component. We are then able to multiply \(y_{1,t}\) by \(\beta_1\) and \(y_{2,t}\) by \(\beta_2\) to yield

\[\begin{eqnarray*} \beta_1 y_{1,t}=\beta_1 \mu_{y_{1t}} + \beta_1 \upsilon_{y_{1t}} \\ \beta_2 y_{2,t}=\beta_2 \mu_{y_{2t}} + \beta_2 \upsilon_{y_{2t}} \end{eqnarray*}\]

If these variables are \(CI(1,1)\), then a linear combination of these variables yields;

\[\begin{eqnarray*} \beta_{1}y_{1,t}+\beta_{2} y_{2,t}& = &\beta_{1}(\mu_{y_{1,t}}+\upsilon_{y_{1,t}})+\beta_{2}(\mu_{y_{2,t}}+\upsilon_{y_{2,t}})\\ & =& (\beta_{1}\mu_{y_{1,t}}+\beta_{2}\mu_{y_{2,t}})+(\beta_{1}\upsilon_{y_{1,t}}+\beta_{2}\upsilon_{y_{2,t}}) \end{eqnarray*}\]

If the errors, \((\beta_{1}\upsilon_{y_{1,t}}+\beta_{2}\upsilon_{y_{2,t}})\) are stationary, and the linear combination of the variables \(\beta_{1}y_{1,t}+\beta_{2}y_{2,t}\) are also stationary; then the stochastic trends \((\beta_{1}\mu_{y_{1,t}}+\beta_{2}\mu_{y_{2,t}})\) would need to vanish. Hence, for \(y_{1,t}\) and \(y_{1,t}\) to be \(CI(1,1)\),

\[\begin{eqnarray*} \mu_{y_{1,t}} = \frac{-\beta_{2}\mu_{y_{2,t}}}{\beta_{1}} \end{eqnarray*}\]

This implies that they must have the same stochastic trend up to the scalar \({-\beta_{2}}/{\beta_{1}}\). The essential insight that is provided by Stock and Watson (1988) is that the parameters in the cointegrating vector must purge the trend from the linear combination of the variables. Such a cointegrating vector is unique up to the scalar \({-\beta_{2}}/{\beta_{1}}\).

This analysis could also be extended to the \(n\) variable case, with the aid of the regression

\[\begin{equation*} {\bf{y}}_{t}=\mu_{t} + {\bf{u}}_{t} \end{equation*}\]

where \({\bf{y}}_{t}\) is a vector of \(\{y_{1_{t}},y_{2_{t}}, \ldots, y_{n_{t}}\}\) variables that are integrated of the first-order, and \(\mu_{t}\) is a vector of stochastic trends \(\{\mu_{1_{t}},\mu_{2_{t}}, \ldots, \mu_{n_{t}}\}\), and \({\bf{u}}_{t}\) is an \(n \times 1\) vector of irregular components.

In addition, if we can then express one trend as a linear combination of other trends it means that there exists a vector \(\beta\) such that

\[\begin{equation*} \beta_{1}\mu_{1_{t}}+\beta_{2}\mu_{2_{t}}+ \ldots +\beta_{n}\mu_{n_{t}}=0 \end{equation*}\]

where we once again multiply though by \(\beta\) to get \(\beta {\bf{y}}_{t}=\beta \mu_{t} + \beta {\bf{u}}_{t}\). Since the linear combination of all \(\beta\mu_{t}=0\), we are left with \(\beta {\bf{y}}_{t} = \beta {\bf{u}}_{t}\), where both sides are stationary.

A cointegration model may make use of the term equilibrium which refers to the existence of a long-run relationship. This type of equilibrium can only occur if there is a common stochastic trend amongst the variables (i.e. two variables share a common equilibrium path). These variables will periodically move away from the equilibrium path, but the effect of this will not be permanent (i.e. the errors are stationary). Over time, the variables return towards the equilibrium path and the residuals in the cointegrated model are then described as equilibrium errors.

2 Error Correction Models

Where a cointegrating relationship may be used to define an equilibrium relationship, the time paths of cointegrated variables are influenced by the extent of any deviation from the long run equilibrium. If the variables are cointegrated then they will return towards the equilibrium values, although they need not actually attain these values at a particular point in time. What is essential is that there is a force that will draw the variables towards the equilibirum values, so that the deviation from equilibrium (following a shock) is not permanent.

The deviation of a cointegrated variable from the path of equilibrium may be modelled with the aid of an error correction representation. Engle and Granger formalised the connection between this dynamic response to the errors and co-integration in the Engle-Granger representation theorem, which states that two variables are cointegrated if, and only if, there exists an error correction mechanism for one set of variables.

2.1 Example: Two cointegrated share prices

By way of example, consider two share prices, \(P_1\) and \(P_2\), which are that cointegrated. If we then assume that the gap between the prices during the current period of time is relatively large, when compared to the long-run equilibrium values (i.e. we are currently not at a point of equilibrium). In this case the low priced share \(P_2\) must rise relative to the high priced share \(P_1\). This may be accomplished by either an increase in \(P_2\) or a decrease in \(P_1\), an increase in \(P_1\) with a larger decrease in \(P_2\), or a decrease in \(P_1\) with a smaller decrease in \(P_2\).

The regression that describes the relative movements in the two share prices could then take the form,

\[\begin{eqnarray} P_{1,t}=\beta_{1}P_{2,t}+u_{t} \tag{2.1} \end{eqnarray}\]

If the errors, \(u_t\), are stationary then they may be described by the autoregression,

\[\begin{eqnarray} u_{t}=\phi_{1} u_{t-1}+\varepsilon_{t} \;\;\; \mathsf{with } \; |\phi_{1}| < 1 \tag{2.2} \end{eqnarray}\]

Hence after writing (2.1) as, \(u_{t} = P_{1,t}-\beta_{1}P_{2,t}\), and inserting it into (2.2), we have

\[\begin{eqnarray*} P_{1,t}- \beta_{1}P_{2,t} & = &\phi_{1}(P_{1,t-1}- \beta_{1}P_{2,t-1})+\varepsilon_{t} \\ P_{1,t} & = &\beta_{1}P_{2,t}+\phi_{1}(P_{1,t-1}- \beta_{1}P_{2,t-1})+\varepsilon_{t} \end{eqnarray*}\]

Adding and subtracting \(P_{1,t-1}\) and \(P_{2,t-1}\) on both sides, provides us with,

\[\begin{eqnarray*} \Delta P_{1,t} &=& -(1-\phi_{1})(P_{1,t-1}- \beta_{1}P_{2,t-1})+ (\beta_{1}\Delta P_{2,t} + \varepsilon_{1,t}) \\ &=&\alpha(P_{1,t-1}- \beta_{1}P_{2,t-1})+\varepsilon_{1,t} \end{eqnarray*}\]

where \(\alpha=-(1-\phi_{1})\), while \(\Delta P_{2,t}\) is stationary and \(\varepsilon_{1,t} = (\beta_{1}\Delta P_{2,t} + \varepsilon_{1,t})\). Note that large persistence in the autoregressive error would imply a slow speed of adjustment. This representation is termed the error correction mechanism (ECM), which describes the manner in which the variables return to equilibria. It could also be used to illustrate how the variables are influenced by deviations from equilibrium.

If we assume that both share prices are \(CI(1,1)\) then we could write the respective error correction mechanisms as,

\[\begin{eqnarray} \nonumber \Delta P_{1} = \alpha_{1}(P_{2,{t-1}}-\beta_1 P_{1,{t-1}}) + \varepsilon_{1,_{t}}\\ \Delta P_{2} = \alpha_{2}(P_{2,{t-1}}-\beta_1 P_{1,{t-1}}) + \varepsilon_{2,_{t}} \tag{2.3} \end{eqnarray}\]

Note that the long-term equilibrium would be described by \((P_2 - \beta_1 P_1)\), which is stationary when variables are \(CI(1,1)\). If \(P_1\) is \(I(1)\) then \(\Delta P_1\) is stationary and when the variables are \(CI(1,1)\), then either (or both) of the terms \(\varepsilon_{1,_{t}}\) and \(\varepsilon_{2,_{t}}\) would need to be stationary.

As it currently stands, the expressions in (2.3) are only internally consistent if the two variables, \(P_{1,t}\) and \(P_{2,t}\) are cointegrated. To see this, we firstly note that the left-hand side and the error terms are assumed to be stationary. Therefore, if the \((P_{2,{t-1}}-\beta P_{1,{t-1}})\) term is non-stationary, then the right-hand side of equation (2.3) is non-stationary, which is internally inconsistent with the left-hand side. For the model to be internally consistent, \(P_{1,t}\) and \(P_{2,t}\) must be cointegrated.

Hence the two share prices would need to be cointegrated with the vector \((1, - \beta_1)^{\prime}\), when they are of the order \(CI(1,1)\). The parameters \(\alpha_1\) and \(\alpha_2\) could then be used to describe the speed of adjustment, which provides information on how changes to share prices react to past deviations from the equilibrium path. Small values of \(\alpha_i\) would then imply a relatively unresponsive relationship, where it would take a long time to return to equilibrium.

To ascertain how the long-run equilibrium maintained note that if \(P_{2,t-1}>\beta_1 P_{1,t-1},\) then the combined term \((P_{2,{t-1}}-\beta_1 P_{1,{t-1}})\) is positive. Now since, \(\Delta P_{2,t}\), depends negatively on the combined term through \((1-\phi_{1})\). As \(|\phi_{1}|<1\) by assumption, the total effect will be negative. In other words, if \(P_{2,t}\) is above its long-run equilibrium level relative to \(P_{1,t}\), the error correction mechanism will drive down \(P_{2,t}\) until the long-run equilibrium is restored. Conversely, if \(P_{1,t}\) is higher than its long-run equilibrium level relative to \(P_{2,t}\), then the error correction mechanism will drive up \(P_{2,t}\) until the long-run equilibrium is restored.

2.2 Vector Error Correction Representation

The above example of an error correction model is rather restrictive, in that we have specified an equilibrium relationship where the full adjustment in the respective variables would need to be attributed to a change that is described by the coefficient \(\beta\). The general form of the error correction mechanism allows for slightly richer dynamics interactions between the variables, which can be specified as,

\[\begin{eqnarray} \nonumber \Delta y_{1,t}&=& \gamma_{0} + \alpha_{1} \left[ y_{1,t-1}-\beta_1 y_{2,t-1} \right] + \sum_{i=1}^{K} \zeta_{1,i} \Delta y_{1,t-1} + \sum_{j=1}^{L} \zeta_{2,j} \Delta y_{2,t-1} + \varepsilon_{y_1,{t}} \\ &&\tag{2.4} \\ \nonumber \Delta y_{2,t}&=& \eta_{0} + \alpha_{2} \left[ y_{1,t-1}-\beta_1 y_{2,t-1} \right] + \sum_{i=1}^{K} \xi_{1,i} \Delta y_{2,t-1} + \sum_{j=1}^{L} \xi_{2,j} \Delta y_{1,t-1} + \varepsilon_{y_2,{t}} \\ && \tag{2.5} \end{eqnarray}\]

This representation is termed the vector error correction model (VECM), where it would be possible to show that if both \(\alpha_1\) and \(\alpha_2\) are equal to zero, then there is: no equilibrium relationship, no error-correction, and no cointegration.

Note that if \(y_{1,t}\) and \(y_{2,t}\) are \(CI(1,1)\) then all terms in either (2.4) or (2.5) are \(I(0)\) and statistical inference that makes use of standard \(t\) and \(F\) statistics would be applicable.

2.3 Autoregressive distributed lag model

An alternative one could also employ an indirect specification of an error correction mechanism, which takes the form of the autoregressive distributed lag (ARDL) model. A simple example of such an ARDL model would be,

\[\begin{eqnarray} y_{1,t}=\phi_{1} y_{1,t-1}+\phi_{2} y_{2,t} + \varepsilon_{t} \tag{2.6} \end{eqnarray}\]

As such, the ARDL looks very similar to the autoregressive models studied earlier, except for the inclusion of the \(y_{2,t}\) term on the right-hand side. To see how the ARDL relates to the error correction model, we subtract \(y_{1,t-1}\) and \(\phi_{2}y_{2,t-1}\) from both sides of the equality sign in equation (2.6), to provide

\[\begin{eqnarray*} \Delta y_{1,t}&=&\phi_{2}\Delta y_{2,t}- (1-\phi_{1})y_{1,t-1}+\phi_{2}y_{2,t-1}+\varepsilon_{t} \\ &=& \phi_{2}\Delta y_{2,t}-(1-\phi_{1})(y_{1,t-1}-\frac{\phi_{2}}{1-\phi_{1}}y_{2,t-1})+\varepsilon_{t} \end{eqnarray*}\]

We could then denote the long-term steady-state of the variables as, \(\bar{y}_{1}\) and \(\bar{y}_{2}\), where \(y_{1,t}=y_{1,t-1}=\bar{y}_{1}\). This would allow for use to describe the long-run relationships as,

\[\begin{eqnarray*} \bar{y}_{1}=\phi_{1}\bar{y}_{1}+\phi_{2}\bar{y}_{2} \end{eqnarray*}\]

which could be rearranged as,

\[\begin{eqnarray*} \bar{y}_{1}=\frac{\phi_{2}}{1-\phi_{1}}\bar{y}_{2} \end{eqnarray*}\]

Hence the relationship between \(y_1\) and \(y_2\) is described by \(\frac{\phi_{2}}{1-\phi_{1}}\), which is equivalent to the ECM that we derived earlier (although it is more general). As long as \(|\theta_{1}|<1\), it implies that \((1-\phi_{1})>0\), and the ARDL model corrects in the same manner as the error correction model that was described previously. The choice of whether to specify an error correction model, or an ARDL model where the equilibrium property is implicitly defined, is mostly a matter of convenience and interpretation.

2.4 Summary

Before we proceed with a discussion that relates to the estimation of these models, we should summarise a few important concepts. Firstly, we should note that cointegration refers to linear combinations of non-stationary variables. It is possible that there are nonlinear cointegrating relationships, but we currently don’t know how to test for this. However, we can model regime-switching cointegrating relationships using the methods discussed in Balke and Fomby (1997), as well as fractionally integrated cointegrating relationships, as discussed in Johansen and Nielsen (2012). Cointegrating vectors are unique up to a scalar, for every \(\beta_1 , \beta_2, \ldots\) there exists \(\lambda \beta_1, \lambda \beta_2, \ldots\), where \(\lambda\) is the scalar.

It is also important to reiterate that all of the variables must be integrated of the same order, where it is usually the case that a set of \(I(d)\) variables are not cointegrated. In addition, we should note that when two variables are integrated of different orders they cannot be cointegrated, however, it is possible to have multi-cointegration (i.e. \(b=2\)).

In addition, if \({\bf{y}}_{t}\) has \(n\) nonstationary components then there could be \(n-1\) linear cointegrating vectors. Therefore, if \({\bf{y}}_{t}\) has two variables then there can only be one cointegrating vector, and the number of cointegrating vectors is termed the cointegrating rank.

There is also a special case where if you have three variables and two are \(I(2)\) and one is \(I(1)\), it may be possible (but unlikely) that the two \(I(2)\) variables are \(CI(2,1)\). The remaining \(I(1)\) variable may then share a common stochastic trend with other two variables, whereupon the system will be stationary. Obviously, the chance of this occurring in practice is very small.

3 Engle-Granger procedure

To implement the first step of the Engle-Granger procedure, we need to test the variables for their order of integration. Therefore, if we think that \(y_{1,t}\) and \(y_{2,t}\) are possibly cointegrated we would need to determine integration order with the use of an Augmented Dickey Fuller or similar testing procedure. If both of these variables are stationary then these variables could not be cointegrated. In addition, if the variables are of different orders then there can also be no cointegration. However, where you have three or more variables, at least two variables must be of the same order. For example, a group could be \(C(2,1)\) and this group may be cointegrated with a further set of \(I(1)\) variables.

If both of the variables are integrated of the same order, then we can proceed to estimate the long-run relationship between the two variables with the regression,

\[\begin{eqnarray} y_{1,t} = \beta_0 + \beta_1 y_{2,t} + u_t & \mathsf{or }& y_{1,t} = \beta_1 y_{2,t} + u_t \tag{3.1} \end{eqnarray}\]

If the variables are cointegrated the use of OLS would yield a super consistent estimate of \(\beta_0\) and \(\beta_1\). Note that in this case, the variables are cointegrated when \(\hat{u}_t\) is stationary. To test for stationarity we could make use of an Augmented Dickey Fuller test once again, which would be specified as,

\[\begin{eqnarray*} \Delta \hat{u}_t = \pi_1 \hat{u}_{t-1} + \sum_{j=1}^k \gamma_{j} \Delta \hat{u}_{t-j} + \varepsilon_t \end{eqnarray*}\]

where \(\pi_1 = (1-\phi)\), and \(\phi\) is the autoregressive coefficient that would approximate a unitary value when \(\hat{u}_t\) follows a random walk. If we are unable to reject \(\pi_1 = 0\), then the residuals are not stationary and the variables will not be cointegrated. However, if we are able to reject \(\pi_1 = 0\) then the residuals will be stationary and the variables are cointegrated. It is important to note that the calculated test statistic should be compared to the critical values that are included in Engle and Granger (1987) or Engle and Yoo (1987), as opposed to those that are contained in Dickey and Fuller (1981). In addition, if the OLS regression includes a constant then it would not be advisable to include a constant in the ADF regression (and vice versa).

After it has been established that the residuals in the long-run relationship are stationary, we can then estimate the error-correction model. In this case we can make use of the residuals \(u_t\) to represent \([ y_{1,t-1}-\beta_1 y_{2,t-1} ]\) in (3.1). Hence,

\[\begin{eqnarray*} \Delta y_{1,t}&=& \gamma_{0} + \alpha_{1} \left[ y_{1,t-1}-\beta_1 y_{2,t-1} \right] + \ldots \\ && \sum_{i=1}^{K} \zeta_{1,i} \Delta y_{1,t-1} + \sum_{j=1}^{L} \zeta_{2,j} \Delta y_{2,t-1} + \varepsilon_{y_1,{t}} \\ \Delta y_{2,t} &=& \eta_{0} + \alpha_{2} \left[ y_{1,t-1}-\beta_1 y_{2,t-1} \right] + \ldots \\ && \sum_{i=1}^{K} \xi_{1,i} \Delta y_{2,t-1} + \sum_{j=1}^{L} \xi_{2,j} \Delta y_{1,t-1} + \varepsilon_{y_2,{t}} \end{eqnarray*}\]

where \(\beta_1\) is the cointegrating vector. The speed of adjustment parameters are then given by \(\alpha_1\) and \(\alpha_2\), and they provide us with information about the amount of time that would pass before the variables converge on their respective equilibrium values. The white noise errors are then \(\varepsilon_{y_1,t}\) and \(\varepsilon_{y_2,t}\) and lagged values of the left-hand side variables are also included on the right-hand side to account for persistence in the left-hand side variables.

When the variables are cointegrated, OLS would prove to be an efficient method for parameter estimation and the usual t-statistics and F-statistics would continue to be appropriate.

To conclude the estimation of the model, we assess its adequacy. To do so, we check the residuals \(\varepsilon_{1,t}\) and \(\varepsilon_{2,t}\) to make sure they are white noise. If there is a problem with the amount of persistence then consider increasing the lag length in the error-correction model. Note that the speed of adjustment parameters, \(\alpha_1\) and \(\alpha_2\), describe the dynamics of the system, where large value of \(\alpha_2\) are associated with a large \(\Delta y_{2,t}\). If \(\alpha_2 = 0\) and \(\xi_{2,j}=0\) then \(\Delta y_{1,t}\) can’t Granger-cause \(\Delta y_{2,t}\), while if both \(\alpha_1 = 0\) and \(\alpha_2 = 0\) then there is no cointegration or error correction. In addition, \(\alpha_1\) and \(\alpha_2\) should also not be too big, since they would need to converge on the long-run values over time (i.e. not immediately, or over-correct too drastically).

While the Engle-Granger method is highly intuitive, it has a number of important limitations, which include the need to specify left-hand side variables at the very beginning of the procedure. For example, we would need to specify the long-run equation as one of the following,

\[\begin{eqnarray*} y_{1,t} = \beta_1 y_{2,t} + \upsilon_{y1,t} \\ y_{2,t} = \beta_2 y_{1,t} + \upsilon_{y2,t} \end{eqnarray*}\]

In certain circumstances it may be the case that \(\upsilon_{y1,t}\) is stationary, while \(\upsilon_{1,t}\) is not. In addition, we may also get different results when including a constant in the long-run relationship. These types of problems would create some uncertainty around the results that are ultimately produced.

In addition, the Engle-Granger procedure is not able to identify multiple cointegrating vectors (or their form). Then lastly, the reliance on a two-step procedure is also problematic as the residuals from the first step is used in the second step to obtain estimates for \(\alpha_1\). Hence, any misspecification error in step 1 is carried over to step 2, which would affect the final results (i.e. we may incorrectly including a constant in step 1).

4 Cointegration in a multivariate setting

In most of the preceding situations we were primarily concerned with investigating possible cointegration between two variables, namely \(y_{1,t}\) and \(y_{2,t}\). These concepts, tests, and models could be easily extend to more variables, particularly where each of these variables are \(I(1)\). As we will see it is more convenient to analyse such forms of cointegration in a multivariate setting, where we make use of the vector autoregression (VAR) model we have discussed previously. For example, consider a first-order VAR(1) for the \(n\times1\) vector \({\bf{y}}_{t}=[y_{1,t}, y_{2,t}, \ldots , y_{n,t}]^{\prime}\),

\[\begin{eqnarray} {\bf{y}}_{t}=\mu+\Pi_{1} {\bf{y}}_{t-1}+{\bf{u}}_{t} \tag{4.1} \end{eqnarray}\]

where \(\mu=[\mu_{1} , \mu_{2}, \ldots ,\mu_{n}]^{\prime}\) is a vector of constants, \({\bf{u}}_{t}=[ u_{1,t},u_{2,t}, \ldots , u_{n,t}]^{\prime}\) is a vector of error terms, and \(\Pi_{1}\) is a \((n \times n)\) is a matrix of coefficients. The stability of the VAR model is determined by the eigenvalues of \(\Pi_{1}\) that are obtained by solving the characteristic equation

\[\begin{eqnarray*} | \; \Pi_{1}- I \; |=0 \end{eqnarray*}\]

If all eigenvalues have a modulus that is less than 1, the VAR is stable.

For simplicity, consider the example when \(n=2\), so that equation (4.1) can be written as the bivariate VAR(1),

\[\begin{eqnarray*} \left[ \begin{array} [c]{c}% y_{1,t}\\ y_{2,t}% \end{array} \right] =\left[ \begin{array} [c]{c}% \mu_{1}\\ \mu_{2}% \end{array} \right] +\left[ \begin{array} [c]{cc}% \pi_{1,1} & \pi_{1,2}\\ \pi_{2,1} & \pi_{2,2} \end{array} \right] \left[ \begin{array} [c]{c}% y_{1,t-1}\\ y_{2,t-1}% \end{array} \right] +\left[ \begin{array} [c]{c}% \varepsilon_{1,t}\\ \varepsilon_{2,t}% \end{array} \right] \end{eqnarray*}\]

\[\begin{eqnarray*} y_{1,t} & = & \mu_{1}+\pi_{1,1}y_{1,t-1}+\pi_{1,2}y_{2,t-1}+\varepsilon _{1,t} \\ y_{2,t} & = & \mu_{2}+\pi_{2,1}y_{1,t-1}+\pi_{2,2}y_{2,t-1}+\varepsilon _{2,t} \end{eqnarray*}\]

If this VAR(1) is stable we would then provide inference in the normal way. However, if one or more of the eigenvalues has a modulus equal to or above one, the VAR is unstable as it is in the presence of nonstationarity. To extend the concept of cointegration to multivariate VAR models we firstly consider the specification of the vector error correction model.

4.1 Vector error correction model (VECM)

In the same manner as for the single equation, we can rewrite the VAR(1) model from (4.1) as a vector error correction model (VECM)

\[\begin{eqnarray} \nonumber \Delta {\bf{y}}_{t} & = & \mu+(\Pi_{1}-I){\bf{y}}_{t-1}+{\bf{u}}_{t} \\ & = & \mu+\Pi {\bf{y}}_{t-1}+{\bf{u}}_{t} \;\;\; \mathsf{where } \; \Pi=(\Pi_{1}-I) \tag{4.2} \end{eqnarray}\]

With two variables, we saw that there could exist at most one linear stationary relationship between the variables. With \(n\) variables the number of of linear combinations of the variables in \({\bf{y}}_{t}\) that are stationary will provide us with information on the number of cointegration vectors. There are three possibilities that exist, which are considered below:

If the variables in the VAR(\(p\)) are cointegrated, then the VECM would include long-run cointegration and speed of adjustment parameters. Therefore, we can decompose \(\Pi\) as,

\[\begin{eqnarray*} \Pi=\alpha\beta^{\prime} \end{eqnarray*}\]

where \(\alpha\) and \(\ \beta\) are both dimensions \(n\times r\). We say that the matrix \(\beta\) is a matrix of cointegration parameters, so that the linear combination of \(\beta^{\prime} {\bf{y}}_{t}\) is stationary. In addition, each of the \(r\) rows in \(\beta^{\prime}{\bf{y}}_t\) is a cointegrated (long-run) relation that induces stability.

The matrix \(\alpha\) is contains the speed of adjustment parameters, which account for the time that it takes to move back to equilibrium (i.e. the speed at which the error corrects following a movement away from equilibrium). For instance, under the maintained assumption about cointegration the bivariate case examined above, (4.2) can be written as:

\[\begin{eqnarray*} \left[ \begin{array} [c]{c} \Delta y_{1,t}\\ \Delta y_{2,t} \end{array} \right] =\left[ \begin{array} [c]{c} \mu_{1}\\ \mu_{2} \end{array} \right] +\left[ \begin{array} [c]{c} \alpha_{1}\\ \alpha_{2} \end{array} \right] \Big[\beta_{1}\mathsf{ \ }\beta_{2}\Big] \left[ \begin{array} [c]{c} y_{1,t-1}\\ y_{2,t-1} \end{array} \right] +\left[ \begin{array} [c]{c} u_{1,t}\\ u_{2,t} \end{array} \right] \end{eqnarray*}\]

which could be written as,

\[\begin{eqnarray*} \nonumber \Delta y_{1,t} & =\mu_{1}+\alpha_{1}(\beta_{1}y_{1,t-1}+\beta_{2}y_{2,t-1})+u_{1,t}\\ \Delta y_{2,t} & =\mu_{2}+\alpha_{2}(\beta_{1}y_{1,t-1}+\beta_{2}y_{2,t-1})+u_{2,t} \end{eqnarray*}\]

The cointegration relationship (or combination) \(\beta^{\prime} {\bf{y}}_{t}\) would then be given by,

\[\begin{eqnarray*} \beta^{\prime}{\bf{y}}_{t}=\beta_{1}y_{1,t}+\beta_{2}y_{2,t}\sim I(0) \end{eqnarray*}\]

Hence, the nonstationary variables in the \({\bf{y}}_{t}\) vector are cointegrated if there is a linear combination of these variables that is stable (stationary). Such a linear combination of variables could be related to economic theory and is often referred to as a long-run equilibrium relationship. The intuition is that the variables will drift together along some form of long-run equilibrium path. These variables may of course move away from this path over certain periods of time, but there will always be forces that move the variables back towards the long-run equilibrium path.

So far we have only investigated a situation with two variables, implying a maximum of one cointegration relationship. However, when \(n=3\), we can potentially have \(r=2\) equilibrium relationships. To see this, assume that we have \(n=3\) and \(r=2\). The respective matrices could then be written as,

\[\begin{eqnarray*} \nonumber \left[ \begin{array} [c]{c} \Delta y_{1,t}\\ \Delta y_{2,t}\\ \Delta y_{3,t} \end{array} \right] &=& \left[ \begin{array} [c]{c} \mu_{1}\\ \mu_{2}\\ \mu_{3} \end{array} \right] +\left[ \begin{array} [c]{cc} \alpha_{11} & \alpha_{12}\\ \alpha_{21} & \alpha_{22}\\ \alpha_{31} & \alpha_{32} \end{array} \right] \ldots \\ && \left[ \begin{array} [c]{ccc} \beta_{11} & \beta_{21} & \beta_{31}\\ \beta_{12} & \beta_{22} & \beta_{32} \end{array} \right] \left[ \begin{array} [c]{c} y_{1,t-1}\\ y_{2,t-1}\\ y_{3,t-1} \end{array} \right] +\left[ \begin{array} [c]{c} u_{1,t}\\ u_{2,t}\\ u_{3,t} \end{array} \right] \end{eqnarray*}\]

In this case the the three variables allow for two cointegration relationships between the variables, which we denote for \(\beta_{1}^{\prime} {\bf{y}}_{t-1}\) and \(\beta_{2}^{\prime} {\bf{y}}_{t-1}\). These could be expanded as,

\[\begin{eqnarray*}\nonumber \beta_{1}^{\prime}{\bf{y}}_{t} & =\beta_{11}y_{1,t}+\beta_{21}y_{2,t}+\beta_{31}y_{3,t}\sim I(0)\\ \beta_{2}^{\prime}{\bf{y}}_{t} & =\beta_{12}y_{1,t}+\beta_{22}y_{2,t}+\beta_{32}y_{3,t}\sim I(0) \end{eqnarray*}\]

Hence, we now have two linear relationships, \(\beta_{1}\) and \(\beta_{2}\), between the three variables, \(y_{1}\), \(y_{2}\) and \(y_{3}\) which are able to induce stationarity.

Such a finding for the VAR(1) model could be generalised to a VAR(\(p\)) in the following way,

\[\begin{eqnarray*} \Delta {\bf{y}}_{t}=\mu+\alpha\beta {\bf{y}}_{t-1}+\Gamma_{1}\Delta {\bf{y}}_{t-1}+\Gamma_{2}\Delta {\bf{y}}_{t-2}+ \ldots \\ \nonumber +\Gamma_{p-1}\Delta {\bf{y}}_{t-p-1} + {\bf{u}}_{t} \end{eqnarray*}\]

where we have added \(p\) lags of the vector of variables.

5 Johansen procedure

Testing for cointegration in the multivariate case amounts to determining the rank of \(\Pi\), where we effectively need to determine the number of non-zero eigenvalues in \(\Pi\). Johansen (1988) established a novel method for determining the number of eigenvalues in a maximum likelihood framework. It suggests that one should order the eigenvalues such that \(\hat{\lambda}_{1}>\hat{\lambda}_{2}> \ldots >\hat{\lambda}_{n}\), where \(\hat{\lambda}_{1}\) is the first eigenvalue. To test the null hypothesis that there are at most \(r\) cointegrating vectors would then amount to testing,

\[\begin{eqnarray} \nonumber H_{0}: \hat{\lambda}_{i}=0 \;\; \mathsf{for } \; i=r+1, \ldots ,n \end{eqnarray}\]

where only the first \(r\) eigenvalues are non-zero. For instance, if \(n=2\) and \(r=1\) as in the first example, the first eigenvalue, \(\hat{\lambda}_{1}\), will be non-zero and the second \(\hat{\lambda}_{2}\) will be zero.

In the three variable case, when \(n=3\) and \(r=2\) , the first two eigenvalues \(\{ \hat{\lambda}_{1}, \hat{\lambda} \}\) are non-zero and the third, \(\hat{\lambda}_{3}\) is zero. By adding more variables, this pattern will continue until \(n=r\). Therefore, when \(\Pi\) has rank \(r\times0\) (i.e. \(\Pi=0\)), then there is no long-run relationship, so all the eigenvalues are equal to zero.

To calculate the estimate for the appropriate rank, we will describe two test statistics, which include the trace statistic and the maximum eigenvalue statistic.

The trace statistic specifies the null of hypothesis, \(H_{0}\), for \(r\) cointegration relations as,

\[\begin{eqnarray*} \lambda_{trace}=-T\sum_{i=r+1}^{n}\log(1-\hat{\lambda}_{i}) \;\;\; r=0,1,2, \ldots , n-1 \end{eqnarray*}\]

where the alternative hypothesis is that there are more than \(r\) cointegration relationships.

The maximum eigenvalue statistic for the null hypothesis of at most \(r\) cointegration relationships is then computed as,

\[\begin{eqnarray*} \lambda_{max}=-T\log(1-\hat{\lambda}_{r+1}) \;\;\; r=0,1,2,\ldots, n-1 \end{eqnarray*}\]

Where the alternative hypothesis is that there are \(r+1\) cointegration relationships.

For both tests, the asymptotic distribution is non-standard and depends upon the deterministic components included (constant and trend), just as in the case of the univariate Dickey-Fuller test for unit roots. Tabulated critical values can be found in Johansen (1988) and Osterwald-Lenum (1992). In both cases, the calculated test statistics must be greater than tables to reject null hypothesis.

6 Conclusion

When making use of nonstationary variables in a traditional time series model, we may generate spurious results, unless the variables cointegrated. In this case, cointegration would imply that the variables share a common trend, which describes the long-run relationship between variables. There are many examples of cases where such relationships would arise in economic case studies.

To determine if the variables share one or more cointegrating relationships we would need to firstly determine whether the variables are stationary or nonstationary (with the aid of unit root testing). Thereafter, if the variables are stationary, we could proceed to estimate the dynamic model in its stationary form. However, in those cases where the variables are nonstationary, we could proceed to test for cointegration with either the Engel-Granger, Johansen, or Bounds test procedure. Since cointegration is inherently a system property, a multivariate approach is usually preferred.

If we are then able to reject the null hypothesis of no-cointegration, we can proceed to estimate an equilibrium correction model with the aid of a single equation model (i.e. ECM or ARDL) or a multivariate VECM model. Of course, if the null of no-cointegration cannot be rejected, we would need to proceed to estimate the model in first differences.

7 References

Balke, Nathan S., and Thomas B. Fomby. 1997. “Threshold Cointegration.” International Economic Review 38(3): 627–45.

Dickey, D. A., and W. A. Fuller. 1981. “Likelihood Ratio Statistics for Autoregressive Time Series with a Unit Root.” Econometrica 49: 1057–72.

Engle, Robert F., and Clive W. J. Granger. 1987. “Co-Integration and Error Correction: Representation, Estimation, and Testing.” Econometrica 55 (2): 251–76.

Engle, Robert F., and S. Yoo. 1987. “Forecasting and Testing in Cointegrated Systems.” Journal of Econometrics 35: 143–59.

Johansen, Søren. 1988. “Statistical Analysis of Cointegration Vectors.” Journal of Economic Dynamics and Control 12 (213): 231–54.

Johansen, Søren, and Morten Nielsen. 2012. “Likelihood Inference for a Fractionally Cointegrated Vector Autoregressive Model.” Econometrica 80: 2667–2732.

Osterwald-Lenum, Michael. 1992. “A Note with Quantiles of the Asymptotic Distribution of the Maximum Likelihood Cointegration Rank Test Statistics.” Oxford Bulletin of Economics and Statistics 54 (3): 461–72.

Sims, Christopher A. 1980. “Comparison of Interwar and Postwar Business Cycles.” American Economic Review 70 (2): 250–57.

Stock, James H., and Mark W. Watson. 1988. “Testing for Common Trends.” Journal of American Statistical Association 83 (404): 1097–1107.

In his acceptance speak, Clive Granger noted that while he had been puzzling over the problem of integrated variables, David Hendry suggested that the difference between integrated series might be stationary. Granger did not believe such a postulate and while trying to prove that he was wrong, he showed that Hendry was indeed correct, and as a result the concept cointegration was born.↩

Cointegration

by Kevin Kotzé