Many economic and financial time series exhibit some form of trending behaviour. Typical examples in economics include measures of output and employment, while in finance, examples consist of asset prices, dividends and financial market indices. The presence of deterministic or stochastic trends may induce nonstationary behaviour in a variable, which has important consequences for the construction and estimation of time series models. For example, a simple plot of the FTSE/JSE stock market index would suggest that the time series is not stationary as the mean would appear to depend on time. If either the data or the models are not conditioned to account for this phenomena, standard classical regression techniques (such as the use of ordinary least squares) would be inappropriate.

Many early approaches to contain the effects of trends in macroeconomic data would augment the specification of the model with a deterministic time trend. However, Nelson and Plosser (1982), show that this strategy could represent a misspecification of the dynamics of the model and argue that accounting for the stochastic trend in many macroeconomic variables would be more appropriate. As we will see, it is important to distinguish between those variables that may contain a deterministic or stochastic trend, as the transformations that are required to induce stationarity differ from one another.

The identification of a unit root process has also attracted much interest in the empirical literature, as a random walk process may be regarded as a prototype for various economic and financial hypotheses (i.e. it can be used to test the efficient market hypothesis or exchange rate overshooting). In addition, the use of Bayesian estimation techniques has also allowed for many interesting estimation strategies that could be used to describe the long-run behaviour of variables.

1 An intuitive example

Early studies that consider the effects of incorporating variables that contain a stochastic trend in a regression model include the work of Yule (1926), which considers the possible relationship between mortality and marriage. The data for this study is sampled at an annual frequency for England & Wales over the period 1866 to 1911 and is displayed in Figure 1. The trends in these variables would suggest that both mortality rates and the number of marriages have decreased over time. After regressing the data for marriages on mortality, we are able to produce the results in Table 1, where we note that the regressors have extremely large \(t\)-values. In addition, the joint explanatory power of the regression is relatively high, as the coefficient of determination is 0.9. This would suggest that there is a strong relationship between these two variables.1

However, after taking the first difference of the variables, which may be used to describe the change in rate of mortality and the total number of marriages, the results from the regression would suggest that there is no relationship between these variables. Table 1, 2 includes the results of this regression, where we note that the measures for the significance of the regressor and the coefficient of determination are extremely small.2

Figure 1: Mortality and Marriage

Dependent Variable: Mortality
Coefficient Std. Error t-value prob
constant -13.88 1.57 -8.82 0
marriage 0.04 0 20.51 0
R2 0.91

Table 1: Regression results for mortality and marriage - Levels

Dependent Variable: \(\Delta\) Mortality
Coefficient Std. Error t-value prob
constant -0.133 0.21 -0.63 0.531
\(\Delta\) marriage 0.011 0.043 0.27 0.788
R2 0.001

Table 2: Regression results for mortality and marriage - First difference

In what follows we describe the differences that may exist in variables that have either a deterministic or a stochastic trend. Thereafter, we consider some of the consequences for time series analysis if these features of the data are ignored or misinterpreted. We also consider the use of autocorrelation functions that are used to describe the degree of persistence, before we consider more formal tests for the presence of a unit root. The tests that are described in this chapter are by no means exhaustive, and we refer the reader to Perron (2006) and Haldrup and Jansen (2006) for surveys.

2 Deterministic trend

As was noted in the introduction, many time series variables contain a trend, which may be either deterministic or stochastic. Hence, if we are to ignore the effect of a seasonal component, the variable \(y_t\) is comprised of the following dynamic components,

\[\begin{eqnarray} \nonumber y_t = \text{trend} + \text{stationary component} + \text{irregular} \end{eqnarray}\]

When the trend is deterministic, we would know the value of this component at each and every point in time with absolute certainty. To remove the deterministic trend from such a variable we would need to regress \(y_t\) on time, where \(t = \{1, 2, 3, \dots, T \}\). This type of regression model could be structured as,

\[\begin{eqnarray} \nonumber y_t = \alpha t + \varepsilon_t \end{eqnarray}\]

Note that the residuals in this case, \(\varepsilon_t\), would be free of the deterministic component and could be used for further analysis. To show that a variable with a deterministic trend is non-stationary, we note that a stationary univariate time series process could be written as the moving average,

\[\begin{eqnarray} \nonumber y_{t}=\varepsilon_{t}+\theta_{1}\varepsilon_{t-1}+\theta_{2} \varepsilon_{t-2}+\theta_{3}\varepsilon_{t-3}+ \ldots \end{eqnarray}\]

where \(\varepsilon_{t} \sim \mathsf{i.i.d.} \mathcal{N}\left( 0,\sigma^{2}\right)\) and \(t=\{1,2,\ldots,T\}\). As noted previously, such a variable has a constant mean and variance, which do not depend on time. After introducing a deterministic time trend, the variable \(y_t\) may be expressed as

\[\begin{eqnarray} y_{t}=\alpha t+\theta(L)\varepsilon_{t} \tag{2.1} \end{eqnarray}\]

where the lag polynomial takes the form, \(\theta(L)=1+\theta_{1}L\) \(+\theta_{2}L^{2}+\theta_{3}L^{3}+ \ldots\), and the deterministic trend is simply the time index, \(t\), with a slope parameter, \(\alpha\). Since the expected value of all the white noise errors is zero, the expected mean of the variable would be,

\[\begin{eqnarray} \nonumber \mathbb{E}\left[ y_{t}\right] =\alpha t \end{eqnarray}\]

which would clearly depend on time. However, if we remove the expected time-varying mean from \(y_{t}\), then we are able to show that deviations from the expected mean are stationary

\[\begin{eqnarray*} y_{t}-\mathbb{E}\left[ y_{t}\right] & =& \alpha t+\theta(L)\varepsilon_{t}-\left( \alpha t\right) \\ & =& \theta(L)\varepsilon_{t} \end{eqnarray*}\]

Therefore, this time series variable includes a stationary component and a deterministic trend. We could show that this variable would return to a point on the deterministic trend after a stochastic shock (i.e. an irregular innovation). For this reason we call these variables trend-stationary (TS).

In addition to linear trends, an economic process may include a nonlinear trend. For example, we may wish to include a quadratic polynomial for the trend to describe a variable that characterises increasing returns to scale. A model that takes various orders of nonlinear deterministic trends could then take the form,

\[\begin{eqnarray} \nonumber y_t = \mu + \alpha_1 t + \alpha_2 t^2 + \alpha_3 t^3+ \ldots + \alpha_n t^n + \varepsilon_t \end{eqnarray}\]

To test for the inclusion of these nonlinear trends we would usually estimate a selection of different models and then compare the goodness-of-fit with the aid of various information criteria.

3 Stochastic trend

The counterpart to a deterministic trend is a stochastic trend, and as we will see below, the time series properties of a variable that has a deterministic trend are very different to those that have a stochastic trend. There are many examples of economic variables that have stochastic trends, where the values of variables are permanently affected by a shock (or innovation).3

3.1 Random walk

The simplest model of a variable with a stochastic trend is the random walk, which depends on past values of itself and Gaussian white noise errors,

\[\begin{eqnarray} y_{t}=y_{t-1}+\varepsilon_{t} \;\;\; \text{where } \; \varepsilon_{t}\sim \mathsf{i.i.d.} \mathcal{N}\left(0,\sigma^{2}\right) \tag{3.1} \end{eqnarray}\]

These random walk processes have a number of interesting features, where the best forecast of \(y_{t+1}\) at time \(t\), is given by

\[\begin{eqnarray}\nonumber \mathbb{E}\left[y_{t+1}|\;\;y_{t}\right] =y_{t} \end{eqnarray}\]

The mean square error (MSE), which describes the forecast error variance, of this process would grow with the forecast horizon,

\[\begin{eqnarray}\nonumber \acute{\sigma}_{y}\left(h\right)=\mathsf{var}\left( y_{t+1}-\mathbb{E}\left[ y_{t+1}| y_{t}\right] \right) =\sigma^{2}h \end{eqnarray}\]

Note that in this instance, the forecasting horizon may be used to denote the progression of time, where the difference between a two and a one step-ahead forecast is one period of time. With the aid of this expression, we would suggest that the variance depends on time, which would imply that it is nonstationary. This may be confirmed with the aid of a recursive substitution exercise.

For the random walk model, \(y_{t}=y_{t-1}+\varepsilon_{t}\), we may use substitute recursive lag values of \(y_t\) to describe the evolution of the process,

\[\begin{eqnarray}\nonumber y_{t} & =& y_{t-1}+\varepsilon_{t}\\ \nonumber & =& y_{t-2}+\varepsilon_{t-1}+\varepsilon_{t}\\ \nonumber & =& y_{t-3}+\varepsilon_{t-2}+\varepsilon_{t-1}+\varepsilon_{t}\\ \nonumber & \vdots & \\ \nonumber y_{t} & =& \overset{t-1}{\underset{j=0}{\sum}}\varepsilon_{t-j}+y_{0} \end{eqnarray}\]

Therefore each shock, \(\varepsilon_{t-j}\), will influence subsequent values of \(y_{t}\). This would imply that a shock to a random walk has a permanent effect on the time series variable. Alternatively, we may infer that these variables have infinite memory. If we assume that \(y_{0}\) is equal to zero, without any loss of generality, we can write the random walk as,

\[\begin{eqnarray} \nonumber y_{t}=\overset{t-1}{\underset{j=0}{\sum}}\varepsilon_{t-j} \end{eqnarray}\]

This would allow us to write the mean and variance of the random walk as

\[\begin{eqnarray}\nonumber \mathbb{E}\left[y_{t}\right]=0 \;\;\; \text{and } \;\; \mathsf{var}\left( y_{t}\right) =\sigma^{2}t \end{eqnarray}\]

The covariance, \(\gamma_{t-j}\), between \(y_t\) and \(y_{t-j}\) with \(y_0=0\), would then be

\[\begin{eqnarray} \nonumber \mathbb{E}\big[(y_t - y_0)(y_{t-j}-y_0)\big] & = & \mathbb{E} \big[(\varepsilon_t + \varepsilon_{t-1} + \ldots + \varepsilon_1) \ldots \\ \nonumber & & (\varepsilon_{t-j} + \varepsilon_{t-j-1} + \ldots + \varepsilon_{1})\big]\\ \nonumber & = & \mathbb{E}\big[(\varepsilon_{t-j})^2 + (\varepsilon_{t-j-1})^2 + \ldots + (\varepsilon_1)^2\big]\\ \nonumber & = & (t-j)\sigma^2 \end{eqnarray}\]

which also depends on time. Hence, since the variance and covariance of the process depend on time, the random walk is certainly nonstationary. With such a process, the effect of a change in the error term in \(t-j\) will continue to effect \(y_{t}\), and the roots of the linear difference equation would contain a unitary element, which infers that such a process is a unit root.

These processes are also termed difference-stationary (DS), as the first difference of the random walk would yield,

\[\begin{eqnarray}\nonumber y_{t} & =&y_{t-1} + \varepsilon_{t}\\ \nonumber \Delta\ y_{t} & =&\varepsilon_{t} \end{eqnarray}\]

where \(\Delta y_{t}\) clearly is stationary as the expected mean and variance of the white noise error are stationary. If a variable, \(y_{t}\), could be made stationary after differencing it once, it is integrated of the first order. We use the notation \(I(1)\), to describe such a process. Stationary random variables, such as \(\Delta y_{t}\) are thus integrated of order zero (i.e. \(\Delta y_{t}\) is \(I(0)\)). If it is necessary to take the second difference to achieve stationarity, where \(\Delta^2 y_t\) is \(I(0)\). Such a process is integrated of the second order, where we would use the notation, \(I(2)\).

3.2 Random walk with drift

Adding a constant term to the random walk model in equation (3.1) results in a random walk with drift, which may be expressed as,

\[\begin{eqnarray}\nonumber y_{t}= \mu + y_{t-1}+\varepsilon_{t} \end{eqnarray}\]

Using recursive substitution we can show that the random walk with drift can be written as a function of a deterministic trend and stochastic term,

\[\begin{eqnarray} \nonumber y_{t} & =&\mu+y_{t-1}+\varepsilon_{t}\\ & =&\mu+(y_{t-2}+\mu+\varepsilon_{t-1})+\varepsilon_{t}\nonumber\\ & =&2\mu+(y_{t-3}+\mu+\varepsilon_{t-2})+\varepsilon_{t-1}+\varepsilon_{t}\nonumber\\ & \vdots & \nonumber\\ \ y_{t} & =&\mu \cdot t+\overset{t-1}{\underset{j=0}{\sum}}\varepsilon_{t-j} \tag{3.2} \end{eqnarray}\]

where we again assume that the starting value, \(y_{0}\), is equal to zero. In contrast with the random walk model in equation (3.1), a random walk with drift now also contains a deterministic trend, which results from the inclusion of the constant term, \(\mu\), that influences the slope of the of the deterministic trend. However, in contrast with the trend stationary model, the deviations from the deterministic trend are not stationary. This would imply that each \(\varepsilon_{t-j}\) will influence the value of \(y_{t}\), even after removing the deterministic trend from the series.

4 Implications of nonstationarity

When a time series process is stationary we noted that one is able to recover the infinite moving average form of an autoregressive process. In addition, a shock to such a process would only have a temporary effect on future values of the process, where the expected mean, variance and covariance do not depend on time.

In contrast with these properties, a shock to a nonstationary time series process would have a permanent effect on the future values of the process. In addition, it was also noted that the expected variance, covariance and/or mean, would depend on time. This finding may be substantiated with the results of the impulse response functions in Figure 2, where we have included the results from several autoregressive processes.

Figure 2: Impulse response functions for autoregressive processes

If a time series is trend-stationary, the expected mean value will depend on time, which would imply that it is nonstationary. We can simply remove the effects of this trend by regressing it on time, and as a result the residuals will be stationary. However, after removing a deterministic trend from a random walk with drift, we are left with a random walk process, which will continue to display nonstationary behaviour. Examples of all of these processes are contained in Figure 3, where we have also included the results from a stationary AR(1) processes that have a coefficients of \(\phi= 0.8\) and \(\phi= 0.4\).

Figure 3: Simulated time series processes

Time series variables that have a unit root can be transformed into stationary variables, by taking the first difference of the data. At this point it is worth noting that when taking the first difference of a process that has a deterministic trend, we could introduce a unit root. For example, consider the following trend stationary process,

\[\begin{eqnarray} \nonumber y_t = \alpha t + \varepsilon_t \end{eqnarray}\]

where the lag could be represented by, \(y_{t-1} = \alpha (t-1) + \varepsilon_{t-1}\). The first difference of the above trend stationary process could then take the form,

\[\begin{eqnarray}\nonumber \Delta y_t = \alpha + \varepsilon_t - \varepsilon_{t-1} \end{eqnarray}\]

where the full effect of the previous shock is now incorporated in the solution. Hence, the process is nonstationary, as we have introduced a unit root in the moving average component, as the effects of previous shocks do not dissipate with time.

It is worth noting that this result is very different to the one that would arise when the underlying time series has both a unit root and a deterministic trend. Consider, by way of example, the following process that has both deterministic and stochastic components:

\[\begin{eqnarray} \nonumber y_t = \alpha t + y_{t-1} + \varepsilon_t \end{eqnarray}\]

To make this process stationary, we would need to subtract \(y_{t-1}\) from both sides, which would ensure that we are left with the following time series process,

\[\begin{eqnarray}\nonumber \Delta y_t = \alpha t + \varepsilon_t \end{eqnarray}\]

To then remove the deterministic trend we could regress \(\Delta y_t\) on a variable that has a deterministic trend (i.e. \(x = 1,2,3,\ldots\)), which would provide us with a stationary residual, which in this case would be white noise as there was no other stationary component in the original \(y_t\) process. We are then able to conclude that when a process only has both a deterministic trend and a stochastic trend, then it would be appropriate to take the first difference if we are looking to transform the variable into a stationary process. However, if such a process only has a deterministic trend (and not a unit root) then we would induce alternative form of nonstationarity, through the lag of the MA term when taking the first difference of such a process.

4.1 The autocorrelation function

As has been noted previously, the autocorrelation function may be used to describe the persistence in a process. When we are modelling a stationary AR(1) process, the first correlation coefficient, \(\rho_1\), is equivalent to the coefficient in the AR(1) model, \(\phi\). Similarly, the second correlation coefficient, \(\rho_2\), is equivalent to \(\phi^2\).

The subsequent values of the correlation coefficient, \(\rho_j\), may be derived from the more general expression that considers the value of the covariance function, which is divided by the product of the standard deviation of \(y_t\) and the standard deviation of \(y_{t-j}\). Hence, for a random walk the standard deviation of \(y_t\) may be derived from \(\sqrt{\mathsf{var}(y_t)} = \sqrt{ t\sigma^2}\). In addition, the standard deviation of \(y_{t-j}\) may be similarly derived from, \(\sqrt{\mathsf{var}(y_{t-j})} = \sqrt{(t-j)\sigma^2}\). The autocorrelation coefficient may then be derived from.

\[\begin{eqnarray} \nonumber \rho_s & = & (t-j)\sigma^2 / \sqrt{(t-j)\sigma^2} \sqrt{(t)\sigma^2} \\ \nonumber & = & (t-j) / \sqrt{(t-j)t}\\ \nonumber & = & \sqrt{(t-j) / t} \;\;\;\; < 1 \end{eqnarray}\]

In most cases, where the sample size \(t\) is large, when compared with the value for \(j\), the ratio \((t-j)/t\) is approximately equal to unity. However, it will in all instances be less than 1. This is rather unfortunate as it would infer that we are not able to use the autocorrelation function to distinguish between a process that has a unit root and an AR(1) process that is stationary, but has a high degree of persistence. As such a slowly decaying autocorrelation function indicates that the process has a large characteristic root, where the process may possibly include a true unit root, a deterministic trend, or both of these features. In addition, such a slowly decaying autocorrelation function could also suggest that the process is stationary, but somewhat persistent. Furthermore, as we previously noted that as the value of \(\rho_1\), is equivalent to the \(\hat{\phi}\) coefficient estimate in the AR(1) model, this would imply that the parameter estimate would be biased, as it generate a value that is less than unity.

Figure 4: Autocorrelation functions for simulated processes

Examples of these processes are included in Figure 4, where we note that it would be difficult to use the autocorrelation function to distinguish between the various processes. Formal tests would therefore be required to determine whether the series contains a deterministic or a stochastic trend, both of these features, or neither of them.

5 Tests for unit root

Several tests have been developed to test the order of integration of a time series. In what follows, these tests have been separated into three groups. The first group of tests investigate the null hypothesis of a unit root, against the alternative of stationarity, where the alternative could be stationarity in levels or around a deterministic trend (trend-stationarity). The second group of tests, also consider the null hypothesis that there is a unit root, but allow for structural breaks that may prevail at a given point in time, or where the existence of such a break is unknown. The final group of test statistics investigate the null hypothesis that the process is stationary, against the alternative that the process has a unit root.

5.1 Dickey-Fuller & Augmented Dickey-Fuller test

The most widely used test for a presence of a unit root was originally proposed by Dickey and Fuller (1979), which tests the null hypothesis of whether a series is a random walk against the alternative that it is stationary. To perform this test, we assume that we have an AR(1) process,

\[\begin{eqnarray} \nonumber y_{t}=\phi y_{t-1}+\varepsilon_{t} \;\;\; \text{where } \; \varepsilon_{t}\sim\mathsf{i.i.d.} \mathcal{N}\left(0,\sigma^{2}\right) \end{eqnarray}\]

With the use of this equation, we would want to determine whether \(|\phi|=1\), against the alternative that \(|\phi|<1\). If \(|\phi|=1\) then the above model would represent a random walk process, while if \(|\phi|<1\), the above process is stationary. As noted above, the value of the autocorrelation coefficient, \(\rho_1\) and the estimated value of \(\hat{\phi}\) would be biased towards a value that is less than one, when the underlying data generating process contains a unit root.^[A simulation study that is used to illustrate this property of integrated data is provided in the appendix to this chapter.

Hence, when comparing the results of a near unit root with that of a true unit root, we are primarily interested in determining the degree of certainty with which this coefficient has been estimated. This information may be obtained from the \(t\)-statistic that is associated with the coefficient estimate. Dickey and Fuller (1979) make use of the following test regression that is derived from the AR(1) model,

\[\begin{eqnarray} \nonumber y_{t}&=&\phi y_{t-1}+\varepsilon_{t}\\ \nonumber y_{t} - y_{t-1} &=&\phi y_{t-1} - y_{t-1}+\varepsilon_{t}\\ \nonumber \Delta y_{t}&=& (\phi -1) y_{t-1}+\varepsilon_{t}\\ \Delta y_{t}&=&\pi y_{t-1}+\varepsilon_{t} \tag{5.1} \end{eqnarray}\]

where \(\pi=\hat{\phi}-1\). Thus, using equation (5.1), the test for a unit root would simply involve an investigation into the value of the \(\pi\) parameter, where

\[\begin{eqnarray} \nonumber H_{0}\; :\pi=0 \end{eqnarray}\]

If the null hypothesis is satisfied, this would imply that \(y_{t}\) is integrated of order one, such that \(y_{t}\sim I(1)\). The alternative hypothesis would then take the form,

\[\begin{eqnarray} \nonumber H_{1}\; :\pi<0 \end{eqnarray}\]

which implies that \(y_{t}\) is stationary, such that \(y_{t}\sim I(0)\). This testing procedure would imply that we would be looking to derive the \(t\)-statistic that is associated with the \(\pi\) parameter, which considers whether this parameter is significantly different from zero. Therefore, the test for the null hypothesis, \(H_{0}\) may be expressed as,

\[\begin{eqnarray} \nonumber \hat{t}_{DF}=\frac{\hat {\pi}}{SE\left(\hat{\pi}\right)} =\frac{{\phi}-1}{SE\left({\phi}\right)} \end{eqnarray}\]

where \(SE\) denotes the standard error that is associated with the coefficient estimate. The Dickey-Fuller test is one-sided test, since the relative alternative to the null hypothesis is that \(y_{t}\) is stationary (i.e. \(\phi \ne 1\)).4 Note however, that the asymptotic distribution for this \(t\)-statistic is non-Gaussian, owing to the possible inclusion of bias in the parameter estimate. This would imply that we cannot use the critical values from the standard \(t\)-distribution. The relevant critical values are included in the work of Dickey and Fuller (1979), Dickey and Fuller (1981) and MacKinnon (1991).5

The above test describes the procedure for investigating whether the null hypothesis assumes a unit root, while the alternative hypothesis is that of stationarity. This is appropriate for time series that do not drift systematically in any direction. However, if the time series is either increasing or decreasing over the sample, we would like to include a deterministic trend in the alternative hypothesis.6 This testing procedure would consider the use of the regression model,

\[\begin{eqnarray} \nonumber y_{t}=\beta_1 + \beta_2 t+\phi y_{t-1}+\varepsilon_{t} \end{eqnarray}\]

which can be rewritten as,

\[\begin{eqnarray} \Delta y_{t}=\beta_1 + \beta_2 t+\pi y_{t-1}+\varepsilon_{t} \tag{5.2} \end{eqnarray}\]

where \(\pi=\hat{\phi}-1\), once again. This test for a unit root would still consider whether or not \(\pi=0\). However, in this case the implications are somewhat different, since

\[\begin{eqnarray} \nonumber H_{0}\; :\;\; \pi=0 \end{eqnarray}\]

which implies that \(y_{t}\sim I(1)\) with drift, against the alternative,

\[\begin{eqnarray} \nonumber H_{1}\; :\;\; \pi<0 \end{eqnarray}\]

which implies that \(y_{t}\sim I(0)\), but with a deterministic time trend (i.e. the process is trend-stationary). Note that the properties of the asymptotic distribution of the \(t\)-statistic will change if either a constant or a time trend are included in the estimated regression model. As such, the critical values would differ to those that are provided in the previous case.

Since the alternative hypotheses in both of the above tests do not allow for any persistence in the underlying process, the residuals may be autocorrelated. This lead to the development of the augmented Dickey-Fuller (ADF) test, which is describe in Dickey and Fuller (1981). It controls for residual autocorrelation by including lagged values of \(\Delta y_{t}\), which are allowed to follow a higher order AR(\(p\)) process. To see how this works, consider an AR(2) representation,

\[\begin{eqnarray} \nonumber y_{t}=\beta_1 + \beta_2 t+\phi_{1}y_{t-1}+\phi_{2}y_{t-2}+\varepsilon_{t} \end{eqnarray}\]

which is the same as,

\[\begin{eqnarray} \nonumber y_{t}=\beta_1 + \beta_2 t+(\phi_{1}+\phi_{2})y_{t-1}-\phi_{2}(y_{t-1}-y_{t-2})+\varepsilon_{t} \end{eqnarray}\]

Subtracting \(y_{t-1}\) from both sides gives

\[\begin{eqnarray} \nonumber \Delta y_{t}=\beta_1+\beta_2 t+\pi y_{t-1}+\gamma_{1}\Delta y_{t-1}+\varepsilon_{t} \end{eqnarray}\]

where we have defined \(\pi=\phi_{1}+\phi_{2}-1\) and \(\gamma_{1}=-\phi_{2}\). Hence, if we allowed for \(p\) lags in the autoregressive process, we would have

\[\begin{eqnarray}\nonumber \Delta y_{t}=\beta_1 +\beta_2 t+\pi y_{t-1}+\overset{p}{\underset{j=1}{\sum}}\gamma_{j}\Delta y_{t-j}+\varepsilon_{t} \end{eqnarray}\]

where \(\pi=\sum_{j=1}^{p}\phi_{j}-1\) and \(\gamma_{j}=\sum_{k=j+1}^{p}\phi_{k}\), for \(j=\{1,2,3,\ldots, p\}\). The lags length of \(p\) can be estimated using information criteria, such as BIC or AIC. This allows us to isolate the persistence from other stationary components and this particular test may also be used to isolate the effects of intercepts and linear time trends, where we essentially have three test equations,

\[\begin{eqnarray} \Delta y_t = \pi y_{t-1} + \sum_{i=2}^{p}\gamma_i \Delta y_{t-i+1} + \varepsilon_t \tag{5.3} \\ \Delta y_t = \beta_1 + \pi y_{t-1} + \sum_{i=2}^{p}\gamma_i \Delta y_{t-i+1} + \varepsilon_t \tag{5.4} \\ \Delta y_t = \beta_1 + \beta_2 t + \pi y_{t-1} + \sum_{i=2}^{p}\gamma_i \Delta y_{t-i+1} + \varepsilon_t \tag{5.5} \end{eqnarray}\]

The differences between these regressions concerns the inclusion of \(\beta_1\) and \(\beta_2\), where equation (5.3) refers to a pure random walk model, equation (5.4) includes an intercept or drift term and equation (5.5) includes both a drift and linear time trend. In each case, the parameter of interest is \(\pi\), where if \(\pi =0\) then the process \(y_t\) contains a unit root. Comparing the calculated \(t\)-statistic with the critical values from the Dickey-Fuller tables determines whether or not we should reject the null hypothesis, \(H_{0}: \; \pi =0\).

Although the method is the same regardless of equations, the critical values of the \(t\)-statistics depend on whether the intercept or time trend is included and these critical values will also depend on the sample size.

Dickey and Fuller (1981) include three additional \(F\)-statistics, which we denote \(\varphi_1 , \varphi_2\) and \(\varphi_3\). These statistics are used to test joint hypotheses on the coefficients and may be used to determine whether (5.3), (5.4) or (5.5) are appropriate for the underlying data generating process. This is of importance as these test equations have different critical values.

The null hypothesis for equation (5.4), where \(\pi = \beta_1 = 0\) is tested using \(\varphi_1\), to determine whether the process could possibly include a constant. If we are not able to reject this null hypothesis, then we should make use of equation (5.3).

The null hypothesis for equation (5.5), where \(\pi = \beta_1 = \beta_2 = 0\) is tested using \(\varphi_2\), to determine whether the process could possibly include a constant and a time trend. If we are not able to reject this null hypothesis, then we should make use of equation (5.4). The joint hypothesis \(\pi = \beta_2 = 0\) may also be tested with the aid of \(\varphi_3\), which seeks to determine whether the process has a deterministic time trend.

The values for the \(\varphi_1 , \varphi_2\) and \(\varphi_3\) statistics are constructed as if they were \(F\)-tests,

\[\begin{eqnarray} \nonumber \varphi_i = \frac{[RSS(restricted) - RSS(unrestricted)] / r}{RSS(unrestricted) / (T-k)} \end{eqnarray}\]

where \(RSS(restricted)\) and \(RSS(unrestricted)\) are the sum of the squared residuals for the two variants of the model, and $r $ is number of restrictions, \(T\) is the number of usable observations, and \(k\) is the number of estimated parameters in the unrestricted model.

When comparing the calculated value of \(\varphi_i\) to the values in Dicky-Fuller tables, we need to determine the significance level at which the restriction is binding, to test the null hypothesis that the data is generated by the restricted model. In this case the alternative hypothesis is that the data is generated by the unrestricted model.

If the restriction is not binding \(RSS(restricted)\) should be close to the value for \(RSS(restricted)\), and \(\varphi_i\) will be small. This would imply that large values of \(\varphi_i\) suggest that the restriction is binding, which would result in a rejection of the null hypothesis.

5.2 Implementing an ADF test

When implementing the augmented Dickey-Fuller test, it has been suggested that one should employ a general-to-specific approach, where the first step is to make use of the test equation that includes a constant and time trend. If we find that we are unable to reject the null of a unit root, we would then need to consider the value of \(\varphi_3\). If we are unable to reject the null that the process does not include a time trend then we would need to estimate the subsequent test equation. The full details of this testing procedure are provided in Figure 5.

Figure 5: Augmented Dickey-Fuller: general-to-specific procedure

Note that if we suspect that the process is integrated of the second order, we would need to perform Dickey-Fuller tests on successive differences of \(y_t\). For example, if we want to test whether \(y_t \sim I(2)\) then we would estimate the equation,

\[\begin{eqnarray}\nonumber \Delta^2 y_t = \mu + \xi_1 \Delta y_{t-1} + \varepsilon_t \end{eqnarray}\]

where we cannot reject the null that \(\xi_1 = 0\), we would conclude that \(y_t\) is \(I(2)\).

5.3 Unit roots and structural breaks

In a much cited paper, Perron (1989) showed that the ADF test has little power to discriminate between a stochastic and deterministic trend when the data is subject to structural break. This would imply that in the presence of structural breaks, the various ADF tests are biased towards the non-rejection of a unit root.

For example, consider the moving average representation of an autoregressive model, \(y_t = S_t + 0.5 \sum \varepsilon_t\). This time series has been simulated for 500 observations, where the level shift is described by \(S\), where for the first half of the sample, \(S_{1-249} = 0\) and for the second half, \(S_{250-500} = 10\). This time series is depicted in Figure 6.

Figure 6: Stationary time series plus structural break

If we were to fit a AR(1) model to this process, the coefficient would be biased towards unity, since low values are followed by low values, and high values are followed by high values. Hence, the ADF tests of this misspecified model may suggest that this process follows a random walk plus drift, where it is clearly just a stationary time series with a structural break.

Perron (1989) includes a formal procedure for testing unit roots in the presence of a structural change. The parameter \(\tau\), is used to denote the position of the structural break, which in the above example would occur at position 250. This test could take one of the following three forms,

If we assume that the null considers a one-time jump (pulse) in the level of the unit root process, we could construct the hypothesis

\[\begin{eqnarray} \nonumber H_0 \; : \;\; y_t = \mu + y_{t-1} + \beta_1 D_P + \varepsilon_t \end{eqnarray}\]

where \(D_P = 1\) if \(t = \tau +1\), and 0 otherwise. This specification would describe a random-walk plus drift with the addition of a structural break. Note that as a unit root process has infinite memory, the effect of the structural break at \(\tau +1\) we be present in the remainder of the time series. In addition, since we know that a random-walk plus drift would usually trend upwards or downwards, an appropriate alternative hypothesis would be to consider a (level shift) structural break in the intercept of a stationary process that has a deterministic trend,7

\[\begin{eqnarray} \nonumber H_1 \; : \;\; y_t = \mu + \alpha t + \beta_2 D_L + \varepsilon_t \end{eqnarray}\]

where \(D_L = 1\) if \(t > \tau\), and 0 otherwise.

To consider a permanent change in the drift of a unit root process, we could construct the null hypothesis,

\[\begin{eqnarray} \nonumber H_0\; : \;\; y_t = \mu + y_{t-1} + \beta_1 D_L + \varepsilon_t \end{eqnarray}\]

where \(D_L = 1\) if \(t > \tau\), and 0 otherwise. In this case the infinite memory of random-walk plus drift would ensure that the inclusion of the level shift dummy would provide behaviour that may be characterised by an increase (or decrease) in the drift. As such an appropriate alternative hypothesis would be to consider a trend-stationary process that has a dummy variable that has such a change in slope,

\[\begin{eqnarray} \nonumber H_1 \; : \;\; y_t = \mu + \alpha t + \beta_3 D_T + \varepsilon_t \end{eqnarray}\]

where \(D_T = t-\tau\) if \(t > \tau\), and 0 otherwise.

To consider a change in both the level and drift, we could construct the null hypotheses that makes use of the previous two specifications,

\[\begin{eqnarray}\nonumber H_0 \; : \;\; y_t = \mu + y_{t-1} + \beta_1 D_P + \beta_2 D_L + \varepsilon_t \end{eqnarray}\]

For which the alternative would also make use of the previous two specifications, such that

\[\begin{eqnarray}\nonumber H_1 \; : \;\; y_t = \mu + \alpha t + \beta_2 D_L + \beta_1 D_T + \varepsilon_t \end{eqnarray}\]

After making use of this procedure Perron (1989) found that there was less evidence of unit roots in economic time series, than had been previously reported in the literature. To implement this procedure one could estimate the model for the alternative hypothesis, which may contain the effects of the constant, time trend and structural break. The residuals from this model would then exclude the effects of these terms and could be tested using a simple ADF specification as provided in equation (5.3). Alternatively, if we are testing the null of a one-time jump in a unit root process (against the alternative of level shift in a trend-stationary process) one could combine these steps by estimating the equation,

\[\begin{eqnarray}\nonumber y_t = \mu + \phi_1 y_{t-1} + \alpha t + \beta_2 D_L + \sum_{i=1}^{k} \gamma_i \Delta y_{t-i} + \varepsilon_t \end{eqnarray}\]

Appropriate critical values for this hypothesis test are contained in Perron (1989). While this technique is highly intuitive, Christiano (1992) and a number of other researchers criticised the Perron approach on the basis that it required prior knowledge about the exact date of such a break point, which is not always available. This lead to the development of a number of different methods that treats the break point as unknown (prior to testing). Examples of these procedures are contained in the work of Perron and Vogelsang (1992), Banerjee, Lumsdaine, and Stock (1992), Perron (1997), and Vogelsang and Perron (1998).

While most of these studies provide interesting insights, the technique that is described in Zivot and Andrews (2002) is the most popular procedure for identifying a unit root with an unknown endogenous structural break. This procedure makes use of an optimisation routine that identifies the date of the endogenous structural shift, as that point which gives the least favourable result for the null hypothesis of a random walk with drift.

Therefore, the test statistics are formulated as,

\[\begin{eqnarray} \nonumber \Delta y_t = \mu + \pi y_{t-1} + \alpha t + \beta_2 D_L \hat{\lambda} + \sum_{i=1}^{k} \gamma_i \Delta y_{t-i} + \varepsilon_t \\ \nonumber \Delta y_t = \mu + \pi y_{t-1} + \alpha t + \beta_3 D_T \hat{\lambda} + \sum_{i=1}^{k} \gamma_i \Delta y_{t-i} + \varepsilon_t \\ \nonumber \Delta y_t = \mu + \pi y_{t-1} + \alpha t + \beta_2 D_L \hat{\lambda} + \beta_3 D_T \hat{\lambda} + \sum_{i=1}^{k} \gamma_i \Delta y_{t-i} + \varepsilon_t \end{eqnarray}\]

where \(\lambda\) is the estimated date for the structral break and we are essentially interested in the value for \(\pi=\phi-1\). Critical values for this technique are provided in Zivot and Andrews (2002).

6 Testing the assumption of stationarity

An alternative testing procedure has been proposed by Kiawatkowski et al. (1992), who consider the null hypothesis that a series is stationary. In this case the alternate hypothesis is that the variable is nonstationary (i.e. \(I(1)\)). This procedure is usually referred to as the KPSS test.

To consider the intuitive appeal of this procedure, assume that the data generating process has the form,

\[\begin{eqnarray} y_{t}=\mu+x_{t}+\upsilon_{t} \tag{6.1} \end{eqnarray}\]

where \(\mu\) is a constant, \(\upsilon_{t}\) is a stationary component, and \(x_{t}\) takes the form of a random walk, such that

\[\begin{eqnarray} x_{t}=x_{t-1}+\varepsilon_{t} \;\;\; \text{where }\; \varepsilon_{t}\sim \mathsf{i.i.d.} \mathcal{N} \left(0,\sigma^{2}\right) \tag{6.2} \end{eqnarray}\]

It can then be shown that if the variance of \(\varepsilon\) is zero, then \(x_{t}=x_{0}\) for all \(t\). For instance, if there is no variation in the error term, \(\varepsilon_t\) then \(x_t\) must be constant. This would infer that \(y_{t}\) would be stationary when \(\sigma^{2}=0\), as it would only include constants and the stationary process, \(\upsilon_{t}\). Therefore, the test statistic could be formulated with the null hypothesis that \(y_{t}\) is stationary, where we specify,

\[\begin{eqnarray} \nonumber H_{0}\; :\sigma^{2}=0 \end{eqnarray}\]

which implies that \(x_{t}\) is a constant, against the alternative hypothesis,

\[\begin{eqnarray} \nonumber \ H_{0}\; :\sigma^{2}>0 \end{eqnarray}\]

which implies that \(x_{t}\) varies over time and \(y_{t}\) will be nonstationary. To derive the test statistic we regress \(y_{t}\) on a constant, \(\mu\), to obtain the residuals, which we call \(\hat{\upsilon}_{t}\). Thereafter, we calculate \(S_{t}=\sum_{s=1}^{t}\hat{\upsilon}_{t}\) and \(\hat{\sigma}_{\infty}^{2}\), which relates to the long-run variance of the process. The KPSS test statistic could then be derived with the aid of the following calculation,

\[\begin{eqnarray} KPSS=\frac{1}{T^{2}}\frac{\sum_{t=1}^{T}\hat{S}_{t}^{2}}{\hat{\sigma}_{\infty}^{2}} \tag{6.3} \end{eqnarray}\]

This test statistic may be augmented to allow for additional deterministic components, such as a deterministic trend. Note that any changes to the test equation would require a different set of critical values.

7 Bayesian analysis and unit roots

Up to this point we have adopted the classical statistical perspective, where we estimate the value of \(\phi\) in an autoregressive model. When using these classical techniques, the Dickey-Fuller testing procedure suggested that if the uncertainty with which we estimate of the coefficient value is relatively high, and that coefficient is relatively close to one, then we would be unable to reject the null of a unit root.

When using Bayesian estimation techniques, all the parameters are treated as random variables, so we need to specify the moments for the prior distribution of the parameter. To derive the final posterior estimates we would then multiply the prior distribution by the likelihood function (which would provide a summary of the parameter estimates, conditional on the observed values of the data). Note that in this case, if the distribution for the likelihood function is relatively flat, which would occur when the data suggests that there is a great deal of uncertainty about the parameter estimates, then the posterior would converge on the prior. Similarly, when the likelihood function is relatively narrow and there is a great deal of certainty relating to the estimated parameter estimates, then the posterior would converge on the likelihood function.

Hence, if we suspect that the time series contains a unit root, then we would make use of a prior distribution that has a mean value of unity. If the data strongly suggests that this is not a unit root process then the prosterior would converge on the value that is provided by the likelihood function to provide a parameter estimate that is less than one. Similarly, if the data suggests that there is a great deal of uncertainty about the possible value of the parameter, then the final parameter estimate would be unity (or closely related to unity). In this way the final parameter estimate is not biased. For further use of Bayesian techniques in the presence of a unit root, see Sims (1988) and Sims and Uhlig (1991).

8 Conclusion

Standard regressions that are performed on nonstationary data may provide spurious results. This is important since many time series variables have deterministic or stochastic trends, which would infer that they are nonstationary. If a process returns to its (non-zero) trend value after a shock we say that it has a deterministic trend and is trend-stationary. These variables can be made stationary by removing the deterministic time trend. Time series variables that are integrated of order one, \(I(1)\), can be made stationary by differencing. Such variables are often termed difference-stationary, or we say that they have one unit root. The most widely used unit root test is the Augmented Dickey-Fuller test, which should be employed within a general-to-specific procedure. The Perron test should be used in the presence of a know structural break, while the Zivot-Andrews test should be used for an unknown endogenous structural break. An alternative method that tests the null hypothesis of stationarity is the KPSS test.

9 Appendix

9.1 Monte Carlo simulations for the bias in a unit root

In the tutorial we constructed a number of simulation exercises, where we noted that after generating a random walk process 10,000 times, the estimated coefficients for \(\hat{\phi}\), were biased to values below 1. The results of this simulation exercise are contained in Figure 6.

Figure 7: Bias in unit root process when \(\phi=1\)

To make use of a Monte Carlo simulation for such a data generating process (DGP) that may have been generated for a particular model, we need to specify information relating to:

Therefore, if we assume that the DGP is generated by an AR(1) model that does not have a constant, such as:

\[\begin{eqnarray} \nonumber y_t = \phi y_{t -1} + \epsilon_t, \;\;\; \text{for } t = 1, . . . , T \;\;\; \text{and } \epsilon_t \sim \text{i.i.d.} \mathcal{N}(0, \sigma^2 ) \end{eqnarray}\]

Then we would need to specify values for the follow terms, where by way of example, \[ y_0 = 0, \phi = 1, \sigma = 1 \; \text{and } T = 100 \].

We would then be able to generate values for the variables with the aid a some form of simulatio, where the number of simulations would need to take on a defined value, e.g. \(N = 10,000\). Thereafter, we could estimate an AR(1) model for each of these simulated time series, which could be used to investigate the bias in the estimated value of \(\hat{\phi}\). Hence,

9.2 Power studies

The power of a test is the probability of rejecting the null hypothesis given that the null hypothesis is not true (that is, one minus type II error).

For example consider the power of the Dickey-Fuller test, where we assume that you know the \(5\%\) critical value of the one-sided \(t\)-test for \(\phi = 1\) denoted by \(\tau_{0.05}\). To ascertain the power of the Dickey-Fuller test for \(\phi \ne 1\), which is where the test suggests that the series contains a unit root (when you know it doesn’t).

To obtain sample of estimated \(t\)-statistics:

We could then consider different values of \(\phi\) to investigate the relation between power and \(\phi\), which may be used to draw a power function. These studies suggest that the power of the Dickey-Fuller test is relatively low. For example, when making use of a simulation exercise for a stationary time series process that has a long memory, where \(\phi = 0.95\), we noted that the Dickey-Fuller test was only able to reject the null of a unit root 4.3% of the time (when using the critical values at the 95% level).

10 References

Banerjee, A., R. L. Lumsdaine, and J. H. Stock. 1992. “Recursive and Sequential Tests of the Unit-Root and Trend-Break Hypotheses: Theory and International Evidence.” Journal of Business and Economic Statistics 10(3): 271–87.

Christiano, Lawrence J. 1992. “Searching for a Break in GNP.” Journal of Business and Economic Statistics 10(3): 237–50.

Dickey, D. A., and W. A. Fuller. 1979. “Distribution of the Estimates for Autoregressive Time Series with a Unit Root.” Journal of American Statistical Association 74(366): 427–31.

———. 1981. “Likelihood Ratio Statistics for Autoregressive Time Series with a Unit Root.” Econometrica 49: 1057–72.

Haldrup, N., and W. Jansen. 2006. “Palgrave Handbook of Econometrics: Vol 1 Economic Theory.” In, edited by T. Mills and K. Patterson. Pagrave MacMillan.

Kiawatkowski, D., P. C. Phillips, P. Schmidt, and Y. Shin. 1992. “Testing the Null Hypothesis of Staionarity Aganist the Alternative of a Unit Root: How Sure Are We That Economic Time Series Have a Unit Root?” Journal of Econometrics 54(1): 159–78.

MacKinnon, J. 1991. “Long-Run Economic Relationships: Readings in Cointegration.” In, edited by R. F. Engle and C. W. J. Granger. Advanced Texts in Econometrics. Oxford: Oxford University Press.

Nelson, C.R., and C.I. Plosser. 1982. “Trends and Random Walks in Macroeconmic Time Series: Some Evidence and Implications.” Journal of Monetary Economics 10: 139–62.

Perron, Pierre. 1989. “The Great Crash, the Oil Price Shock, and the Unit Root Hypothesis.” Econometrica, 1361–1401.

———. 1997. “Further Evidence on Breaking Trend Functions in Macroeconomic Variables.” Journal of Econometrics 80: 355–85.

———. 2006. “Palgrave Handbook of Econometrics, Volume 1.” In, 278–352. Pagrave MacMillan.

Perron, Pierre, and Timothy Vogelsang. 1992. “Nonstationary and Level Shifts with an Application to Purchasing Power Parity.” Journal of Business and Economic Statistics 10: 301–20.

Sims, Christopher A. 1988. “Bayesian Skepticism on Unit Root Econometrics.” Journal of Economic Dynamics and Control 12 (2-3): 463–74.

Sims, Christopher A., and Harald Uhlig. 1991. “Understanding Unit Rooters: A Helicopter Tour.” Econometrica 59(6): 1591–9.

Vogelsang, Timothy, and Pierre Perron. 1998. “Additional Tests for a Unit Root Allowing for a Break in the Trend Function at an Unknown Time.” International Economic Review 39: 1073–1100.

Yule, G. U. 1926. “Why Do We Sometimes Get Nonsense-Correlations Between Time Series.” Journal of Statistical Society 89: 1–64.

Zivot, Eric, and Donald Andrews. 2002. “Further Evidence on the Great Crash, the Oil-Price Shock, and the Unit Root Hypothesis.” Journal of Business and Economic Statistics 20: 25–44.


  1. Similar results were obtained after including a deterministic time trend in the model.

  2. The earlier results for the regression in levels may been due to the dramatic improvements to medicine and a change in preferences (to get married) that may have occurred over this period of time.

  3. Such an example may allow for instances where a change in technology permanently affects the level of output.

  4. This would imply that it is only when the calculate value of the test statistic is smaller (or more negative) that we are able to reject the null of a unit root.

  5. The values of Dickey and Fuller (1979) have been included in the urca package.

  6. For example, the level of economic output is usually increasing over time.

  7. Figure 3 contains an example of a random walk plus drift.