Nonstationarity and Unit Roots

# Nonstationarity and Unit Roots
### Kevin Kotzé

---

---
# Contents

1. Introduction
1. Deterministic or stochastic trends
1. The ACF for simulated time series
1. Testing for stationarity
1. Unit roots with structural breaks
1. Tests that assume stationarity
1. Bayesian analysis and unit roots
1. Conclusion

---
# Introduction

- Presence of deterministic or stochastic trends may induce nonstationarity
- A plot of GDP or JSE index (in levels) would suggest that the mean depends on time
- If either the data or the models are not conditioned to account for this phenomena, standard classical regression techniques (i.e. OLS) may be inappropriate

---
# What's with the trend?

- Consider the historic example Yule (1926) for Mortality and Marriage:
    - Yearly data on the standardised mortality in England & Wales
    - Proportion of Church of England marriages (1866-1911)
- Graph suggests strong positive relationship
- Correlation coefficient of 0.9515 suggests likewise
- The inclusion of a constant and/or deterministic trend produces similar results
- However taking the first difference of the variables suggests that there is no relationship

---
background-image: url(image/Mort_Marry.svg)
background-position: top
background-size: 90% 90%

Figure : Mortality and Marriage

---
# Yule (1926) for Mortality and Marriage

Dependent Variable: `\(\Delta\)` mortality

Sample: 1866 to 1911

`\(\;\)`              | Coefficient | Std.Error | `\(t\)`-value | `\(prob\)` 
------------------|-------------|-----------|-----------|--------
constant          |      -0.133 |     0.210 |     -0.63 |  0.531
`\(\Delta\)` marriage |       0.011 |     0.043 |     0.270 |  0.788
`\(R^2\)`             |       0.001 |           |           |

---
# Introduction

Many series have a trend which may be deterministic or stochastic
`\begin{eqnarray}
y_t = \text{trend} + \text{stationary component} + \text{irregular}
\end{eqnarray}`
- When trend is deterministic it could be removed by regressing the data on variable:
`\begin{eqnarray}
x_t = 1,2,3,\dots, T
\end{eqnarray}`
- Residual should then be stationary

---
# Trend stationary series

- Consider the autoregressive model with a deterministic trend,
`\begin{eqnarray}
y_t =  \alpha t +  \phi_1 y_{t-1} +  \varepsilon_t
\end{eqnarray}`
- Which can be expressed in moving average form,
`\begin{eqnarray}
y_t = \alpha t +  \theta (L)  \varepsilon_t
\end{eqnarray}`
    - where  `\(\theta(L)=1+\theta_{1}L +\theta_{2}L^{2}+\theta_{3}L^{3}+ \ldots\)`
    - the deterministic trend is simply the time index `\(t\)`
    - with a slope parameter `\(\alpha\)`
- Note that the first moment clearly depends on time,
`\begin{eqnarray}
\mathbb{E}\left[  y_{t}\right]  =\alpha t
\end{eqnarray}`

---
# Trend stationary series

- However, deviations of `\(y_{t}\)` from its expected mean are stationary,
`\begin{eqnarray}
y_{t}-\mathbb{E}\left[  y_{t}\right]   & = \alpha t+\theta(L)\varepsilon_{t}-\left(  \alpha t\right) \\
& =\theta(L)\varepsilon_{t}
\end{eqnarray}`
- The time series will return to the trend after a shock
- We call this a trend-stationary (TS) process

---
# Nonlinear deterministic trends

- Economic problems often encounter nonlinear trends from increasing returns to scale, etc.
- Hence,
`\begin{eqnarray}
y_t = \mu + \alpha_1 t + \alpha_2 t^2 + \alpha_3 t^3+ \ldots + \alpha_n t^n + \varepsilon_t
\end{eqnarray}`
- Usually take several polynomial trends and then remove components - compare models with AIC, etc.
- Common in many possible time series variables

---
# Stochastic trend - Random Walk

- Stochastic trends are permanently affected by innovations
- Simplest model of a variable with a stochastic trend is the random walk
- Depends on past values of itself and white noise errors
`\begin{eqnarray}
y_{t}=y_{t-1}+\varepsilon_{t} \;\;\; \text{where } \; \varepsilon_{t}\sim \mathsf{i.i.d.} \mathcal{N}\left(0,\sigma^{2}\right)
\end{eqnarray}`
- Implies that the best forecast of `\(y_{t+1}\)` at time `\(t\)`,
`\begin{eqnarray}
\mathbb{E}\left[y_{t+1}|\;y_{t}\right] =y_{t}
\end{eqnarray}`
- where the MSE grows linearly with the forecast horizon,
`\begin{eqnarray}
\acute{\sigma}\left(h\right)=\mathsf{var}\left(y_{t+1}-\mathbb{E}\left[ y_{t+1}| y_{t}\right]  \right)  =\sigma^{2}h
\end{eqnarray}`
- The effect of shocks is not eroded with time

---
# Stochastic trend - Random Walk

- For the random walk model `\(y_{t}=y_{t-1}+\varepsilon_{t}\)` we may use  recursive substitution to describe the evolution of the process,
`\begin{eqnarray}
y_{t}  & =& y_{t-1}+\varepsilon_{t}\\
& =& y_{t-2}+\varepsilon_{t-1}+\varepsilon_{t}\\
& =& y_{t-3}+\varepsilon_{t-2}+\varepsilon_{t-1}+\varepsilon_{t}\\
& \vdots & \\
y_{t}  & =& \overset{t-1}{\underset{j=0}{\sum}}\varepsilon_{t-j}+y_{0}
\end{eqnarray}`
- Therefore each shock, `\(\varepsilon_{t-j}\)` will contribute to subsequent values of `\(y_{t}\)`
- Hence, a shock to a random walk has a permanent effect and persists forever (infinite memory)

---
# Stochastic trend - Random Walk

- If `\(y_{0}\)` is taken to be zero (in the finite past), the mean and variance of a random walk is,
`\begin{eqnarray}
\mathbb{E}\left[y_{t}\right]=0 \;\;\; \text{and } \;\; \mathsf{var}\left( y_{t}\right) =\sigma^{2}t
\end{eqnarray}`
- Thus `\(y_{t}\)` is nonstationary as variance depends on time
- The covariance, `\(\gamma_{t-j}\)`, between `\(y_t\)` and `\(y_{t-j}\)` with `\(y_0=0\)`,
`\begin{eqnarray}
\mathbb{E}[(y_t - y_0)(y_{t-j}-y_0)] & = & \mathbb{E}[(\varepsilon_t + \varepsilon_{t-1} + \ldots + \varepsilon_1) \ldots \\
& & (\varepsilon_{t-j} + \varepsilon_{t-j-1} + \ldots + \varepsilon_{1})]\\
& = & \mathbb{E}[(\varepsilon_{t-j})^2 + (\varepsilon_{t-j-1})^2 + \ldots + (\varepsilon_1)^2]\\
& = & (t-j)\sigma^2
\end{eqnarray}`
- which also depends on time

---
# Stochastic trend - Random Walk

- Random-walk has a unit root as there is no tendency to return to the mean after a shock
- Also termed difference-stationary (DS), as the first difference of the random walk yields
`\begin{eqnarray}
y_{t}- y_{t-1}  & =\varepsilon_{t}\\
\Delta y_{t}  & =\varepsilon_{t}
\end{eqnarray}`
- where `\(\Delta y_{t}\)` is a sequence of stationary white noise errors
- Also refer to `\(y_{t}\)` as `\(I(1)\)` and `\(\Delta y_{t}\)` as `\(I(0)\)`

---
# Stochastic trend - Random Walk plus Drift

- Adding a constant term provides the random walk with drift,
`\begin{eqnarray}
y_{t}=\mu + y_{t-1}+\varepsilon_{t}
\end{eqnarray}`
- Using recursive substitution we can show that it comprises of a deterministic and stochastic trend
`\begin{eqnarray}
 y_{t}  & =&\mu+y_{t-1}+\varepsilon_{t}\\
  & =&\mu+(y_{t-2}+\mu+\varepsilon_{t-1})+\varepsilon_{t}\\
  & =&2\mu+(y_{t-3}+\mu+\varepsilon_{t-2})+\varepsilon_{t-1}+\varepsilon_{t}\\
& \vdots & \\
\ y_{t}  & =&\mu \cdot t+\overset{t-1}{\underset{j=0}{\sum}}\varepsilon_{t-j}
\end{eqnarray}`
- In contrast to the trend-stationary model, deviations from the deterministic trend are not stationary
- All past values of `\(\varepsilon_{t}\)` will influence `\(y_{t}\)`, even after removing the deterministic trend

---
# Features of non-stationary processes

- In stationary time series:
    - Shocks are temporary
    - Revert back to long-run mean
- In non-stationary time series:
    - Includes permanent shock or other components
    - Mean and/or variance are time dependent
- To identify a non-stationary time series:
    - No long-run mean to which the series returns
    - Variance is time dependent (increases with time)
    - Theoretical autocorrelations don't decay and sample correlogram dissipates slowly

---
background-image: url(image/sim_irf.svg)
background-position: top
background-size: 90% 90%

Figure : Impulse response functions for autoregressive processes

---
# First difference of trend stationary process

- Worth noting that taking the first difference of a trend stationary process would introduce a unit root
`\begin{eqnarray}
y_t = \alpha t + \varepsilon_t
\end{eqnarray}`
- where the lag could be represented by, `\(y_{t-1} = \alpha (t-1) + \varepsilon_{t-1}\)`
- First difference of the above process is
`\begin{eqnarray}
\Delta y_t = \alpha + \varepsilon_t - \varepsilon_{t-1}
\end{eqnarray}`
- where the full effect of the previous shock is in the solution
- Hence, the process is nonstationary, as the effects of previous shocks do not dissipate with time

---
# Determinisitic & stochastic trends

- Consider the following process that has both deterministic and stochastic components
`\begin{eqnarray}
y_t = \alpha t + y_{t-1} + \varepsilon_t
\end{eqnarray}`
- To make this process stationary subtract `\(y_{t-1}\)` from both sides
`\begin{eqnarray}
\Delta y_t = \alpha t + \varepsilon_t
\end{eqnarray}`
- We are then able to remove the deterministic trend by regression `\(\Delta y_t\)` on a variable with a deterministic trend (i.e. `\(x = 1,2,3,\ldots\)`)
- What will be left is a stationary process, which in this case will be white noise
- When the process has a deterministic trend there is no problem with taking the first difference if it also has a unit root, however if it does not have a unit root then we induce nonstationarity through the lag of the MA term

---
background-image: url(image/sim_proc.svg)
background-position: top
background-size: 90% 90%

Figure : Different simulated time series

---
background-image: url(image/simul_acf.svg)
background-position: top
background-size: 90% 90%

Figure : The ACF for simulated time series

---
# Correlation of Non-stationary Processes

- The correlation coefficient `\(\rho_j\)` is then derived from the covariance function divided by the product of the standard deviation of `\(y_t\)` multiplied by the standard deviation of `\(y_{t-j}\)`. For a random walk,
    - Standard deviation of: `\(\sqrt{\mathsf{var}(y_t)} = \sqrt{ t\sigma^2}\)`
    - Standard deviation of: `\(\sqrt{\mathsf{var}(y_{t-j})} =  \sqrt{(t-j)\sigma^2}\)`
`\begin{eqnarray}
\rho_j & = & (t-j)\sigma^2 / \sqrt{(t-j)\sigma^2} \sqrt{(t)\sigma^2} \\
& = & (t-j) / \sqrt{(t-j)t}\\
& = & \sqrt{(t-j) / t} \;\;\;\; < 1
\end{eqnarray}`
- Since the sample size `\(t\)` is large relative to the number of autocorrelations for reasonable values of `\(t\)` and `\(j\)`
- The ratio `\((t-j)/t\)` is approximately equal to unity
- For adjacent periods `\(j=1\)` the correlation coefficient `\(\rho_1\)` approaches unity as `\(t \rightarrow \infty\)`

---
# The ACF for non-stationary time series

- Hence the sample autocorrelation function for a random walk will show tendency to decay
- Makes it impossible to use ACF to distinguish between unit root and near unit root
    - Provides rough indicator of whether a trend is present
- Slowly decaying ACF indicates:
    - Large characteristic root, or
    - Series may possibly include a true unit root process, or
    - Series may possibly include a trend stationary process, or
    - Series may include both these features or it may be stationary (but highly persistent)
- Sharply decaying ACF indicates series is stationary

---
# Tests for unit roots

- Need formal test to distinguish between:
    - Stationary long-memory process
    - Random-walk, random-walk plus drift
    - Stationary process with deterministic trend
    - Non-stationary process with deterministic trend
- Tests need to account for parameter estimates that may be biased
- We will see that existing tests have little power to distinguish between unit root and near unit root

---
# Tests for unit roots

- Several tests consider the order of integration of a time series, which may be separated into three groups:
- First group tests the null that there is a unit root against the alternative of stationarity
    - The alternative could also be stationarity in levels or around a deterministic trend (trend-stationarity)
- Second group tests as above but allows for structural breaks, observed or unknown
- Third group assumes the opposite of the above-mentioned tests, namely the null of stationarity against the alternative of a unit root

---
# Dickey-Fuller test

- Dickey and Fuller (1979) test whether a series is a random walk against the alternative that it is (trend) stationary
- Assume an `\(AR(1)\)` process:
`\begin{eqnarray}
y_{t}=\phi y_{t-1}+\varepsilon_{t} \;\;\; \text{where } \; \varepsilon_{t}\sim\mathsf{i.i.d.}  \mathcal{N}\left(0,\sigma^{2}\right)
\end{eqnarray}`
- We would want to know whether `\(|\phi|=1\)` or `\(|\phi|<1\)`
- If `\(|\phi|=1\)` equation we have a random walk
- If `\(|\phi|<1\)`, the `\(AR(1)\)` process is stationary
- The autocorrelation coefficient, `\(\rho_1\)`, is equal to the coefficient, `\(\hat{\phi}\)`, which is biased to be below unity, when in the presence of a unit root

---
# Dickey-Fuller test

- When we write the AR(1) in first difference,
`\begin{eqnarray}
y_{t}&=&\phi y_{t-1}+\varepsilon_{t}\\
y_{t} - y_{t-1} &=&\phi y_{t-1} - y_{t-1}+\varepsilon_{t}\\
\Delta y_{t}&=& ( \phi -1 ) y_{t-1}+\varepsilon_{t}\\
\Delta y_{t}&=&\pi y_{t-1}+\varepsilon_{t}
\end{eqnarray}`
- where `\(\pi=\hat{\phi}-1\)`
- Test for unit root involves the estimation of `\(\pi\)` parameter
`\begin{eqnarray}
H_{0}:\pi=0
\end{eqnarray}`
- which implies that `\(y_{t}\sim I(1)\)`, or the alternative,
`\begin{eqnarray}
H_{1}:\pi<0
\end{eqnarray}`
- which implies that `\(y_{t}\sim I(0)\)`

---
# Dickey-Fuller test

- To test whether the coefficient in a regression is statistically different from zero we can use a `\(t\)`-test
- This involves making use of the standard errors or `\(t\)`-statistics
`\begin{eqnarray}
\hat{t}_{DF}=\frac{\hat{\phi}-1}{SE\left(\hat{\phi}\right)}=\frac{ {\pi}}{SE\left({\pi}\right)}
\end{eqnarray}`
- Note that the potential bias also implies that we cannot use the standard critical `\(t\)`-values

---
# Dickey & Fuller (1979)

- To test for the presence of a unit root, the Dickey-Fuller test:
    - Assumes a measure of the bias, such that anything within that interval may be deemed to follow a random walk
    - Generate thousands of random walks - calculate `\(\phi\)`
- In presence of an intercept:
    - 90% of `\(\phi\)` are less than `\(-3.15\)` std errors from unity
    - 95% of `\(\phi\)` are less than `\(-3.45\)` std errors from unity
    - 99% of `\(\phi\)` are less than `\(-4.04\)` std errors from unity
- If we were trying to determine whether the previous data contained a unit root:
    - With a standard error of `\(-2.24\)`, we can't reject the null of a unit root
    - Needs to be smaller (or more negative) than critical values to reject null of unit root
    - It is only when we are extremely certain that `\(\pi \neq 0\)` that we can conclude that no unit root is present

---
# Dickey & Fuller (1979)

- The above test is appropriate when there is no deterministic trend
- For trending time series like the level of GDP we would like to include a deterministic trend in the alternative hypothesis
- The testing strategy will then be to specify,
`\begin{eqnarray}
y_{t}=\beta_1 + \beta_2  t+\phi y_{t-1}+\varepsilon_{t}
\end{eqnarray}`
    - which can be rewritten as,
`\begin{eqnarray}
\Delta y_{t}=\beta_1 + \beta_2 t+\pi y_{t-1}+\varepsilon_{t}
\end{eqnarray}`
- where `\(\pi=\hat{\phi}-1\)` and it still involves a test on `\(\pi=0\)`,
`\begin{eqnarray}
H_{0} \; : \; \pi=0
\end{eqnarray}`
- which implies `\(y_{t}\sim I(1)\)` with drift, and the alternative,
`\begin{eqnarray}
H_{1} \; : \; \pi<0
\end{eqnarray}`
- which implies `\(y_{t}\sim I(0)\)` with a deterministic time trend

---
# Augmented Dickey-Fuller (1981)

- Since the lag structure in the original tests is not extensive
- ADF test allows for persistence in the stationary `\(\Delta y_{t}\)`, which may follow a higher order `\(AR(p)\)` process
- For example, when the stationary part follows an `\(AR(2)\)` process,
`\begin{eqnarray}
y_{t}=\beta_1 + \beta_2 t+\phi_{1}y_{t-1}+\phi_{2}y_{t-2}+\varepsilon_{t}
\end{eqnarray}`
- which is the same as,
`\begin{eqnarray}
y_{t}= \beta_1 + \beta_2 t+(\phi_{1}+\phi_{2})y_{t-1}-\phi_{2}(y_{t-1}-y_{t-2})+\varepsilon_{t}
\end{eqnarray}`
- subtracting `\(y_{t-1}\)` from both sides gives
`\begin{eqnarray}
\Delta y_{t}= \beta_1+\beta_2 t+\pi y_{t-1}+\gamma_{1}\Delta y_{t-1}+\varepsilon_{t}
\end{eqnarray}`
- where `\(\pi=\phi_{1}+\phi_{2}-1\)` and `\(\gamma_{1}=-\phi_{2}\)`

---
# Augmented Dickey-Fuller (1981)

- Allows us to isolate the persistence from other stationary components
- May also isolate the effects of intercepts and linear time trends,
`\begin{eqnarray}
\Delta y_t = \pi y_{t-1} + \sum_{i=2}^{p}\gamma_i \Delta y_{t-i+1} + \varepsilon_t  \\
\Delta y_t = \beta_1 + \pi y_{t-1} + \sum_{i=2}^{p}\gamma_i \Delta y_{t-i+1} + \varepsilon_t \\
\Delta y_t = \beta_1 +  \beta_2 t +  \pi y_{t-1} + \sum_{i=2}^{p}\gamma_i \Delta y_{t-i+1} + \varepsilon_t
\end{eqnarray}`
- Choice of lags may be determined by information criteria

---
# Types of Dickey-Fuller tests

- Differences between these regressions concern `\(\beta_1\)` and `\(\beta_2\)`
    - First is a pure random walk model
    - Second adds an intercept or drift term
    - Third includes drift and linear time trend
- In each case the parameter of interest is `\(\pi\)`:
    - If `\(\pi =0\)` then `\(y_t\)` contains a unit root
    - Comparing the `\(t\)`-statistic with the Dickey-Fuller tables determines whether to reject null, `\(\pi =0\)`
- Although the method is the same regardless of equations:
    - The critical values of `\(t\)`-statistics depend on whether the intercept or time trend is included
    - Dickey-Fuller (1979) also suggest that critical values depend on sample size

---
# Dickey Fuller (1981)

- Dickey Fuller (1981) provide 3 additional `\(F-\)`statistics `\((\varphi_1 , \varphi_2\)` and `\(\varphi_3)\)` to test joint hypotheses on the coefficients
    - The null for second expression, `\(\pi = \beta_1 = 0\)` is tested using `\(\varphi_1\)`
        - i.e. process doesn't have a constant
    - The null for third equation, `\(\pi = \beta_1 = \beta_2 = 0\)` is tested using `\(\varphi_2\)`
        - i.e. process doesn't have a constant or trend
    - The joint hypothesis `\(\pi = \beta_2 = 0\)` is tested using `\(\varphi_3\)`
        - i.e. process doesn't have a trend

---
# Dickey Fuller (1981)

- The `\(\varphi_1, \varphi_2\)` and `\(\varphi_3\)` statistics are constructed according to the traditional `\(F-\)`test methodology;
`\begin{eqnarray}
\varphi_i = \frac{[RSS(restricted) - RSS(unrestricted)] / r}{RSS(unrestricted) / (T-k)}
\end{eqnarray}`
- where `\(RSS(restricted)\)` and `\(RSS(unrestricted)\)` are the sum of the squared residuals
- `\(r\)` = number of restrictions
- `\(T\)` = number of usable observations
- `\(k\)` = number of estimated parameters in the unrestricted model

---
# Dickey Fuller (1981)

- Comparing the calculated value of `\(\varphi_i\)` to the values in Dicky-Fuller tables
    - Determines the significance level at which the restriction is binding
    - Null is that the data is generated by the restricted model
    - Alternative is that the data is generated by the unrestricted model
- If the restriction is not binding `\(RSS(restricted)\)` should be close to `\(RSS(restricted)\)` and `\(\varphi_i\)` will be small
    - Hence large values of `\(\varphi_i\)` suggest a binding restriction and a rejection of the null
- The general testing procedure involves a general-to-specific methodology that is describe in the next slide

---
background-image: url(image/genSpec.svg)
background-position: top
background-size: 90% 90%

Figure : General-to-Specific Procedure

---
# Multiple roots

- If more than one root is suspected perform Dickey-Fuller test on successive differences of `\(y_t\)`
- Where only one root was suspected we could use, for example;
`\begin{eqnarray}
\Delta y_t = \mu + \pi y_{t-1} + \varepsilon_t
\end{eqnarray}`
- If two roots are suspected then estimate the equation;
`\begin{eqnarray}
\Delta^2 y_t = \mu + \xi_1 \Delta y_{t-1} + \varepsilon_t
\end{eqnarray}`
- If you cannot reject the null that `\(\xi_1 = 0\)` conclude that `\(y_t\)` is `\(I(2)\)`

---
# Structural Change

- Perron (1989) showed that the ADF test has little power to discriminate between a stochastic and deterministic trend when data is subject to structural break
- When there are structural breaks the various ADF tests are biased towards the non-rejection of a unit root
- Consider the moving average representation of an autoregressive model, `\(y_t = S_t + 0.5 \sum \varepsilon_t\)` where `\(S_{1-249} = 0\)` and `\(S_{250-500} = 1\)`

---
background-image: url(image/struct1.svg)
background-position: top
background-size: 90% 90%

Figure : Stationary plus structural break

---
# Stationary process with structural break

- If we were to fit an `\(AR(1)\)` to this process the coefficient would be biased towards unity
    - low values are followed by low values
    - high values are followed by high values
- The ADF tests of this misspecified model would suggest a random walk plus drift

---
# Testing for structural change

- Perron (1989) develops a formal procedure for testing unit roots in the presence of a structural change, `\(\tau\)`, that could take three forms:
    - The null considers a one-time jump (pulse) in the level of the unit root process
`\begin{eqnarray}
H_0: \;\; y_t = \mu + y_{t-1} + \beta_1 D_P + \varepsilon_t
\end{eqnarray}`
`\(\hspace{1cm}\)` where `\(D_P = 1\)` if `\(t = \tau +1\)`, and `\(0\)` otherwise
    - The alternative considers a level-shift in the intercept of a stationary process that has a deterministic trend
`\begin{eqnarray}
H_1: \;\; y_t = \mu + \alpha t + \beta_2 D_L + \varepsilon_t
\end{eqnarray}`
`\(\hspace{1cm}\)` where `\(D_L = 1\)` if `\(t > \tau\)`, and `\(0\)` otherwise

---
# Perron (1989) - Pulse Dummy

**Step 1:** Detrend the data by the alternate hypothesis and obtain the residuals `\(\hat{y}\)`

**Step 2:** Estimate the regression `\(\hat{y} = \phi \hat{y}_{t-1} + \varepsilon_t\)`. Under the null `\(\phi_1\)` is unity

- Perron showed that when the residuals are identically and independently distributed then the distribution of `\(\phi\)` depends on the proportion of observations occurring prior to the break, `\(\lambda = \tau/T\)` (which influence the critical values)

**Step 3:** Perform diagnostic checks to see whether the residuals are serially uncorrelated. If there is serial correlation then use the augmented form of the regression; `\(\hat{y} = \phi \hat{y}_{t-1} + \sum_{i=1}^{k} \gamma_i \Delta \hat{y}_{t-i} + \varepsilon_t\)`

**Step 4:** Calculate the `\(t\)`-statistic for the null `\(\phi =1\)` and compare to the Perron critical values. If the calculated `\(t\)`-statistic is greater than the critical value then reject the null of a unit root

---
# Perron (1989) - Pulse Dummy

- The previous steps can be completed simultaneously with:

`\begin{equation}
y_t = \mu + \phi_1 y_{t-1} + \alpha t + \beta_2 D_L + \sum_{i=1}^{k} \gamma_i \Delta y_{t-i} + \varepsilon_t
\end{equation}`

---
# Perron (1989) - Change in drift

- To consider a permanent change in drift as oppose to a change in slope of the trend,
`\begin{eqnarray}
H_0: \;\; y_t = \mu + y_{t-1} + \beta_1 D_L + \varepsilon_t
\end{eqnarray}`
`\(\hspace{1cm}\)` where `\(D_L = 1\)` if `\(t > \tau\)`, and `\(0\)` otherwise
- The alternative considers a trend-stationary process with a change in slope,
`\begin{eqnarray}
H_1: \;\; y_t = \mu + \alpha t + \beta_3 D_T + \varepsilon_t
\end{eqnarray}`
`\(\hspace{1cm}\)` where `\(D_T = t-\tau\)` if `\(t > \tau\)`, and `\(0\)` otherwise

---
# Perron (1989) - Change in level and drift

- To consider both these hypotheses together,
`\begin{eqnarray}
H_0: \;\; y_t = \mu + y_{t-1} + \beta_1 D_P +  \beta_2 D_L + \varepsilon_t
\end{eqnarray}`

- For which the alternative is,
`\begin{eqnarray}
H_1: \;\; y_t = \mu + \alpha t + \beta_2 D_L + \beta_1 D_T + \varepsilon_t
\end{eqnarray}`

---
# Critique of Perron (1989)

- But what if you don't know the date of the break *a priori*?
    - Perron & Vogelsang (1992)
    - Perron (1997)
    - Vogelsang & Perron (1998)
- Currently most researchers use the Zivot & Andrews (2002) method

---
# Endogenous Structural Break

- The date of the endogenous structural shift relates to that point in time, which gives the least favourable result for the null hypothesis of a random walk with drift
- Test statistics are formulated as;
`\begin{eqnarray}
\Delta y_t = \mu + \pi y_{t-1} + \alpha t + \beta_2 D_L \hat{\lambda} + \sum_{i=1}^{k} \gamma_i \Delta y_{t-i} + \varepsilon_t \\
\Delta y_t = \mu + \pi y_{t-1} + \alpha t + \beta_3 D_T \hat{\lambda} + \sum_{i=1}^{k} \gamma_i \Delta y_{t-i} + \varepsilon_t \\
\Delta y_t = \mu + \pi y_{t-1} + \alpha t + \beta_2 D_L  \hat{\lambda} + \beta_3 D_T \hat{\lambda} + \sum_{i=1}^{k} \gamma_i \Delta y_{t-i} + \varepsilon_t
\end{eqnarray}`
    - where `\(\lambda\)` is estimated and `\(\pi=\phi-1\)`
- Critical values in Zivot & Andrews (2002)

---
# KPSS tests for unit roots

- Test the null hypothesis that a series is `\(I(0)\)` against the alternative that it is `\(I(1)\)`
- Kiawatkowski, Phillips, Schmidt, and Shin (1992) test this hypothesis
- Assuming that we ignore the trend for simplicity,
`\begin{eqnarray}
y_{t}=\mu+x_{t}+\upsilon_{t}
\end{eqnarray}`
- where `\(\mu\)` is a constant, `\(\upsilon_{t}\)` is a stationary `\(I(0)\)` process, and `\(x_{t}\)` is a random walk,
`\begin{eqnarray}
x_{t}=x_{t-1}+\varepsilon_{t} \;\;\; \text{where } \; \varepsilon_{t}\sim \mathsf{i.i.d.} \mathcal{N} \left(0,\sigma^{2}\right)
\end{eqnarray}`
- If the variance is zero, `\(\sigma^{2}=0\)`, then `\(x_{t}\)` is constant for all `\(t\)` and `\(y_{t}\)` will be stationary as it is composed of constants and the stationary process `\(\upsilon_{t}\)`

---
# KPSS tests for unit roots

- Hence, the KPSS test may be specified as,
`\begin{eqnarray}
H_{0}:\sigma^{2}=0
\end{eqnarray}`
  - which implies that `\(x_{t}\)` is a constant, against the alternative,
`\begin{eqnarray}
\ H_{1}:\sigma^{2}>0
\end{eqnarray}`
  - which implies that `\(x_{t}\)` and `\(y_{t}\)` are nonstationary `\(I(1)\)` processes
- The KPSS test statistic may therefore be estimated from,
`\begin{eqnarray}
KPSS=\frac{1}{T^{2}}\frac{\sum_{t=1}^{T}\hat{S}_{t}^{2}}{\hat{\sigma}_{\infty}^{2}}
\end{eqnarray}`
- where `\(S_{t}=\sum_{s=1}^{t}\hat{\upsilon}_{t}\)` and `\(\hat{\upsilon}_{t}\)` is the residual of a regression of `\(y_{t}\)` on a constant, `\(\mu\)`
- `\(\hat{\sigma}_{\infty}^{2}\)` is the estimator of the long-run variance of the process `\(\upsilon_{t}\)` using `\(\hat{\upsilon}_{t}\)`

---
# Bayesian analysis and unit roots

-  When the coefficient is relatively close to one and the uncertainty surrounding the estimate is relatively high, we are unable to reject the null of a unit root, when using these classical techniques
- In such cases we cannot make use of the results of such a spurious regression in levels as the coefficient estimates will be biased
- The use of Bayesian inference may get around this problem with the aid of carefully specified priors

---
# Bayesian analysis and unit roots

- Bayesian techniques require that we provide a prior distribution for all the parameter estimates
- These priors are then evaluated, after the are subjected to the likelihood function of the data, to generate a posterior estimate
- The degree to which the posterior differs from the prior is partly dependent upon the second moment of the likelihood function
- If the likelihood function suggests that the posterior should be largely different from the prior with a high degree of certainty, then the posterior will differ significantly from the prior

---
# Bayesian analysis and unit roots

- Sims (1988) and Sims & Uhlig (1991) show that this feature of Bayesian statistics could overcome the problems of biased coefficients
- For example, if we make use of a mean value of unity for the prior distribution and there is a great degree of uncertainty in the likelihood function, then the posterior will remain at unity
- These parameter estimates would not be biased

---
background-image: url(image/bayes1.svg)
background-position: top
background-size: 90% 90%

Figure : Bayesian estimation with reasonably flat likelihood function

---
background-image: url(image/bayes2.svg)
background-position: top
background-size: 90% 90%

Figure : Bayesian estimation with narrow likelihood function

---
# Conclusion

- Standard regressions on nonstationary data may provide spurious results
- Time series that have deterministic or stochastic trends are nonstationarity
- If a process returns to its (non-zero) trend value after a shock we say that it is trend-stationary
- Time series that contain a stationary part and a deterministic trend can be made stationary by removing the deterministic time trend
- Time series which are integrated of order one, `\(I(1)\)`, can be made stationary by differencing the time series
- For this reason they are often called difference-stationary, or unit root processes

---
# Conclusion

- The most widely used unit root test is the Augmented Dickey-Fuller test, which tests the null of a unit root
- The Perron test should be used in the presence of a know structural break, while the Zivot-Andrews test should be used when we suspect an unknown endogenous structural break
- Could also use the KPSS, which considers the null of stationarity