State-space models deal with dynamic time series problems that involve unobserved variables or parameters that describe the evolution in the state of the underlying system. This area of mathematical statistics is relevant to many areas of econometric research, as we often encounter unobserved variables that may be included in a model: output gaps, business cycles, expectational values of certain variables, permanent income streams, ex ante real interest rates, reservation wages, etc. In addition, this framework is also relevant to those who are interested in financial research, as they are used in the application of the many variants of stochastic volatility models.

The basic approach to state-space modelling assumes that the development over time of a system under investigation is determined by an unobserved series of vectors, \(\{\alpha_{1}, \ldots ,\alpha_{n}\}\), that are associated an observed series of observations, \(\{y_{1}, \ldots ,y_{n}\}\). The relationship between the \(\alpha_{t}\)’s and the \(y_{t}\)’s is specified by the state-space model and the purpose of state-space analysis is to infer the relevant properties of the \(\alpha_{t}\)’s from our knowledge of the state-space model and the realisation of the observations \(\{y_{1}, \ldots ,y_{n}\}\).¹

1 An Intuitive Example of a State Space Model

A time series is a set of observations, \(\{y_{1}, \ldots ,y_{n}\}\), ordered in time that may be expressed in additive form.²

\[\begin{eqnarray} y_{t}= \mu_{t} + \gamma_{t} + \varepsilon_{t} & \: & t=1, \ldots ,T \tag{1.1} \end{eqnarray}\]

where,

\(\mu_{t}\) is a slowly varying component called the trend
\(\gamma_{t}\) is a periodic component of fixed period called the seasonal
\(\varepsilon_{t}\) is an irregular component called the error

To develop suitable models for \(\mu_{t}\) and \(\gamma_{t}\) we may choose to use of a random walk process to describe the scalar series \(\alpha_{t}\),³ such that,

\[\begin{eqnarray} \alpha_{t+1}= \alpha_{t} + \eta_{t} & \: & \eta_{t} \sim \mathsf{i.i.d.} \mathcal{N}(0, W_{\eta}) \tag{1.2} \end{eqnarray}\]

If \(\mu_{t}=\alpha_{t}\) (where \(\alpha_{t}\) is a random walk), \(\gamma_{t}=0\) (no seasonal is present), and all variables are normally distributed then we may rewrite equations (1.1) and (1.2) as,⁴

\[\begin{eqnarray} y_{t}= \alpha_{t} + \varepsilon_{t} & \: & \varepsilon_{t} \sim \mathsf{i.i.d.} \mathcal{N}(0, V_{\varepsilon}) \tag{1.3}\\ \alpha_{t+1}= \alpha_{t} + \eta_{t} & \: & \eta \sim \mathsf{i.i.d.} \mathcal{N}(0, W_{\eta}) \tag{1.4} \end{eqnarray}\]

Such a model has state-space characteristics as we have:

a measurement equation that describes the relation between the observed variables \(\{y_{1}, \ldots ,y_{n}\}\) and the unobserved state variables (\(\alpha_{t}\)’s)
a state equation that reflects the dynamics of the unobserved state variables \(\{\alpha_{1}, \ldots ,\alpha_{n}\}\)

The objective of state-space methodology is to infer the relevant properties of the \(\alpha_{t}\)’s from our knowledge of the observations \(\{y_{1}, \ldots ,y_{n}\}\). Although it would be pretty straightforward to find a solution to this problem in a stationary model, the inclusion of the random walk \(\alpha_{t}\), would imply that the distributions of the random variables \(y_{t}\) and \(\alpha_{t}\), depend on \(t\). This would make for rather cumbersome multivariate calculations that could be used to find the conditional mean of \(\alpha_t\), as well as the variance and covariances of \(\varepsilon_{t}\) and \(\eta_{t}\), given \(\{y_{1}, \ldots ,y_{n}\}\).

1.1 The Mathematics of State Space Modelling

Although the state-space form is ideally suited to dynamic time series models that involve unobserved components, it also provides a unified representation for a wide range of ARIMA and time varying regression models.⁵ Indeed, this framework is also flexible enough to encapsulate different specifications of non-parametric and nonlinear spline regression models.

These models pay particular attention to the set of \(m\) state variables that evolve over time. These state variables may be subject to systematic distortions and an element of noise. The respective state variables could then be contained in an \(m \times 1\) vector, which is denoted \(\alpha_t\), while the \(N\) observed variables may then be described by an \(N \times 1\) vector, \(y_t\). This allows for the derivation of the measurement equation;

\[\begin{eqnarray} y_{t} = F_{t}\alpha_{t} + S_{t}\varepsilon_{t}, & \;\;\;\; & \varepsilon_{t} \sim \mathsf{i.i.d.} \mathcal{N}(0,V_{\varepsilon}) \tag{1.5} \end{eqnarray}\]

where \(F_t\) and \(S_t\) are fixed matrices for the respective coefficients of order \(N \times m\) and \(N \times r\). In this case, \(r\) refers to the dimensions of the disturbance vector in the measurement equation. \(\varepsilon_t\) is a \(r \times 1\) vector with zero mean and covariance matrix, \(V_{\varepsilon}\).⁶

The state equation could then be described as;

\[\begin{eqnarray} \alpha_{t+1} = G_{t}\alpha_{t} + R_{t}\eta_{t}, & \;\;\;\; & \eta_{t} \sim \mathsf{i.i.d.} \mathcal{N}(0,W_\eta) \tag{1.6} \end{eqnarray}\]

where \(G_t\) and \(R_t\) are fixed coefficient matrices of order \(m \times m\) and \(m \times g\). In this case \(g\) refers to the dimensions of the disturbance vector in the transition equation. \(\eta_t\) is a \(g \times 1\) vector with zero mean and covariance matrix, \(W_\eta\).

The disturbances in the measurement and transition equations are taken to be uncorrelated over all time periods. They are also assumed to uncorrelated with the initial state vector \(\alpha_0\), which is to say;

\[\begin{eqnarray*} \left\{ \begin{array}{ll} \varepsilon_t\\ \eta_t\\ \end{array} \right\} \sim \mathsf{i.i.d.} \mathcal{N} \left[ 0, \left( \begin{array}{ll} V_\varepsilon & 0\\ 0 & W_\eta\\ \end{array} \right)\right] \end{eqnarray*}\]

and

\[\begin{eqnarray*} \mathbb{E}\left[\alpha_{0} \eta_{t}^{\prime}\right]=0, & \;\;\;\; & \mathbb{E}\left[\alpha_{0} \varepsilon_{t}^{\prime}\right]=0 \end{eqnarray*}\]

The covariance matrix of the error terms could then be included in \(\Omega\) and the coefficient matrix may be included in \(\Phi\), such that the solution to this problem is found after deriving:

\[\begin{eqnarray*} \Phi = \left\{ \begin{array}{ll} F_t\\ G_t\\ \end{array} \right\} , \; \; \; \; \Omega = \left\{ \begin{array}{ll} V & 0\\ 0 & W\\ \end{array} \right\}. \end{eqnarray*}\]

which represent the unknowns in any standard regression model (where we usually make strong assumptions about the covariance matrix that are latter subject to a barrage of tests).

The structure of these matrices are particularly important, as they need to be specified before the model is estimated and they can get a little tricky when you have several equations. For example the model,

\[\begin{eqnarray*} y_{t} = \mu_{t} + \varepsilon_{t}\\ \mu_{t+1} = \mu_{t} + \beta_{t} + \eta_{t}\\ \beta_{t+1} = \beta_{t} + \zeta_{t} \end{eqnarray*}\]

may have the coefficient and covariance matrices,

\[\begin{eqnarray*} \Phi = \left\{ \begin{array}{lll} 1 & 0\\ 1 & 1\\ 0 & 1\\ \end{array} \right\} , \; \; \; \; \Omega = \left\{ \begin{array}{lll} V_\varepsilon & 0 & 0\\ 0 & W_\eta & 0\\ 0 & 0 & W_\zeta\\ \end{array} \right\}. \end{eqnarray*}\]

1.2 The practicalities of formulating a State Space model

There are several software packages that have pre-programmed routines that may assist in the formulation of State Space models. For example, EViews has developed the sspace object module and Oxmetrics has STAMP, or one can use the SsfPack module in Ox.⁷ These models can also be estimated in RATS with the aid of the DLM command.

As one would expect there are a number of packages in R that may be used, including StructTS, sspir, dlm, MASS, RWinBugs, RStan, etc. Alternatively, you could also use one of the toolboxes in Python, Matlab or Gauss.⁸

2 Examples

2.1 The local level model

A simple example of a state-space model is the local level model, where the level component (or intercept term) is allowed to vary over time. It may be formulated by defining the respective measurement and state equations as,

\[\begin{eqnarray*} y_{t} = \mu_{t} + \varepsilon_{t}, & \; & \varepsilon_{t} \sim \mathsf{i.i.d.} \mathcal{N}(0,V_\varepsilon) \\ \mu_{t+1} = \mu_{t} + \xi_{t}, & \; & \xi_{t} \sim \mathsf{i.i.d.} \mathcal{N}(0,W_\xi) \end{eqnarray*}\]

This would imply that the matrices for the general form of the model would simplify to,⁹

\[\begin{eqnarray*} \alpha_t = \mu_t, \; \eta_t = \xi_t, \; F_t = G_t = S_t = R_t = 1, \; W = W_\xi, \; V = V_\varepsilon, \end{eqnarray*}\]

In the case of the local level model, note that the dynamic properties relating to the state of the system at time \(t+1\) are expressed as a function of the state of the system at time \(t\). If \(\xi_{t} = 0\) for \(t=1, \ldots ,T\) the model would reduce to a traditional linear regression model that may be solved analytically. However, if \(\xi_t\) is modelled as a stochastic process, then we are not able to solve the model analytically and would need to make use of an iterative optimisation procedure (such as maximum likelihood, nonlinear least squares, or MCMC methods).

By way of example, we have applied this model to South African deflater for the period 1960 to 2014, were the series is represented as the logarithm change in the year-on-year price index. The results are depicted in Figure 1, where we see how inflation has changed over time. Note how the state equation provides a smoothed version of the observed inflationary process, which represents the random walk component of this particular time series.

Figure 1: Local level model - SA inflation (1960Q2-2014Q1)

In terms of the in-sample statistics that are used to describe the goodness-of-fit of this model, the value of the negative log-likelihood is \(261.5\), the variance of the irregular component (\(\hat{V}\)) is \(0.233\) and the variance of the level (\(\hat{W}\)) is \(0.014\). The value for the level, \(\mu\), at the final state at period 2014Q1 is \(1.91\)%. The residuals for the measurement equation are provided in Figure 2, where we note that they would appear to represent white noise.¹⁰

Figure 2: Local level model - Measurement equation residuals

To compare different state-space models we often make use of the information criterion, such as that of the AIC, which in this case may be formulated as:

\[\begin{eqnarray*} AIC =\left[ -2 \log \ell + 2(n) \right] = 526.9 \end{eqnarray*}\]

where \(\log \ell\) is the value of the negative log-likelihood function that is to be maximised and \(n\) is the number of parameters that are to be estimated. When applying this procedure to compare models, smaller positive values (or when the result is negative, we look for larger absolute negative values) denote better fitting models. This technique is extremely helpful when comparing the in-sample fit of different models, as it compensates for the number of parameters in the model specification.

2.2 The local level trend model

The local level trend model is formulated with the aid of two state equations that include an additional slope component, \(\upsilon_{t}\), to the specification of the local level model. It may be derived as follows

\[\begin{eqnarray*} y_{t} = \mu_{t} + \varepsilon_{t}, & \; & \varepsilon_{t} \sim \mathsf{i.i.d.} \mathcal{N}(0,V_\varepsilon) \\ \mu_{t+1} = \mu_{t} + \upsilon_t + \xi_{t}, & \; & \xi_{t} \sim \mathsf{i.i.d.} \mathcal{N}(0,W_\xi) \\ \upsilon_{t+1} = \upsilon_{t} +\zeta_{t}, & \; & \zeta_{t} \sim \mathsf{i.i.d.} \mathcal{N}(0,W_\zeta) \end{eqnarray*}\]

Or alternatively,

\[\begin{eqnarray*} \alpha_{t} = \binom{\mu_{t}}{\upsilon_{t}}, \; \eta_{t} = \binom{\xi_{t}}{\zeta_{t}}, \; G_{t} = \left( \begin{array}{cc} 1 & 1 \\ 0 & 1 \end{array} \right), \; F_{t} = \binom{1}{0}, \; S_{t} = 1, \; \dots \end{eqnarray*}\] \[\begin{eqnarray*} W = \left( \begin{array}{cc} W_\xi & 0 \\ 0 & W_\zeta \end{array} \right), \; R_{t} = \left( \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right), \; V = V_\varepsilon \end{eqnarray*}\]

Note that in this case, \(\upsilon_t\) is the slope of the trend component, which differs to the slope of the regression line. Hence, this parameter is not the same as the coefficient in the classic regression model.

When all state disturbances \(\xi_t\) and \(\zeta_t\) are set to zero it is easy to see that;

\[\begin{eqnarray*} y_{t} = \mu_{1} + \upsilon_{1}g_{t} + \varepsilon_{t}, & \; & \varepsilon_{t} \sim \mathsf{i.i.d.} \mathcal{N}(0,V_\varepsilon) \end{eqnarray*}\]

where \(g_t\) is a variable that represents time (i.e. \(g_t = t\)). This example would therefore represent a traditional linear regression model, where we regress an observed variable on time and a constant.¹¹

Figure 3: Local level trend model - SA inflation (1960Q2-2014Q1)

When we make use of this specification to model the South African inflation data, where both the level and the slope may vary over time, then we observe the results that are contained in Figure 3. The results for the stochastic slope could then be graphed separately in Figure 4. In this case the slope component does not appear to contain a great deal of information.

Figure 4: Local level trend model - Stochastic slope

The statistical results for this model are similar to those that were presented previously, where we note that the value of the negative log-likelihood is \(341.6\), the variance of the irregular component (\(\hat{V_\varepsilon}\)) is \(0.23\), the variance of the level (\(\hat{W_\xi}\)) is \(0.04\), and the variance of the trend (\(\hat{W_\zeta}\)) is \(0.002\). The value for the level, \(\mu\), at the final state during period 2014Q112 is \(2.07\%\) and the value for the \(AIC\) is \(689.15\). This is greater than that which was produced for the local level model, which would suggest that the relative fit of the local level trend model is not as good.

2.3 The local level trend model with a seasonal

When modelling the behaviour of a time series, one should be conscious of the possibility that the data is influenced by seasonal characteristics. For quarterly data, such a seasonal component could be modelled with the framework

\[\begin{eqnarray} y_{t} = \mu_{t} +\gamma_{1,t} + \varepsilon_{t}, & \; & \varepsilon_{t} \sim \mathsf{i.i.d.} \mathcal{N}(0,V_\varepsilon) \\ \mu_{t+1} = \mu_{t} + \upsilon_{t} + \xi_{t}, & \; & \xi_{t} \sim \mathsf{i.i.d.} \mathcal{N}(0,W_\xi) \\ \upsilon_{t+1} = \upsilon_{t} +\zeta_{t}, & \; & \zeta_{t} \sim \mathsf{i.i.d.} \mathcal{N}(0,W_\zeta) \\ \gamma_{1,t+1} = -\gamma_{1,t} -\gamma_{2,t} -\gamma_{3,t} +\omega_{t}, & \; & \omega_{t} \sim \mathsf{i.i.d.} \mathcal{N}(0,W_\omega) \tag{2.1}\\ \gamma_{2,t+1} = \gamma_{1,t}, \tag{2.2}\\ \gamma_{3,t+1} = \gamma_{2,t} \tag{2.3} \end{eqnarray}\]

Or alternatively,¹²

\[\begin{eqnarray*} \alpha_{t} = \left( \begin{array}{c} \mu_{t} \\ \upsilon_{t} \\ \gamma_{1,t} \\ \gamma_{2,t} \\ \gamma_{3,t} \\ \end{array} \right), \; \eta_{t} = \left( \begin{array}{c} \xi_{t} \\ \zeta_{t} \\ \omega_{t} \end{array} \right), \; G_{t} = \left( \begin{array}{ccccc} 1 & 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & -1 & -1 & -1 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ \end{array} \right), \; F_{t} = \left( \begin{array}{c} 1 \\ 0 \\ 1 \\ 0 \\ 0 \end{array} \right) \end{eqnarray*}\] \[\begin{eqnarray*} S_{t} = 1, \; W = \left( \begin{array}{ccc} W_\xi & 0 & 0 \\ 0 & W_\zeta & 0 \\ 0 & 0 & W_\omega \end{array} \right), \; R_{t} = \left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0\\ 0 & 0 & 1\\ 0 & 0 & 0\\ 0 & 0 & 0 \end{array} \right),\; V = V_\varepsilon \end{eqnarray*}\]

where the \(\gamma\)’s refer to the seasonal components and the disturbance \(\omega\) allows for the seasonal to change over time. When adding a seasonal component to a state-space model we usually require several state equations (i.e. frequency -1). In the above example, the last two equations, which represent identities, suggest that \(\gamma_1\) follows \(\gamma_2\) and \(\gamma_2\) follows \(\gamma_3\).

When we allow both the level and the slope to vary, and include a seasonal variable for monthly data (which would imply that we would need to included eleven different \(\gamma\) components to describe South African inflation), we are able to produce the results that are contained in Figure 5. The separate graph for the seasonal component is the depicted in Figure 6, which would once again appear to contain white noise.

Figure 5: Local level trend model with seasonal - SA inflation (1960Q1-2014Q1)

Figure 6: Local level trend model with seasonal - Seasonal component

The graph for the seasonal component suggests that there has not been much variation in the seasonal over time. Once again, the statistical results for this model are similar to those above where we note that the value of the negative log-likelihood is \(371.43\).¹³ Once again, the \(AIC\) is greater than the local level model. Therefore, this model fails to improve upon the fit of the first model.

2.4 The local level model with an intervention variable

In certain cases we may be interested in an assessment of the impact of a structural change on a particular time series over time. In a state-space framework, such effects can be described by adding intervention variables to any of the above models. Structural changes may result in a level shift, where the value of the level of the time series suddenly exhibits a permanent change at that point in time where the structural change takes place. Alternatively, it may be that a slope shift is experienced, where the value that is attached to the slope coefficient experiences a permanent change after the structural break. A third possibility is that of a pulse effect, where the value of the level suddenly changes and then returns to the previous level (i.e. prior to the structural change).

To determine the impact of a structural change in a local level model, one could estimate;

\[\begin{eqnarray*} y_{t} = \mu_{t} + \varepsilon_{t}, & \; & \varepsilon_{t} \sim \mathsf{i.i.d.} \mathcal{N}(0,V_\varepsilon) \\ \mu_{t+1} = \mu_{t} + \lambda_t w_{t} + \xi_{t}, & \; & \xi_{t} \sim \mathsf{i.i.d.} \mathcal{N}(0,W_\xi) \\ \lambda_{t+1} = \lambda_t \end{eqnarray*}\]

where \(w_t\) is a dummy variable.¹⁴ To investigate whether a level shift has occurred we would set the dummy to zero before the proposed structural change, and at one thereafter. If we wanted to determine whether this change was the result of a change in slope then we could include a slope coefficient in state equation for the intervention variable as we did previously. Alternatively, we could consider whether this break had a pulse effect, which would involve setting the dummy variable to 1 only at the point of the proposed structural change. We could then compare the information criteria from the respective models to see which of the models fits the data best.

To consider if there is a structural break in the level, using the above model, then we would consider whether the coefficient estimate for \(\lambda_t\) is significantly different to zero. Note that in the above example, we assume that \(\lambda_t\) is a fixed regression coefficient that will take on a single value. In this sense, one could employ the methodology that is presented in Chow (1960) for this analysis. Similarly, if we wanted to test for a structural break in the slope, then we would need to include the dummy and accompanying coefficient in the slope equation (rather than in the level).

2.5 The local level model with an explanatory variable

To describe the effect of explanatory variables on a time series within a state-space framework, we add the explanatory variables to the measurement equation of the model, such that the general representation produces;

\[\begin{eqnarray*} y_{t} = \mu_{t} + \beta_t x_{t}+ \varepsilon_{t}, & \; & \varepsilon_{t} \sim \mathsf{i.i.d.} \mathcal{N}(0,V_\varepsilon) \\ \mu_{t+1} = \mu_{t} + \xi_{t}, & \; & \xi_{t} \sim \mathsf{i.i.d.} \mathcal{N}(0,W_\xi) \\ \beta_{t+1} = \beta_{t} +\tau_{t}, & \; & \tau_{t} \sim \mathsf{i.i.d.} \mathcal{N}(0,W_\tau) \end{eqnarray*}\]

Should we wish to do so, we could then include an additional state equation to allow for a time-varying parameter. In this case, the coefficient \(\beta\) is allowed to vary over time. Note that if we removed the stochastic components, \(\tau\) and \(\xi\), then the above model would represent a traditional linear regression model.

In the following example, we make use of the dataset from J. Durbin and Watson (1951), which includes per capita consumption of spirits in the UK, per capita income, and the relative price of spirits on an annual basis from 1870 to 1938. In this example we allow for a stochastic trend that describes changes in tastes and habits that cannot be modelled explicitly. We also allow the \(\beta\) parameters to be time-varying, so that the regression coefficient would reflect current behaviour. The results of this model are shown in Figure 7.

Figure 7: Local level trend model with explanatory variable (1870-1938)

The output for the regressive variable is provided in the second panel and may be interpreted in a similar way to that of an ordinary regression coefficient. Hence, the average for the \(\beta\) coefficient, which is associated with price sensitivity (and has a value of around -0.9277), indicates that a one percent increase in price leads to a fall in spirit consumption of -0.9277 (on average). It is worth noting that as the value of this coefficient increases (in absolute terms) over time, it would suggest that consumers have become slightly more price sensitive.

To see what is happening to the smoother trend (which may be used to describe tastes or preferences), we note that there has been a gradual decline over time. Hence, over this period of time tastes would be moved away from hard spirits.

As is the case with all the other models, one could then test the residuals to ensure that they represent white noise.

2.6 Confidence Intervals

In the state-space framework the stochastic state components are associated with estimation errors. These stochastic errors have unique variances, which allows for the construction of confidence intervals around each of the state components. These may be used the uncertainty in each of the estimated processes. Letting \(W = 2\) we can produce confidence intervals using the standard formula,

\[\begin{eqnarray*} \mu_{t} \pm 2\sqrt{W} \end{eqnarray*}\]

An example of these confidence intervals, when applied to the local level model for the South African inflation data, is provided in Figure 8.

Figure 8: Local level model with confidence intervals - SA inflation (1960Q1-2014Q1)

3 The Kalman filter

The values for the state components can be estimated with an iterative filter, such as the one that has been proposed by Kalman (1960) and Kalman and Bucy (1961). This procedure would usually involve both filtering and smoothing, which are described below.¹⁵

3.1 Filtering

The values for the state components can be estimated with an iterative filter, such as the one that has been proposed by Kalman (1960) and Kalman and Bucy (1961). For a given model and set of observed variables, \(\{y_{1}, \ldots ,y_{n}\}\), the Kalman filter produces successive one-step ahead predictions conditional on the past and concurrent observations. Once we have point estimates for the one-step ahead predictions for the filtered-state, we can then also calculate the variance of this filtered-state variable.

Given the structure of the model, as presented in equations (1.5) and (1.6), the estimated Kalman filtered state at point \(t+1\) is denoted \(\alpha_{t+1}\), such that \(\alpha_{t+1} = \mathbb{E}_t \big[\alpha_{t+1} | y_t \big]\) and the estimated variance of this filtered-state variable is denoted \(P_{t+1} = \mathsf{var} \big[ \alpha_{t+1} | y_t \big]\). The central formula in the recursive Kalman filter may then be represented by,

\[\begin{eqnarray*} \alpha_{t+1} = \alpha_{t} + K_{t}(y_{t} - F_{t}'\alpha_{t}) \end{eqnarray*}\]

For the local level model this would reduce to,

\[\begin{eqnarray*} \mu_{t+1} = \mu_{t} + K_{t}(y_{t} - \mu_{t}) \end{eqnarray*}\]

If we consider a sample of hypothetical annual data in Figure 9, for the period 1978 to 1983, we note that at time point \(t = 1980\), the current value of the filtered state, \(\alpha_t\), is based on past observations, \(\{ y_{1970}, y_{1972}, \ldots, y_{1979} \}\). If at this point we had not observed value \(y_t\) then the best estimate for \(\alpha_{t+1}\) would be \(\alpha_t\), as indicated by the arrow extending from \(\alpha_t\). However, at time point \(t\) we do indeed observe point \(y_t\) and this value is fed into the above algorithm for the Kalman filter. The discrepancy between \(y_t\) and \(\alpha_t\) in 1980 is then used to update the estimate for \(\alpha_{t+1}\). Since the discrepancy, \(y_{t} - \alpha_{t}\), is a negative number, the estimate decreases from \(\alpha_t\) to \(\alpha_{t+1}\). Note that given the expression for the Kalman filter, larger values for \(K_t\) would lead to \(y_t\) being more influencial on subsequent values of state variable \(\alpha_t\). In addition, one may also observe that since the update of the filtered state at point \(t+1\) is based on values from time point \(t\), it always projects forward by one observation. This will be addressed in our discussion relating to the smoothing algorithm.

Figure 9: Kalman filter example (1978-1983)

Note that the original measurement equation for the local level model would imply that \(\varepsilon_{t+1} = y_{t+1} - \alpha_{t+1}\). These errors may be called the one-step ahead innovation errors (where they are regarded as innovations, since they bring in new information to the Kalman filter). To summarise the variance of the innovation errors, we use \(\sigma_{\varepsilon,t}\). In a similar way, we call the the difference between \(\alpha_{t+1} - \alpha_t\), the one-step ahead filtered-state estimation-error, or more commonly the prediction error. We denote the variance of the prediction error, \(P_t\).

The value of \(K_t\) is called the Kalman gain and it refers to the simultaneous compromise between the uncertainty of two issues. When the uncertainty of the state, based on past observations of the state variable, is relatively large then the prediction error variance will be large. In such cases, it would be ideal if \(K_t\) were to tend towards a value of one, as this would allow for new information that is obtained from \(y_t\) to have a large impact on the next value of the state. Similarly, if the difference between the observed variable \(y_t\) and the estimated state variable \(a_t\) is highly volatile (as would be the case when \(y_t\) includes a number of outliers), then it would be ideal if \(K_t\) were to tend towards zero.

Therefore, an appropriate statistic for the value of \(K_t\) may be derived as follows:

\[\begin{eqnarray*} K_t = \frac{P_{t}}{\sigma_{\varepsilon,t}}. \end{eqnarray*}\]

In the case of the local level model the value for \(P_t\) may be derived from the variance of \(\xi_t\), which is \(W_{\xi}\). Similarly, the value for \(\sigma_{\varepsilon,t}\) is obtained from the variance of \(\varepsilon_t\). Over time the values of \(P_t\) and \(\sigma_{\varepsilon,t}\) converge towards a constant value, which implies that \(K_t\) would also converge on a constant value.¹⁶ An example of this is provide in Figure 10.

3.2 Smoothing

The Kalman filter is a recursive algorithm that evaluates the respective one-step ahead estimates, the values of \(\alpha_{t+1}\) are largely related to the observation from a previous period, \(y_t\). The general idea behind the smoothing algorithm is to relate the estimate of \(\alpha_t\) to the observation that was realised during the same period of time, \(y_t\). Many potential algorithms exist to achieve this objective, and the interested reader is referred to Kitagawa and Gersch (1996) for an extensive treatment of these techniques.

One example of a smoother that may achieve this objective, would include the idea that we could start at the last observation of the time series, \(T\) and move towards to the first observation. Such a smoothing algorithm could be specified, in a similar way to that of the original Kalman filter, for an application to the filtered state of the local level model,

\[\begin{eqnarray*} \alpha^s_{t-1} = \alpha_{t} + J_{t-1}(\alpha^s_{t} - \alpha_{t}) \end{eqnarray*}\]

where \(\alpha^s_{t}\) is the smoothed estimate and \(\alpha_t\) is the filtered estimate. The value for \(J_{t-1}\) would then be determined by the ratio between the variance in \((\alpha^s_{t} - \alpha_{t})\) and the variance in \(\alpha_{t}\).

3.3 Example - Kalman filter in local level model

In the following example, we seek to estimate a state-space model for a simulated random-walk process that has \(50\) observations, where we make use of the assumption that the errors are distributed \(\xi_t = \mathsf{i.i.d.} \mathcal{N} [0,1]\) and \(\varepsilon_t = \mathsf{i.i.d.} \mathcal{N} [0,1]\). In addition, we make use of starting values, \(\alpha_0 = \mathcal{N} [0,1]\), for the state variable. After generating values for the filtered and smoothed values of the state variable, we are then able to inspect the values for the Kalman gain and the prediction errors. These are provided in Figure 10. Not that the Kalman gain quickly converges to a constant value. In addition, we also note that the predicted errors that have been made would appear to display behaviour that is consistent with a white noise process. These values are subjected to further tests in the following subsection.

Figure 10: Kalman gain and one-step ahead prediction errors

3.4 Diagnostic tests

Residuals should be independent, homoskedastic, and normally distributed. To investigate whether they satisfy these properties we consider the behaviour of the standardised prediction errors, which are defined as;

\[\begin{eqnarray*} e_t = \frac{\xi_t}{\sqrt{P_{t}}} \end{eqnarray*}\]

These standardised prediction errors are displayed in the attaching graph (11), which seeks to describe the inflation gap (i.e. the difference between core and actual inflation) with a local level trend model with seasonal and intervention components, and an explanatory variable (output gap).

Figure 11: Diagnostic tests on prediction errors

To test for independence we make use of the Box-Ljung statistic, where the residual autocorrelation from lag \(k\),

\[\begin{eqnarray*} r_k = \frac{\sum_{t=1}^{T-k} (e_t - \bar{e})(e_{t+k} - \bar{e} )}{\sum_{t=1}^{T} (e_t - \bar{e})^2} \end{eqnarray*}\]

and \(\bar{e}\) is the mean of the \(n\) residuals. The Box-Ljung statistic may then be expressed as,

\[\begin{eqnarray*} Q(k) = T(T+2) \sum_{l=1}^{k} \frac{r_l^2}{T-l} \end{eqnarray*}\]

for lags \(l=1, \ldots ,k\). This value is then compared to a \(\chi^2\) distribution with \((k-w+1)\) degrees of freedom (where \(w\) is the number of hyperparameters or disturbance variances). If the calculated value is less than the critical value at some level of significance (eg. \(5\%\)), \(Q(k) < \chi^2_{(k-w+1; 0.05)}\), it implies that the null of independence is not rejected and there is no reason to assume that the residuals are serially correlated.

The second assumption of homoskedasticity of the residuals may be tested by comparing the variance of the residuals in the first third of the series with the variance of the residuals of the last third of the series. Hence, the following statistic,

\[\begin{eqnarray*} H(h)=\frac{\sum_{t=T-h+1}^{T} e^2_t}{\sum_{t=d+1}^{d+h} e^2_t} \end{eqnarray*}\]

where \(d\) is the number of diffuse initial state values (there is usually one element for each state equation) and \(h\) is the nearest integer to \((T-d)/3\). This value is then compared to an \(F\)-distribution with \((h,h)\) degrees of freedom. When using a \(5\%\) level of significance the critical values for a two-tailed test correspond to the upper and lower \(2.5\%\) of the \(F\)-distribution. When \(H(h)>1\) then we test \(H(h) < F(h,h; 0.025)\), when \(H(h)<1\) then we test \(1/ H(h) < F(h,h; 0.025)\). If \(1< H(h) < F(h,h; 0.025)\), the null of equal variances is not rejected and there is no reason to assume a departure from homoskedasticity of the residuals.

Normality of the residuals can be tested with the following statistic which considers the skewness and kurtosis of the residual distribution,

\[\begin{eqnarray*} N=T\left(\frac{S^2}{6} + \frac{(K-3)^2}{24}\right) \end{eqnarray*}\]

with the skewness , \(S\), and the kurtosis, \(K\), being defined as,

\[\begin{eqnarray*} S= \frac{\frac{1}{T} \sum_{t=1}^{T} (e_t - \bar{e})^3}{\sqrt{\left(\frac{1}{T} \sum_{t=1}^{T} (e_t - \bar{e})^2\right)^3}} \end{eqnarray*}\] \[\begin{eqnarray*} K=\frac{\frac{1}{T} \sum_{t=1}^{T} (e_t - \bar{e})^4}{\left(\frac{1}{T} \sum_{t=1}^{T} (e_t - \bar{e})^2\right)^2} \end{eqnarray*}\]

If \(N<\chi^2_{(2;0.05)}\) the null hypothesis of normality is not rejected, and there is no reason to assume that the residuals are not normally distributed.

The further diagnostic tool considers an inspection of the auxiliary residuals, which are obtained by dividing the smoothed disturbances with the square root of their corresponding variances,

\[\begin{eqnarray*} \frac{\hat{\varepsilon}_t}{\sqrt{\mathsf{var}(\hat{\varepsilon}_t)}} \; \mathsf{and }\; \; \frac{\hat{\eta}_t}{\sqrt{\mathsf{var}(\hat{\eta}_t)}} \end{eqnarray*}\]

This results in the standardised smoothed disturbances. Inspection of the standardised smoothed observation disturbances allows for the detection of outliers whilst the detection of the standardised smoothed state disturbances allows for the detection of structural shifts.

If an outlier is detected then check for measurement error and where necessary include a pulse intervention (dummy) variable. For structural breaks in the level include a level shift intervention variable. The inclusion of such variables should always be based on some underlying theory concerning the cause of the structural break or outlier.

3.5 Forecasting

To compute forecasts of a time series we simply continue with the Kalman filter. When we arrive at the end of the sample, the update of the filter state at time point \(t=n\) equals,

\[\begin{eqnarray*} a_n = \alpha_{n-1} + K_{n-1}(y_{n-1} - z^{\prime}_{n-1} \alpha_{n-1}) \end{eqnarray*}\]

The last observation, \(y_n\), can then be used to update the filtered state at time point \(t=n+1\) as follows,

\[\begin{eqnarray*} \alpha_{n+1} = \alpha_{n-1} + K_{n-1}(y_{n} - z^{\prime}_{n} \alpha_{n}) \end{eqnarray*}\]

From \(n+1\) onwards the filtered state no longer changes and by letting \(\bar{\alpha}_{n+1} = \alpha_{n+1}\) the forecasts simply become \(\bar{\alpha}_{n+1+j} = \bar{\alpha}_{n+f}\), where \(F\) refers to the number of time points for the forecast (i.e. the lead time or forecasting horizon). Such forecasts are useful since they provide information on future developments based on the past, and in addition, they may also be used for out-of-sample testing to determine whether the series behaves according to our expectation during future periods. For such exercises we are often less conservative with confidence limits which may be decreased to 90% (where, \(\alpha_t \pm 0.64 \sqrt{P_t}\)) or even 85%. Such an exercise is presented in Figure 12, where we can see that the confidence interval quickly increases with time.

Figure 12: Filtered forecasts - SA inflation (1960Q1-2014Q1)

3.6 Missing observations

Missing observations are easily dealt with and are treated as if they were forecasts. In estimating the filtered state the values of the prediction errors and the Kalman gain \(K_t\) are set to zero whenever the value of an observation is missing.

4 Multivariate time series analysis

State-space models are easily generalised to cases where two or more time series need to be analysed simultaneously.

4.1 State Space representation of multivariate models

A multivariate time series model can take the state-space form,

\[\begin{eqnarray*} y_t = F_t \alpha_t + \varepsilon_t & \; & \varepsilon \sim \mathsf{i.i.d.} \mathcal{N}(0,V_\varepsilon) \\ \alpha_{t+1} = G_t \alpha_t + R_t \eta_t & \; & \eta_t \sim \mathsf{i.i.d.} \mathcal{N}(0,W_\eta) \end{eqnarray*}\]

for \(t=1, \ldots ,n\), where the vector \(y_t\) contains \(p\) observed time series at \(t\) and the vector \(\varepsilon_t\) contains a disturbance for each \(p\) variable. In this case the variance matrix \(V_\varepsilon\) of the order \(p \times p\) describes the unknown variance-covariance structure. The \(\alpha_t\) matrix contains unobserved variables and unknown fixed effects. Matrix \(F_t\) is of the order \(p \times m\) and it links the unobserved factors and the regression effects of the state vector with the observation vector. Matrix \(G_t\) is the transition matrix of order \(m \times m\). The \(\eta_t\) vector of order \(r \times 1\) contains the state disturbances with zero means and unknown variances and covariances collected in the variance matrix \(W_\eta\) of the order \(r \times r\). In standard cases \(r=m\) and the matrix \(R_t\) is the identity matrix \(I_m\), but it may be specified freely.

4.2 Multivariate trend model with regression effects

To illustrate that the application of the state-space framework to multivariate analyses consider the case where \(p=2\) the vectors and matrices could be given by,

\[\begin{eqnarray*} \alpha_{t} = \left( \begin{array}{c} \mu_{t}^{(1)} \\ \upsilon_{t}^{(1)} \\ \beta_{t}^{(1)} \\ \mu_{t}^{(2)} \\ \upsilon_{t}^{(2)} \\ \beta_{t}^{(2)} \\ \end{array} \right), \; \eta_{t} = \left( \begin{array}{c} \xi_{t}^{(1)} \\ \zeta_{t}^{(1)} \\ \xi_{t}^{(2)} \\ \zeta_{t}^{(2)} \end{array} \right), \; G_{t} = \left( \begin{array}{cccccc} 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \end{array} \right), \; F_{t} = \left( \begin{array}{cccccc} 1 & 0 & x_t & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & x_t \end{array} \right) \end{eqnarray*}\] \[\begin{eqnarray*} R_{t} = \left( \begin{array}{cccc} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1\\ 0 & 0 & 0 & 0 \end{array} \right),\; V_{t} = \left( \begin{array}{cc} V_{\varepsilon}^{(1)} & \mathsf{cov}(\varepsilon^{(1)},\varepsilon^{(2)}) \\ \mathsf{cov}(\varepsilon^{(1)},\varepsilon^{(2)}) & V_{\varepsilon}^{(2)} \end{array} \right), \; \end{eqnarray*}\] \[\begin{eqnarray*} W_{t} = \left( \begin{array}{cccc} W_{\xi}^{(1)} & 0 & \mathsf{cov}(\xi^{(1)},\xi^{(2)}) & 0 \\ 0 & W_{\zeta}^{(1)} & 0 & \mathsf{cov}(\zeta^{(1)},\zeta^{(2)}) \\ \mathsf{cov}(\xi^{(1)},\xi^{(2)}) & 0 & W_{\xi}^{(2)} & 0\\ 0 & \mathsf{cov}(\zeta^{(1)},\zeta^{(2)}) & 0 & W_{\zeta}^{(2)} \end{array} \right), \; \end{eqnarray*}\]

These matrices imply a bivariate local linear model where the same explanatory variable \(x_t\) is applied to both series in \(y_t\) and the superscripts denote whether they belong to the first or second series respectively. Such a system would describe the following two observation equations,

\[\begin{eqnarray*} y_{t}^{(1)} = \mu_{t}^{(1)} +\beta_{t}^{(1)} x_t + \varepsilon_{t}^{(1)},\\ y_{t}^{(2)} = \mu_{t}^{(2)} +\beta_{t}^{(2)} x_t + \varepsilon_{t}^{(2)}, \end{eqnarray*}\]

and the following six state equations,

\[\begin{eqnarray*} \mu_{t+1}^{(1)} = \mu_{t}^{(1)} + \upsilon_{t}^{(1)} + \xi_{t}^{(1)}, \\ \upsilon_{t+1}^{(1)} = \upsilon_{t}^{(1)} +\zeta_{t}^{(1)}, \\ \beta_{t+1}^{(1)} = \beta_{t}^{(1)},\\ \mu_{t+1}^{(2)} = \mu_{t}^{(2)} + \upsilon_{t}^{(2)} + \xi_{t}^{(2)}, \\ \upsilon_{t+1}^{(2)} = \upsilon_{t}^{(2)} +\zeta_{t}^{(2)}, \\ \beta_{t+1}^{(2)} = \beta_{t}^{(2)} \end{eqnarray*}\]

Similarly, we could estimate a state-space model where the explanatory variable is only included in the first and not the second equation such that the observation equations are given by

\[\begin{eqnarray*} y_{t}^{(1)} = \mu_{t}^{(1)} +\beta_{t}^{(1)} x_t + \varepsilon_{t}^{(1)},\\ y_{t}^{(2)} = \mu_{t}^{(2)} + \varepsilon_{t}^{(2)}, \end{eqnarray*}\]

Of course we could also make changes to the variance-covariance matrices without affecting the underlying model equations.

4.3 Common levels and slopes

Common slope components exist when the slope components are correlated, that is when, \(\mathsf{cov}(\zeta^{(1)},\zeta^{(2)}) \ne 0\). The multivariate model with unobserved component vectors that depend on correlated disturbances are termed seemingly unrelated time series models, which implies that although the disturbances of the components can be correlated, the equations remain seemingly unrelated. The level of dependence is measured most effectively by the correlation between the two disturbances,

\[\begin{eqnarray*} \mathsf{corr}(\zeta^{(1)},\zeta^{(2)}) = \frac{\mathsf{cov}(\zeta^{(1)} , \zeta^{(2)} )} {\sqrt{ \sigma^2_{\zeta^{(1)}} \sigma^2_{\zeta^{(2)}} }} \end{eqnarray*}\]

where \(-1 < \mathsf{corr}(\zeta^{(1)}, \zeta^{(2)}) \leq 1\). When the correlation is close to zero the slope components do not have much in common, however when it is close to 1 then the one slope can be expressed as a linear combination of the other slope and the following variance matrix has rank one,

\[\begin{eqnarray*} \left( \begin{array}{cc} \sigma_{\zeta^{(1)}}^{2} & \mathsf{cov}(\zeta^{(1)},\zeta^{(2)}) \\ \mathsf{cov}(\zeta^{(1)},\zeta^{(2)}) & \sigma_{\zeta^{(2)}}^{2} \end{array} \right), \; \end{eqnarray*}\]

The same arguments apply to the disturbances of the other components (including the irregular vector \(\varepsilon_t\)). For example, when the variance matrix of the disturbance vector associated with the level component has rank 1 we have \(\mathsf{corr}(\xi^{(1)}, \xi^{(2)}) \pm 1\)

\[\begin{eqnarray*} \left( \begin{array}{cc} W_{\xi}^{(1)} & \mathsf{cov}(\xi^{(1)},\xi^{(2)}) \\ \mathsf{cov}(\xi^{(1)},\xi^{(2)}) & W_{\xi}^{(2)} \end{array} \right), \; \end{eqnarray*}\]

In a local level model without a slope, the level is said to be common.¹⁷

When constructing economic or financial models we may wish to impose rank restrictions such that

\[\begin{eqnarray*} \left( \begin{array}{cc} W_{\zeta}^{(1)} & \mathsf{cov}(\zeta^{(1)},\zeta^{(2)}) \\ \mathsf{cov}(\zeta^{(1)},\zeta^{(2)}) & W_{\zeta}^{(2)} \end{array} \right) = \left( \begin{array}{cc} a & 0 \\ b & c \end{array} \right) \left( \begin{array}{cc} a & b \\ 0 & c \end{array} \right) \end{eqnarray*}\]

The (possible) existence of a common component can lead to more insights in certain aspects of time series of interest. Where the variance matrices \(V_t\) and \(W_t\) are restricted to only include diagonal elements we are performing a totally unrelated univariate analyses.

4.4 An illustration of multivariate state-space analysis

Figure 13: Bivariate model for USA Inflation and GDP

Rather than estimate the output and inflation gaps with univariate models we may wish to model these series as a bivariate model where we can test to see whether the components of these series move together. Such a general unrestricted model could take the form,

\[\begin{eqnarray*} \left( \begin{array}{c} \pi_t \\ y_t \end{array} \right) = \left( \begin{array}{c} \mu_t^\pi \\ \mu_t^y \end{array} \right) + \left( \begin{array}{c} \psi_t^\pi \\ \psi_t^y \end{array} \right) + \left( \begin{array}{c} \varepsilon_t^\pi \\ \varepsilon_t^y \end{array} \right) \end{eqnarray*}\]

where the \(\mu_t^\pi\) and \(\mu_t^y\) trend components could be modeled as random walk processes which could be integrated. We could then investigate whether the cyclical components exhibit similar behaviour, such that the cycle of inflation depends on both the GDP cycle and the series own intrinsic cycle. Hence,

\[\begin{eqnarray*} \pi_t = \mu^\pi_t + \beta \psi_t^y + \psi_t^{\pi \dagger} + \varepsilon_t^{\pi} \end{eqnarray*}\]

We could then compare the respective models, one with the restriction and one without the restriction to determine whether the inclusion of the restriction improves the model fit (by comparing AIC and other statistics). In this case the restriction does not improve the fit of the model.

5 State Space and Box-Jenkins methods for Time Series Analysis

This section considers the relative merits of state-space and Box-Jenkins methods. It starts off with a brief recap of central elements of the Box-Jenkins approach.

5.1 Stationary processes and related concepts

A stochastic process, \(\mu_t\), is weakly stationary if it’s mean, variance and autocovariance are constant. In addition, a stochastic process is called a purely random process if it consists of random variables, \(\eta_t\), that are mutually independent and identically distributed. Such a process is also stationary since it has constant mean and variance, and all the autocorrelations are zero.

In contrast, a process is a random walk if,

\[\begin{eqnarray*} \mu_{t+1} = \mu_1 + \sum_{j=1}^{t} \eta_j = \mu_t + \eta_t \end{eqnarray*}\]

It follows that the first difference of a random walk yields a stationary random process.

\[\begin{eqnarray*} \Delta \mu_{t+1} = \mu_t - \mu_{t-1} = \eta_{t-1} \end{eqnarray*}\]

Where \(\eta_t\) is a purely random process with mean zero and variance \(\sigma^2\). Then a process, \(\mu_t\) is called a moving average process of order \(q\), [MA(\(q\))], if

\[\begin{eqnarray*} \mu_t = \beta_0 \eta_t + \beta_1 \eta_{-1} + \beta_2 \eta_{t-2} + \ldots + \beta_q \eta_{t-q} \end{eqnarray*}\]

For example a first order MA(1) process may be given as,

\[\begin{eqnarray*} \mu_t = \eta_t + 0.5 \eta_{t-1} \end{eqnarray*}\]

A pure MA(\(q\)) process is always a stationary process. For example, where \(\beta_0 =1\)

\[\begin{eqnarray*} \frac{\beta_1}{1+ \beta_1^2} = \frac{0.5}{1+ 0.5^2} = 0.4 \end{eqnarray*}\]

When \(\eta_t\) is a purely random process with mean zero and variance \(\sigma^2\). Then the process, \(\mu_t\) is called a moving average process of order \(p\), [AR(\(p\))], if

\[\begin{eqnarray*} \mu_t = \alpha_1 \mu_{t-1} + \alpha_2 \mu_{t-2} + \ldots + \alpha_p \mu_{t-p} + \eta_t \end{eqnarray*}\]

In this case, \(\mu_t\) is regressed on past values of itself. An example of an AR(1) process would be,

\[\begin{eqnarray*} \mu_t = 0.5 \mu_{t-1} + \eta_t \end{eqnarray*}\]

Such an AR(1) process is stationary since the coefficient is within the unit circle.

The combination of autoregressive and moving average processes produces a autoregressive moving average model, ARMA(\(p,q\)), with \(p\) AR terms and \(q\) MA terms. It may be expressed as,

\[\begin{eqnarray*} \mu_t = \alpha_1 \mu_{t-1} + \alpha_2 \mu_{t-2} + \ldots + \alpha_p \mu_{t-p} + \eta_t + \beta_1 \eta_{t-1} + \beta_2 \eta_{t-2} + \ldots + \beta_q \eta_{t-q} \end{eqnarray*}\]

An example of such a model could take the following form, which is stationary,

\[\begin{eqnarray*} \mu_t = 0.5 \mu_{t-1} + \eta_t + 0.5 \eta_{t-1} \end{eqnarray*}\]

5.2 Non-stationary ARIMA models

A typical Box-Jenkins approach to time series proceeds as follows, For a time series that includes some non-stationary features (due to trend and/or seasonal effects), the observed time series is transformed into a stationary series using time and lag functions (often through differencing). This may involve removing the trend by taking the first difference of an observed series, \(y_t\), to create a new series \(y_t^\star\),

\[\begin{eqnarray*} y_t^\star = \Delta y_t = y_t - y_{t-1} \end{eqnarray*}\]

Alternatively the seasonal with periodicity \(s\) may be removed by differencing,

\[\begin{eqnarray*} y_t^\star = \Delta_s y_t = y_t - y_{t-s} \end{eqnarray*}\]

Or in certain cases it may be necessary to remove both the trend and the seasonal, which would involve,

\[\begin{eqnarray*} y_t^\star = \Delta \Delta_s y_t = (y_t - y_{t-s}) - (y_{t-1} - y_{t-s-1}) \end{eqnarray*}\]

In cases where the variable is still not stationary, the differencing procedure can be continued by taking the second difference,

\[\begin{eqnarray*} y_t^\star = \Delta^2 \Delta_s^2 y_t, \end{eqnarray*}\]

After sufficient differencing the appropriate AR(p), MA(q) or ARMA(p,q) the model that can best account for the differenced time series needs to be identified, where the residuals of the best model should follow a random process.

5.3 Unobserved components and ARIMA

To consider the similarities between state-space and ARIMA models, recall that the local level model has the form,

\[\begin{eqnarray*} y_t = \mu_ + \varepsilon_t \end{eqnarray*}\] \[\begin{eqnarray} \mu_t = \mu_{t-1} + \eta_t \tag{5.1} \end{eqnarray}\]

Where the first difference of \(y_t\) yields,

\[\begin{eqnarray} \Delta y_{t} = y_t - y_{t-1} = \mu_{t} - \mu_{t-1} + \varepsilon_t - \varepsilon_{t-1} \tag{5.2} \end{eqnarray}\]

Since (5.1) implies that,

\[\begin{eqnarray} \mu_t - \mu_{t-1} = \eta_t \tag{5.3} \end{eqnarray}\]

We can rewrite (5.2) as,

\[\begin{eqnarray*} \Delta y_{t} = y_t - y_{t-1} = \eta_{t-1} + \varepsilon_t - \varepsilon_{t-1} \end{eqnarray*}\]

Which is stationary and has the same correlogram as an MA(1) process. This implies that the local level model is an ARIMA(0,1,1). Similarly we can also show that a local level trend model can be represented as a ARIMA(0,2,2). A comprehensive overview of the equivalencies between state-space and ARIMA models is provided in Harvey (1989). Finally it should be noted that ARIMA models could also be put into a state-space form and fitted by state-space methods.

5.4 State-space versus ARIMA approaches

Despite the similarities between the two approaches they differ in a number of regards. The most important of which is that the state-space approach seeks to explicitly model the non-stationarity in terms of the trend and the seasonal components, whilst the Box-Jenkins approach treats these as nuisance components which need to be removed prior to any analysis. Hence the state-space approach seeks to provide an explicit structural framework for the simultaneous decomposition of a time series into the respective dynamic components, whilst the Box-Jenkins approach is primarily concerned with short-run dynamics and short-run forecasts.

Furthermore, since we are never really sure as to whether a series is stationary or non-stationary (and this is particularly so for most economic and financial time series) there may be problems with the application of the Box-Jenkins approach. In state-space methods stationarity is not required and it is often easier to deal with missing data, time-varying regression coefficients and multivariate extensions in this framework.

6 Conclusion

Models with unobserved components are frequently encountered in economics and finance. These processes may be modelled with the aid of state-space representation that provides a parsimonious way of modelling dynamic multivariate systems. These models avoid the need to use ad hoc proxy variables for the unobservable variables, which can result in biased and inconsistent parameter estimates. Researchers are able to employ the Kalman filter for the identification and extraction of the unobserved components in the model. This iterative technique has been used in many examples, in cases where the state variables exhibit linear properties.

Then finally, it is worth noting that Bayesian techniques are frequently used to estimate the parameters in these models as one is then able to assume that all the parameters and unobserved variables are modelled as random variables. In addition, these estimation techniques also lend themselves to instances where a nonlinear filter or smoother may need to be employed.¹⁸

7 References

Anderson, B. D. O., and J. B. Moore. 1979. Optimal Filtering. New York: Englewood Cliffs: Prentice-Hall.

Carter, C.K., and R. Kohn. 1994. “On Gibbs Sampling for State Space Models.” Biometrika 81: 541–53.

Chow, G. C. 1960. “Tests of Equality Between Sets of Coefficients in Two Linear Regressions.” Econometrica 28: 591–605.

Commandeur, Jacques J.F., and Siem Jan Koopman. 2007. An Introduction to State Space Time Series Analysis. Oxford: Oxford University Press.

De\(\;\)Jong, P. 1991. “The Diffuse Kalman Filter.” Annals of Statistics 19: 1073–83.

Durbin, J., and G. Watson. 1951. “Testing for Serial Correlation in Least Squares Regression Ii.” Biometrika 38: 159–78.

Durbin, James, and Siem Jam Koopman. 2001. Time Series Analysis by State Space Methods. Oxford: Oxford University Press.

———. 2012. Time Series Analysis by State Space Methods. Second. Oxford: Oxford University Press.

Hamilton, James D. 1994. Time Series Analysis. Princeton: Princeton University Press.

Harvey, Andrew C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press.

———. 1993. Time Series Models. Cambridge, Mass: MIT Press.

Kalman, R. E. 1960. “A New Approach to Linear Filtering and Prediction Problems.” Journal of Basic Engineering, Transactions ASMA, Series D 82: 35–45.

Kalman, R. E., and R. S. Bucy. 1961. “New Results in Linear Filtering and Prediction Theory.” Journal of Basic Engineering, Transactions ASMA, Series D 83: 95–108.

Kim, Chang-Jin, and Charles R. Nelson. 1998. State-Space Models with Regime-Switching: Classical and Gibbs-Sampling Approaches with Applications. Cambridge, Mass: MIT Press.

Kitagawa, Genshiro, and Will Gersch. 1996. Smoothness Priors Analysis of Time Series. Vol. 116. Springer Verlag.

Petris, Giovanni, Sonia Petrone, and Patrizia Campagnoli. 2009. Dynamic Linear Models with R. Edited by Robert Gentleman, Kurt Hornik, and Giovanni Parmigiani. New York: Springer.

Pole, P.J., and M. West. 1988. Nonnormal and Nonlinear Dynamic Bayesian Modeling. Edited by J.C. Spall. Vol. Bayesian Analysis of Time Series and Dynamic Linear Models. New York: Marcel Dekker.

Prado, Raquel, and Mike West. 2010. Time Series - Modeling, Computation, and Inference. Boca-Raton, Florida: Chapman & Hall.

Shumway, R., and D. Stoffer. 2010. Time Series Analysis and Its Applications. New York: Springer-Verlag.

Stroud, J. R., P. Muller, and N. G. Polson. 2003. “Nonlinear State-Space Models with State Dependent Variance.” Journal of the American Statistical Association 98: 377–86.

West, M., and J. Harrison. 1997. Bayesian Forecasting and Dynamic Models. Edited by Second. New York: Springer-Verlag.

The interested reader is referred to a number of excellent alternative expositions that include Commandeur and Koopman (2007), Harvey (1989), Harvey (1993), Hamilton (1994), Kim and Nelson (1998), Shumway and Stoffer (2010), and James Durbin and Koopman (2012).↩
In many economic applications the components combine multiplicatively, however, by working with logarithmic values we are able to reduce the multiplicative model to the form that is represented by equation (1.1).↩
Of course many other processes could be used to describe the evolution of \(\alpha\), i.e. an AR(2), ARMA(2,1), etc.↩
This linear Gaussian state-space model is commonly referred to as the local level model.↩
See, Harvey (1989) for more on this.↩
In what follows, \(V_\varepsilon\) is used to describe the vector of terms for the variance of the stochastic error in the measurement equation; and \(W_\eta\) is used to describe the vector of terms for the variance of the respective stochastic errors in the state equations.↩
Eviews is not able to apply restrictions on multivariate models.↩
A useful Matlab toolbox is SSM.↩
Although specifying the matrices in the general form would seem like a pointless (and thankless) task, it should be borne in mind that when specifying more complex models, the software that you need to use would usually require that the input takes this form.↩
One could test this formally with the aid of autocorrelation functions and Q-statistics.↩
To confirm this specification of the model, consider in period \(t=1\); \(\;y_1 = \mu_1 + \varepsilon_1\), where \(\; \mu_2 = \mu_1 + \upsilon_1\) and \(\; \upsilon_2 = \upsilon_1\). Then in period \(t=2\); \(\;y_2 = \mu_2 + \varepsilon_2 = \mu_1 + \upsilon_1 + \varepsilon_2\), where \(\; \mu_3 = \mu_2 + \upsilon_2 = \mu_1 + 2 \upsilon_1\) and \(\; \upsilon_3 = \upsilon_1\). Then finally in \(t=3\); \(\;y_3 = \mu_3 + \varepsilon_3 = \mu_1 + 2 \upsilon_1 + \varepsilon_3\), where \(\; \mu_4 = \mu_3 + \upsilon_3 = \upsilon_1 + 3 \upsilon_1\) and \(\; \upsilon_4 = \upsilon_1\).↩
In the above model, equations (2.1) to (2.3) refer to the construction of a matrix of dummy variables. To see how this is constructed note that we could choose to make use of the fact that \(\gamma_{2,t} = \gamma_{1,t-1}\) and \(\gamma_{3,t} = \gamma_{1,t-2}\). This would imply that equation (2.1) would refer to the passage of time.↩
In this particular case the Hessian is singular as the variance of the stochastic term in the seasonal is zero. This prevents us from reporting on the variance of the error terms and we should conclude that the seasonal should not be included.↩
i.e. \(w_t\) is a binary variable that takes on values of \(0\) or \(1\).↩
For alternative explanations see, Anderson and Moore (1979), Harvey (1989), or James Durbin and Koopman (2001).↩
With this in mind, the starting value for \(P_t\) would usually be an extremely large number, as we assume that there is little useful information about the initial value of \(a_0\). This ensures that the initial values from \(K_t\) would tend towards one. The only case where this may not be appropriate, is where the we are seeking to model a random walk process, where one should make use of a diffuse initialisation De\(\;\)Jong (1991).↩
However, for a local level model with a stochastic slope that has full rank in the variance-covariance matrix for the slope and reduced rank in the variance-covariance of the level components, the resulting level components are not in common as they cannot be expressed as linear functions of the other.↩
The interested reader is referred to Carter and Kohn (1994), Pole and West (1988), West and Harrison (1997), Stroud, Muller, and Polson (2003), Petris, Petrone, and Campagnoli (2009), and Prado and West (2010).↩

State-Space Modelling

by Kevin Kotzé