Forecasting

# Forecasting
### Kevin Kotzé

---

---
# Contents

1. Introduction
1. Forecasts with autoregressive models
1. Forecast errors and uncertainty
1. Forecast evaluation
1. Model combination
1. Alternative forecasting strategies
1. Conclusion

---
# Introduction

- Significant part of time series literature considers the ability of a model to predict the future behaviour of a variable
- Future can of course be the next minute, day, month, year, etc. (i.e. trader vs. governor)
- Not always easy as economic outcomes result from complex interactions between individuals, firms & organizations
    - Resulted in the development of modern macroeconomic models that are large and mathematically complicated
    - However, forecasts of relatively simple models are in many cases comparable
    - Techniques for evaluation need to be carefully applied
- To determine whether a forecast provides a reasonable degree of accuracy 
    - Forecast should represent the value that will eventually be realised
    - Should characterise the uncertainties associated with the forecast

---
# Notation

- Want to generate a number of `\(h\)`-step ahead forecasts, where `\(h=\{1,2, \ldots, 8\}\)`
- Forecasting horizon may be represented by `\(H\)`, where in this case, `\(H=8\)`
- `\(I_t\)` contains all the information at time `\(t\)`, which usually stems from all current & past realised values of a variable or variables
- For an out-of-sample evaluation we compare the forecasts against future realisations
- Divide the sample size, `\(T + H\)` into an in-sample portion `\(R\)` and an out-of-sample portion `\(P\)`
- To perform out-of-sample evaluation one would usually generate a number of successive forecasts, where `\(I_t\)` moves incrementally towards `\(T\)`
- Make use of recursive schemes or rolling window schemes

---
background-image: url(image/notation.svg)
background-position: top
background-size: 90% 90%

Figure : Notation for different forecasting schemes

---
# Forecasts with autoregressive models

- Given `\(AR(1)\)` model with `\(\varepsilon_{t}\)` `\(\sim \mathsf{i.i.d.} \; \mathcal{N} (0,\sigma ^{2})\)`,

`\begin{eqnarray}
y_{t}=\phi_{1}y_{t-1}+\varepsilon_{t}
\end{eqnarray}`

- To calculate future values of `\(y_{t}\)` over `\(h\)`-steps ahead, we iterate forward

`\begin{eqnarray}
y_{t + 1} &=&\phi_{1}y_{t}+\varepsilon_{t + 1}\\
y_{t + 2} &=&\phi_{1}y_{t + 1}+\varepsilon_{t + 2} \\
\vdots &=&\vdots\\
y_{t + H} &=&\phi_{1}y_{t + H-1}+\varepsilon_{t + H}
\end{eqnarray}`

---
# Forecasts with autoregressive models

- Inserting the first line into the second,

`\begin{eqnarray}
y_{t+2} &=&\phi_{1}(\phi_{1}y_{t}+\varepsilon_{t+1})+\varepsilon _{t+2}\\
 &=&\phi_{1}^{2}y_{t}+\phi_{1}\varepsilon_{t+1}+\varepsilon _{t+2} 
\end{eqnarray}`

- Such that we could derive the representation,

`\begin{eqnarray}
y_{t + h}=\phi_{1}^{h}y_{t}+\overset{h - 1}{\underset{i = 0}{\sum }}\phi_{1}^{i}\varepsilon_{t + h - i}
\end{eqnarray}`

- Thus `\(y_{t + h}\)` is a function of `\(I_t\)`, which comprises of information relating to current and past values of `\(y_t\)`
- Actual observed (i.e. realised) future values of `\(y_{t + h}\)` will also contain the effects of future shocks
- We don't have information about these shocks at time `\(t\)`

---
# Forecasts with autoregressive models

- The point forecast refers to the conditional expectation, `\(\mathbb{E}_t \left[ y_{t + h}|I_t \right] = \mathbb{E}_t \left[ y_{t + h}|y_{t}\right]\)`
- When the error is Gaussian white noise, this conditional expectation is easy to compute:
  - One period ahead forecast is, `\(\mathbb{E}_t \left[ y_{t + 1}|y_{t} \right] =\phi_{1}y_{t}\)`
  - Two period ahead forecast is `\(\mathbb{E}_t \left[ y_{t + 2}|y_{t}\right] =\phi_{1}^{2}y_{t}\)`

- Or using the more general expression,

`\begin{equation}
\mathbb{E}_t \left[ y_{t + h} | y_{t}\right] =\phi_{1}^{h}y_{t}
\end{equation}`

- `\(\mathbb{E}_t \left[ y_{t + h} | y_{t}\right]\)` is sometimes called a predictor, which we can denote `\(\acute{y}_t(h)\)`

---
# Forecasts with autoregressive models

- For a stable process, where `\(|\phi_{1}|<1\)`, in `\(y_{t}=\phi_{1}y_{t-1}+\varepsilon_{t}\)`

`\begin{equation}
\mathbb{E}_t \left[ y_{t + h} | y_{t}\right] \rightarrow 0 \;\;\; \text{when } \; h \rightarrow \infty
\end{equation}`

- Effect of shocks in `\(y_{t }\)` dissipate as the forecast horizon increases
- When including an intercept to the stable `\(AR(1)\)` equation,

`\begin{equation}
y_{t }=\mu +\phi_{1}y_{t - 1}+\varepsilon_{t} 
\end{equation}`

- After recursive substitution we derive the expression,

`\begin{equation}
\mathbb{E}_t \left[ y_{t + h} | y_{t}\right] =(1+\phi_{1}+\phi_{1}^{2}+ \ldots + \phi_{1}^{h -1})\mu +\phi_{1}^{h}y_{t} 
\end{equation}`

- which may be written,

`\begin{equation}
\mathbb{E}_t \left[ y_{t + h} | y_{t}\right] \longrightarrow \frac{\mu }{1-\phi_{1}}\text{ when }h \rightarrow \infty 
\end{equation}`

---
# Forecasts with autoregressive models

- The case of an `\(AR(p)\)` model with intercept,

`\begin{equation}
y_{t}=\mu +\overset{p}{\underset{i = 1}{\sum}} \phi_{p} y_{t - i}+\varepsilon_{t}
\end{equation}`

- We take the conditional expectation at each forecast horizon,

`\begin{eqnarray}
\mathbb{E}_t \left[ y_{t + 1} | y_{t}\right] &=&\mu +\phi_{1}y_{t }+\phi_{2}y_{t-1 }+ \ldots +\phi_{p}y_{t -p+1} \\
\mathbb{E}_t \left[ y_{t + 2} | y_{t}\right] &=&\mu +\phi_{1}\mathbb{E}_t \left[ y_{t + 1} | y_{t}\right] +\phi_{2}y_{t }+ \ldots +\phi_{p}y_{t -p+2} \\
\vdots &=&\vdots\\
\mathbb{E}_t \left[ y_{t + h} | y_{t}\right] &=&\mu +\phi_{1}\mathbb{E}_t \left[ y_{t - 1+h} | y_{t}\right] +\phi_{2} \mathbb{E}_t \left[ y_{t - 2+h} | y_{t}\right] + \ldots \\
&&+\phi_{p}\mathbb{E}_t \left[ y_{t -p+h} | y_{t}\right]
\end{eqnarray}`

---
# Forecast errors and uncertainty

- The forecast error, `\(\acute{e}_t(h)\)`, in period `\(t+h\)`

`\begin{eqnarray}
\acute{e}_t(h) = y_{t+h}-\acute{y}_t(h)
\end{eqnarray}`

- where `\(y_{t+h}\)` is the *ex-post* actual realisation of the variable
- Using this expression and the recursive formulas that are provided previously, we calculate the forecast error at different horizons

`\begin{eqnarray}
\acute{e}_t(1) = y_{t+1}-\acute{y}_t(1) &=&(\phi_{1}y_{t}+\varepsilon_{t+1})-\phi_{1}y_{t}=\varepsilon_{t+1} \\
\acute{e}_t(2) = y_{t+2}-\acute{y}_t(2) &=&(\phi_{1}^{2}y_{t}+\phi_{1}\varepsilon _{t+1}+\varepsilon_{t+2})-\phi_{1}^{2}y_{t}=\phi_{1}\varepsilon_{t+1}+\varepsilon_{t+2} \\
\vdots &=&\vdots\\
\end{eqnarray}`

---
# Forecast errors and uncertainty

- At horizon `\(h\)`,

`\begin{eqnarray}
\acute{e}_t(h) &=& y_{t+h}-\acute{y}_t(h)=\left( \phi_{1}^{h}y_{t}+\overset{h -1}{\underset{i =0}{\sum }}\phi_{1}^{i}\varepsilon_{t+h-i}\right) -\phi_{1}^{h}y_{t} \\
&=& \overset{h -1}{\underset{i =0}{\sum }}\phi_{1}^{i}\varepsilon_{t+h-i} 
\end{eqnarray}`

- Suggests the forecast errors are the coefficients of an MA representation of the `\(AR(1)\)` process
- If we assume that the original model is correctly specified and the residuals are Gaussian white noise, then the expected value of all future realisations of the forecast error will be zero

---
# Forecast errors and uncertainty

- Therefore, when we assume that `\(\mathbb{E}_t \left[ \varepsilon_{t + h} | I_t\right] = 0\)`, it implies that

`\begin{eqnarray}
\mathbb{E}_t \left[\acute{e}_t(h) \right]= \mathbb{E}_t \left[ y_{t+h}-\acute{y}_t(h)\right] =\mathbb{E}_t \left[ y_{t+h}\right] -\mathbb{E}_t \left[ \acute{y}_t(h)\right] =0 
\end{eqnarray}`

- If this is the case it would imply that the predictor is unbiased

---
# Mean square errors

- MSE is a quadratic loss function that is widely used to evaluate the forecasting accuracy of a particular model
- May also be used for the forecast error variance when constructing forecast intervals
- Denote the MSE for the `\(h\)`-step ahead forecast error as `\(\acute{\sigma}_{t}(h)\)`, where

`\begin{eqnarray}
\mathbb{E}_t \left[ \acute{\sigma}_{t}(h) \right] &=& \mathbb{E}_t \left[ \left(y_{t+h}-\acute{y}_t(h) \right)^{2}\right] \\
&=&\mathbb{E}_t \left[ \left( \overset{h -1}{\underset{i =0}{\sum }}\phi_{1}^{i}\varepsilon_{t+ h -i}\right) \left( \overset{h -1}{\underset{i=0}{\sum }}\phi_{1}^{i}\varepsilon_{t + h -i}\right) \right]
\end{eqnarray}`

- where we can move the `\(\phi\)` terms outside the expectation
- and `\(\mathbb{E}_t \left[ \varepsilon_{t + h -i}\varepsilon _{t + h -i}\right] =\sigma_{\varepsilon}^{2}\)` for all `\(h\)`

---
# Mean square errors

- Hence, `\(\acute{\sigma}_{y}(h)=\sum_{i = 0}^{h -1}\phi_{1}^{i}\sigma_{\varepsilon }^{2}\sum_{i = 0}^{h -1}\phi_{1}^{i}\)`, where,

`\begin{eqnarray}
\acute{\sigma}_{y}(1) &=&\sigma_{\varepsilon }^{2} \\
\acute{\sigma}_{y}(2) &=&\sigma_{\varepsilon }^{2}+\phi_{1}^{2}\sigma_{\varepsilon }^{2}=\acute{\sigma}_{y}(1)+\phi_{1}^{2}\sigma_{\varepsilon }^{2}\\
\acute{\sigma}_{y}(3) &=&\sigma_{\varepsilon }^{2}+\phi_{1}^{2}\sigma_{\varepsilon }^{2}+\phi_{1}^{4}\sigma_{\varepsilon }^{2}=\acute{\sigma}_{y}(2)+\phi_{1}^{4}\sigma_{\varepsilon }^{2} \\
&\vdots& \\
\acute{\sigma}_{y}(h) &=&\sigma_{\varepsilon }^{2}(1+\phi_{1}^{2}+\phi_{1}^{4}+ \ldots +\phi_{1}^{2(h-1)})\\
&=& \acute{\sigma}_{y}(h-1)+\phi_{1}^{h-1}\sigma_{\varepsilon }^{2}\phi_{1}^{h-1} 
\end{eqnarray}`

- This is equivalent to the unconditional variance of the process

---
# Mean square errors

- Therefore,

`\begin{eqnarray}
\acute{\sigma}_{y}(h)\rightarrow \frac{\sigma_{\varepsilon }^{2}}{1 -\phi_{1}^{2}} \;\; \text{where } \;\; h\rightarrow \infty
\end{eqnarray}`

- Assuming that the errors are distributed `\(\varepsilon_t \sim \mathcal{N}(0,\sigma^2)\)`
- *Expected* forecast errors will also have a similar distribution, such that, `\(\mathbb{E}_t \left[ \acute{\sigma}_{t}(h) \right] = \sigma^2\)`
- This would only arise when we are using the correct specification of the underlying data generating process

---
# Uncertainty

- While the forecast for the correctly specified model will on average be equal to the true value that we want to forecast (unbiased)
- They will not be equal to the true value of the process at each and every period of time (there is a degree of variance)
- In many instance we supplement the information of a point forecast with a measure of uncertainty
  - e.g. central banks publish fan charts for inflation forecasts
  - communicate view on possible paths for future inflation

---
background-image: url(image/inf_fan.svg)
background-position: top
background-size: 85% 85%

Figure : Uncertainty - South African inflation fan chart (SARB June 2014)

---
# Uncertainty

- When the model residuals are `\(\varepsilon_{t}\sim \mathsf{i.i.d.} \;\; \mathcal{N}(0,\sigma_{\varepsilon }^{2})\)`, the forecast errors are,

`\begin{eqnarray}
y_{t+h}-\acute{y}_t(h)\sim \;\; \mathcal{N}(0,\acute{\sigma}_{y}(h))
\end{eqnarray}`

- Such that,

`\begin{eqnarray}
\frac{y_{t+h}-\acute{y}_t(h)}{\sqrt{\acute{\sigma}_{y}(h)}}\sim \;\; \mathcal{N}(0,1)
\end{eqnarray}`

- Denoting `\(z_{\alpha }\)` is the upper `\(\alpha 100\%\)` of the normal distribution, 
- The bounds of the distribution are then,

`\begin{eqnarray}
\left[\acute{y}_t(h)-z_{\alpha /2}\sqrt{\acute{\sigma}_{y}(h)}\;\;\; , \;\;\; \acute{y}_t(h)+z_{\alpha /2}\sqrt{\acute{\sigma}_{y}(h)}\right]
\end{eqnarray}`

---
# Uncertainty - Example:

- Let `\(\varepsilon_{t}\sim \mathcal{N}(0,0.01\)`), and `\(\alpha =0.05\)`, implying that `\(z_{\alpha /2}=1.96\)` in large samples
- Assume an `\(AR(1)\)` with `\(\mu =0.4\)`, `\(\phi_{1}=0.7\)`, and `\(y_{t}=2\)`
- For forecast horizon `\(h=1,5,10\)` we can derive the point forecasts

| Point Estimate | MSE | Lower Bound | Upper Bound
--|----------------|-----|-------------|-------------
`\(\acute{y}_t(1)\)`   | 1.80 | 0.10 | 1.18 | 2.42
`\(\acute{y}_t(5)\)`   | 1.45 | 0.19 | 0.59 | 2.30
`\(\acute{y}_t(10)\)`  | 1.35 | 0.20 | 0.48 | 2.22

---
# Uncertainty - Example:

- Second column shows MSE, which converges to unconditional variance of the autoregressive process as `\(h\)` increases
- When computing last column for a large number of time series, then about `\(\left( 1-\alpha \right) 100\%\)` of the intervals will contain the actual value of the random variable `\(y_{t + h}\)`

---
# Uncertainty - Alternative:

- Could also construct a density forecast by simulating `\(n\)` number of forecasts from the normal distribution with mean `\(\acute{y}_t(h)\)` and the variance `\(\acute{\sigma}_{y}(h)\)` across all horizons `\((h)\)`
- From the vector of forecasts `\((n \times 1)\)` for each `\(h\)`, a forecast interval can be derived by sorting the numbers (e.g. from lowest to highest) 
- If `\(n\)` is not big enough, this procedure will not be a true representative of what is provided above
- The following two graphs show the case where `\(n=100\)` and `\(n=10,000\)`
- This has implications for the use of empirical non-parametric distributions that may be used to calculate forecast intervals

---
background-image: url(image/fig_32bc.svg)
background-position: top
background-size: 90% 90%

Figure : Uncertainty - Simulated forecasts with `\(n=100\)` and `\(n=10,000\)`

---
# Forecast evaluation

- Different loss functions emphasize different aspects of the forecast
- Focus on the bias, root mean squared error, MAE, Diebold-Mariano & Clarke-West statistic
- Also interested in density of forecast to measure uncertainty
- Dataset for a variable extends over `\(t=\{1,\ldots , R, \ldots , T+H\}\)`, where `\(R\)` is the end of initial in-sample 
- `\(\{ R+1, \ldots , T+H\}\)` are the observations for out-of-sample evaluation
- Out-of-sample forecast error is simply difference between the forecast and realisation

`\begin{eqnarray}
\acute{e}_R(h)= y_{R+h}-\acute{y}_R(h)
\end{eqnarray}`

- After we have obtained the first forecast error we update to generate a vector of forecast errors,

`\begin{eqnarray}
\acute{\mathbf{e}}_{h}= \left[ \acute{e}_R(h), \acute{e}_{R+1}(h), \acute{e}_{R+2}(h), \ldots , \acute{e}_{T}(h)\right]^{\top}
\end{eqnarray}`

---
# Forecast evaluation

- Could use a recursive scheme or rolling window scheme
- Usually want to evaluate both the short-term and long-term forecasting performance
- Each vector for the `\(h\)`-step ahead forecasting errors `\(\acute{\mathbf{e}}_{h}\)` are placed in a separate columns

`\begin{equation}
\acute{\mathbf{e}}_{H} = \left\{
\begin{array}{cccc}
\acute{e}_R(1) & \acute{e}_R(2) & \ldots & \acute{e}_{R}(H)\\
\acute{e}_{R+1}(1) & \acute{e}_{R+1}(2)& \ldots & \acute{e}_{R+1}(H)\\
\vdots & \vdots & \ddots & \vdots\\
\acute{e}_{T}(1) & \acute{e}_{T}(2) & \ldots &\acute{e}_{T}(H) \\
\end{array} \right\}
\end{equation}`

- The respective columns or rows in the matrix would represent a time series variable
- The first column would represent the one-step ahead forecasts errors over the out-of-sample period

---
# Forecast evaluation

- Termed pseudo (or quasi) out-of-sample forecasting evaluation:
    - We just pretend that we did not have this information
    - The model may account for features in the data ex-post the date for the initial in-sample
- Some of the economic literature makes use of real-time forecast errors
    - Macroeconomic data is subject to revision so forecasts are evaluated using the first publication of data at each point in time

---
# Forecast evaluation - Bias

- Expected value of the forecast error is a measure of the bias,

`\begin{equation}
\mathbb{E}_t \left[\acute{e}_R(h)\right]= \mathbb{E}_t\left[ y_{R+h}-\acute{y}_R(h)\right]
\end{equation}`

- with the vector `\(\acute{\mathbf{e}}_{h}\)` at hand, we can calculate the bias, `\(\overline{\acute{\mathbf{e}}}_{h}\)`, as

`\begin{equation}
\overline{\acute{\mathbf{e}}}_{h}=\frac{1}{\left(T -R\right)} \; \overset{T}{\underset{i = R}{\sum }} \acute{e}_{R+i}(h)
\end{equation}`

- The value of the estimated bias that is closest to zero is considered to be the preferred estimate
- With a bias we continuously make the same mistake of either predicting too high or too low (on aggregate), or we make a few large errors on one side of the distribution

---
# Forecast evaluation - RMSE

- RMSE measures the size of the forecast error,

`\begin{equation}
\sqrt{\mathbb{E}_t \Big[ \big\{ \acute{e}_R(h) \big\}^{2}\Big] }=\sqrt{\mathbb{E}_t \Big[ \big\{y_{R+h}-\acute{y}_R(h)\big\}^{2}\Big] }
\end{equation}`

- An estimate of the RMSE can be derived from the vector of out-of-sample forecast errors,

`\begin{equation}
\text{RMSE}_{h}=\sqrt{\frac{1}{\left(T -R \right)} \; \overset{T}{\underset{i = R}{\sum }} \acute{e}^2_{R+i}(h) } 
\end{equation}`

- RMSE is symmetric so whether we forecast too high or too low is equally bad
- Since it is quadratic large errors would attract a reasonably large penalty
- Smaller forecast errors are considered as better than larger ones

---
# Forecast evaluation - RMSE

- The RMSE has two sources of errors, where for an `\(AR(1)\)` model with intercept the forecast error is,

`\begin{equation}
\acute{e}_t (1) = y_{t+1} - \acute{y}_t (1)=\varepsilon _{t+1}+\left[ (\mu -\hat{\mu})+\left( \phi_{1}-\hat{\phi}_{1}\right) y_{t}\right]
\end{equation}`

- Note that the value of `\(\varepsilon _{t+1}\)` are unknown
- And there is some uncertainty about `\(\hat{\mu}\)` and `\(\hat{\phi}\)`
- Therefore, deriving the MSE from the above,

`\begin{equation}
\mathbb{E}_t \left[ \left(y_{t+1}-\acute{y}_t \left(1\right)\right)^{2}\right] =\sigma_{\varepsilon,t+1}^{2}+\mathsf{var}\left[ (\mu -\hat{\mu})+\left( \phi_{1}-\hat{\phi}_{1}\right) y_{t}\right]
\end{equation}`

- which is comprised of uncertainty relating to the shock and parameter uncertainty

---
# Mean absolute errors

- An alternative to the RMSE quadratic loss function is the linear MAE or MAPE loss function

`\begin{eqnarray}
\text{MAE} &=& \mathbb{E}_t \Big[ \left| y_{t+h}-\acute{y}_t(h) \right|\Big] 
\end{eqnarray}`

- Since this is a linear penalty it imposes smaller penalties on large forecast errors (when compared to the RMSE)

---
# Forecast evaluation - Diebold Mariano

- To determine which model is responsible for the superior forecast we usually compare the RMSE or MAE
- However, the difference can be very small, and we might enquire as to whether the difference is statistically significant
- Assume we have two models with squared forecasting errors.

`\begin{eqnarray}
\acute{e}^{2}_{t,1} (h) &=&\left(y_{t+h}-\acute{y}_{t,1} \left(h\right) \right)^{2} \\
\acute{e}^{2}_{t,2} (h) &=&\left(y_{t+h}-\acute{y}_{t,2} \left(h\right) \right)^{2}
\end{eqnarray}`

- where `\(d_{t, h}\)` is the difference between `\(\acute{e}^{2}_{t,1} (h)\)` and `\(\acute{e}^{2}_{t,2} (h)\)`

`\begin{equation}
d_{t, h}=\acute{e}^{2}_{t,1} (h) - \acute{e}^{2}_{t,2} (h)
\end{equation}`

- Could apply this statistic along the rows or columns of `\(\acute{\mathbf{e}}_{H}\)` to see whether the forecast errors differ at a time or forecasting horizon

---
# Forecast evaluation - Diebold Mariano

- Then run OLS regression using `\(d_{t, h}\)` as the dependent variable, and a constant,

`\begin{equation}
d_{t, h}=\beta _{0}+u_{t}
\end{equation}`

- This is called a Diebold-Mariano (1995) test, where the hypothesis is,

`\begin{equation}
H_{0}:\beta _{0}=0 \;\;\text{ vs } \;\; H_{1}:\beta _{0}\neq 0
\end{equation}`

- The null hypothesis implies no significant difference in performance
- Therefore, if `\(\beta _{0}=0\)`, it will be the case that `\(\ \mathbb{E}_t \left[ d_{t, h}\right] =\mathbb{E}_t \left[ u_{t}\right] =0\)` 
- A rejection of the null means that the forecast performance of the two models is significantly different at some given significance level
- Note that `\(d_{t, h}\)` will be serially correlated and as such we should use HAC standard errors
- When the models are nested we should make use of the Clarke-West (2007) statistic

---
background-image: url(image/fig_33.svg)
background-position: top
background-size: 90% 90%

Figure : Recursive forecasting - Hairy plot

---
background-image: url(image/RMSEtime.svg)
background-position: top
background-size: 90% 90%

Figure : Recursive forecasting - Average RMSE over time for two models

---
background-image: url(image/RMSEstep.svg)
background-position: top
background-size: 90% 90%

Figure : Comparing RMSE for one-step to eight-step

---
# Forecast density evaluation - PIT

- Compare the probability density of a particular `\(h\)`-step ahead forecast to the distribution of the data from the in-sample period
- Usually evaluate `\(h=1\)` as it gets more complicated when `\(h>1\)`
- Use histogram to depict the empirical distributions of the PITs
- Solid line represents the number of draws that are expected to be in each bin under a `\(U(0,1)\)` distribution
- Dashed lines represent the 95% confidence interval constructed under the normal approximation of a binomial distribution

---
background-image: url(image/PIT.svg)
background-position: top
background-size: 90% 90%

Figure : Forecast density evaluation - Reasonably good PIT for `\(h=1\)`

---
# Model combination

- Many different strategies to combine model forecasts exist, both for point and density forecasts 
- Two simple strategies for combining point forecasts use equal or MSE weighting
- Let `\(\acute{y}_{h}^{c}\)` denote combined forecast for `\(h\)` steps 
- Forecasts for each of the two models `\(\acute{y}_{h, j}\)`, for `\(j=\{1,2\}\)`
- Use a linear function,

`\begin{eqnarray}
\acute{y}_{h}^{c} &=& w_{h, 1}\acute{y}_{h, 1}+w_{h, 2}\acute{y}_{h, 2}\\
 &=& w_{h}\acute{y}_{h, 1}+\left( 1-w_{h}\right) \acute{y}_{h, 2}
\end{eqnarray}`

- where `\(w_{h, j}\)`, for `\(j=1,2\)` is the weight attached to model `\(j\)`

---
# Model combination

- Two simple schemes, Equal weight and MSE weight:

`\begin{eqnarray}
\text{Equal weights: } & \;\;\;\; & w_{h, j}=\frac{1}{2} \;\; \text{ for } \; j=1,2 
\end{eqnarray}`

`\begin{eqnarray}
\text{MSE weights: } & \;\; & \left(1-w_{h, j}\right)=\frac{MSE_{h, j}}{\sum_{j=1}^{2}MSE_{h, j}} \;\; \text{for } \;\;j=1,2
\end{eqnarray}`

- Where `\(MSE_{h, j}\)` can be derived as above

---
# Forecasting with other models

- For some cases (and for some data) a slightly different approach may be more suitable 
- Direct Forecasting:

`\begin{eqnarray}
y_{t}=\mu +\phi_{1}x_{t-h}+\varepsilon_{t}
\end{eqnarray}`

- where for example `\(h=4\)` and `\(\mu=0\)`
- estimate `\(y_{t}=\phi_{1}x_{t - 4}+\varepsilon_{t}\)` to derive direct `\(4\)`-period forecast `\(E[y_{t+4}|x_t]=\phi_{1}x_{t}\)`

---
# Autoregressive distributed lag model

- The ADL model can be written as,

`\begin{eqnarray}
y_{t}=\mu + \sum^p_{i=1}\phi_{i}y_{t-i}+ \sum_{k=1}^K \sum^{J_{k}}_{j=1} \beta_{j, k }x_{t-j, k}+ \varepsilon_{t}
\end{eqnarray}`

- where `\(k=1,\ldots, K\)` additional regressors `\(x_{t-j, k}\)` are included in the model
- each additional regressor may have different number of lags
- for example, if `\(K=2\)`, where `\(J_{1}=1\)` and `\(J_{2}=2\)` we have

`\begin{eqnarray}
y_{t}= \mu +\overset{p}{\underset{i=1}{\sum }}\phi_{i}y_{t-i}+ \beta_{1,1}x_{t-1, 1} + \beta_{2,1}x_{t-2, 1} + \beta_{1,2}x_{t-1, 2} + \varepsilon_{t}
\end{eqnarray}`

- But the future values of `\(x\)`, i.e. `\(x_{t + 2,1}, x_{t +1,1}\)` and `\(x_{t + 2, 2}\)` are not always known at `\(t\)` and have to be predicted outside of the model

---
# Conclusion

- Forecast from an autoregressive process can be obtained by iterating the process forward, and employing the conditional expectation operator
- Assuming that the process is stable (i.e. stationary), that the errors are Gaussian white noise, and that we are making use of the correct specification for the data generating process: 
    - the conditional expectation operator ensures that the variance of the forecast error is minimized
    - the mean of the autoregressive forecast converges on the unconditional mean of the process, and the variance of the forecast converges on the unconditional variance of the process (when the forecast error horizon becomes large)

---
# Conclusion

- Density and interval forecasts can be constructed based on the assumed distribution of the errors or with the aid of nonparametric methods
- To evaluate the empirical performance of different forecasting models an out-of-sample forecasting experiment should be conducted
- The bias, RMSE, MAE and/or Diebold-Mariano statistics are commonly used evaluation criteria for forecasts
- The density of the forecast distribution can be evaluated with the aid of a PIT or log score
- Combining forecasts from many individual forecasts into one combined forecast may benefit from diversification gains