Volatility Forecasting: GARCH(1,1) Model
In the previous post of this series on volatility forecasting, I described the simple and the exponentially weighted moving average volatility forecasting models.
In particular, I showed that these two models belong to the generic family of weighted moving average volatility forecasting models1, whose members represent the volatility of an asset as a weighted moving average of its past squared returns2.
Another member of this family is the Generalized AutoRegressive Conditional Heteroscedasticity (GARCH) model, widely used in financial time series modelling and implemented in most statistics and econometric software packages3.
In this blog post, I will detail the simplest but often very useful4 GARCH(1,1) volatility forecasting model and I will illustrate its practical performances in the context of monthly volatility forecasting for various ETFs.
Mathematical preliminaries (reminders)
This section contains reminders from a previous blog post.
Volatility modelling and volatility proxies
Let $r_t$ be the (logarithmic) return of an asset over a time period $t$ (a day, a week, a month..), over which its (conditional) mean return is supposed to be null.
Then:
-
The asset (conditional) variance is defined as $ \sigma_t^2 = \mathbb{E} \left[ r_t^2 \right] $
From this definition, the squared return $r_t^2$ of an asset is a (noisy5) variance estimator - or variance proxy5 - for that asset variance over the considered time period.
Another example of an asset variance proxy is the Parkinson range of an asset.
The generic notation for an asset variance proxy in this blog post is $\tilde{\sigma}_t^2$.
-
The asset (conditional) volatility is defined as $ \sigma_t = \sqrt { \sigma_t^2 } $
The generic notation for an asset volatility proxy in this blog post is $\tilde{\sigma}_t$.
Weighted moving average volatility forecasting model
Boudoukh et al.1 shows that many seemingly different methods of volatility forecasting actually share the same underlying representation of the estimate of an asset next period’s variance $\hat{\sigma}_{T+1}^2$ as a weighted moving average of that asset past periods’ variance proxies $\tilde{\sigma}^2_t$, $t=1..T$, with
\[\hat{\sigma}_{T+1}^2 = w_0 + \sum_{i=1}^{k} w_i \tilde{\sigma}^2_{T+1-i}\], where:
- $1 \leq k \leq T$ is the size of the moving average, possibly time-dependent
- $w_i, i=0..k$ are the weights of the moving average, possibly time-dependent as well
GARCH(1,1) volatility forecasting model
The GARCH(p,q) model
Definition
Bollerslev4’s GARCH model is a generalization of Engle’s ARCH econometric model which captures the time-varying nature of the (conditional) variance of certain time series like asset returns.
Under a GARCH(p,q) model, an asset next period’s conditional variance $\sigma_{T+1}^2$ is modeled as recursive linear function of its own $p$ lagged conditional variances $\sigma_{T}^2, \sigma_{T-1}^2…$ and of its $q$ lagged squared returns $r_{T}^2, r_{T-1}^2…$, which leads to the formula
\[\hat{\sigma}_{T+1}^2 = \omega + \sum_{i=1}^p \beta_i \hat{\sigma}_{T+1-i}^2+ \sum_{j=1}^q \alpha_j r_{T+1-i}^2\], where:
- The parameters $\omega$, $\alpha_j$, $j=1..q$ and $\beta_i$, $i=1..p$ are non-negative and subject to various inequality constraints depending on working assumptions6
- The initial conditional variance $\hat{\sigma}_1^2$ is usually taken equal to $r_1^2$, but c.f. Pelagatti and Lisi7 for a thorough discussion about this subject
Squared returns v.s. generic variance proxy
Molnar8 notes that in GARCH type of models, demeaned squared returns serve as a way to calculate innovations to the volatility8 so that replacing the squared returns by more precise volatility estimates will produce better GARCH models, regarding both in-sample fit and out-of-sample forecasting performance8.
Molnar8 then proposes to modify the GARCH(p,q) model for the estimation of an asset next period’s conditional variance $\sigma_{T+1}^2$ as follows
\[\hat{\sigma}_{T+1}^2 = \omega + \sum_{i=1}^p \beta_i \hat{\sigma}_{T+1-i}^2+ \sum_{j=1}^q \alpha_j \tilde{\sigma}_{T+1-i}^2\], where $\tilde{\sigma}^2_t$, $t=1..T$ are the asset past periods’ variance proxies.
To be noted that replacing squared returns by less noisy variance proxies is already discussed at length in the previous blog post in the case of the simple and the exponentially weighted moving average volatility forecasting models.
The GARCH(1,1) model
Definition
Because the GARCH(1,1) model works surprisingly well in comparison with much more complex [GARCH] models8, it is usually the main GARCH model used in practice.
Under this model, the generic GARCH formula for the estimate of an asset next period’s conditional variance can be re-parametrized as follows
\[\hat{\sigma}_{T+1}^2 = \gamma \tilde{\sigma}^2 + \alpha \tilde{\sigma}^2_{T} + \beta \hat{\sigma}_{T}^2\], where:
- $\alpha$, $\beta$ and $\gamma$ are positive parameters summing to one
- $\tilde{\sigma}^2$ is a strictly positive parameter, corresponding to the asset unconditional variance9
The GARCH(1,1) model thus estimates an asset next period’s conditional variance $\hat{\sigma}_{T+1}^2$ as a weighted average10 of three different variance estimators:
- A long-term variance estimator $\tilde{\sigma}^2$
- A short-term variance estimator $\tilde{\sigma}^2_{T}$
- The current GARCH(1,1) variance estimator $\hat{\sigma}_{T}^2$
and the weights $\alpha$, $\beta$ and $\gamma$ determine the speed with which the model adapts to short-term variance v.s. reverts to its long-term variance.
Relationship with the generic weighted moving average model
By developing the recursive definition of the GARCH(1,1) model, it is possible to see that it is a specific kind of weighted moving average volatility forecasting model, with:
- $k = T$
- $w_0 = \gamma \sum_{k=0}^{T-1} \beta^k$
- $w_1 = \alpha$, $w_2 = \alpha \beta$, …, $w_{T-1} = \alpha \beta^{T-2}$, $w_T = \alpha \beta^{T-1}$, that is, exponentially decreasing weights emphasizing recent past variance proxies v.s. more distant ones in the model, exactly like in the exponentially weighted moving average volatility forecasting model11
Volatility forecasting formulas
Under a GARCH(1,1) volatility forecasting model, the generic weighted moving average volatility forecasting formula becomes:
-
To estimate an asset next period’s volatility:
\[\hat{\sigma}_{T+1} = \sqrt{ \gamma \tilde{\sigma}^2 + \alpha \tilde{\sigma}^2_{T} + \beta \hat{\sigma}_{T}^2 }\] -
To estimate an asset next $h$-period’s ahead volatility12, $h \geq 2$:
\[\hat{\sigma}_{T+h} = \sqrt{ \tilde{\sigma}^2 + \left( \alpha + \beta \right)^{h-1} \left( \hat{\sigma}_{T+1} - \tilde{\sigma}^2 \right) }\] -
To estimate an asset aggregated volatility12 over the next $h$ periods:
\[\hat{\sigma}_{T+1:T+h} = \sqrt{h} \hat{\sigma}_{T+1}\]
How to determine the parameters of a GARCH(1,1) model?
The parameters of a GARCH(1,1) model - either $\omega$, $\alpha$ and $\beta$ or $\alpha$, $\beta$, $\gamma$ and $\tilde{\sigma}^2$ - are typically determined by maximum likelihood estimation (MLE) with a Gaussian13 or Student’s $t$ assumption for the distribution of the innovations.
A note of caution, though.
There are plenty of software packages able to do this estimation, but the underlying optimization problem has been documented to be numerically difficult and prone to error14 due to a one dimensional manifold in the parameter space where the likelihood function is large and almost constant14, which tends to “trap” numerical algorithms.
Possible remediations have been suggested in Zumbach14 and in Kristensen and Linton15, like reformulating the optimization problem in an alternative parameter space or using a closed-form estimator for the GARCH(1,1) parameters that does not rely on any numerical optimization procedure, but unfortunately, these remediations are not sufficient due to the problematic16 finite sample behavior of the maximum likelihood estimates…
Implementation in Portfolio Optimizer
Portfolio Optimizer implements the GARCH(1,1) volatility forecasting model through the endpoint /assets/volatility/forecast/garch
.
This endpoint supports the 4 variance proxies below:
- Squared close-to-close returns
- Demeaned squared close-to-close returns
- The Parkinson range
- The jump-adjusted Parkinson range
Internally, this endpoint:
- Assumes that the asset unconditional variance $\tilde{\sigma}^2$ is equal to its long-term average value $\frac{1}{T} \sum_{t=1}^{T} \tilde{\sigma}^2_t$
- Automatically determines the optimal value of the GARCH(1,1) parameters $\alpha$, $\beta$ and $\gamma$ using a proprietary numerical optimization procedure
Example of usage - Volatility forecasting at monthly level for various ETFs
As an example of usage, I propose to enrich the results of the previous blog post, in which monthly forecasts produced by different volatility models are compared - using Mincer-Zarnowitz17 regressions - to the next month’s close-to-close observed volatility for 10 ETFs representative18 of misc. asset classes:
- U.S. stocks (SPY ETF)
- European stocks (EZU ETF)
- Japanese stocks (EWJ ETF)
- Emerging markets stocks (EEM ETF)
- U.S. REITs (VNQ ETF)
- International REITs (RWX ETF)
- U.S. 7-10 year Treasuries (IEF ETF)
- U.S. 20+ year Treasuries (TLT ETF)
- Commodities (DBC ETF)
- Gold (GLD ETF)
Averaged results for all ETFs/regression models over each ETF price history19 are the following20:
Volatility model | Variance proxy | $\bar{\alpha}$ | $\bar{\beta}$ | $\bar{R^2}$ |
---|---|---|---|---|
Random walk | Squared close-to-close returns | 5.8% | 0.66 | 44% |
SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days | Squared close-to-close returns | 5.8% | 0.68 | 46% |
EWMA, optimal $\lambda$ | Squared close-to-close returns | 4.7% | 0.73 | 45% |
GARCH(1,1) | Squared close-to-close returns | -1.3% | 0.98 | 43% |
Random walk | Parkinson range | 5.6% | 0.94 | 44% |
SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days | Parkinson range | 5.1% | 1.00 | 47% |
EWMA, optimal $\lambda$ | Parkinson range | 4.3% | 1.06 | 48% |
GARCH(1,1) | Parkinson range | 2.7% | 1.18 | 47% |
Random walk | Jump-adjusted Parkinson range | 4.9% | 0.70 | 45% |
SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days | Jump-adjusted Parkinson range | 5.1% | 0.71 | 47% |
EWMA, optimal $\lambda$ | Jump-adjusted Parkinson range | 4.0% | 0.76 | 45% |
GARCH(1,1) | Jump-adjusted Parkinson range | -1.0% | 1.00 | 45% |
From these, it is possible to conclude the following:
- The two GARCH(1,1) models using improved variance proxies produce volatility forecasts with better r-squared than the GARCH(1,1) model using squared returns (lines #8 and #12 v.s. line #4), which is in agreement with Molnar8
- The two GARCH(1,1) models using variance proxies that integrate close prices produce nearly unbiased forecasts (lines #4 and #12), which, together with their relatively high r-squared, makes them volatility forecasting models to recommend in these cases
- The GARCH(1,1) using the Parkinson range as variance proxy produces the most biased forecasts (line #8), which makes it a volatility forecasting model to avoid in this case
Conclusion
The GARCH(1,1) volatility forecasting model exhibits good practical performances for a wide range of assets, as empirically demonstrated in the previous section.
Nevertheless, because this model is still unable to describe certain aspects often found in financial data3, many different extensions have been proposed over the years like AGARCH, EGARCH, QGARCH, TGARCH3….
I will not discuss these further, though, and next in this series dedicated to volatility forecasting, I will detail a model that was initially developed for use with high frequency data.
Meanwhile, feel free to connect with me on LinkedIn or to follow me on Twitter.
–
-
See Boudoukh, J., Richardson, M., & Whitelaw, R.F. (1997). Investigation of a class of volatility estimators, Journal of Derivatives, 4 Spring, 63-71. ↩ ↩2
-
Or more generally, of a weighted moving average of one of its past variance proxies. ↩
-
See Brandon Williams, GARCH(1,1) models, B. Sc. Thesis, 15. Juli 2011. ↩ ↩2 ↩3
-
See Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3), 307–327. ↩ ↩2
-
See Andrew J. Patton, Volatility forecast comparison using imperfect volatility proxies, Journal of Econometrics, Volume 160, Issue 1, 2011, Pages 246-256. ↩ ↩2
-
See Daniel B. Nelson and Charles Q. Cao, Inequality Constraints in the Univariate GARCH Model, Journal of Business & Economic Statistics, Vol. 10, No. 2 (Apr., 1992), pp. 229-235. ↩
-
See Pelagatti, M., Lisi, F. (2009). Variance initialisation in GARCH estimation. In Paganoni, A.M., Sangalli, L.M., Secchi, P., Vantini, S. (eds.), S.Co. 2009 Sixth Conference Complex Data Modeling and Computationally Intensive Statistical Methods for Estimation and Prediction, Maggioli Editore, Milan. ↩
-
See Peter Molnar (2016): High-low range in GARCH models of stock return volatility, Applied Economics. ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
Also called the asset long-term variance. ↩
-
More precisely, a convex combination. ↩
-
Which is not surprising since in fact, exponential smoothing is a constrained version of GARCH (1,1)1, without mean-reversion. ↩
-
See Brooks, Chris and Persand, Gitanjali (2003) Volatility forecasting for risk management. Journal of Forecasting, 22(1). pp. 1-22. ↩ ↩2
-
In which case, the Gaussian MLE is usually considered as a quasi-maximum likelihood estimate. ↩
-
See Zumbach, G. (2000). The Pitfalls in Fitting Garch(1,1) Processes. In: Dunis, C.L. (eds) Advances in Quantitative Asset Management. Studies in Computational Finance, vol 1. Springer, Boston, MA. ↩ ↩2 ↩3
-
See Dennis Kristensen and Oliver Linton, A Closed-Form Estimator for the GARCH(1,1) Model, Econometric Theory, Vol. 22, No. 2 (Apr., 2006), pp. 323-337. ↩
-
See Mincer, J. and V. Zarnowitz (1969). The evaluation of economic forecasts. In J. Mincer (Ed.), Economic Forecasts and Expectations. ↩
-
These ETFs are used in the Adaptative Asset Allocation strategy from ReSolve Asset Management, described in the paper Adaptive Asset Allocation: A Primer21. ↩
-
The common ending price history of all the ETFs is 31 August 2023, but there is no common starting price history, as all ETFs started trading on different dates. ↩
-
For all models, I used an expanding window for the volatility forecast computation. ↩
-
See Butler, Adam and Philbrick, Mike and Gordillo, Rodrigo and Varadi, David, Adaptive Asset Allocation: A Primer. ↩