Volatility Forecasting: GARCH(1,1) Model

9 minute read

In the previous post of this series on volatility forecasting, I described the simple and the exponentially weighted moving average volatility forecasting models.

In particular, I showed that these two models belong to the generic family of weighted moving average volatility forecasting models¹, whose members represent the volatility of an asset as a weighted moving average of its past squared returns².

Another member of this family is the Generalized AutoRegressive Conditional Heteroscedasticity (GARCH) model, widely used in financial time series modelling and implemented in most statistics and econometric software packages³.

In this blog post, I will detail the simplest but often very useful⁴ GARCH(1,1) volatility forecasting model and I will illustrate its practical performances in the context of monthly volatility forecasting for various ETFs.

Mathematical preliminaries (reminders)

This section contains reminders from a previous blog post.

Volatility modelling and volatility proxies

Let $r_t$ be the (logarithmic) return of an asset over a time period $t$ (a day, a week, a month..), over which its (conditional) mean return is supposed to be null.

Then:

The asset (conditional) variance is defined as $ \sigma_t^2 = \mathbb{E} \left[ r_t^2 \right] $

From this definition, the squared return $r_t^2$ of an asset is a (noisy⁵) variance estimator - or variance proxy⁵ - for that asset variance over the considered time period.

Another example of an asset variance proxy is the Parkinson range of an asset.

The generic notation for an asset variance proxy in this blog post is $\tilde{\sigma}_t^2$.
The asset (conditional) volatility is defined as $ \sigma_t = \sqrt { \sigma_t^2 } $

The generic notation for an asset volatility proxy in this blog post is $\tilde{\sigma}_t$.

Weighted moving average volatility forecasting model

Boudoukh et al.¹ shows that many seemingly different methods of volatility forecasting actually share the same underlying representation of the estimate of an asset next period’s variance $\hat{\sigma}_{T+1}^2$ as a weighted moving average of that asset past periods’ variance proxies $\tilde{\sigma}^2_t$, $t=1..T$, with

\[\hat{\sigma}_{T+1}^2 = w_0 + \sum_{i=1}^{k} w_i \tilde{\sigma}^2_{T+1-i}\]

, where:

$1 \leq k \leq T$ is the size of the moving average, possibly time-dependent
$w_i, i=0..k$ are the weights of the moving average, possibly time-dependent as well

GARCH(1,1) volatility forecasting model

The GARCH(p,q) model

Definition

Bollerslev⁴’s GARCH model is a generalization of Engle’s ARCH econometric model which captures the time-varying nature of the (conditional) variance of certain time series like asset returns.

Under a GARCH(p,q) model, an asset next period’s conditional variance $\sigma_{T+1}^2$ is modeled as recursive linear function of its own $p$ lagged conditional variances $\sigma_{T}^2, \sigma_{T-1}^2…$ and of its $q$ lagged squared returns $r_{T}^2, r_{T-1}^2…$, which leads to the formula

\[\hat{\sigma}_{T+1}^2 = \omega + \sum_{i=1}^p \beta_i \hat{\sigma}_{T+1-i}^2+ \sum_{j=1}^q \alpha_j r_{T+1-i}^2\]

, where:

The parameters $\omega$, $\alpha_j$, $j=1..q$ and $\beta_i$, $i=1..p$ are non-negative and subject to various inequality constraints depending on working assumptions⁶
The initial conditional variance $\hat{\sigma}_1^2$ is usually taken equal to $r_1^2$, but c.f. Pelagatti and Lisi⁷ for a thorough discussion about this subject

Squared returns v.s. generic variance proxy

Molnar⁸ notes that in GARCH type of models, demeaned squared returns serve as a way to calculate innovations to the volatility⁸ so that replacing the squared returns by more precise volatility estimates will produce better GARCH models, regarding both in-sample fit and out-of-sample forecasting performance⁸.

Molnar⁸ then proposes to modify the GARCH(p,q) model for the estimation of an asset next period’s conditional variance $\sigma_{T+1}^2$ as follows

\[\hat{\sigma}_{T+1}^2 = \omega + \sum_{i=1}^p \beta_i \hat{\sigma}_{T+1-i}^2+ \sum_{j=1}^q \alpha_j \tilde{\sigma}_{T+1-i}^2\]

, where $\tilde{\sigma}^2_t$, $t=1..T$ are the asset past periods’ variance proxies.

To be noted that replacing squared returns by less noisy variance proxies is already discussed at length in the previous blog post in the case of the simple and the exponentially weighted moving average volatility forecasting models.

The GARCH(1,1) model

Definition

Because the GARCH(1,1) model works surprisingly well in comparison with much more complex [GARCH] models⁸, it is usually the main GARCH model used in practice.

Under this model, the generic GARCH formula for the estimate of an asset next period’s conditional variance can be re-parametrized as follows

\[\hat{\sigma}_{T+1}^2 = \gamma \tilde{\sigma}^2 + \alpha \tilde{\sigma}^2_{T} + \beta \hat{\sigma}_{T}^2\]

, where:

$\alpha$, $\beta$ and $\gamma$ are positive parameters summing to one
$\tilde{\sigma}^2$ is a strictly positive parameter, corresponding to the asset unconditional variance⁹

The GARCH(1,1) model thus estimates an asset next period’s conditional variance $\hat{\sigma}_{T+1}^2$ as a weighted average¹⁰ of three different variance estimators:

A long-term variance estimator $\tilde{\sigma}^2$
A short-term variance estimator $\tilde{\sigma}^2_{T}$
The current GARCH(1,1) variance estimator $\hat{\sigma}_{T}^2$

and the weights $\alpha$, $\beta$ and $\gamma$ determine the speed with which the model adapts to short-term variance v.s. reverts to its long-term variance.

Relationship with the generic weighted moving average model

By developing the recursive definition of the GARCH(1,1) model, it is possible to see that it is a specific kind of weighted moving average volatility forecasting model, with:

$k = T$
$w_0 = \gamma \sum_{k=0}^{T-1} \beta^k$
$w_1 = \alpha$, $w_2 = \alpha \beta$, …, $w_{T-1} = \alpha \beta^{T-2}$, $w_T = \alpha \beta^{T-1}$, that is, exponentially decreasing weights emphasizing recent past variance proxies v.s. more distant ones in the model, exactly like in the exponentially weighted moving average volatility forecasting model¹¹

Volatility forecasting formulas

Under a GARCH(1,1) volatility forecasting model, the generic weighted moving average volatility forecasting formula becomes:

To estimate an asset next period’s volatility:
\[\hat{\sigma}_{T+1} = \sqrt{ \gamma \tilde{\sigma}^2 + \alpha \tilde{\sigma}^2_{T} + \beta \hat{\sigma}_{T}^2 }\]
To estimate an asset next $h$-period’s ahead volatility¹², $h \geq 2$:
\[\hat{\sigma}_{T+h} = \sqrt{ \tilde{\sigma}^2 + \left( \alpha + \beta \right)^{h-1} \left( \hat{\sigma}_{T+1}^2 - \tilde{\sigma}^2 \right) }\]
To estimate an asset aggregated volatility¹² over the next $h$ periods:
\[\hat{\sigma}_{T+1:T+h} = \sqrt{ \sum_{i=1}^{h} \hat{\sigma}^2_{T+i} } = \sqrt{ \tilde{\sigma}^2 + \frac{1}{h} \frac{\left( \alpha + \beta \right)^h - 1}{ \left( \alpha + \beta \right) - 1} \left( \hat{\sigma}_{T+1}^2 - \tilde{\sigma}^2 \right) }\]

How to determine the parameters of a GARCH(1,1) model?

The parameters of a GARCH(1,1) model - either $\omega$, $\alpha$ and $\beta$ or $\alpha$, $\beta$, $\gamma$ and $\tilde{\sigma}^2$ - are typically determined by maximum likelihood estimation (MLE) with a Gaussian¹³ or Student’s $t$ assumption for the distribution of the innovations.

A note of caution, though.

There are plenty of software packages able to do this estimation, but the underlying optimization problem has been documented to be numerically difficult and prone to error¹⁴ due to a one dimensional manifold in the parameter space where the likelihood function is large and almost constant¹⁴, which tends to “trap” numerical algorithms.

Possible remediations have been suggested in Zumbach¹⁴ and in Kristensen and Linton¹⁵, like reformulating the optimization problem in an alternative parameter space or using a closed-form estimator for the GARCH(1,1) parameters that does not rely on any numerical optimization procedure, but unfortunately, these remediations are not sufficient due to the problematic¹⁶ finite sample behavior of the maximum likelihood estimates…

Implementation in Portfolio Optimizer

Portfolio Optimizer implements the GARCH(1,1) volatility forecasting model through the endpoint /assets/volatility/forecast/garch.

This endpoint supports the 4 variance proxies below:

Squared close-to-close returns
Demeaned squared close-to-close returns
The Parkinson range
The jump-adjusted Parkinson range

Internally, this endpoint:

Assumes that the asset unconditional variance $\tilde{\sigma}^2$ is equal to its long-term average value $\frac{1}{T} \sum_{t=1}^{T} \tilde{\sigma}^2_t$
Automatically determines the optimal value of the GARCH(1,1) parameters $\alpha$, $\beta$ and $\gamma$ using a proprietary numerical optimization procedure

Example of usage - Volatility forecasting at monthly level for various ETFs

As an example of usage, I propose to enrich the results of the previous blog post, in which monthly forecasts produced by different volatility models are compared - using Mincer-Zarnowitz¹⁷ regressions - to the next month’s close-to-close observed volatility for 10 ETFs representative¹⁸ of misc. asset classes:

U.S. stocks (SPY ETF)
European stocks (EZU ETF)
Japanese stocks (EWJ ETF)
Emerging markets stocks (EEM ETF)
U.S. REITs (VNQ ETF)
International REITs (RWX ETF)
U.S. 7-10 year Treasuries (IEF ETF)
U.S. 20+ year Treasuries (TLT ETF)
Commodities (DBC ETF)
Gold (GLD ETF)

Averaged results for all ETFs/regression models over each ETF price history¹⁹ are the following²⁰:

Volatility model	Variance proxy	$\bar{\alpha}$	$\bar{\beta}$	$\bar{R^2}$
Random walk	Squared close-to-close returns	5.8%	0.66	44%
SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days	Squared close-to-close returns	5.8%	0.68	46%
EWMA, optimal $\lambda$	Squared close-to-close returns	4.7%	0.73	45%
GARCH(1,1)	Squared close-to-close returns	-1.3%	0.98	43%
Random walk	Parkinson range	5.6%	0.94	44%
SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days	Parkinson range	5.1%	1.00	47%
EWMA, optimal $\lambda$	Parkinson range	4.3%	1.06	48%
GARCH(1,1)	Parkinson range	2.7%	1.18	47%
Random walk	Jump-adjusted Parkinson range	4.9%	0.70	45%
SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days	Jump-adjusted Parkinson range	5.1%	0.71	47%
EWMA, optimal $\lambda$	Jump-adjusted Parkinson range	4.0%	0.76	45%
GARCH(1,1)	Jump-adjusted Parkinson range	-1.0%	1.00	45%

From these, it is possible to conclude the following:

The two GARCH(1,1) models using improved variance proxies produce volatility forecasts with better r-squared than the GARCH(1,1) model using squared returns (lines #8 and #12 v.s. line #4), which is in agreement with Molnar⁸
The two GARCH(1,1) models using variance proxies that integrate close prices produce nearly unbiased forecasts (lines #4 and #12), which, together with their relatively high r-squared, makes them volatility forecasting models to recommend in these cases
The GARCH(1,1) using the Parkinson range as variance proxy produces the most biased forecasts (line #8), which makes it a volatility forecasting model to avoid in this case

Conclusion

The GARCH(1,1) volatility forecasting model exhibits good practical performances for a wide range of assets, as empirically demonstrated in the previous section.

Nevertheless, because this model is still unable to describe certain aspects often found in financial data³, many different extensions have been proposed over the years like AGARCH, EGARCH, QGARCH, TGARCH³….

I will not discuss these further, though, and next in this series dedicated to volatility forecasting, I will detail a model that was initially developed for use with high frequency data.

Meanwhile, feel free to connect with me on LinkedIn or to follow me on Twitter.

–

See Boudoukh, J., Richardson, M., & Whitelaw, R.F. (1997). Investigation of a class of volatility estimators, Journal of Derivatives, 4 Spring, 63-71. ↩ ↩²
Or more generally, of a weighted moving average of one of its past variance proxies. ↩
See Brandon Williams, GARCH(1,1) models, B. Sc. Thesis, 15. Juli 2011. ↩ ↩² ↩³
See Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3), 307–327. ↩ ↩²
See Andrew J. Patton, Volatility forecast comparison using imperfect volatility proxies, Journal of Econometrics, Volume 160, Issue 1, 2011, Pages 246-256. ↩ ↩²
See Daniel B. Nelson and Charles Q. Cao, Inequality Constraints in the Univariate GARCH Model, Journal of Business & Economic Statistics, Vol. 10, No. 2 (Apr., 1992), pp. 229-235. ↩
See Pelagatti, M., Lisi, F. (2009). Variance initialisation in GARCH estimation. In Paganoni, A.M., Sangalli, L.M., Secchi, P., Vantini, S. (eds.), S.Co. 2009 Sixth Conference Complex Data Modeling and Computationally Intensive Statistical Methods for Estimation and Prediction, Maggioli Editore, Milan. ↩
See Peter Molnar (2016): High-low range in GARCH models of stock return volatility, Applied Economics. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
Also called the asset long-term variance. ↩
More precisely, a convex combination. ↩
Which is not surprising since in fact, exponential smoothing is a constrained version of GARCH (1,1)¹, without mean-reversion. ↩
See Brooks, Chris and Persand, Gitanjali (2003) Volatility forecasting for risk management. Journal of Forecasting, 22(1). pp. 1-22. ↩ ↩²
In which case, the Gaussian MLE is usually considered as a quasi-maximum likelihood estimate. ↩
See Zumbach, G. (2000). The Pitfalls in Fitting Garch(1,1) Processes. In: Dunis, C.L. (eds) Advances in Quantitative Asset Management. Studies in Computational Finance, vol 1. Springer, Boston, MA. ↩ ↩² ↩³
See Dennis Kristensen and Oliver Linton, A Closed-Form Estimator for the GARCH(1,1) Model, Econometric Theory, Vol. 22, No. 2 (Apr., 2006), pp. 323-337. ↩
See for example here and there. ↩
See Mincer, J. and V. Zarnowitz (1969). The evaluation of economic forecasts. In J. Mincer (Ed.), Economic Forecasts and Expectations. ↩
These ETFs are used in the Adaptative Asset Allocation strategy from ReSolve Asset Management, described in the paper Adaptive Asset Allocation: A Primer²¹. ↩
The common ending price history of all the ETFs is 31 August 2023, but there is no common starting price history, as all ETFs started trading on different dates. ↩
For all models, I used an expanding window for the volatility forecast computation. ↩
See Butler, Adam and Philbrick, Mike and Gordillo, Rodrigo and Varadi, David, Adaptive Asset Allocation: A Primer. ↩