Volatility Forecasting: GARCH(1,1) Model
In the previous post of this series on volatility forecasting, I described the simple and the exponentially weighted moving average volatility forecasting models.
In particular, I showed that these two models belong to the generic family of weighted moving average volatility forecasting models^{1}, whose members represent the volatility of an asset as a weighted moving average of its past squared returns^{2}.
Another member of this family is the Generalized AutoRegressive Conditional Heteroscedasticity (GARCH) model, widely used in financial time series modelling and implemented in most statistics and econometric software packages^{3}.
In this blog post, I will detail the simplest but often very useful^{4} GARCH(1,1) volatility forecasting model and I will illustrate its practical performances in the context of monthly volatility forecasting for various ETFs.
Mathematical preliminaries (reminders)
This section contains reminders from a previous blog post.
Volatility modelling and volatility proxies
Let $r_t$ be the (logarithmic) return of an asset over a time period $t$ (a day, a week, a month..).
Then:

The asset (conditional) variance is defined as $ \sigma_t^2 = \mathbb{E} \left[ r_t^2 \right] $
From this definition, the squared return $r_t^2$ of an asset is a (noisy^{5}) variance estimator  or variance proxy^{5}  for that asset variance over the considered time period.
Another example of an asset variance proxy is the Parkinson range of an asset.
The generic notation for an asset variance proxy in this blog post is $\tilde{\sigma}_t^2$.

The asset (conditional) volatility is defined as $ \sigma_t = \sqrt { \sigma_t^2 } $
The generic notation for an asset volatility proxy in this blog post is $\tilde{\sigma}_t$.
Weighted moving average volatility forecasting model
Boudoukh et al.^{1} shows that many seemingly different methods of volatility forecasting actually share the same underlying representation of the estimate of an asset next period’s variance $\hat{\sigma}_{T+1}^2$ as a weighted moving average of that asset past periods’ variance proxies $\tilde{\sigma}^2_t$, $t=1..T$, with
\[\hat{\sigma}_{T+1}^2 = w_0 + \sum_{i=1}^{k} w_i \tilde{\sigma}^2_{T+1i}\], where:
 $1 \leq k \leq T$ is the size of the moving average, possibly timedependent
 $w_i, i=0..k$ are the weights of the moving average, possibly timedependent as well
GARCH(1,1) volatility forecasting model
The GARCH(p,q) model
Definition
Bollerslev^{4}’s GARCH model is a generalization of Engle’s ARCH econometric model which captures the timevarying nature of the (conditional) variance of certain time series like asset returns.
Under a GARCH(p,q) model, an asset next period’s conditional variance $\sigma_{T+1}^2$ is modeled as recursive linear function of its own $p$ lagged conditional variances $\sigma_{T}^2, \sigma_{T1}^2…$ and of its $q$ lagged squared returns $r_{T}^2, r_{T1}^2…$, which leads to the formula
\[\hat{\sigma}_{T+1}^2 = \omega + \sum_{i=1}^p \beta_i \hat{\sigma}_{T+1i}^2+ \sum_{j=1}^q \alpha_j r_{T+1i}^2\], where:
 The parameters $\omega$, $\alpha_j$, $j=1..q$ and $\beta_i$, $i=1..p$ are nonnegative and subject to various inequality constraints depending on working assumptions^{6}
 The initial conditional variance $\hat{\sigma}_1^2$ is usually taken equal to $r_1^2$, but c.f. Pelagatti and Lisi^{7} for a thorough discussion about this subject
Squared returns v.s. generic variance proxy
Molnar^{8} notes that in GARCH type of models, demeaned squared returns serve as a way to calculate innovations to the volatility^{8} so that replacing the squared returns by more precise volatility estimates will produce better GARCH models, regarding both insample fit and outofsample forecasting performance^{8}.
Molnar^{8} then proposes to modify the GARCH(p,q) model for the estimation of an asset next period’s conditional variance $\sigma_{T+1}^2$ as follows
\[\hat{\sigma}_{T+1}^2 = \omega + \sum_{i=1}^p \beta_i \hat{\sigma}_{T+1i}^2+ \sum_{j=1}^q \alpha_j \tilde{\sigma}_{T+1i}^2\], where $\tilde{\sigma}^2_t$, $t=1..T$ are the asset past periods’ variance proxies.
To be noted that replacing squared returns by less noisy variance proxies is already discussed at length in the previous blog post in the case of the simple and the exponentially weighted moving average volatility forecasting models.
The GARCH(1,1) model
Definition
Because the GARCH(1,1) model works surprisingly well in comparison with much more complex [GARCH] models^{8}, it is usually the main GARCH model used in practice.
Under this model, the generic GARCH formula for the estimate of an asset next period’s conditional variance can be reparametrized as follows
\[\hat{\sigma}_{T+1}^2 = \gamma \tilde{\sigma}^2 + \alpha \tilde{\sigma}^2_{T} + \beta \hat{\sigma}_{T}^2\], where:
 $\alpha$, $\beta$ and $\gamma$ are positive parameters summing to one
 $\tilde{\sigma}^2$ is a strictly positive parameter, corresponding to the asset unconditional variance^{9}
The GARCH(1,1) model thus estimates an asset next period’s conditional variance $\hat{\sigma}_{T+1}^2$ as a weighted average^{10} of three different variance estimators:
 A longterm variance estimator $\tilde{\sigma}^2$
 A shortterm variance estimator $\tilde{\sigma}^2_{T}$
 The current GARCH(1,1) variance estimator $\hat{\sigma}_{T}^2$
and the weights $\alpha$, $\beta$ and $\gamma$ determine the speed with which the model adapts to shortterm variance v.s. reverts to its longterm variance.
Relationship with the generic weighted moving average model
By developing the recursive definition of the GARCH(1,1) model, it is possible to see that it is a specific kind of weighted moving average volatility forecasting model, with:
 $k = T$
 $w_0 = \gamma \sum_{k=0}^{T1} \beta^k$
 $w_1 = \alpha$, $w_2 = \alpha \beta$, …, $w_{T1} = \alpha \beta^{T2}$, $w_T = \alpha \beta^{T1}$, that is, exponentially decreasing weights emphasizing recent past variance proxies v.s. more distant ones in the model, exactly like in the exponentially weighted moving average volatility forecasting model^{11}
Volatility forecasting formulas
Under a GARCH(1,1) volatility forecasting model, the generic weighted moving average volatility forecasting formula becomes:

To estimate an asset next period’s volatility:
\[\hat{\sigma}_{T+1} = \sqrt{ \gamma \tilde{\sigma}^2 + \alpha \tilde{\sigma}^2_{T} + \beta \hat{\sigma}_{T}^2 }\] 
To estimate an asset next $h$period’s ahead volatility^{12}, $h \geq 2$:
\[\hat{\sigma}_{T+h} = \sqrt{ \tilde{\sigma}^2 + \left( \alpha + \beta \right)^{h1} \left( \hat{\sigma}_{T+1}  \tilde{\sigma}^2 \right) }\] 
To estimate an asset aggregated volatility^{12} over the next $h$ periods:
\[\hat{\sigma}_{T+1:T+h} = \sqrt{h} \hat{\sigma}_{T+1}\]
How to determine the parameters of a GARCH(1,1) model?
The parameters of a GARCH(1,1) model  either $\omega$, $\alpha$ and $\beta$ or $\alpha$, $\beta$, $\gamma$ and $\tilde{\sigma}^2$  are typically determined by maximum likelihood estimation (MLE) with a Gaussian^{13} or Student’s $t$ assumption for the distribution of the innovations.
A note of caution, though.
There are plenty of software packages able to do this estimation, but the underlying optimization problem has been documented to be numerically difficult and prone to error^{14} due to a one dimensional manifold in the parameter space where the likelihood function is large and almost constant^{14}, which tends to “trap” numerical algorithms.
Possible remediations have been suggested in Zumbach^{14} and in Kristensen and Linton^{15}, like reformulating the optimization problem in an alternative parameter space or using a closedform estimator for the GARCH(1,1) parameters that does not rely on any numerical optimization procedure, but unfortunately, these remediations are not sufficient due to the problematic^{16} finite sample behavior of the maximum likelihood estimates…
Implementation in Portfolio Optimizer
Portfolio Optimizer implements the GARCH(1,1) volatility forecasting model through the endpoint /assets/volatility/forecast/garch
.
This endpoint supports the 4 variance proxies below:
 Squared closetoclose returns
 Demeaned squared closetoclose returns
 The Parkinson range
 The jumpadjusted Parkinson range
Internally, this endpoint:
 Assumes that the asset unconditional variance $\tilde{\sigma}^2$ is equal to its longterm average value $\frac{1}{T} \sum_{t=1}^{T} \tilde{\sigma}^2_t$
 Automatically determines the optimal value of the GARCH(1,1) parameters $\alpha$, $\beta$ and $\gamma$ using a proprietary numerical optimization procedure
Example of usage  Volatility forecasting at monthly level for various ETFs
As an example of usage, I propose to enrich the results of the previous blog post, in which monthly forecasts produced by different volatility models are compared  using MincerZarnowitz^{17} regressions  to the next month’s closetoclose observed volatility for 10 ETFs representative^{18} of misc. asset classes:
 U.S. stocks (SPY ETF)
 European stocks (EZU ETF)
 Japanese stocks (EWJ ETF)
 Emerging markets stocks (EEM ETF)
 U.S. REITs (VNQ ETF)
 International REITs (RWX ETF)
 U.S. 710 year Treasuries (IEF ETF)
 U.S. 20+ year Treasuries (TLT ETF)
 Commodities (DBC ETF)
 Gold (GLD ETF)
Averaged results for all ETFs/regression models over each ETF price history^{19} are the following^{20}:
Volatility model  Variance proxy  $\bar{\alpha}$  $\bar{\beta}$  $\bar{R^2}$ 

Random walk  Squared closetoclose returns  5.8%  0.66  44% 
SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days  Squared closetoclose returns  5.8%  0.68  46% 
EWMA, optimal $\lambda$  Squared closetoclose returns  4.7%  0.73  45% 
GARCH(1,1)  Squared closetoclose returns  1.3%  0.98  43% 
Random walk  Parkinson range  5.6%  0.94  44% 
SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days  Parkinson range  5.1%  1.00  47% 
EWMA, optimal $\lambda$  Parkinson range  4.3%  1.06  48% 
GARCH(1,1)  Parkinson range  2.7%  1.18  47% 
Random walk  Jumpadjusted Parkinson range  4.9%  0.70  45% 
SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days  Jumpadjusted Parkinson range  5.1%  0.71  47% 
EWMA, optimal $\lambda$  Jumpadjusted Parkinson range  4.0%  0.76  45% 
GARCH(1,1)  Jumpadjusted Parkinson range  1.0%  1.00  45% 
From these, it is possible to conclude the following:
 The two GARCH(1,1) models using improved variance proxies produce volatility forecasts with better rsquared than the GARCH(1,1) model using squared returns (lines #8 and #12 v.s. line #4), which is in agreement with Molnar^{8}
 The two GARCH(1,1) models using variance proxies that integrate close prices produce nearly unbiased forecasts (lines #4 and #12), which, together with their relatively high rsquared, makes them volatility forecasting models to recommend in these cases
 The GARCH(1,1) using the Parkinson range as variance proxy produces the most biased forecasts (line #8), which makes it a volatility forecasting model to avoid in this case
Conclusion
The GARCH(1,1) volatility forecasting model exhibits good practical performances for a wide range of assets, as empirically demonstrated in the previous section.
Nevertheless, because it is unable to describe certain aspects often found in financial data^{3}, many variations have been proposed in the literature^{3} (AGARCH, EGARCH, QGARCH, TGARCH…).
Next in this series, I will detail such a variation  very recent^{21}  whose main characteristic is its capability to adapt to timevarying GARCH parameters.
Meanwhile, feel free to connect with me on LinkedIn or to follow me on Twitter.
–

See Boudoukh, J., Richardson, M., & Whitelaw, R.F. (1997). Investigation of a class of volatility estimators, Journal of Derivatives, 4 Spring, 6371. ↩ ↩^{2}

Or more generally, of a weighted moving average of one of its past variance proxies. ↩

See Brandon Williams, GARCH(1,1) models, B. Sc. Thesis, 15. Juli 2011. ↩ ↩^{2} ↩^{3}

See Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3), 307–327. ↩ ↩^{2}

See Andrew J. Patton, Volatility forecast comparison using imperfect volatility proxies, Journal of Econometrics, Volume 160, Issue 1, 2011, Pages 246256. ↩ ↩^{2}

See Daniel B. Nelson and Charles Q. Cao, Inequality Constraints in the Univariate GARCH Model, Journal of Business & Economic Statistics, Vol. 10, No. 2 (Apr., 1992), pp. 229235. ↩

See Pelagatti, M., Lisi, F. (2009). Variance initialisation in GARCH estimation. In Paganoni, A.M., Sangalli, L.M., Secchi, P., Vantini, S. (eds.), S.Co. 2009 Sixth Conference Complex Data Modeling and Computationally Intensive Statistical Methods for Estimation and Prediction, Maggioli Editore, Milan. ↩

See Peter Molnar (2016): Highlow range in GARCH models of stock return volatility, Applied Economics. ↩ ↩^{2} ↩^{3} ↩^{4} ↩^{5} ↩^{6}

Also called the asset longterm variance. ↩

More precisely, a convex combination. ↩

Which is not surprising since in fact, exponential smoothing is a constrained version of GARCH (1,1)^{1}, without meanreversion. ↩

See Brooks, Chris and Persand, Gitanjali (2003) Volatility forecasting for risk management. Journal of Forecasting, 22(1). pp. 122. ↩ ↩^{2}

In which case, the Gaussian MLE is usually considered as a quasimaximum likelihood estimate. ↩

See Zumbach, G. (2000). The Pitfalls in Fitting Garch(1,1) Processes. In: Dunis, C.L. (eds) Advances in Quantitative Asset Management. Studies in Computational Finance, vol 1. Springer, Boston, MA. ↩ ↩^{2} ↩^{3}

See Dennis Kristensen and Oliver Linton, A ClosedForm Estimator for the GARCH(1,1) Model, Econometric Theory, Vol. 22, No. 2 (Apr., 2006), pp. 323337. ↩

See Mincer, J. and V. Zarnowitz (1969). The evaluation of economic forecasts. In J. Mincer (Ed.), Economic Forecasts and Expectations. ↩

These ETFs are used in the Adaptative Asset Allocation strategy from ReSolve Asset Management, described in the paper Adaptive Asset Allocation: A Primer^{22}. ↩

The common ending price history of all the ETFs is 31 August 2023, but there is no common starting price history, as all ETFs started trading on different dates. ↩

For all models, I used an expanding window for the volatility forecast computation. ↩

At the date of publication of this blog post. ↩

See Butler, Adam and Philbrick, Mike and Gordillo, Rodrigo and Varadi, David, Adaptive Asset Allocation: A Primer. ↩