Jekyll2024-03-11T07:31:35-05:00https://portfoliooptimizer.io/feed.xmlPortfolio OptimizerPortfolio Optimizer is a Web API democratizing the access to the Nobel Prize-winning science of portfolio optimization.Roman R.Volatility Forecasting: GARCH(1,1) Model2024-03-11T00:00:00-05:002024-03-11T00:00:00-05:00https://portfoliooptimizer.io/blog/volatility-forecasting-garch11-model<p>In the <a href="/blog/volatility-forecasting-simple-and-exponentially-weighted-moving-average-models/">previous post</a> of this series on volatility forecasting, I described the simple and the exponentially
weighted moving average volatility forecasting models.</p>
<p>In particular, I showed that these two models belong to the generic family of weighted moving average volatility forecasting models<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote">1</a></sup>, whose members
represent the volatility of an asset as a <a href="https://en.wikipedia.org/wiki/Moving_average">weighted moving average</a> of its past squared returns<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote">2</a></sup>.</p>
<p>Another member of this family is the <em><a href="https://en.wikipedia.org/wiki/Autoregressive_conditional_heteroskedasticity">Generalized AutoRegressive Conditional Heteroscedasticity (GARCH) model</a></em>,
<em>widely used in financial time series modelling and implemented in most statistics and econometric software packages</em><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup>.</p>
<p>In this blog post, I will detail the <em>simplest but often very useful</em><sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote">4</a></sup> GARCH(1,1) volatility forecasting model and I will illustrate its practical performances in the context of monthly volatility forecasting for various ETFs.</p>
<h2 id="mathematical-preliminaries-reminders">Mathematical preliminaries (reminders)</h2>
<p>This section contains reminders from a <a href="/blog/volatility-forecasting-simple-and-exponentially-weighted-moving-average-models/">previous blog post</a>.</p>
<h3 id="volatility-modelling-and-volatility-proxies">Volatility modelling and volatility proxies</h3>
<p>Let $r_t$ be the (<a href="https://en.wikipedia.org/wiki/Rate_of_return#Logarithmic_or_continuously_compounded_return">logarithmic</a>) return of an asset over a time period $t$ (a day, a week, a month..).</p>
<p>Then:</p>
<ul>
<li>
<p>The asset (conditional) variance is defined as $ \sigma_t^2 = \mathbb{E} \left[ r_t^2 \right] $</p>
<p>From this definition, the squared return $r_t^2$ of an asset is a (noisy<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote">5</a></sup>) <em>variance estimator</em> - or <em>variance proxy</em><sup id="fnref:6:1" role="doc-noteref"><a href="#fn:6" class="footnote">5</a></sup> - for that asset variance over the considered time period.</p>
<p>Another example of an asset variance proxy is <a href="/blog/range-based-volatility-estimators-overview-and-examples-of-usage/">the Parkinson range</a> of an asset.</p>
<p>The generic notation for an asset variance proxy in this blog post is $\tilde{\sigma}_t^2$.</p>
</li>
<li>
<p>The asset (conditional) volatility is defined as $ \sigma_t = \sqrt { \sigma_t^2 } $</p>
<p>The generic notation for an asset volatility proxy in this blog post is $\tilde{\sigma}_t$.</p>
</li>
</ul>
<h3 id="weighted-moving-average-volatility-forecasting-model">Weighted moving average volatility forecasting model</h3>
<p>Boudoukh et al.<sup id="fnref:3:1" role="doc-noteref"><a href="#fn:3" class="footnote">1</a></sup> shows that many seemingly different methods of volatility forecasting actually share the same underlying representation of the estimate of an asset next period’s variance $\hat{\sigma}_{T+1}^2$ as
a weighted moving average of that asset past periods’ variance proxies $\tilde{\sigma}^2_t$, $t=1..T$, with</p>
\[\hat{\sigma}_{T+1}^2 = w_0 + \sum_{i=1}^{k} w_i \tilde{\sigma}^2_{T+1-i}\]
<p>, where:</p>
<ul>
<li>$1 \leq k \leq T$ is the size of the moving average, possibly time-dependent</li>
<li>$w_i, i=0..k$ are the weights of the moving average, possibly time-dependent as well</li>
</ul>
<h2 id="garch11-volatility-forecasting-model">GARCH(1,1) volatility forecasting model</h2>
<h3 id="the-garchpq-model">The GARCH(p,q) model</h3>
<h4 id="definition">Definition</h4>
<p>Bollerslev<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote">4</a></sup>’s GARCH model is a generalization of Engle’s <a href="https://en.wikipedia.org/wiki/Autoregressive_conditional_heteroskedasticity">ARCH</a> <a href="https://en.wikipedia.org/wiki/Econometric_model">econometric model</a>
which captures the time-varying nature of the (conditional) variance of certain time series like asset returns.</p>
<p>Under a GARCH(p,q) model, an asset next period’s conditional variance $\sigma_{T+1}^2$ is modeled as recursive linear function of its own $p$ lagged conditional variances $\sigma_{T}^2, \sigma_{T-1}^2…$ and of its $q$
lagged squared returns $r_{T}^2, r_{T-1}^2…$, which leads to the formula</p>
\[\hat{\sigma}_{T+1}^2 = \omega + \sum_{i=1}^p \beta_i \hat{\sigma}_{T+1-i}^2+ \sum_{j=1}^q \alpha_j r_{T+1-i}^2\]
<p>, where:</p>
<ul>
<li>The parameters $\omega$, $\alpha_j$, $j=1..q$ and $\beta_i$, $i=1..p$ are non-negative and subject to various inequality constraints depending on working assumptions<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote">6</a></sup></li>
<li>The initial conditional variance $\hat{\sigma}_1^2$ is usually taken equal to $r_1^2$, but c.f. Pelagatti and Lisi<sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote">7</a></sup> for a thorough discussion about this subject</li>
</ul>
<h4 id="squared-returns-vs-generic-variance-proxy">Squared returns v.s. generic variance proxy</h4>
<p>Molnar<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote">8</a></sup> notes that <em>in GARCH type of models, demeaned squared returns serve as a way to calculate innovations to the volatility</em><sup id="fnref:9:1" role="doc-noteref"><a href="#fn:9" class="footnote">8</a></sup> so that <em>replacing the squared returns by more
precise volatility estimates will produce better GARCH models, regarding both in-sample fit and out-of-sample forecasting performance</em><sup id="fnref:9:2" role="doc-noteref"><a href="#fn:9" class="footnote">8</a></sup>.</p>
<p>Molnar<sup id="fnref:9:3" role="doc-noteref"><a href="#fn:9" class="footnote">8</a></sup> then proposes to modify the GARCH(p,q) model for the estimation of an asset next period’s conditional variance $\sigma_{T+1}^2$ as follows</p>
\[\hat{\sigma}_{T+1}^2 = \omega + \sum_{i=1}^p \beta_i \hat{\sigma}_{T+1-i}^2+ \sum_{j=1}^q \alpha_j \tilde{\sigma}_{T+1-i}^2\]
<p>, where $\tilde{\sigma}^2_t$, $t=1..T$ are the asset past periods’ variance proxies.</p>
<p>To be noted that replacing squared returns by less noisy variance proxies is already discussed at length in the
<a href="/blog/volatility-forecasting-simple-and-exponentially-weighted-moving-average-models/">previous blog post</a> in the case of the simple and the exponentially weighted moving average volatility forecasting models.</p>
<h3 id="the-garch11-model">The GARCH(1,1) model</h3>
<h4 id="definition-1">Definition</h4>
<p>Because the GARCH(1,1) model <em>works surprisingly well in comparison with much more complex [GARCH] models</em><sup id="fnref:9:4" role="doc-noteref"><a href="#fn:9" class="footnote">8</a></sup>, it is usually the main GARCH model used in practice.</p>
<p>Under this model, the generic GARCH formula for the estimate of an asset next period’s conditional variance can be re-parametrized as follows</p>
\[\hat{\sigma}_{T+1}^2 = \gamma \tilde{\sigma}^2 + \alpha \tilde{\sigma}^2_{T} + \beta \hat{\sigma}_{T}^2\]
<p>, where:</p>
<ul>
<li>$\alpha$, $\beta$ and $\gamma$ are positive parameters summing to one</li>
<li>$\tilde{\sigma}^2$ is a strictly positive parameter, corresponding to the asset unconditional variance<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote">9</a></sup></li>
</ul>
<p>The GARCH(1,1) model thus estimates an asset next period’s conditional variance $\hat{\sigma}_{T+1}^2$ as a weighted average<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote">10</a></sup> of three different variance estimators:</p>
<ul>
<li>A long-term variance estimator $\tilde{\sigma}^2$</li>
<li>A short-term variance estimator $\tilde{\sigma}^2_{T}$</li>
<li>The current GARCH(1,1) variance estimator $\hat{\sigma}_{T}^2$</li>
</ul>
<p>and the weights $\alpha$, $\beta$ and $\gamma$ determine the speed with which the model adapts to short-term variance v.s. reverts to its long-term variance.</p>
<h4 id="relationship-with-the-generic-weighted-moving-average-model">Relationship with the generic weighted moving average model</h4>
<p>By developing the recursive definition of the GARCH(1,1) model, it is possible to see that it is a specific kind of weighted moving average volatility forecasting model, with:</p>
<ul>
<li>$k = T$</li>
<li>$w_0 = \gamma \sum_{k=0}^{T-1} \beta^k$</li>
<li>$w_1 = \alpha$, $w_2 = \alpha \beta$, …, $w_{T-1} = \alpha \beta^{T-2}$, $w_T = \alpha \beta^{T-1}$, that is, exponentially decreasing weights emphasizing recent past variance proxies v.s. more distant ones in the model, exactly like in the exponentially weighted moving average volatility forecasting model<sup id="fnref:14" role="doc-noteref"><a href="#fn:14" class="footnote">11</a></sup></li>
</ul>
<h4 id="volatility-forecasting-formulas">Volatility forecasting formulas</h4>
<p>Under a GARCH(1,1) volatility forecasting model, the generic weighted moving average volatility forecasting formula becomes:</p>
<ul>
<li>
<p>To estimate an asset next period’s volatility:</p>
\[\hat{\sigma}_{T+1} = \sqrt{ \gamma \tilde{\sigma}^2 + \alpha \tilde{\sigma}^2_{T} + \beta \hat{\sigma}_{T}^2 }\]
</li>
<li>
<p>To estimate an asset next $h$-period’s ahead volatility<sup id="fnref:16" role="doc-noteref"><a href="#fn:16" class="footnote">12</a></sup>, $h \geq 2$:</p>
\[\hat{\sigma}_{T+h} = \sqrt{ \tilde{\sigma}^2 + \left( \alpha + \beta \right)^{h-1} \left( \hat{\sigma}_{T+1} - \tilde{\sigma}^2 \right) }\]
</li>
<li>
<p>To estimate an asset aggregated volatility<sup id="fnref:16:1" role="doc-noteref"><a href="#fn:16" class="footnote">12</a></sup> over the next $h$ periods:</p>
\[\hat{\sigma}_{T+1:T+h} = \sqrt{h} \hat{\sigma}_{T+1}\]
</li>
</ul>
<h4 id="how-to-determine-the-parameters-of-a-garch11-model">How to determine the parameters of a GARCH(1,1) model?</h4>
<p>The parameters of a GARCH(1,1) model - either $\omega$, $\alpha$ and $\beta$ or $\alpha$, $\beta$, $\gamma$ and $\tilde{\sigma}^2$ - are typically determined by
<a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation">maximum likelihood estimation (MLE)</a> with a Gaussian<sup id="fnref:18" role="doc-noteref"><a href="#fn:18" class="footnote">13</a></sup> or Student’s $t$ assumption for the distribution of the innovations.</p>
<p>A note of caution, though.</p>
<p>There are plenty of software packages able to do this estimation, but the underlying optimization problem <em>has been documented to be numerically
difficult and prone to error</em><sup id="fnref:15" role="doc-noteref"><a href="#fn:15" class="footnote">14</a></sup> due to <em>a one dimensional manifold in the parameter space where the likelihood function is large and almost constant</em><sup id="fnref:15:1" role="doc-noteref"><a href="#fn:15" class="footnote">14</a></sup>, which tends to “trap” numerical algorithms.</p>
<p>Possible remediations have been suggested in Zumbach<sup id="fnref:15:2" role="doc-noteref"><a href="#fn:15" class="footnote">14</a></sup> and in Kristensen and Linton<sup id="fnref:17" role="doc-noteref"><a href="#fn:17" class="footnote">15</a></sup>, like reformulating the optimization problem in an alternative parameter space or using
a closed-form estimator for the GARCH(1,1) parameters that does not rely on any numerical optimization procedure, but unfortunately, these remediations are not sufficient due to the
problematic<sup id="fnref:19" role="doc-noteref"><a href="#fn:19" class="footnote">16</a></sup> finite sample behavior of the maximum likelihood estimates…</p>
<h2 id="implementation-in-portfolio-optimizer">Implementation in Portfolio Optimizer</h2>
<p><strong>Portfolio Optimizer</strong> implements the GARCH(1,1) volatility forecasting model through the endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/assets/volatility/forecast/garch</code></a>.</p>
<p>This endpoint supports the 4 variance proxies below:</p>
<ul>
<li>Squared close-to-close returns</li>
<li>Demeaned squared close-to-close returns</li>
<li>The Parkinson range</li>
<li>The jump-adjusted Parkinson range</li>
</ul>
<p>Internally, this endpoint:</p>
<ul>
<li>Assumes that the asset unconditional variance $\tilde{\sigma}^2$ is equal to its long-term average value $\frac{1}{T} \sum_{t=1}^{T} \tilde{\sigma}^2_t$</li>
<li>Automatically determines the optimal value of the GARCH(1,1) parameters $\alpha$, $\beta$ and $\gamma$ using a proprietary numerical optimization procedure</li>
</ul>
<h2 id="example-of-usage---volatility-forecasting-at-monthly-level-for-various-etfs">Example of usage - Volatility forecasting at monthly level for various ETFs</h2>
<p>As an example of usage, I propose to enrich the results of <a href="/blog/volatility-forecasting-simple-and-exponentially-weighted-moving-average-models/">the previous blog post</a>, in which
monthly forecasts produced by different volatility models are compared - using Mincer-Zarnowitz<sup id="fnref:21" role="doc-noteref"><a href="#fn:21" class="footnote">17</a></sup> regressions - to the next month’s close-to-close observed volatility for 10 ETFs representative<sup id="fnref:20" role="doc-noteref"><a href="#fn:20" class="footnote">18</a></sup> of misc. asset classes:</p>
<ul>
<li>U.S. stocks (SPY ETF)</li>
<li>European stocks (EZU ETF)</li>
<li>Japanese stocks (EWJ ETF)</li>
<li>Emerging markets stocks (EEM ETF)</li>
<li>U.S. REITs (VNQ ETF)</li>
<li>International REITs (RWX ETF)</li>
<li>U.S. 7-10 year Treasuries (IEF ETF)</li>
<li>U.S. 20+ year Treasuries (TLT ETF)</li>
<li>Commodities (DBC ETF)</li>
<li>Gold (GLD ETF)</li>
</ul>
<p>Averaged results for all ETFs/regression models over each ETF price history<sup id="fnref:22" role="doc-noteref"><a href="#fn:22" class="footnote">19</a></sup> are the following<sup id="fnref:23" role="doc-noteref"><a href="#fn:23" class="footnote">20</a></sup>:</p>
<table>
<thead>
<tr>
<th>Volatility model</th>
<th>Variance proxy</th>
<th>$\bar{\alpha}$</th>
<th>$\bar{\beta}$</th>
<th>$\bar{R^2}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>Random walk</td>
<td>Squared close-to-close returns</td>
<td>5.8%</td>
<td>0.66</td>
<td>44%</td>
</tr>
<tr>
<td>SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days</td>
<td>Squared close-to-close returns</td>
<td>5.8%</td>
<td>0.68</td>
<td>46%</td>
</tr>
<tr>
<td>EWMA, optimal $\lambda$</td>
<td>Squared close-to-close returns</td>
<td>4.7%</td>
<td>0.73</td>
<td>45%</td>
</tr>
<tr>
<td><strong>GARCH(1,1)</strong></td>
<td><strong>Squared close-to-close returns</strong></td>
<td><strong>-1.3%</strong></td>
<td><strong>0.98</strong></td>
<td><strong>43%</strong></td>
</tr>
<tr>
<td>Random walk</td>
<td>Parkinson range</td>
<td>5.6%</td>
<td>0.94</td>
<td>44%</td>
</tr>
<tr>
<td>SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days</td>
<td>Parkinson range</td>
<td>5.1%</td>
<td>1.00</td>
<td>47%</td>
</tr>
<tr>
<td>EWMA, optimal $\lambda$</td>
<td>Parkinson range</td>
<td>4.3%</td>
<td>1.06</td>
<td>48%</td>
</tr>
<tr>
<td><strong>GARCH(1,1)</strong></td>
<td><strong>Parkinson range</strong></td>
<td><strong>2.7%</strong></td>
<td><strong>1.18</strong></td>
<td><strong>47%</strong></td>
</tr>
<tr>
<td>Random walk</td>
<td>Jump-adjusted Parkinson range</td>
<td>4.9%</td>
<td>0.70</td>
<td>45%</td>
</tr>
<tr>
<td>SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days</td>
<td>Jump-adjusted Parkinson range</td>
<td>5.1%</td>
<td>0.71</td>
<td>47%</td>
</tr>
<tr>
<td>EWMA, optimal $\lambda$</td>
<td>Jump-adjusted Parkinson range</td>
<td>4.0%</td>
<td>0.76</td>
<td>45%</td>
</tr>
<tr>
<td><strong>GARCH(1,1)</strong></td>
<td><strong>Jump-adjusted Parkinson range</strong></td>
<td><strong>-1.0%</strong></td>
<td><strong>1.00</strong></td>
<td><strong>45%</strong></td>
</tr>
</tbody>
</table>
<p>From these, it is possible to conclude the following:</p>
<ul>
<li>The two GARCH(1,1) models using improved variance proxies produce volatility forecasts with better r-squared than the GARCH(1,1) model using squared returns (lines #8 and #12 v.s. line #4), which is in agreement with Molnar<sup id="fnref:9:5" role="doc-noteref"><a href="#fn:9" class="footnote">8</a></sup></li>
<li>The two GARCH(1,1) models using variance proxies that integrate close prices produce nearly unbiased forecasts (lines #4 and #12), which, together with their relatively high r-squared, makes them volatility forecasting models to recommend in these cases</li>
<li>The GARCH(1,1) using the Parkinson range as variance proxy produces the most biased forecasts (line #8), which makes it a volatility forecasting model to avoid in this case</li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>The GARCH(1,1) volatility forecasting model exhibits good practical performances for a wide range of assets, as empirically demonstrated in the previous section.</p>
<p>Nevertheless, because it is <em>unable to describe certain aspects often found in financial data</em><sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup>, many variations have been proposed in the literature<sup id="fnref:2:2" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup> (AGARCH, EGARCH, QGARCH, TGARCH…).</p>
<p>Next in this series, I will detail such a variation - very recent<sup id="fnref:24" role="doc-noteref"><a href="#fn:24" class="footnote">21</a></sup> - whose main characteristic is its capability to adapt to time-varying GARCH parameters.</p>
<p>Meanwhile, feel free to <a href="https://www.linkedin.com/in/roman-rubsamen/">connect with me on LinkedIn</a> or to <a href="https://twitter.com/portfoliooptim">follow me on Twitter</a>.</p>
<p>–</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:3" role="doc-endnote">
<p>See <a href="https://www.pm-research.com/content/iijderiv/4/3/63">Boudoukh, J., Richardson, M., & Whitelaw, R.F. (1997). Investigation of a class of volatility estimators, Journal of Derivatives, 4 Spring, 63-71</a>. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:3:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>Or more generally, of a weighted moving average of one of its past variance proxies. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>See <a href="https://math.berkeley.edu/~btw/thesis4.pdf">Brandon Williams, GARCH(1,1) models, B. Sc. Thesis, 15. Juli 2011</a>. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:2:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:1" role="doc-endnote">
<p>See <a href="">Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3), 307–327</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/abs/pii/S030440761000076X">Andrew J. Patton, Volatility forecast comparison using imperfect volatility proxies, Journal of Econometrics, Volume 160, Issue 1, 2011, Pages 246-256</a>. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:6:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>See <a href="https://www.jstor.org/stable/1391681">Daniel B. Nelson and Charles Q. Cao, Inequality Constraints in the Univariate GARCH Model, Journal of Business & Economic Statistics, Vol. 10, No. 2 (Apr., 1992), pp. 229-235</a>. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:13" role="doc-endnote">
<p>See <a href="https://www.researchgate.net/publication/237530561_VARIANCE_INITIALISATION_IN_GARCH_ESTIMATION">Pelagatti, M., Lisi, F. (2009). Variance initialisation in GARCH estimation. In Paganoni, A.M., Sangalli, L.M., Secchi, P., Vantini, S. (eds.), S.Co. 2009 Sixth Conference Complex Data Modeling and Computationally Intensive Statistical Methods for Estimation and Prediction, Maggioli Editore, Milan</a>. <a href="#fnref:13" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>See <a href="http://dx.doi.org/10.1080/00036846.2016.1170929">Peter Molnar (2016): High-low range in GARCH models of stock return volatility, Applied Economics</a>. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:9:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:9:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:9:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:9:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a> <a href="#fnref:9:5" class="reversefootnote" role="doc-backlink">↩<sup>6</sup></a></p>
</li>
<li id="fn:11" role="doc-endnote">
<p>Also called the asset long-term variance. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:12" role="doc-endnote">
<p>More precisely, a convex combination. <a href="#fnref:12" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:14" role="doc-endnote">
<p>Which is not surprising since <em>in fact, exponential smoothing is a constrained version of GARCH (1,1)</em><sup id="fnref:3:2" role="doc-noteref"><a href="#fn:3" class="footnote">1</a></sup>, without mean-reversion. <a href="#fnref:14" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:16" role="doc-endnote">
<p>See <a href="https://centaur.reading.ac.uk/21316/">Brooks, Chris and Persand, Gitanjali (2003) Volatility forecasting for risk management. Journal of Forecasting, 22(1). pp. 1-22</a>. <a href="#fnref:16" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:16:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:18" role="doc-endnote">
<p>In which case, the Gaussian MLE is usually considered as a <a href="https://en.wikipedia.org/wiki/Quasi-maximum_likelihood_estimate">quasi-maximum likelihood estimate</a>. <a href="#fnref:18" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:15" role="doc-endnote">
<p>See <a href="https://link.springer.com/chapter/10.1007/978-1-4615-4389-3_8">Zumbach, G. (2000). The Pitfalls in Fitting Garch(1,1) Processes. In: Dunis, C.L. (eds) Advances in Quantitative Asset Management. Studies in Computational Finance, vol 1. Springer, Boston, MA</a>. <a href="#fnref:15" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:15:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:15:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:17" role="doc-endnote">
<p>See <a href="https://www.jstor.org/stable/4093228">Dennis Kristensen and Oliver Linton, A Closed-Form Estimator for the GARCH(1,1) Model, Econometric Theory, Vol. 22, No. 2 (Apr., 2006), pp. 323-337</a>. <a href="#fnref:17" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:19" role="doc-endnote">
<p>See for example <a href="https://ntguardian.wordpress.com/2017/11/02/problems-estimating-garch-parameters-r/">here</a> and <a href="https://ntguardian.wordpress.com/2019/01/28/problems-estimating-garch-parameters-r-part-2-rugarch/">there</a>. <a href="#fnref:19" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:21" role="doc-endnote">
<p>See <a href="https://econpapers.repec.org/bookchap/nbrnberch/1214.htm">Mincer, J. and V. Zarnowitz (1969). The evaluation of economic forecasts. In J. Mincer (Ed.), Economic Forecasts and Expectations</a>. <a href="#fnref:21" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:20" role="doc-endnote">
<p>These ETFs are used in the <em>Adaptative Asset Allocation</em> strategy from <a href="https://investresolve.com/">ReSolve Asset Management</a>, described in the paper <em>Adaptive Asset Allocation: A Primer</em><sup id="fnref:25" role="doc-noteref"><a href="#fn:25" class="footnote">22</a></sup>. <a href="#fnref:20" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:22" role="doc-endnote">
<p>The common ending price history of all the ETFs is 31 August 2023, but there is no common starting price history, as all ETFs started trading on different dates. <a href="#fnref:22" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:23" role="doc-endnote">
<p>For all models, I used an expanding window for the volatility forecast computation. <a href="#fnref:23" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:24" role="doc-endnote">
<p>At the date of publication of this blog post. <a href="#fnref:24" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:25" role="doc-endnote">
<p>See <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2328254">Butler, Adam and Philbrick, Mike and Gordillo, Rodrigo and Varadi, David, Adaptive Asset Allocation: A Primer</a>. <a href="#fnref:25" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Roman R.In the previous post of this series on volatility forecasting, I described the simple and the exponentially weighted moving average volatility forecasting models. In particular, I showed that these two models belong to the generic family of weighted moving average volatility forecasting models1, whose members represent the volatility of an asset as a weighted moving average of its past squared returns2. Another member of this family is the Generalized AutoRegressive Conditional Heteroscedasticity (GARCH) model, widely used in financial time series modelling and implemented in most statistics and econometric software packages3. In this blog post, I will detail the simplest but often very useful4 GARCH(1,1) volatility forecasting model and I will illustrate its practical performances in the context of monthly volatility forecasting for various ETFs. Mathematical preliminaries (reminders) This section contains reminders from a previous blog post. Volatility modelling and volatility proxies Let $r_t$ be the (logarithmic) return of an asset over a time period $t$ (a day, a week, a month..). Then: The asset (conditional) variance is defined as $ \sigma_t^2 = \mathbb{E} \left[ r_t^2 \right] $ From this definition, the squared return $r_t^2$ of an asset is a (noisy5) variance estimator - or variance proxy5 - for that asset variance over the considered time period. Another example of an asset variance proxy is the Parkinson range of an asset. The generic notation for an asset variance proxy in this blog post is $\tilde{\sigma}_t^2$. The asset (conditional) volatility is defined as $ \sigma_t = \sqrt { \sigma_t^2 } $ The generic notation for an asset volatility proxy in this blog post is $\tilde{\sigma}_t$. Weighted moving average volatility forecasting model Boudoukh et al.1 shows that many seemingly different methods of volatility forecasting actually share the same underlying representation of the estimate of an asset next period’s variance $\hat{\sigma}_{T+1}^2$ as a weighted moving average of that asset past periods’ variance proxies $\tilde{\sigma}^2_t$, $t=1..T$, with \[\hat{\sigma}_{T+1}^2 = w_0 + \sum_{i=1}^{k} w_i \tilde{\sigma}^2_{T+1-i}\] , where: $1 \leq k \leq T$ is the size of the moving average, possibly time-dependent $w_i, i=0..k$ are the weights of the moving average, possibly time-dependent as well GARCH(1,1) volatility forecasting model The GARCH(p,q) model Definition Bollerslev4’s GARCH model is a generalization of Engle’s ARCH econometric model which captures the time-varying nature of the (conditional) variance of certain time series like asset returns. Under a GARCH(p,q) model, an asset next period’s conditional variance $\sigma_{T+1}^2$ is modeled as recursive linear function of its own $p$ lagged conditional variances $\sigma_{T}^2, \sigma_{T-1}^2…$ and of its $q$ lagged squared returns $r_{T}^2, r_{T-1}^2…$, which leads to the formula \[\hat{\sigma}_{T+1}^2 = \omega + \sum_{i=1}^p \beta_i \hat{\sigma}_{T+1-i}^2+ \sum_{j=1}^q \alpha_j r_{T+1-i}^2\] , where: The parameters $\omega$, $\alpha_j$, $j=1..q$ and $\beta_i$, $i=1..p$ are non-negative and subject to various inequality constraints depending on working assumptions6 The initial conditional variance $\hat{\sigma}_1^2$ is usually taken equal to $r_1^2$, but c.f. Pelagatti and Lisi7 for a thorough discussion about this subject Squared returns v.s. generic variance proxy Molnar8 notes that in GARCH type of models, demeaned squared returns serve as a way to calculate innovations to the volatility8 so that replacing the squared returns by more precise volatility estimates will produce better GARCH models, regarding both in-sample fit and out-of-sample forecasting performance8. Molnar8 then proposes to modify the GARCH(p,q) model for the estimation of an asset next period’s conditional variance $\sigma_{T+1}^2$ as follows \[\hat{\sigma}_{T+1}^2 = \omega + \sum_{i=1}^p \beta_i \hat{\sigma}_{T+1-i}^2+ \sum_{j=1}^q \alpha_j \tilde{\sigma}_{T+1-i}^2\] , where $\tilde{\sigma}^2_t$, $t=1..T$ are the asset past periods’ variance proxies. To be noted that replacing squared returns by less noisy variance proxies is already discussed at length in the previous blog post in the case of the simple and the exponentially weighted moving average volatility forecasting models. The GARCH(1,1) model Definition Because the GARCH(1,1) model works surprisingly well in comparison with much more complex [GARCH] models8, it is usually the main GARCH model used in practice. Under this model, the generic GARCH formula for the estimate of an asset next period’s conditional variance can be re-parametrized as follows \[\hat{\sigma}_{T+1}^2 = \gamma \tilde{\sigma}^2 + \alpha \tilde{\sigma}^2_{T} + \beta \hat{\sigma}_{T}^2\] , where: $\alpha$, $\beta$ and $\gamma$ are positive parameters summing to one $\tilde{\sigma}^2$ is a strictly positive parameter, corresponding to the asset unconditional variance9 The GARCH(1,1) model thus estimates an asset next period’s conditional variance $\hat{\sigma}_{T+1}^2$ as a weighted average10 of three different variance estimators: A long-term variance estimator $\tilde{\sigma}^2$ A short-term variance estimator $\tilde{\sigma}^2_{T}$ The current GARCH(1,1) variance estimator $\hat{\sigma}_{T}^2$ and the weights $\alpha$, $\beta$ and $\gamma$ determine the speed with which the model adapts to short-term variance v.s. reverts to its long-term variance. Relationship with the generic weighted moving average model By developing the recursive definition of the GARCH(1,1) model, it is possible to see that it is a specific kind of weighted moving average volatility forecasting model, with: $k = T$ $w_0 = \gamma \sum_{k=0}^{T-1} \beta^k$ $w_1 = \alpha$, $w_2 = \alpha \beta$, …, $w_{T-1} = \alpha \beta^{T-2}$, $w_T = \alpha \beta^{T-1}$, that is, exponentially decreasing weights emphasizing recent past variance proxies v.s. more distant ones in the model, exactly like in the exponentially weighted moving average volatility forecasting model11 Volatility forecasting formulas Under a GARCH(1,1) volatility forecasting model, the generic weighted moving average volatility forecasting formula becomes: To estimate an asset next period’s volatility: \[\hat{\sigma}_{T+1} = \sqrt{ \gamma \tilde{\sigma}^2 + \alpha \tilde{\sigma}^2_{T} + \beta \hat{\sigma}_{T}^2 }\] To estimate an asset next $h$-period’s ahead volatility12, $h \geq 2$: \[\hat{\sigma}_{T+h} = \sqrt{ \tilde{\sigma}^2 + \left( \alpha + \beta \right)^{h-1} \left( \hat{\sigma}_{T+1} - \tilde{\sigma}^2 \right) }\] To estimate an asset aggregated volatility12 over the next $h$ periods: \[\hat{\sigma}_{T+1:T+h} = \sqrt{h} \hat{\sigma}_{T+1}\] How to determine the parameters of a GARCH(1,1) model? The parameters of a GARCH(1,1) model - either $\omega$, $\alpha$ and $\beta$ or $\alpha$, $\beta$, $\gamma$ and $\tilde{\sigma}^2$ - are typically determined by maximum likelihood estimation (MLE) with a Gaussian13 or Student’s $t$ assumption for the distribution of the innovations. A note of caution, though. There are plenty of software packages able to do this estimation, but the underlying optimization problem has been documented to be numerically difficult and prone to error14 due to a one dimensional manifold in the parameter space where the likelihood function is large and almost constant14, which tends to “trap” numerical algorithms. Possible remediations have been suggested in Zumbach14 and in Kristensen and Linton15, like reformulating the optimization problem in an alternative parameter space or using a closed-form estimator for the GARCH(1,1) parameters that does not rely on any numerical optimization procedure, but unfortunately, these remediations are not sufficient due to the problematic16 finite sample behavior of the maximum likelihood estimates… Implementation in Portfolio Optimizer Portfolio Optimizer implements the GARCH(1,1) volatility forecasting model through the endpoint /assets/volatility/forecast/garch. This endpoint supports the 4 variance proxies below: Squared close-to-close returns Demeaned squared close-to-close returns The Parkinson range The jump-adjusted Parkinson range Internally, this endpoint: Assumes that the asset unconditional variance $\tilde{\sigma}^2$ is equal to its long-term average value $\frac{1}{T} \sum_{t=1}^{T} \tilde{\sigma}^2_t$ Automatically determines the optimal value of the GARCH(1,1) parameters $\alpha$, $\beta$ and $\gamma$ using a proprietary numerical optimization procedure Example of usage - Volatility forecasting at monthly level for various ETFs As an example of usage, I propose to enrich the results of the previous blog post, in which monthly forecasts produced by different volatility models are compared - using Mincer-Zarnowitz17 regressions - to the next month’s close-to-close observed volatility for 10 ETFs representative18 of misc. asset classes: U.S. stocks (SPY ETF) European stocks (EZU ETF) Japanese stocks (EWJ ETF) Emerging markets stocks (EEM ETF) U.S. REITs (VNQ ETF) International REITs (RWX ETF) U.S. 7-10 year Treasuries (IEF ETF) U.S. 20+ year Treasuries (TLT ETF) Commodities (DBC ETF) Gold (GLD ETF) Averaged results for all ETFs/regression models over each ETF price history19 are the following20: Volatility model Variance proxy $\bar{\alpha}$ $\bar{\beta}$ $\bar{R^2}$ Random walk Squared close-to-close returns 5.8% 0.66 44% SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days Squared close-to-close returns 5.8% 0.68 46% EWMA, optimal $\lambda$ Squared close-to-close returns 4.7% 0.73 45% GARCH(1,1) Squared close-to-close returns -1.3% 0.98 43% Random walk Parkinson range 5.6% 0.94 44% SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days Parkinson range 5.1% 1.00 47% EWMA, optimal $\lambda$ Parkinson range 4.3% 1.06 48% GARCH(1,1) Parkinson range 2.7% 1.18 47% Random walk Jump-adjusted Parkinson range 4.9% 0.70 45% SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days Jump-adjusted Parkinson range 5.1% 0.71 47% EWMA, optimal $\lambda$ Jump-adjusted Parkinson range 4.0% 0.76 45% GARCH(1,1) Jump-adjusted Parkinson range -1.0% 1.00 45% From these, it is possible to conclude the following: The two GARCH(1,1) models using improved variance proxies produce volatility forecasts with better r-squared than the GARCH(1,1) model using squared returns (lines #8 and #12 v.s. line #4), which is in agreement with Molnar8 The two GARCH(1,1) models using variance proxies that integrate close prices produce nearly unbiased forecasts (lines #4 and #12), which, together with their relatively high r-squared, makes them volatility forecasting models to recommend in these cases The GARCH(1,1) using the Parkinson range as variance proxy produces the most biased forecasts (line #8), which makes it a volatility forecasting model to avoid in this case Conclusion The GARCH(1,1) volatility forecasting model exhibits good practical performances for a wide range of assets, as empirically demonstrated in the previous section. Nevertheless, because it is unable to describe certain aspects often found in financial data3, many variations have been proposed in the literature3 (AGARCH, EGARCH, QGARCH, TGARCH…). Next in this series, I will detail such a variation - very recent21 - whose main characteristic is its capability to adapt to time-varying GARCH parameters. Meanwhile, feel free to connect with me on LinkedIn or to follow me on Twitter. – See Boudoukh, J., Richardson, M., & Whitelaw, R.F. (1997). Investigation of a class of volatility estimators, Journal of Derivatives, 4 Spring, 63-71. ↩ ↩2 Or more generally, of a weighted moving average of one of its past variance proxies. ↩ See Brandon Williams, GARCH(1,1) models, B. Sc. Thesis, 15. Juli 2011. ↩ ↩2 ↩3 See Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3), 307–327. ↩ ↩2 See Andrew J. Patton, Volatility forecast comparison using imperfect volatility proxies, Journal of Econometrics, Volume 160, Issue 1, 2011, Pages 246-256. ↩ ↩2 See Daniel B. Nelson and Charles Q. Cao, Inequality Constraints in the Univariate GARCH Model, Journal of Business & Economic Statistics, Vol. 10, No. 2 (Apr., 1992), pp. 229-235. ↩ See Pelagatti, M., Lisi, F. (2009). Variance initialisation in GARCH estimation. In Paganoni, A.M., Sangalli, L.M., Secchi, P., Vantini, S. (eds.), S.Co. 2009 Sixth Conference Complex Data Modeling and Computationally Intensive Statistical Methods for Estimation and Prediction, Maggioli Editore, Milan. ↩ See Peter Molnar (2016): High-low range in GARCH models of stock return volatility, Applied Economics. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 Also called the asset long-term variance. ↩ More precisely, a convex combination. ↩ Which is not surprising since in fact, exponential smoothing is a constrained version of GARCH (1,1)1, without mean-reversion. ↩ See Brooks, Chris and Persand, Gitanjali (2003) Volatility forecasting for risk management. Journal of Forecasting, 22(1). pp. 1-22. ↩ ↩2 In which case, the Gaussian MLE is usually considered as a quasi-maximum likelihood estimate. ↩ See Zumbach, G. (2000). The Pitfalls in Fitting Garch(1,1) Processes. In: Dunis, C.L. (eds) Advances in Quantitative Asset Management. Studies in Computational Finance, vol 1. Springer, Boston, MA. ↩ ↩2 ↩3 See Dennis Kristensen and Oliver Linton, A Closed-Form Estimator for the GARCH(1,1) Model, Econometric Theory, Vol. 22, No. 2 (Apr., 2006), pp. 323-337. ↩ See for example here and there. ↩ See Mincer, J. and V. Zarnowitz (1969). The evaluation of economic forecasts. In J. Mincer (Ed.), Economic Forecasts and Expectations. ↩ These ETFs are used in the Adaptative Asset Allocation strategy from ReSolve Asset Management, described in the paper Adaptive Asset Allocation: A Primer22. ↩ The common ending price history of all the ETFs is 31 August 2023, but there is no common starting price history, as all ETFs started trading on different dates. ↩ For all models, I used an expanding window for the volatility forecast computation. ↩ At the date of publication of this blog post. ↩ See Butler, Adam and Philbrick, Mike and Gordillo, Rodrigo and Varadi, David, Adaptive Asset Allocation: A Primer. ↩Random Portfolio Benchmarking: Simulation-based Performance Evaluation in Finance2024-02-06T00:00:00-06:002024-02-06T00:00:00-06:00https://portfoliooptimizer.io/blog/random-portfolio-benchmarking-simulation-based-performance-evaluation-in-finance<p>As noted in Surz<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup>, <em>the question “Is [a mutual fund’s]<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote">2</a></sup> performance good?” can only be answered relative to something</em><sup id="fnref:4:1" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup>, typically by comparing that fund to a benchmark like a financial index or to a peer group.</p>
<p>Unfortunately, these two methodologies are not without issues. For example, it is very difficult to create an index <em>captur[ing] the essence of the people, process, and philosophy behind an investment product</em><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup> and peer groups have well-known biases<sup id="fnref:4:2" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup>
(classification bias, survivorship bias…).</p>
<p>In a series of papers (Surz<sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup>, Surz<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote">4</a></sup>, Surz<sup id="fnref:4:3" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup>…), Ronald J. Surz proposes an <em>innovation that combines the better aspects of both [methodologies] while eliminating their undesirable properties</em><sup id="fnref:2:2" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup>. This innovation
consists in evaluating a fund against the fund manager’s <em>true opportunity set</em><sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote">5</a></sup>, defined as the set of <em>all of the possible portfolios that the manager could have conceivably
held following his unique investment process</em><sup id="fnref:3:1" role="doc-noteref"><a href="#fn:3" class="footnote">4</a></sup>.</p>
<p>In practice, the fund manager’s opportunity set is approximated by the simulation of thousands of random portfolios in the same universe of assets as the one of the fund manager and satisfying the same constraints (long-only,
long-short…) and rules (portfolio rebalancing rules…) as those of the fund manager. Then, because these portfolios do not exhibit any particular skill<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote">6</a></sup>, they can be <em>used as the control group to test [the]
fund manager skill</em><sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote">5</a></sup>, thus allowing to apply <em>modern statistics to the problem of performance evaluation</em><sup id="fnref:3:2" role="doc-noteref"><a href="#fn:3" class="footnote">4</a></sup>.</p>
<p>In this blog post, I will describe how to generate random portfolios, detail Surz’s original methodology as well as some of its variations and illustrate the usage of random portfolios with a couple of examples like
the creation of synthetic benchmarks or the evaluation and monitoring of trading strategies.</p>
<h2 id="mathematical-preliminaries">Mathematical preliminaries</h2>
<h3 id="definition">Definition</h3>
<p>Let be:</p>
<ul>
<li>$n$ the number of assets in a universe of assets</li>
<li>$\mathcal{C} \subset \mathbb{R}^{n}$ a subset of $ \mathbb{R}^{n}$ representing the constraints imposed<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote">7</a></sup> on a fund manager when investing in that universe of assets, for example:
<ul>
<li>Short sale constraints</li>
<li>Concentration constraints (assets, sectors, industries…)</li>
<li>Leverage constraints</li>
<li>Cardinality constraints (i.e., minimum or maximum number of assets constraints)</li>
<li>Portfolio volatility constraints</li>
<li>Portfolio tracking error constraints</li>
<li>…</li>
</ul>
</li>
</ul>
<p>Then, a <em>(constrained) random portfolio</em> in that universe of assets is a vector of portfolio weights $w \in \mathbb{R}^{n}$ generated at random over the set $\mathcal{C}$.</p>
<h3 id="generation-at-random-vs-generation-uniformly-at-random">Generation at random v.s. generation uniformly at random</h3>
<p>In the context of performance evaluation, as in Surz’s methodology, it is theoretically preferable that random portfolios are generated <em>uniformly</em> at random over the set $\mathcal{C}$, so as not to introduce any biases<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote">8</a></sup>
which would otherwise defeat the purpose of using these portfolios as an unbiased control group<sup id="fnref:2:3" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup>.</p>
<p>Figure 1 and Figure 2 illustrate the difference between random portfolios generated at random v.s. uniformly at random in the case of a three-asset universe subject to long-only
and full investment constraints<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote">9</a></sup>:</p>
<ul>
<li>In Figure 1, the random portfolios are visibly concentrated in the middle of the standard 2-simplex in $\mathbb{R}^3$<sup id="fnref:11:1" role="doc-noteref"><a href="#fn:11" class="footnote">9</a></sup> - these random portfolios are NOT generated uniformly at random</li>
<li>In Figure 2, the random portfolios seem to be “well spread” over the standard 2-simplex in $\mathbb{R}^3$<sup id="fnref:11:2" role="doc-noteref"><a href="#fn:11" class="footnote">9</a></sup> - these random portfolios are generated uniformly at random</li>
</ul>
<figure>
<a href="/assets/images/blog/random-portfolio-benchmarking-non-uniform-random-generation-shaw.png"><img src="/assets/images/blog/random-portfolio-benchmarking-non-uniform-random-generation-shaw.png" alt="Random portfolios not generated uniformly at random over a standard simplex, three-asset universe. Source: Shaw." /></a>
<figcaption>Figure 1. Random portfolios not generated uniformly at random over a standard simplex, three-asset universe. Source: Shaw.</figcaption>
</figure>
<figure>
<a href="/assets/images/blog/random-portfolio-benchmarking-uniform-random-generation-shaw.png"><img src="/assets/images/blog/random-portfolio-benchmarking-uniform-random-generation-shaw.png" alt="Random portfolios generated uniformly at random over a standard simplex, three-asset universe. Source: Shaw." /></a>
<figcaption>Figure 2. Random portfolios generated uniformly at random over a standard simplex, three-asset universe. Source: Shaw.</figcaption>
</figure>
<p>One important remark, though, is that real-life portfolios are usually binding on at least one of their constraints, so that generating random portfolios biased toward the boundary of the geometrical
object associated to the constraints set $\mathcal{C}$ might not be a real problem in practice<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote">10</a></sup>.</p>
<h3 id="generation-of-random-portfolios-over-the-standard-simplex">Generation of random portfolios over the standard simplex</h3>
<p>When the constraints imposed on a portfolio are 1) a full investment constraint and 2) a long-only constraint, the subset $C$ is then equal to</p>
\[C = \{w \in \mathbb{R}^{n} \textrm{ s.t. } \sum_{i=1}^n w_i = 1, w_i \geq 0, i = 1..n\}\]
<p>and the geometrical object associated to that subset is called<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote">11</a></sup> a <a href="https://en.wikipedia.org/wiki/Simplex#Standard_simplex">standard simplex</a>, already illustrated in Figure 1 and Figure 2 with $n = 3$.</p>
<p>Several algorithms exist to generate points uniformly at random over a standard simplex<sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote">12</a></sup>, among which:</p>
<ul>
<li>An algorithm based on differences of sorted <a href="https://en.wikipedia.org/wiki/Continuous_uniform_distribution">uniform random variables</a></li>
<li>An algorithm based on normalized <a href="https://en.wikipedia.org/wiki/Exponential_distribution">unit exponential random variables</a></li>
</ul>
<h3 id="generation-of-random-portfolios-over-the-restricted-standard-simplex">Generation of random portfolios over the restricted standard simplex</h3>
<p>When the constraints imposed on a portfolio are 1) a full investment constraint, 2) a long-only constraint and 3) minimum/maximum asset weights constraints, the subset $C$ is then equal to</p>
\[C = \{w \in \mathbb{R}^{n} \textrm{ s.t. } \sum_{i=1}^n w_i = 1, 1 \geq u_i \geq w_i \geq l_i \geq 0, i = 1..n\}\]
<p>, where:</p>
<ul>
<li>$l_1, …, l_n$ represent minimum asset weights constraints</li>
<li>$u_1, …, u_n$ represent maximum asset weights constraints</li>
</ul>
<p>The geometrical object associated to that subset is a “restricted” standard simplex, as illustrated in Figure 3, with:</p>
<ul>
<li>$n = 3$</li>
<li>$ 0.7 \geq w_1 \geq 0.1$</li>
<li>$ 0.8 \geq w_2 \geq 0 $</li>
<li>$ 0.6 \geq w_3 \geq 0.1 $</li>
</ul>
<figure>
<a href="/assets/images/blog/random-portfolio-benchmarking-restricted-simplex-borkowski-piepel.png"><img src="/assets/images/blog/random-portfolio-benchmarking-restricted-simplex-borkowski-piepel.png" alt="Example of restricted standard simplex, three-asset universe. Source: Piepel." /></a>
<figcaption>Figure 3. Example of restricted standard simplex, three-asset universe. Source: Piepel.</figcaption>
</figure>
<p>Generating points uniformly at random over a restricted standard simplex is much more complex than generating points uniformly at random over a standard simplex.</p>
<p>Hopefully, this problem has been studied at least since the 1990’s by people working in the statistical domain of the <a href="https://en.wikipedia.org/wiki/Design_of_experiments">design of experiments</a>,
and an algorithm based on the <a href="https://en.wikipedia.org/wiki/Conditional_probability_distribution">conditional distribution method</a> has been published<sup id="fnref:15" role="doc-noteref"><a href="#fn:15" class="footnote">13</a></sup> in 2000.</p>
<h3 id="generation-of-random-portfolios-over-a-convex-polytope">Generation of random portfolios over a convex polytope</h3>
<p>When the constraints imposed on a portfolio consist in generic linear constraints, the subset $C$ is then equal to</p>
\[C = \{w \in \mathbb{R}^{n} \textrm{ s.t. } A_e w = b_e, A_i w \leq b_i \}\]
<p>, where:</p>
<ul>
<li>$A_e \in \mathbb{R}^{n_e \times n}$ and $b_e \in \mathbb{R}^{n_e}$, $n_e \geq 1$, represent linear equality constraints</li>
<li>$A_i \in \mathbb{R}^{n_i \times n}$ and $b_i \in \mathbb{R}^{n_i}$, $n_i \geq 1$, represent linear inequality constraints</li>
</ul>
<p>The geometrical object associated to that subset is called a <a href="https://en.wikipedia.org/wiki/Convex_polytope">convex polytope</a>, illustrated in Figure 4, with:</p>
<ul>
<li>$n = 3$</li>
<li>$\sum_{i=1}^n w_i = 1$</li>
<li>$1 \geq w_i \geq 0, i = 1..n$</li>
<li>$w_1 > w_2$</li>
<li>$2w_3 > w_2 > 0.5w_3$</li>
</ul>
<figure>
<a href="/assets/images/blog/random-portfolio-benchmarking-general-simplex-tervonen.png"><img src="/assets/images/blog/random-portfolio-benchmarking-general-simplex-tervonen.png" alt="Example of standard simplex with additional linear inequality constraints, three-asset universe. Source: Tervonen et al." /></a>
<figcaption>Figure 4. Example of standard simplex with additional linear inequality constraints, three-asset universe. Source: Tervonen et al.</figcaption>
</figure>
<p>As one can guess, generating points uniformly at random over a convex polytope is another level higher in terms of complexity, and while some algorithms exist to do so<sup id="fnref:17" role="doc-noteref"><a href="#fn:17" class="footnote">14</a></sup>, they are impractical in high dimension.</p>
<p>From the literature, what is possible to achieve instead is to generate points <em>asymptotically</em> uniformly at random, using <a href="https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo">Markov chain Monte Carlo (MCMC)</a> algorithms
like the Hit-And-Run algorithm<sup id="fnref:18" role="doc-noteref"><a href="#fn:18" class="footnote">15</a></sup>.</p>
<h3 id="generation-of-random-portfolios-over-a-generic-constraint-set">Generation of random portfolios over a generic constraint set</h3>
<p>When the constraints imposed on a portfolio are generic, that is, quadratic (volatility or tracking error constraints) and/or non-convex (threshold constraints) and/or integer (maximum number of assets constraints,
round lot constraints), the final boss has arrived.</p>
<p>With such constraints, the only reasonable<sup id="fnref:22" role="doc-noteref"><a href="#fn:22" class="footnote">16</a></sup> approach in the literature seems to recast the problem of generating random points over the constraint set $C$ as a (penalized) optimization problem and use
<a href="https://en.wikipedia.org/wiki/Genetic_algorithm">genetic algorithms</a> to solve it<sup id="fnref:10:1" role="doc-noteref"><a href="#fn:10" class="footnote">10</a></sup>.</p>
<p>The underlying idea is to randomly<sup id="fnref:20" role="doc-noteref"><a href="#fn:20" class="footnote">17</a></sup> minimize an objective function $f$ essentially made of penalties - the higher the distance of a point $w \in \mathbb{R}^{n}$ from the constraint set $C$, the higher the value of the objective function $f(w)$ -,
so that at optimum, a random point satisfying all the constraints<sup id="fnref:19" role="doc-noteref"><a href="#fn:19" class="footnote">18</a></sup> is found.</p>
<p>Of course, the points generated this way are not generated uniformly at random, but when the constraints are fully generic, that requirement should probably be dropped altogether as mentioned in Dawson and Young<sup id="fnref:9:1" role="doc-noteref"><a href="#fn:9" class="footnote">8</a></sup>.</p>
<h2 id="investment-funds-performance-evaluation-with-random-portfolios">Investment fund’s performance evaluation with random portfolios</h2>
<h3 id="rationale">Rationale</h3>
<p>Evaluating the performance of an investment fund is actually a <a href="https://en.wikipedia.org/wiki/Statistical_hypothesis_testing">statistical hypothesis test</a> in disguise<sup id="fnref:23" role="doc-noteref"><a href="#fn:23" class="footnote">19</a></sup>, in which:</p>
<ul>
<li>The null hypothesis is <em>The investment fund’s exhibit no particular “performance” over the considered time period</em></li>
<li>The <a href="https://en.wikipedia.org/wiki/Test_statistic">test statistic</a> is a quantitative measure of “performance” over the considered time period (e.g. annualized return, holding period return, risk-adjusted return…)</li>
<li>The (empirical) distribution of the test statistic under the null hypothesis is computed from a sample made of either
<ul>
<li>One observation - the investment fund’s benchmark</li>
<li>A small number of observations - the investment fund’s peer group</li>
<li>Any desired number of observations - random portfolios generated from the investment fund’s universe of assets and obeying to the fund’s constraints and rules</li>
</ul>
</li>
</ul>
<p>From this perspective, using a benchmark, a peer group or random portfolios for performance evaluation <em>is essentially a choice between sampling approaches</em><sup id="fnref:4:4" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup>.</p>
<p>Still, random portfolios represent a more rigorous approach to performance evaluation than the two other alternatives, for various reasons highlighted in different papers<sup id="fnref:4:5" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup><sup id="fnref:9:2" role="doc-noteref"><a href="#fn:9" class="footnote">8</a></sup><sup id="fnref:10:2" role="doc-noteref"><a href="#fn:10" class="footnote">10</a></sup><sup id="fnref:23:1" role="doc-noteref"><a href="#fn:23" class="footnote">19</a></sup> and summarized by Dawson<sup id="fnref:9:3" role="doc-noteref"><a href="#fn:9" class="footnote">8</a></sup> as follows</p>
<blockquote>
<p>A set of uniformly distributed, stochastically generated, portfolios that by construction incorporate no investment strategy, bias or skill form an effective control set for any portfolio measurement metric. This allows the true value of a strategy or model to be identified. They also provide a mechanism to differentiate between effects due to “market conditions” and effects due to either the management of a portfolio, or the constraints the management is obliged to work within.</p>
</blockquote>
<h3 id="surzs-methodology">Surz’s methodology</h3>
<p>Using random portfolios to evaluate an investment fund’s performance over a considered time period is a two-step <a href="https://en.wikipedia.org/wiki/Monte_Carlo_method">Monte Carlo</a> simulation process<sup id="fnref:30" role="doc-noteref"><a href="#fn:30" class="footnote">20</a></sup>:</p>
<ol>
<li>
<p>Modeling step</p>
<p>Identify the fund’s main characteristics.</p>
</li>
<li>
<p>Computational step</p>
</li>
</ol>
<ul>
<li>
<p>Random portfolios simulation</p>
<p>Generate random portfolios compatible with the identified fund’s main characteristics and simulate their evolution through the considered time period.</p>
</li>
<li>
<p>Performance evaluation</p>
<p>Determine the level of statistical significance of the fund’s performance over the considered time period using the previously simulated random portfolios.</p>
</li>
</ul>
<h4 id="step-1---identifying-the-funds-main-characteristics">Step 1 - Identifying the fund’s main characteristics</h4>
<p>Dawson<sup id="fnref:9:4" role="doc-noteref"><a href="#fn:9" class="footnote">8</a></sup> note that <em>an investment strategy is essentially a set of constraints [and rules] that are carefully constructed to (hopefully) remove, in aggregate, poor performing regions of the solution space</em><sup id="fnref:9:5" role="doc-noteref"><a href="#fn:9" class="footnote">8</a></sup>. Thus,
it is very important to identify as precisely as possible the main characteristics of the fund, even though <em>this […] information is not always available, as only the manager knows [it] exactly</em><sup id="fnref:23:2" role="doc-noteref"><a href="#fn:23" class="footnote">19</a></sup>.</p>
<p>From a practical perspective, these characteristics need to be provided as<sup id="fnref:24" role="doc-noteref"><a href="#fn:24" class="footnote">21</a></sup>:</p>
<ul>
<li>A universe of $n$ assets modeling the fund’s universe of assets</li>
<li>A constraint set $\mathcal{C} \subset \mathbb{R}^{n}$ modeling the fund’s constraints</li>
<li>An ensemble of decision rules modeling<sup id="fnref:25" role="doc-noteref"><a href="#fn:25" class="footnote">22</a></sup> the fund’s rebalancing rules</li>
</ul>
<p>For examples of this step in different contexts, c.f. Kothari and Warner<sup id="fnref:29" role="doc-noteref"><a href="#fn:29" class="footnote">23</a></sup> and Surz<sup id="fnref:32" role="doc-noteref"><a href="#fn:32" class="footnote">24</a></sup>.</p>
<h4 id="step-2a---simulating-random-portfolios">Step 2a - Simulating random portfolios</h4>
<p>Thanks to the previous step, it becomes possible to simulate thousands of random portfolios that the fund’s manager could have potentially held over the considered time period.</p>
<p>The only difficulty at this point is computational, due to the impact of the constraint set on the algorithmic machinery required to generate portfolio weights at random.</p>
<p>As a side note, and <em>for a fair comparison [with the fund under analysis], transaction costs, as well as all other kinds of costs, must be considered</em><sup id="fnref:10:3" role="doc-noteref"><a href="#fn:10" class="footnote">10</a></sup> when simulating random portfolios.</p>
<h4 id="step-2b---evaluating-the-funds-performance">Step 2b - Evaluating the fund’s performance</h4>
<p>The ranking of the fund’s performance against the performance of all the simulated random portfolios provides a direct measure of the statistical significance of that fund’s performance<sup id="fnref:4:6" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup>.</p>
<p>Indeed, under the null hypothesis that the investment fund’s exhibit no particular performance over the considered time period, <a href="https://en.wikipedia.org/wiki/P-value">the $p$-value</a> of the statistical hypothesis test
mentioned at the beginning of this section is defined as<sup id="fnref:26" role="doc-noteref"><a href="#fn:26" class="footnote">25</a></sup></p>
\[p = \frac{n_x + 1}{N + 1}\]
<p>, where:</p>
<ul>
<li>$n_x$ is the the number of random portfolios whose performance over the considered time period is as extreme or more extreme than that of the fund under evaluation</li>
<li>$N$ is the number of simulated random portfolios</li>
</ul>
<p>This idea is illustrated in Figure 5, which depicts:</p>
<ul>
<li>The distribution of the holding period return of 1000 random portfolios made of stocks belonging to the S&P 500 and subject to misc. turnover and cardinality constraints (solid line)</li>
<li>The 95th percentile of that distribution (dashed line)</li>
</ul>
<figure>
<a href="/assets/images/blog/random-portfolio-benchmarking-performance-evaluation-stein.png"><img src="/assets/images/blog/random-portfolio-benchmarking-performance-evaluation-stein.png" alt="Distribution of the holding period return of random portfolios simulated from stocks belonging to the S&P 500, 2005 - 2010. Source: Stein." /></a>
<figcaption>Figure 5. Distribution of the holding period return of random portfolios simulated from stocks belonging to the S&P 500, 2005 - 2010. Source: Stein.</figcaption>
</figure>
<p>From this figure, if we were to evaluate the performances of a fund <em>trading these securities and operating under constraints similar to those simulated, [it] would have to obtain a return equal of at least 80%
during the 6 year period for the null of no particular performance to be rejected</em><sup id="fnref:1:2" role="doc-noteref"><a href="#fn:1" class="footnote">5</a></sup> at a 95% confidence level.</p>
<h3 id="variations-on-surzs-methodology">Variations on Surz’s methodology</h3>
<p>Burns<sup id="fnref:10:4" role="doc-noteref"><a href="#fn:10" class="footnote">10</a></sup><sup id="fnref:26:1" role="doc-noteref"><a href="#fn:26" class="footnote">25</a></sup> and Lisi<sup id="fnref:23:3" role="doc-noteref"><a href="#fn:23" class="footnote">19</a></sup> both extend Surz’s methodology from one time period to several time periods - which allows <em>to consider the persistence of results over time</em><sup id="fnref:23:4" role="doc-noteref"><a href="#fn:23" class="footnote">19</a></sup> - by describing different ways of
combining individual $p$-values<sup id="fnref:28" role="doc-noteref"><a href="#fn:28" class="footnote">26</a></sup>. As a by-product,
Burns<sup id="fnref:10:5" role="doc-noteref"><a href="#fn:10" class="footnote">10</a></sup><sup id="fnref:26:2" role="doc-noteref"><a href="#fn:26" class="footnote">25</a></sup> empirically demonstrates that Stouffer’s method<sup id="fnref:27" role="doc-noteref"><a href="#fn:27" class="footnote">27</a></sup> for combining $p$-values should be preferred to <a href="https://en.wikipedia.org/wiki/Fisher%27s_method">Fisher’s</a> when evaluating a fund’s performance
over multiple time periods.</p>
<p>Stein<sup id="fnref:1:3" role="doc-noteref"><a href="#fn:1" class="footnote">5</a></sup> proposes to replace Surz’s methodology by testing <em>whether the distribution of returns of the [fund under evaluation] is
<a href="https://en.wikipedia.org/wiki/Stochastic_ordering">stochastically greater</a> than that of [a] chosen percentile random fund</em> using the <a href="https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test">Mann–Whitney <em>U</em> test</a>.</p>
<h3 id="caveats">Caveats</h3>
<p>Surz<sup id="fnref:4:7" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup> warns that <em>there are many ways to implement a [Monte Carlo simulation] approach [with random portfolios], some better than others, and some worse</em><sup id="fnref:4:8" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup>.</p>
<p>In particular, Surz<sup id="fnref:4:9" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup> argues that, when selecting a number of assets at random to satisfy portfolio cardinality constraints, it is required to use <em>value-weighted sampling, so that the probability of choosing a given [asset] is proportionate to its outstanding
capitalization</em><sup id="fnref:4:10" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup>. Otherwise, some <em>macroeconomic [in]consistency</em><sup id="fnref:4:11" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup> would be introduced by the Monte Carlo simulation process, which would ultimately bias the end results. This point is confirmed for both
U.S. and global stocks in Arnott et al.<sup id="fnref:28:1" role="doc-noteref"><a href="#fn:28" class="footnote">26</a></sup>, who show that random portfolios <em>introduce, often unintentionally, value and small cap tilts</em><sup id="fnref:28:2" role="doc-noteref"><a href="#fn:28" class="footnote">26</a></sup>. Nevertheless, other authors<sup id="fnref:23:5" role="doc-noteref"><a href="#fn:23" class="footnote">19</a></sup> argue on the contrary that
using equally-weighted sampling is more representative of the behavior of an “unskilled” manager, which is exactly what random portfolios are supposed to model!</p>
<p>More generally, random portfolios might have “unfair” characteristics v.s. the fund under evaluation (higher volatility, higher beta, lower turnover…), which must either be acknowledged or controlled for depending on the exact circumstances.</p>
<h2 id="implementations">Implementations</h2>
<h3 id="implementation-in-portfolio-optimizer">Implementation in <strong>Portfolio Optimizer</strong></h3>
<p>Through the endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/portfolios/simulation/random</code></a>, <strong>Portfolio Optimizer</strong> allows to:</p>
<ul>
<li>Generate the weights of a random portfolio subject to general linear inequality constraints</li>
<li>Simulate the evolution of a random portfolio subject to general linear inequality constraints over a specified time period, with 3 different portfolio rebalancing strategies:
<ul>
<li>Buy and hold</li>
<li>Rebalancing toward the weights of the initial random portfolio</li>
<li>Rebalancing toward the weights of a new random portfolio</li>
</ul>
</li>
</ul>
<h3 id="implementations-elsewhere">Implementations elsewhere</h3>
<p>I am aware of two commercial implementations of random portfolios for performance evaluation as described in this blog post:</p>
<ul>
<li><strong><a href="https://www.portfolioprobe.com/">Portfolio Probe</a></strong>, from Patrick Burns<sup id="fnref:21" role="doc-noteref"><a href="#fn:21" class="footnote">28</a></sup> of <a href="https://www.burns-stat.com/">Burns Statistics</a></li>
<li><strong><a href="https://ppca-inc.com/PODs/pipods.htm">PIPODs</a></strong>, from Ronald Surz<sup id="fnref:21:1" role="doc-noteref"><a href="#fn:21" class="footnote">28</a></sup> of <a href="https://ppca-inc.com/">PPCA Inc.</a>, which seems discontinued</li>
</ul>
<h2 id="examples-of-usage">Examples of usage</h2>
<h3 id="hedge-funds-performance-evaluation">Hedge funds performance evaluation</h3>
<p>The performances of hedge funds are notoriously difficult to evaluate due to the lack of both proper benchmarks and homogeneous peer groups<sup id="fnref:32:1" role="doc-noteref"><a href="#fn:32" class="footnote">24</a></sup>.</p>
<p>Random portfolios offer a solution to this problem <em>because they can reflect the unique specifications of each individual hedge fund</em><sup id="fnref:32:2" role="doc-noteref"><a href="#fn:32" class="footnote">24</a></sup>.</p>
<p>More details can be found in Surz<sup id="fnref:32:3" role="doc-noteref"><a href="#fn:32" class="footnote">24</a></sup> or in Molyboga and Ahelec<sup id="fnref:33" role="doc-noteref"><a href="#fn:33" class="footnote">29</a></sup>.</p>
<h3 id="investment-strategy-returns-dispersion-analysis">Investment strategy returns dispersion analysis</h3>
<p>Kritzman and Page<sup id="fnref:31" role="doc-noteref"><a href="#fn:31" class="footnote">30</a></sup> uses random portfolios to compare the relative importance of different investment strategies (investment at global asset class level, at individual stock level, etc.)
and concludes that<sup id="fnref:31:1" role="doc-noteref"><a href="#fn:31" class="footnote">30</a></sup></p>
<blockquote>
<p>Contrary to perceived doctrine, dispersion around average performance arising from security selection is substantially greater than dispersion around average performance arising from all other investment choices. Moreover, asset allocation, widely considered the most important investment choice, produces the least dispersion; thus, from a normative perspective it is the least important investment choice.</p>
</blockquote>
<p>Figure 6, which quantifies that dispersion around average performance, <em>shows the extent to which a talented investor (top 25th or 5th percentile) could have improved upon average performance by engaging in various
investment choices across a global universe. It also shows how far below average an unlucky investor (bottom 75th or 95th percentile) could have performed, depending on the choice of investment discretion</em><sup id="fnref:31:2" role="doc-noteref"><a href="#fn:31" class="footnote">30</a></sup>.</p>
<figure>
<a href="/assets/images/blog/random-portfolio-benchmarking-returns-dispersion-kritzman.png"><img src="/assets/images/blog/random-portfolio-benchmarking-returns-dispersion-kritzman.png" alt="5th, 25th, 75th, and 95th percentile of the annualized difference from average performance of random portfolios simulated from misc. global asset classes, 1987-2001. Source: Kritzman and Page." /></a>
<figcaption>Figure 6. 5th, 25th, 75th, and 95th percentile of the annualized difference from average performance of random portfolios simulated from misc. global asset classes, 1987-2001. Source: Kritzman and Page.</figcaption>
</figure>
<p>While this conclusion has been criticized<sup id="fnref:34" role="doc-noteref"><a href="#fn:34" class="footnote">31</a></sup>, the underlying methodology - returns dispersion analysis - provides valuable insight to investors in that it allows to understand
<em>the potential impact of [any] investment choice, irrespective of investment behavior</em><sup id="fnref:31:3" role="doc-noteref"><a href="#fn:31" class="footnote">30</a></sup>.</p>
<p>For example, let’s suppose a French investor would like to passively invest in an MSCI World ETF, but is worried by both<sup id="fnref:35" role="doc-noteref"><a href="#fn:35" class="footnote">32</a></sup>:</p>
<ul>
<li>The massive ~70% weight of the United States in that index</li>
<li>The ridiculous ~3% weight of his home country in that index</li>
</ul>
<p>Thus, this investor would rather like to invest in an MSCI World ETF “tilted” away from the United States and toward France, although he has no precise idea about how to implement such a tilt.</p>
<p>Here, returns dispersion analysis through random portfolios can help our investor understand the impact on performances of his proposed tactical deviation from the baseline investment strategy, at least historically<sup id="fnref:38" role="doc-noteref"><a href="#fn:38" class="footnote">33</a></sup>.</p>
<p>For this, Figure 7 depicts the range of annualized returns achieved over the period 31 December 2008 - 29 December 2023 by 10000 random portfolios invested in the 23 countries of the MSCI World index when<sup id="fnref:36" role="doc-noteref"><a href="#fn:36" class="footnote">34</a></sup>:</p>
<ul>
<li>The weights of the United States and France are made randomly varying between 0% and 100%</li>
<li>The weights of all the other countries are kept constant v.s. their reference weights</li>
</ul>
<figure>
<a href="/assets/images/blog/random-portfolio-benchmarking-msci-world-france-tilt.png"><img src="/assets/images/blog/random-portfolio-benchmarking-msci-world-france-tilt.png" alt="Quartiles of the annualized returns of random portfolios from countries included in the MSCI World tilted away from the U.S. and toward France, 31 December 2008 - 29 December 2023." /></a>
<figcaption>Figure 7. Quartiles of the annualized returns of random portfolios from countries included in the MSCI World tilted away from the U.S. and toward France, 31 December 2008 - 29 December 2023.</figcaption>
</figure>
<p>It is clear from from Figure 7 that deviating from the MSCI World index by under-weighting the United States and over-weighting France has been a relatively bad strategy over the considered period,
with a median annualized return of ~8.30% compared to an annualized return of ~11.30% for the MSCI World<sup id="fnref:37" role="doc-noteref"><a href="#fn:37" class="footnote">35</a></sup>!</p>
<p>Whether history will repeat itself remains to be seen, but thanks to this historical returns dispersion analysis, our investor is at least informed to take an educated decision re. his proposed investment strategy.</p>
<h3 id="synthetic-benchmark-construction">Synthetic benchmark construction</h3>
<p>Stein<sup id="fnref:1:4" role="doc-noteref"><a href="#fn:1" class="footnote">5</a></sup> proposes to use random portfolios in order to construct a “synthetic” benchmark, representative of all possible investment strategies within a given universe
of assets and subject to a given set of constraints and rules.</p>
<p>In more details, Stein<sup id="fnref:1:5" role="doc-noteref"><a href="#fn:1" class="footnote">5</a></sup> proposes to construct such a benchmark directly from <em>the time series of returns of a [well chosen] single random portfolio</em><sup id="fnref:1:6" role="doc-noteref"><a href="#fn:1" class="footnote">5</a></sup>, which is typically the median<sup id="fnref:4:12" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup> random portfolio
w.r.t. a chosen performance measure.</p>
<p>A similar idea is discussed in Lisi<sup id="fnref:23:6" role="doc-noteref"><a href="#fn:23" class="footnote">19</a></sup>, in which such a benchmark is this time constructed from the cross-sectional returns of the random portfolios.</p>
<p>Figure 8 illustrates Lisi’s methodology; on it, the green line represents a synthetic benchmark made of the 95th percentile<sup id="fnref:45" role="doc-noteref"><a href="#fn:45" class="footnote">36</a></sup> of the distribution of the random portfolios holding period returns at each time $t$</p>
<figure>
<a href="/assets/images/blog/random-portfolio-benchmarking-synthetic-benchmark-lisi.png"><img src="/assets/images/blog/random-portfolio-benchmarking-synthetic-benchmark-lisi.png" alt="Time series of the 95th percentile of cross-sectional holding period returns of random portfolios. Source: Lisi." /></a>
<figcaption>Figure 8. Time series of the 95th percentile of cross-sectional holding period returns of random portfolios. Source: Lisi.</figcaption>
</figure>
<p>The possibility to create synthetic benchmarks is particularly useful when evaluating the performances of tactical asset allocation (TAA) strategies,
because their allocation might completely change from one period to another, making their comparison with a static benchmark - like a buy and hold portfolio or a 60/40 stock-bond portfolio - difficult to justify <em>a priori</em>.</p>
<p>For example, Figure 9 compares over the period 30th November 2006 - 31th January 2024<sup id="fnref:42" role="doc-noteref"><a href="#fn:42" class="footnote">37</a></sup>:</p>
<ul>
<li>The <em>Global Tactical Asset Allocation (GTTA)</em><sup id="fnref:39" role="doc-noteref"><a href="#fn:39" class="footnote">38</a></sup> strategy of <a href="https://mebfaber.com/">Mebane Faber</a>, which invests monthly - depending on the quantitative rules described in Faber<sup id="fnref:39:1" role="doc-noteref"><a href="#fn:39" class="footnote">38</a></sup> - within a five-asset universe made of:
<ul>
<li>U.S. Equities, represented by the SPY ETF</li>
<li>International Equities, represented by the EFA ETF</li>
<li>Intermediate-Term U.S. Treasury Bonds, represented by the IEF ETF</li>
<li>U.S. Real Estate, represented by the VNQ ETF</li>
<li>Global Commodities, represented by the DBC ETF</li>
</ul>
</li>
<li>An (equal-weighted) buy and hold portfolio within Faber’s five-asset universe, which is the benchmark of the <em>GTTA</em> strategy proposed in Faber<sup id="fnref:39:2" role="doc-noteref"><a href="#fn:39" class="footnote">38</a></sup></li>
<li>A synthetic benchmark <em>à la Lisi<sup id="fnref:23:7" role="doc-noteref"><a href="#fn:23" class="footnote">19</a></sup></em> constructed from the median cross-sectional holding period returns of 25000 random portfolios simulated<sup id="fnref:41" role="doc-noteref"><a href="#fn:41" class="footnote">39</a></sup> within Faber’s five-asset universe</li>
</ul>
<figure>
<a href="/assets/images/blog/random-portfolio-benchmarking-synthetic-benchmark-gtta.png"><img src="/assets/images/blog/random-portfolio-benchmarking-synthetic-benchmark-gtta.png" alt="Global Tactical Asset Allocation (GTTA) strategy v.s. misc. benchmarks, 30th November 2006 - 31th January 2024." /></a>
<figcaption>Figure 9. Global Tactical Asset Allocation (GTTA) strategy v.s. misc. benchmarks, 30th November 2006 - 31th January 2024.</figcaption>
</figure>
<p>In Figure 9, the buy and portfolio and the synthetic benchmark both appears to behave quite similarly<sup id="fnref:48" role="doc-noteref"><a href="#fn:48" class="footnote">40</a></sup>, but the synthetic benchmark should theoretically be preferred for performance comparison purposes
because it better reflects the dynamics of the <em>GTTA</em> strategy (varying asset weights, varying exposure) while guaranteeing the complete absence of skill<sup id="fnref:40" role="doc-noteref"><a href="#fn:40" class="footnote">41</a></sup>.</p>
<h3 id="trading-strategy-evaluation">Trading strategy evaluation</h3>
<p>Random portfolios also find applications in evaluating (quantitative) trading strategies.</p>
<p>One of these applications, described in Dawson<sup id="fnref:9:6" role="doc-noteref"><a href="#fn:9" class="footnote">8</a></sup>, consists in evaluating the significance of an alpha signal through the comparison of two different samples of random portfolios:</p>
<blockquote>
<p>It is […] very difficult to show conclusively the effect that a model, strategy […] has on the real investment process. […] The ideal solution would be to generate a set of portfolios, constrained in the same way as the portfolio(s) built with the theory or model under investigation, only without the information produced by the theory. It would then be possible to compare the distributions of the portfolio characteristics with and without the added information from the new theory, giving strong statistical evidence of the effects of the new information.</p>
</blockquote>
<p>As a practical illustration, let’s come back on Faber’s <em>GTTA</em> strategy introduced in the previous subsection and compare:</p>
<ul>
<li>A sample of 10000 random portfolios simulated within Faber’s five-asset universe, rebalanced each month toward a random portfolio invested<sup id="fnref:44" role="doc-noteref"><a href="#fn:44" class="footnote">42</a></sup> in all the assets selected for investment by the <em>GTTA</em> strategy rules</li>
<li>A sample of 10000 random portfolios simulated within Faber’s five-asset universe, rebalanced each month toward a random portfolio invested<sup id="fnref:44:1" role="doc-noteref"><a href="#fn:44" class="footnote">42</a></sup> in all the assets NOT selected for investment by the <em>GTTA</em> strategy rules</li>
</ul>
<p>The resulting ranges of annualized returns over the period 30th November 2006 - 31th January 2024 are displayed in Figure 10, which highlights a clear under-performance of the second sample of random portfolios.</p>
<figure>
<a href="/assets/images/blog/random-portfolio-benchmarking-evaluation-gtta.png"><img src="/assets/images/blog/random-portfolio-benchmarking-evaluation-gtta.png" alt="Quartiles of the annualized returns of the Global Tactical Asset Allocation (GTTA) strategy original asset selection rules v.s. inverted asset selection rules using random portfolios for weighting the selected assets, 30th November 2006 - 31th January 2024." /></a>
<figcaption>Figure 10. Quartiles of the annualized returns of the Global Tactical Asset Allocation (GTTA) strategy original asset selection rules v.s. inverted asset selection rules using random portfolios for weighting the selected assets, 30th November 2006 - 31th January 2024.</figcaption>
</figure>
<p>This kind of under-performance is exactly what is expected under the hypothesis that the <em>GTTA</em> strategy asset selection rules correspond to a true alpha signal.</p>
<p>Indeed, as Arnott et al.<sup id="fnref:28:3" role="doc-noteref"><a href="#fn:28" class="footnote">26</a></sup> puts it:</p>
<blockquote>
<p>In inverting the strategies, we tacitly examine whether these strategies outperform because they are predicated on meaningful investment theses and deep insights on capital markets, or for reasons unrelated to the investment theses. If the investment beliefs are the source of outperformance, then contradicting those beliefs should lead to underperformance.</p>
</blockquote>
<h3 id="trading-strategy-monitoring">Trading strategy monitoring</h3>
<p>Going beyond trading strategy evaluation, random portfolios can also be used to monitor how the performances of a trading strategy differ between in-sample and out-of-sample periods.</p>
<p>I propose to illustrate the associated process with another tactical asset allocation strategy, the <em>Global Equities Momentum (GEM)</em><sup id="fnref:46" role="doc-noteref"><a href="#fn:46" class="footnote">43</a></sup> strategy of <a href="https://dualmomentum.net/">Gary Antonacci</a>, which invests monthly -
depending on the quantitative rules described in Antonacci<sup id="fnref:46:1" role="doc-noteref"><a href="#fn:46" class="footnote">43</a></sup> - in one asset among a three-asset universe made of:</p>
<ul>
<li>U.S. Equities, represented by the S&P 500 Index</li>
<li>International Equities, represented by the MSCI ACWI ex-USA Index</li>
<li>U.S. Bonds, represented by the Barclays Capital US Aggregate Bond Index</li>
</ul>
<p>Because this strategy was published in 2013, let’s first check <em>GEM</em> performances during the in-sample period 1989-2012 (<a href="https://docs.google.com/spreadsheets/d/1jCLJNg8gHs_e1snq7wAuTP03ZUK_b1rUCXUto0_JRHU/edit?usp=sharing">Google Sheet</a>).</p>
<p>Figure 11 depicts the equity curves of a couple of random portfolios simulated<sup id="fnref:47" role="doc-noteref"><a href="#fn:47" class="footnote">44</a></sup> within Antonacci’s three-asset universe (in solid) v.s. the GEM equity curve (in dashed) over that period.</p>
<figure>
<a href="/assets/images/blog/gem-random-portfolios.png"><img src="/assets/images/blog/gem-random-portfolios.png" alt="Global Equities Momentum (GEM) strategy v.s. random portfolios, in-sample period 1989-2012" /></a>
<figcaption>Figure 11. Global Equities Momentum (GEM) strategy v.s. random portfolios, in-sample period 1989-2012.</figcaption>
</figure>
<p>For the real thing, Figure 12 depicts the Sharpe Ratio distribution of 10000 random portfolios simulated<sup id="fnref:47:1" role="doc-noteref"><a href="#fn:47" class="footnote">44</a></sup> within Antonacci’s three-asset universe over that period, with the
red line corresponding to the <em>GEM</em> Sharpe Ratio.</p>
<figure>
<a href="/assets/images/blog/gem-sharpe-ratios-in-sample.png"><img src="/assets/images/blog/gem-sharpe-ratios-in-sample.png" alt="Global Equities Momentum (GEM) strategy v.s. random portfolios, distribution of Sharpe Ratios, in-sample period 1989-2012" /></a>
<figcaption>Figure 12. Global Equities Momentum (GEM) strategy v.s. random portfolios, distribution of Sharpe Ratios, in-sample period 1989-2012.</figcaption>
</figure>
<p>Pretty amazing, as the <em>GEM</em> Sharpe Ratio is among the best obtainable Sharpe Ratios over the in-sample period!</p>
<p>Let’s now check <em>GEM</em> performances during the out-of-sample period 2013-October 2020 (<a href="https://docs.google.com/spreadsheets/d/1mzus6uIV9JGwELAO_1sAtv1gB9iAX3FQd5LJJ5P61Go/edit?usp=sharing">Google Sheet</a>).</p>
<p>Figure 13 depicts the Sharpe Ratio distribution of 10000 random portfolios simulated<sup id="fnref:47:2" role="doc-noteref"><a href="#fn:47" class="footnote">44</a></sup> within Antonacci’s three-asset universe over that new period, with the
red line again corresponding to the <em>GEM</em> Sharpe Ratio.</p>
<figure>
<a href="/assets/images/blog/gem-sharpe-ratios-out-of-sample.png"><img src="/assets/images/blog/gem-sharpe-ratios-out-of-sample.png" alt="Global Equities Momentum (GEM) strategy v.s. random portfolios, distribution of Sharpe Ratios, out-of-sample period 2013-October 2020" /></a>
<figcaption>Figure 13. Global Equities Momentum (GEM) strategy v.s. random portfolios, distribution of Sharpe Ratios, out-of-sample period 2013-October 2020.</figcaption>
</figure>
<p>Pretty blah this time, with the <em>GEM</em> Sharpe Ratio roughly comparable to the median random portfolio Sharpe Ratio, which by definition exhibits no particular skill…</p>
<p>Such a difference in the <em>GEM</em> Sharpe Ratio <em>relative to its simulated peer group</em><sup id="fnref:4:13" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup> between the in-sample period and the out-of-sample period is puzzling, but analyzing a particular tactical asset
allocation strategy is out of scope of this blog post, so that I need to leave the “why” question unanswered.</p>
<p>The same process can be applied to any trading strategy in order to detect a potential shift in that strategy’s performances, so, don’t hesitate to abuse the computing power available with today’s computers!</p>
<h2 id="conclusion">Conclusion</h2>
<p>Dawson<sup id="fnref:9:7" role="doc-noteref"><a href="#fn:9" class="footnote">8</a></sup> notes that <em>[Monte Carlo analysis] is not a tool that has been readily applied to the investment process in the past, due to the perceived complexity of the problem</em><sup id="fnref:9:8" role="doc-noteref"><a href="#fn:9" class="footnote">8</a></sup>.</p>
<p>Through this blog post, I hope to have demonstrated that random portfolios are not necessarily complex to use, and come with many benefits for performance evaluation.</p>
<p>Feel free to randomly reach out <a href="https://www.linkedin.com/in/roman-rubsamen/">on LinkedIn</a> or <a href="https://twitter.com/portfoliooptim">on Twitter</a>.</p>
<p>–</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:4" role="doc-endnote">
<p>See <a href="https://jpm.pm-research.com/content/32/4/54">Surz, Ronald, A Fresh Look at Investment Performance Evaluation: Unifying Best Practices to Improve Timeliness and Reliability, Journal of Portfolio Management, Vol. 32, No. 4, Summer 2006, pp 54-65</a>. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:4:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:4:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:4:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:4:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a> <a href="#fnref:4:5" class="reversefootnote" role="doc-backlink">↩<sup>6</sup></a> <a href="#fnref:4:6" class="reversefootnote" role="doc-backlink">↩<sup>7</sup></a> <a href="#fnref:4:7" class="reversefootnote" role="doc-backlink">↩<sup>8</sup></a> <a href="#fnref:4:8" class="reversefootnote" role="doc-backlink">↩<sup>9</sup></a> <a href="#fnref:4:9" class="reversefootnote" role="doc-backlink">↩<sup>10</sup></a> <a href="#fnref:4:10" class="reversefootnote" role="doc-backlink">↩<sup>11</sup></a> <a href="#fnref:4:11" class="reversefootnote" role="doc-backlink">↩<sup>12</sup></a> <a href="#fnref:4:12" class="reversefootnote" role="doc-backlink">↩<sup>13</sup></a> <a href="#fnref:4:13" class="reversefootnote" role="doc-backlink">↩<sup>14</sup></a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>Or a trading strategy, or whatever. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>See <a href="https://www.pm-research.com/content/iijinvest/3/2/36">Surz, R. J. 1994. Portfolio opportunity distributions: an innovation in performance evaluation. The Journal of Investing, 3(2): 36-41</a>. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:2:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:2:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>See <a href="https://ppca-inc.com/pdf/accurate-benchmarking.pdf">Surz, Ron. Accurate Benchmarking is Gone But Not Forgotten: The Imperative Need to Get Back to Basics, Journal of Performance Measurement, Vol. 11, No. 3, Spring, pp 34-43</a>. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:3:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:3:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:1" role="doc-endnote">
<p>See <a href="https://www.tandfonline.com/doi/abs/10.1080/10293523.2014.11082564">Roberto Stein, Not fooled by randomness: Using random portfolios to analyse investment funds, Investment Analysts Journal, 43:79, 1-15</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:1:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:1:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:1:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a> <a href="#fnref:1:5" class="reversefootnote" role="doc-backlink">↩<sup>6</sup></a> <a href="#fnref:1:6" class="reversefootnote" role="doc-backlink">↩<sup>7</sup></a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>By construction. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>These <em>can be imposed by the firm that offers the funds, for example in terms of the prospectus and investment goals, or self-imposed trading behavior that the manager maintains over his career</em><sup id="fnref:1:7" role="doc-noteref"><a href="#fn:1" class="footnote">5</a></sup>; these can also be imposed by regulatory bodies or stock exchanges. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/abs/pii/B9780750654487500082">Dawson, R. and R. Young: 2003, Near-uniformly Distributed, Stochastically Generated Portfolios. In: S. Satchell and A. Scowcroft (eds.): Advances in Portfolio Construction and Implementation. Butterworth–Heinemann</a>. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:9:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:9:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:9:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:9:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a> <a href="#fnref:9:5" class="reversefootnote" role="doc-backlink">↩<sup>6</sup></a> <a href="#fnref:9:6" class="reversefootnote" role="doc-backlink">↩<sup>7</sup></a> <a href="#fnref:9:7" class="reversefootnote" role="doc-backlink">↩<sup>8</sup></a> <a href="#fnref:9:8" class="reversefootnote" role="doc-backlink">↩<sup>9</sup></a></p>
</li>
<li id="fn:11" role="doc-endnote">
<p>The $3$-dimensional geometrical object associated to the subset of $\mathbb{R}^3$ modeling long-only and full investment constraints is <a href="https://en.wikipedia.org/wiki/Simplex#Standard_simplex">the standard 2-simplex in $\mathbb{R}^3$</a>. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:11:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:11:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>See <a href="https://link.springer.com/chapter/10.1007/3-540-36626-1_11">Burns, P. (2007). Random Portfolios for Performance Measurement. In: Kontoghiorghes, E.J., Gatu, C. (eds) Optimisation, Econometric and Financial Analysis. Advances in Computational Management Science, vol 9. Springer, Berlin, Heidelberg</a>. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:10:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:10:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:10:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:10:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a> <a href="#fnref:10:5" class="reversefootnote" role="doc-backlink">↩<sup>6</sup></a></p>
</li>
<li id="fn:12" role="doc-endnote">
<p>Such an object is also called a unit simplex or the probabilistic simplex. <a href="#fnref:12" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:13" role="doc-endnote">
<p>See <a href="https://link.springer.com/article/10.1007/s10479-009-0567-7">Onn, S., Weissman, I. Generating uniform random vectors over a simplex with implications to the volume of a certain polytope and to multivariate extremes. Ann Oper Res 189, 331–342 (2011)</a>. <a href="#fnref:13" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:15" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/abs/pii/S0167715299000954?via%3Dihub">Kai-Tai Fang, Zhen-Hai Yang, On uniform design of experiments with restricted mixtures and generation of uniform distribution on some domains, Statistics & Probability Letters, Volume 46, Issue 2, 2000, Pages 113-120</a>. <a href="#fnref:15" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:17" role="doc-endnote">
<p>See <a href="https://www.tandfonline.com/doi/abs/10.1080/03610918408812382">Paul A. Rubin (1984) Generating random points in a polytope, Communications in Statistics - Simulation and Computation, 13:3, 375-396</a>. <a href="#fnref:17" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:18" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/abs/pii/S0377221714005396">Gert van Valkenhoef, Tommi Tervonen, Douwe Postmus, Notes on “Hit-And-Run enables efficient weight generation for simulation-based multiple criteria decision analysis”, European Journal of Operational Research, Volume 239, Issue 3, 2014, Pages 865-867</a>. <a href="#fnref:18" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:22" role="doc-endnote">
<p>The <a href="https://en.wikipedia.org/wiki/Rejection_sampling">rejection method</a> is not a reasonable approach because <em>the probability of a portfolio being accepted is generally extremely small when realistic constraints are in place</em><sup id="fnref:10:6" role="doc-noteref"><a href="#fn:10" class="footnote">10</a></sup>. <a href="#fnref:22" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:20" role="doc-endnote">
<p>Due to the nature of genetic algorithms. <a href="#fnref:20" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:19" role="doc-endnote">
<p>Within a given numerical tolerance. <a href="#fnref:19" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:23" role="doc-endnote">
<p>See <a href="https://www.tandfonline.com/doi/abs/10.1080/14697688.2010.513337">Francesco Lisi (2011) Dicing with the market: randomized procedures for evaluation of mutual funds, Quantitative Finance, 11:2, 163-172</a>. <a href="#fnref:23" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:23:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:23:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:23:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:23:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a> <a href="#fnref:23:5" class="reversefootnote" role="doc-backlink">↩<sup>6</sup></a> <a href="#fnref:23:6" class="reversefootnote" role="doc-backlink">↩<sup>7</sup></a> <a href="#fnref:23:7" class="reversefootnote" role="doc-backlink">↩<sup>8</sup></a></p>
</li>
<li id="fn:30" role="doc-endnote">
<p>Some authors like Kritzman and Page<sup id="fnref:31:4" role="doc-noteref"><a href="#fn:31" class="footnote">30</a></sup> consider the usage of random portfolios to be a bootstrap simulation process, and not a Monte Carlo one. <a href="#fnref:30" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:24" role="doc-endnote">
<p>To be noted that the universe of assets, constraints and rules might perfectly be time-dependent; for example, at a given point in time, the universe of assets might be completely different from that at an earlier or later point in time. <a href="#fnref:24" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:25" role="doc-endnote">
<p>The frontier between the universe of assets and the rebalancing rules might not always be perfectly clear; at heart, the rebalancing rules must model the trading behaviour of the fund. <a href="#fnref:25" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:29" role="doc-endnote">
<p>See <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=75871">Kothari, S.P. and Warner, Jerold B., Evaluating Mutual Fund Performance (August 1997)</a>. <a href="#fnref:29" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:32" role="doc-endnote">
<p>See <a href="https://www.pm-research.com/content/iijwealthmgmt/7/4/78">Ronald J. Surz, Testing the Hypothesis “Hedge Fund Performance Is Good”, The Journal of Wealth Management, Spring 2005, 7 (4) 78-83</a>. <a href="#fnref:32" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:32:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:32:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:32:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a></p>
</li>
<li id="fn:26" role="doc-endnote">
<p>See <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=630123">Burns, Patrick J., Performance Measurement Via Random Portfolios (December 2, 2004)</a>. <a href="#fnref:26" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:26:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:26:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:28" role="doc-endnote">
<p>See <a href="https://www.pm-research.com/content/iijpormgmt/39/4/91">Robert D. Arnott, Jason Hsu, Vitali Kalesnik, Phil Tindall, The Surprising Alpha From Malkiel’s Monkey and Upside-Down Strategies, The Journal of Portfolio Management, Summer 2013, 39 (4) 91-105</a>. <a href="#fnref:28" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:28:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:28:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:28:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a></p>
</li>
<li id="fn:27" role="doc-endnote">
<p>See <a href="https://doi.org/10.1093/biomet/asx076">N A Heard, P Rubin-Delanchy, Choosing between methods of combining p-values, Biometrika, Volume 105, Issue 1, March 2018, Pages 239–246</a>. <a href="#fnref:27" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:21" role="doc-endnote">
<p>I have no affiliation. <a href="#fnref:21" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:21:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:33" role="doc-endnote">
<p>See <a href="https://doi.org/10.1057/jam.2016.3">Molyboga, M., Ahelec, C. A simulation-based methodology for evaluating hedge fund investments. J Asset Manag 17, 434–452 (2016)</a>. <a href="#fnref:33" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:31" role="doc-endnote">
<p>See <a href="https://www.pm-research.com/content/iijpormgmt/29/4/11">Kritzman, Mark and Sébastien Page (2003), The Hierarchy of Investment Choice, Journal of Portfolio Management 29, number 4, pages 11-23.</a>. <a href="#fnref:31" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:31:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:31:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:31:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:31:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a></p>
</li>
<li id="fn:34" role="doc-endnote">
<p>See <a href="https://www.pm-research.com/content/iijpormgmt/31/1/118">Staub, R. (2004). The Hierarchy of Investment Choice. The Journal of Portfolio Management, 31(1), 118–123</a>. <a href="#fnref:34" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:35" role="doc-endnote">
<p>C.f. <a href="https://www.msci.com/documents/10199/178e6643-6ae6-47b9-82be-e1fc565ededb">the MSCI World index factsheet</a>. <a href="#fnref:35" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:38" role="doc-endnote">
<p>If future asset prices are available, for example thanks to <a href="/blog/bootstrap-simulation-with-portfolio-optimizer-usage-for-financial-planning/">a bootstrap simulation</a>, nothing prevents a returns dispersion analysis to integrate them. <a href="#fnref:38" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:36" role="doc-endnote">
<p>In more details, the methodology is as follows: 1) Gross USD monthly price data for all the 23 countries represented in the MSCI World index has been collected from <a href="https://www.msci.com/end-of-day-data-country">the MSCI website</a>, 2) The MSCI World <a href="/blog/index-tracking-reproducing-the-performance-of-a-financial-market-index-and-more/">tracking portfolio</a> - exhibiting a nearly null tracking error - has been computed over the period 31 December 2008 - 29 December 2023, which gives reference weights for the 23 countries represented in the MSCI World (e.g., United States ~50% and France ~5%), 3) Using <strong>Portfolio Optimizer</strong>, the evolution of 10000 random portfolios has been simulated over the period 31 December 2008 - 29 December 2023, these portfolios being a) constrained so that all country weights are positive, sum to one and all country weights except these for the United States and France are kept constant v.s. their reference weights and b) monthly rebalanced toward random portfolios in order to encompass any possible tilting, 4) The annualized return of each of these 10000 random portfolios has been computed. <a href="#fnref:36" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:37" role="doc-endnote">
<p>Which, in addition, is near the top of the achievable annualized returns! <a href="#fnref:37" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:45" role="doc-endnote">
<p>To be noted that in practice, this 95th quantile would probably need to be replaced by the 50th quantile because <em>the benchmark return [should] always ranks median</em><sup id="fnref:4:14" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup>. <a href="#fnref:45" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:42" role="doc-endnote">
<p>(Adjusted) prices have have been retrieved using <a href="https://api.tiingo.com/">Tiingo</a>. <a href="#fnref:42" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:39" role="doc-endnote">
<p>See <a href="https://ssrn.com/abstract=962461">Faber, Meb, A Quantitative Approach to Tactical Asset Allocation (February 1, 2013). The Journal of Wealth Management, Spring 2007</a>. <a href="#fnref:39" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:39:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:39:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:41" role="doc-endnote">
<p>In more details, using <strong>Portfolio Optimizer</strong>, the evolution of 25000 random portfolios has been simulated over the considered period, these portfolios being a) constrained so that all weights are positive and sum to a random exposure between 0% and 100% and b) monthly rebalanced toward random portfolios in order to encompass any possible tactical allocation. <a href="#fnref:41" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:48" role="doc-endnote">
<p>Which justifies <em>a posteriori</em> the use of the buy and hold portfolio as a benchmark. <a href="#fnref:48" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:40" role="doc-endnote">
<p>As a side note, Figure 9 highlights that an equal-weighted buy and hold portfolio is a tough benchmark to beat, c.f. DeMiguel et al.<sup id="fnref:43" role="doc-noteref"><a href="#fn:43" class="footnote">45</a></sup>! <a href="#fnref:40" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:44" role="doc-endnote">
<p>Long-only and fully invested. <a href="#fnref:44" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:44:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:46" role="doc-endnote">
<p>Gary Antonacci, Dual Momentum Investing: An Innovative Strategy for Higher Returns With Lower Risk <a href="#fnref:46" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:46:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:47" role="doc-endnote">
<p>In more details, using <strong>Portfolio Optimizer</strong>, the evolution of 10000 random portfolios has been simulated over the considered period, these portfolios being a) constrained so that all weights are positive and sum to a 100% and b) monthly rebalanced toward random portfolios in order to encompass any possible tactical allocation. <a href="#fnref:47" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:47:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:47:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:43" role="doc-endnote">
<p>See <a href="https://ssrn.com/abstract=1376199">DeMiguel, Victor and Garlappi, Lorenzo and Uppal, Raman, Optimal Versus Naive Diversification: How Inefficient is the 1/N Portfolio Strategy? (May 2009). The Review of Financial Studies, Vol. 22, Issue 5, pp. 1915-1953, 2009</a>. <a href="#fnref:43" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Roman R.As noted in Surz1, the question “Is [a mutual fund’s]2 performance good?” can only be answered relative to something1, typically by comparing that fund to a benchmark like a financial index or to a peer group. Unfortunately, these two methodologies are not without issues. For example, it is very difficult to create an index captur[ing] the essence of the people, process, and philosophy behind an investment product3 and peer groups have well-known biases1 (classification bias, survivorship bias…). In a series of papers (Surz3, Surz4, Surz1…), Ronald J. Surz proposes an innovation that combines the better aspects of both [methodologies] while eliminating their undesirable properties3. This innovation consists in evaluating a fund against the fund manager’s true opportunity set5, defined as the set of all of the possible portfolios that the manager could have conceivably held following his unique investment process4. In practice, the fund manager’s opportunity set is approximated by the simulation of thousands of random portfolios in the same universe of assets as the one of the fund manager and satisfying the same constraints (long-only, long-short…) and rules (portfolio rebalancing rules…) as those of the fund manager. Then, because these portfolios do not exhibit any particular skill6, they can be used as the control group to test [the] fund manager skill5, thus allowing to apply modern statistics to the problem of performance evaluation4. In this blog post, I will describe how to generate random portfolios, detail Surz’s original methodology as well as some of its variations and illustrate the usage of random portfolios with a couple of examples like the creation of synthetic benchmarks or the evaluation and monitoring of trading strategies. Mathematical preliminaries Definition Let be: $n$ the number of assets in a universe of assets $\mathcal{C} \subset \mathbb{R}^{n}$ a subset of $ \mathbb{R}^{n}$ representing the constraints imposed7 on a fund manager when investing in that universe of assets, for example: Short sale constraints Concentration constraints (assets, sectors, industries…) Leverage constraints Cardinality constraints (i.e., minimum or maximum number of assets constraints) Portfolio volatility constraints Portfolio tracking error constraints … Then, a (constrained) random portfolio in that universe of assets is a vector of portfolio weights $w \in \mathbb{R}^{n}$ generated at random over the set $\mathcal{C}$. Generation at random v.s. generation uniformly at random In the context of performance evaluation, as in Surz’s methodology, it is theoretically preferable that random portfolios are generated uniformly at random over the set $\mathcal{C}$, so as not to introduce any biases8 which would otherwise defeat the purpose of using these portfolios as an unbiased control group3. Figure 1 and Figure 2 illustrate the difference between random portfolios generated at random v.s. uniformly at random in the case of a three-asset universe subject to long-only and full investment constraints9: In Figure 1, the random portfolios are visibly concentrated in the middle of the standard 2-simplex in $\mathbb{R}^3$9 - these random portfolios are NOT generated uniformly at random In Figure 2, the random portfolios seem to be “well spread” over the standard 2-simplex in $\mathbb{R}^3$9 - these random portfolios are generated uniformly at random Figure 1. Random portfolios not generated uniformly at random over a standard simplex, three-asset universe. Source: Shaw. Figure 2. Random portfolios generated uniformly at random over a standard simplex, three-asset universe. Source: Shaw. One important remark, though, is that real-life portfolios are usually binding on at least one of their constraints, so that generating random portfolios biased toward the boundary of the geometrical object associated to the constraints set $\mathcal{C}$ might not be a real problem in practice10. Generation of random portfolios over the standard simplex When the constraints imposed on a portfolio are 1) a full investment constraint and 2) a long-only constraint, the subset $C$ is then equal to \[C = \{w \in \mathbb{R}^{n} \textrm{ s.t. } \sum_{i=1}^n w_i = 1, w_i \geq 0, i = 1..n\}\] and the geometrical object associated to that subset is called11 a standard simplex, already illustrated in Figure 1 and Figure 2 with $n = 3$. Several algorithms exist to generate points uniformly at random over a standard simplex12, among which: An algorithm based on differences of sorted uniform random variables An algorithm based on normalized unit exponential random variables Generation of random portfolios over the restricted standard simplex When the constraints imposed on a portfolio are 1) a full investment constraint, 2) a long-only constraint and 3) minimum/maximum asset weights constraints, the subset $C$ is then equal to \[C = \{w \in \mathbb{R}^{n} \textrm{ s.t. } \sum_{i=1}^n w_i = 1, 1 \geq u_i \geq w_i \geq l_i \geq 0, i = 1..n\}\] , where: $l_1, …, l_n$ represent minimum asset weights constraints $u_1, …, u_n$ represent maximum asset weights constraints The geometrical object associated to that subset is a “restricted” standard simplex, as illustrated in Figure 3, with: $n = 3$ $ 0.7 \geq w_1 \geq 0.1$ $ 0.8 \geq w_2 \geq 0 $ $ 0.6 \geq w_3 \geq 0.1 $ Figure 3. Example of restricted standard simplex, three-asset universe. Source: Piepel. Generating points uniformly at random over a restricted standard simplex is much more complex than generating points uniformly at random over a standard simplex. Hopefully, this problem has been studied at least since the 1990’s by people working in the statistical domain of the design of experiments, and an algorithm based on the conditional distribution method has been published13 in 2000. Generation of random portfolios over a convex polytope When the constraints imposed on a portfolio consist in generic linear constraints, the subset $C$ is then equal to \[C = \{w \in \mathbb{R}^{n} \textrm{ s.t. } A_e w = b_e, A_i w \leq b_i \}\] , where: $A_e \in \mathbb{R}^{n_e \times n}$ and $b_e \in \mathbb{R}^{n_e}$, $n_e \geq 1$, represent linear equality constraints $A_i \in \mathbb{R}^{n_i \times n}$ and $b_i \in \mathbb{R}^{n_i}$, $n_i \geq 1$, represent linear inequality constraints The geometrical object associated to that subset is called a convex polytope, illustrated in Figure 4, with: $n = 3$ $\sum_{i=1}^n w_i = 1$ $1 \geq w_i \geq 0, i = 1..n$ $w_1 > w_2$ $2w_3 > w_2 > 0.5w_3$ Figure 4. Example of standard simplex with additional linear inequality constraints, three-asset universe. Source: Tervonen et al. As one can guess, generating points uniformly at random over a convex polytope is another level higher in terms of complexity, and while some algorithms exist to do so14, they are impractical in high dimension. From the literature, what is possible to achieve instead is to generate points asymptotically uniformly at random, using Markov chain Monte Carlo (MCMC) algorithms like the Hit-And-Run algorithm15. Generation of random portfolios over a generic constraint set When the constraints imposed on a portfolio are generic, that is, quadratic (volatility or tracking error constraints) and/or non-convex (threshold constraints) and/or integer (maximum number of assets constraints, round lot constraints), the final boss has arrived. With such constraints, the only reasonable16 approach in the literature seems to recast the problem of generating random points over the constraint set $C$ as a (penalized) optimization problem and use genetic algorithms to solve it10. The underlying idea is to randomly17 minimize an objective function $f$ essentially made of penalties - the higher the distance of a point $w \in \mathbb{R}^{n}$ from the constraint set $C$, the higher the value of the objective function $f(w)$ -, so that at optimum, a random point satisfying all the constraints18 is found. Of course, the points generated this way are not generated uniformly at random, but when the constraints are fully generic, that requirement should probably be dropped altogether as mentioned in Dawson and Young8. Investment fund’s performance evaluation with random portfolios Rationale Evaluating the performance of an investment fund is actually a statistical hypothesis test in disguise19, in which: The null hypothesis is The investment fund’s exhibit no particular “performance” over the considered time period The test statistic is a quantitative measure of “performance” over the considered time period (e.g. annualized return, holding period return, risk-adjusted return…) The (empirical) distribution of the test statistic under the null hypothesis is computed from a sample made of either One observation - the investment fund’s benchmark A small number of observations - the investment fund’s peer group Any desired number of observations - random portfolios generated from the investment fund’s universe of assets and obeying to the fund’s constraints and rules From this perspective, using a benchmark, a peer group or random portfolios for performance evaluation is essentially a choice between sampling approaches1. Still, random portfolios represent a more rigorous approach to performance evaluation than the two other alternatives, for various reasons highlighted in different papers181019 and summarized by Dawson8 as follows A set of uniformly distributed, stochastically generated, portfolios that by construction incorporate no investment strategy, bias or skill form an effective control set for any portfolio measurement metric. This allows the true value of a strategy or model to be identified. They also provide a mechanism to differentiate between effects due to “market conditions” and effects due to either the management of a portfolio, or the constraints the management is obliged to work within. Surz’s methodology Using random portfolios to evaluate an investment fund’s performance over a considered time period is a two-step Monte Carlo simulation process20: Modeling step Identify the fund’s main characteristics. Computational step Random portfolios simulation Generate random portfolios compatible with the identified fund’s main characteristics and simulate their evolution through the considered time period. Performance evaluation Determine the level of statistical significance of the fund’s performance over the considered time period using the previously simulated random portfolios. Step 1 - Identifying the fund’s main characteristics Dawson8 note that an investment strategy is essentially a set of constraints [and rules] that are carefully constructed to (hopefully) remove, in aggregate, poor performing regions of the solution space8. Thus, it is very important to identify as precisely as possible the main characteristics of the fund, even though this […] information is not always available, as only the manager knows [it] exactly19. From a practical perspective, these characteristics need to be provided as21: A universe of $n$ assets modeling the fund’s universe of assets A constraint set $\mathcal{C} \subset \mathbb{R}^{n}$ modeling the fund’s constraints An ensemble of decision rules modeling22 the fund’s rebalancing rules For examples of this step in different contexts, c.f. Kothari and Warner23 and Surz24. Step 2a - Simulating random portfolios Thanks to the previous step, it becomes possible to simulate thousands of random portfolios that the fund’s manager could have potentially held over the considered time period. The only difficulty at this point is computational, due to the impact of the constraint set on the algorithmic machinery required to generate portfolio weights at random. As a side note, and for a fair comparison [with the fund under analysis], transaction costs, as well as all other kinds of costs, must be considered10 when simulating random portfolios. Step 2b - Evaluating the fund’s performance The ranking of the fund’s performance against the performance of all the simulated random portfolios provides a direct measure of the statistical significance of that fund’s performance1. Indeed, under the null hypothesis that the investment fund’s exhibit no particular performance over the considered time period, the $p$-value of the statistical hypothesis test mentioned at the beginning of this section is defined as25 \[p = \frac{n_x + 1}{N + 1}\] , where: $n_x$ is the the number of random portfolios whose performance over the considered time period is as extreme or more extreme than that of the fund under evaluation $N$ is the number of simulated random portfolios This idea is illustrated in Figure 5, which depicts: The distribution of the holding period return of 1000 random portfolios made of stocks belonging to the S&P 500 and subject to misc. turnover and cardinality constraints (solid line) The 95th percentile of that distribution (dashed line) Figure 5. Distribution of the holding period return of random portfolios simulated from stocks belonging to the S&P 500, 2005 - 2010. Source: Stein. From this figure, if we were to evaluate the performances of a fund trading these securities and operating under constraints similar to those simulated, [it] would have to obtain a return equal of at least 80% during the 6 year period for the null of no particular performance to be rejected5 at a 95% confidence level. Variations on Surz’s methodology Burns1025 and Lisi19 both extend Surz’s methodology from one time period to several time periods - which allows to consider the persistence of results over time19 - by describing different ways of combining individual $p$-values26. As a by-product, Burns1025 empirically demonstrates that Stouffer’s method27 for combining $p$-values should be preferred to Fisher’s when evaluating a fund’s performance over multiple time periods. Stein5 proposes to replace Surz’s methodology by testing whether the distribution of returns of the [fund under evaluation] is stochastically greater than that of [a] chosen percentile random fund using the Mann–Whitney U test. Caveats Surz1 warns that there are many ways to implement a [Monte Carlo simulation] approach [with random portfolios], some better than others, and some worse1. In particular, Surz1 argues that, when selecting a number of assets at random to satisfy portfolio cardinality constraints, it is required to use value-weighted sampling, so that the probability of choosing a given [asset] is proportionate to its outstanding capitalization1. Otherwise, some macroeconomic [in]consistency1 would be introduced by the Monte Carlo simulation process, which would ultimately bias the end results. This point is confirmed for both U.S. and global stocks in Arnott et al.26, who show that random portfolios introduce, often unintentionally, value and small cap tilts26. Nevertheless, other authors19 argue on the contrary that using equally-weighted sampling is more representative of the behavior of an “unskilled” manager, which is exactly what random portfolios are supposed to model! More generally, random portfolios might have “unfair” characteristics v.s. the fund under evaluation (higher volatility, higher beta, lower turnover…), which must either be acknowledged or controlled for depending on the exact circumstances. Implementations Implementation in Portfolio Optimizer Through the endpoint /portfolios/simulation/random, Portfolio Optimizer allows to: Generate the weights of a random portfolio subject to general linear inequality constraints Simulate the evolution of a random portfolio subject to general linear inequality constraints over a specified time period, with 3 different portfolio rebalancing strategies: Buy and hold Rebalancing toward the weights of the initial random portfolio Rebalancing toward the weights of a new random portfolio Implementations elsewhere I am aware of two commercial implementations of random portfolios for performance evaluation as described in this blog post: Portfolio Probe, from Patrick Burns28 of Burns Statistics PIPODs, from Ronald Surz28 of PPCA Inc., which seems discontinued Examples of usage Hedge funds performance evaluation The performances of hedge funds are notoriously difficult to evaluate due to the lack of both proper benchmarks and homogeneous peer groups24. Random portfolios offer a solution to this problem because they can reflect the unique specifications of each individual hedge fund24. More details can be found in Surz24 or in Molyboga and Ahelec29. Investment strategy returns dispersion analysis Kritzman and Page30 uses random portfolios to compare the relative importance of different investment strategies (investment at global asset class level, at individual stock level, etc.) and concludes that30 Contrary to perceived doctrine, dispersion around average performance arising from security selection is substantially greater than dispersion around average performance arising from all other investment choices. Moreover, asset allocation, widely considered the most important investment choice, produces the least dispersion; thus, from a normative perspective it is the least important investment choice. Figure 6, which quantifies that dispersion around average performance, shows the extent to which a talented investor (top 25th or 5th percentile) could have improved upon average performance by engaging in various investment choices across a global universe. It also shows how far below average an unlucky investor (bottom 75th or 95th percentile) could have performed, depending on the choice of investment discretion30. Figure 6. 5th, 25th, 75th, and 95th percentile of the annualized difference from average performance of random portfolios simulated from misc. global asset classes, 1987-2001. Source: Kritzman and Page. While this conclusion has been criticized31, the underlying methodology - returns dispersion analysis - provides valuable insight to investors in that it allows to understand the potential impact of [any] investment choice, irrespective of investment behavior30. For example, let’s suppose a French investor would like to passively invest in an MSCI World ETF, but is worried by both32: The massive ~70% weight of the United States in that index The ridiculous ~3% weight of his home country in that index Thus, this investor would rather like to invest in an MSCI World ETF “tilted” away from the United States and toward France, although he has no precise idea about how to implement such a tilt. Here, returns dispersion analysis through random portfolios can help our investor understand the impact on performances of his proposed tactical deviation from the baseline investment strategy, at least historically33. For this, Figure 7 depicts the range of annualized returns achieved over the period 31 December 2008 - 29 December 2023 by 10000 random portfolios invested in the 23 countries of the MSCI World index when34: The weights of the United States and France are made randomly varying between 0% and 100% The weights of all the other countries are kept constant v.s. their reference weights Figure 7. Quartiles of the annualized returns of random portfolios from countries included in the MSCI World tilted away from the U.S. and toward France, 31 December 2008 - 29 December 2023. It is clear from from Figure 7 that deviating from the MSCI World index by under-weighting the United States and over-weighting France has been a relatively bad strategy over the considered period, with a median annualized return of ~8.30% compared to an annualized return of ~11.30% for the MSCI World35! Whether history will repeat itself remains to be seen, but thanks to this historical returns dispersion analysis, our investor is at least informed to take an educated decision re. his proposed investment strategy. Synthetic benchmark construction Stein5 proposes to use random portfolios in order to construct a “synthetic” benchmark, representative of all possible investment strategies within a given universe of assets and subject to a given set of constraints and rules. In more details, Stein5 proposes to construct such a benchmark directly from the time series of returns of a [well chosen] single random portfolio5, which is typically the median1 random portfolio w.r.t. a chosen performance measure. A similar idea is discussed in Lisi19, in which such a benchmark is this time constructed from the cross-sectional returns of the random portfolios. Figure 8 illustrates Lisi’s methodology; on it, the green line represents a synthetic benchmark made of the 95th percentile36 of the distribution of the random portfolios holding period returns at each time $t$ Figure 8. Time series of the 95th percentile of cross-sectional holding period returns of random portfolios. Source: Lisi. The possibility to create synthetic benchmarks is particularly useful when evaluating the performances of tactical asset allocation (TAA) strategies, because their allocation might completely change from one period to another, making their comparison with a static benchmark - like a buy and hold portfolio or a 60/40 stock-bond portfolio - difficult to justify a priori. For example, Figure 9 compares over the period 30th November 2006 - 31th January 202437: The Global Tactical Asset Allocation (GTTA)38 strategy of Mebane Faber, which invests monthly - depending on the quantitative rules described in Faber38 - within a five-asset universe made of: U.S. Equities, represented by the SPY ETF International Equities, represented by the EFA ETF Intermediate-Term U.S. Treasury Bonds, represented by the IEF ETF U.S. Real Estate, represented by the VNQ ETF Global Commodities, represented by the DBC ETF An (equal-weighted) buy and hold portfolio within Faber’s five-asset universe, which is the benchmark of the GTTA strategy proposed in Faber38 A synthetic benchmark à la Lisi19 constructed from the median cross-sectional holding period returns of 25000 random portfolios simulated39 within Faber’s five-asset universe Figure 9. Global Tactical Asset Allocation (GTTA) strategy v.s. misc. benchmarks, 30th November 2006 - 31th January 2024. In Figure 9, the buy and portfolio and the synthetic benchmark both appears to behave quite similarly40, but the synthetic benchmark should theoretically be preferred for performance comparison purposes because it better reflects the dynamics of the GTTA strategy (varying asset weights, varying exposure) while guaranteeing the complete absence of skill41. Trading strategy evaluation Random portfolios also find applications in evaluating (quantitative) trading strategies. One of these applications, described in Dawson8, consists in evaluating the significance of an alpha signal through the comparison of two different samples of random portfolios: It is […] very difficult to show conclusively the effect that a model, strategy […] has on the real investment process. […] The ideal solution would be to generate a set of portfolios, constrained in the same way as the portfolio(s) built with the theory or model under investigation, only without the information produced by the theory. It would then be possible to compare the distributions of the portfolio characteristics with and without the added information from the new theory, giving strong statistical evidence of the effects of the new information. As a practical illustration, let’s come back on Faber’s GTTA strategy introduced in the previous subsection and compare: A sample of 10000 random portfolios simulated within Faber’s five-asset universe, rebalanced each month toward a random portfolio invested42 in all the assets selected for investment by the GTTA strategy rules A sample of 10000 random portfolios simulated within Faber’s five-asset universe, rebalanced each month toward a random portfolio invested42 in all the assets NOT selected for investment by the GTTA strategy rules The resulting ranges of annualized returns over the period 30th November 2006 - 31th January 2024 are displayed in Figure 10, which highlights a clear under-performance of the second sample of random portfolios. Figure 10. Quartiles of the annualized returns of the Global Tactical Asset Allocation (GTTA) strategy original asset selection rules v.s. inverted asset selection rules using random portfolios for weighting the selected assets, 30th November 2006 - 31th January 2024. This kind of under-performance is exactly what is expected under the hypothesis that the GTTA strategy asset selection rules correspond to a true alpha signal. Indeed, as Arnott et al.26 puts it: In inverting the strategies, we tacitly examine whether these strategies outperform because they are predicated on meaningful investment theses and deep insights on capital markets, or for reasons unrelated to the investment theses. If the investment beliefs are the source of outperformance, then contradicting those beliefs should lead to underperformance. Trading strategy monitoring Going beyond trading strategy evaluation, random portfolios can also be used to monitor how the performances of a trading strategy differ between in-sample and out-of-sample periods. I propose to illustrate the associated process with another tactical asset allocation strategy, the Global Equities Momentum (GEM)43 strategy of Gary Antonacci, which invests monthly - depending on the quantitative rules described in Antonacci43 - in one asset among a three-asset universe made of: U.S. Equities, represented by the S&P 500 Index International Equities, represented by the MSCI ACWI ex-USA Index U.S. Bonds, represented by the Barclays Capital US Aggregate Bond Index Because this strategy was published in 2013, let’s first check GEM performances during the in-sample period 1989-2012 (Google Sheet). Figure 11 depicts the equity curves of a couple of random portfolios simulated44 within Antonacci’s three-asset universe (in solid) v.s. the GEM equity curve (in dashed) over that period. Figure 11. Global Equities Momentum (GEM) strategy v.s. random portfolios, in-sample period 1989-2012. For the real thing, Figure 12 depicts the Sharpe Ratio distribution of 10000 random portfolios simulated44 within Antonacci’s three-asset universe over that period, with the red line corresponding to the GEM Sharpe Ratio. Figure 12. Global Equities Momentum (GEM) strategy v.s. random portfolios, distribution of Sharpe Ratios, in-sample period 1989-2012. Pretty amazing, as the GEM Sharpe Ratio is among the best obtainable Sharpe Ratios over the in-sample period! Let’s now check GEM performances during the out-of-sample period 2013-October 2020 (Google Sheet). Figure 13 depicts the Sharpe Ratio distribution of 10000 random portfolios simulated44 within Antonacci’s three-asset universe over that new period, with the red line again corresponding to the GEM Sharpe Ratio. Figure 13. Global Equities Momentum (GEM) strategy v.s. random portfolios, distribution of Sharpe Ratios, out-of-sample period 2013-October 2020. Pretty blah this time, with the GEM Sharpe Ratio roughly comparable to the median random portfolio Sharpe Ratio, which by definition exhibits no particular skill… Such a difference in the GEM Sharpe Ratio relative to its simulated peer group1 between the in-sample period and the out-of-sample period is puzzling, but analyzing a particular tactical asset allocation strategy is out of scope of this blog post, so that I need to leave the “why” question unanswered. The same process can be applied to any trading strategy in order to detect a potential shift in that strategy’s performances, so, don’t hesitate to abuse the computing power available with today’s computers! Conclusion Dawson8 notes that [Monte Carlo analysis] is not a tool that has been readily applied to the investment process in the past, due to the perceived complexity of the problem8. Through this blog post, I hope to have demonstrated that random portfolios are not necessarily complex to use, and come with many benefits for performance evaluation. Feel free to randomly reach out on LinkedIn or on Twitter. – See Surz, Ronald, A Fresh Look at Investment Performance Evaluation: Unifying Best Practices to Improve Timeliness and Reliability, Journal of Portfolio Management, Vol. 32, No. 4, Summer 2006, pp 54-65. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 Or a trading strategy, or whatever. ↩ See Surz, R. J. 1994. Portfolio opportunity distributions: an innovation in performance evaluation. The Journal of Investing, 3(2): 36-41. ↩ ↩2 ↩3 ↩4 See Surz, Ron. Accurate Benchmarking is Gone But Not Forgotten: The Imperative Need to Get Back to Basics, Journal of Performance Measurement, Vol. 11, No. 3, Spring, pp 34-43. ↩ ↩2 ↩3 See Roberto Stein, Not fooled by randomness: Using random portfolios to analyse investment funds, Investment Analysts Journal, 43:79, 1-15. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 By construction. ↩ These can be imposed by the firm that offers the funds, for example in terms of the prospectus and investment goals, or self-imposed trading behavior that the manager maintains over his career5; these can also be imposed by regulatory bodies or stock exchanges. ↩ See Dawson, R. and R. Young: 2003, Near-uniformly Distributed, Stochastically Generated Portfolios. In: S. Satchell and A. Scowcroft (eds.): Advances in Portfolio Construction and Implementation. Butterworth–Heinemann. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 The $3$-dimensional geometrical object associated to the subset of $\mathbb{R}^3$ modeling long-only and full investment constraints is the standard 2-simplex in $\mathbb{R}^3$. ↩ ↩2 ↩3 See Burns, P. (2007). Random Portfolios for Performance Measurement. In: Kontoghiorghes, E.J., Gatu, C. (eds) Optimisation, Econometric and Financial Analysis. Advances in Computational Management Science, vol 9. Springer, Berlin, Heidelberg. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 Such an object is also called a unit simplex or the probabilistic simplex. ↩ See Onn, S., Weissman, I. Generating uniform random vectors over a simplex with implications to the volume of a certain polytope and to multivariate extremes. Ann Oper Res 189, 331–342 (2011). ↩ See Kai-Tai Fang, Zhen-Hai Yang, On uniform design of experiments with restricted mixtures and generation of uniform distribution on some domains, Statistics & Probability Letters, Volume 46, Issue 2, 2000, Pages 113-120. ↩ See Paul A. Rubin (1984) Generating random points in a polytope, Communications in Statistics - Simulation and Computation, 13:3, 375-396. ↩ See Gert van Valkenhoef, Tommi Tervonen, Douwe Postmus, Notes on “Hit-And-Run enables efficient weight generation for simulation-based multiple criteria decision analysis”, European Journal of Operational Research, Volume 239, Issue 3, 2014, Pages 865-867. ↩ The rejection method is not a reasonable approach because the probability of a portfolio being accepted is generally extremely small when realistic constraints are in place10. ↩ Due to the nature of genetic algorithms. ↩ Within a given numerical tolerance. ↩ See Francesco Lisi (2011) Dicing with the market: randomized procedures for evaluation of mutual funds, Quantitative Finance, 11:2, 163-172. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 Some authors like Kritzman and Page30 consider the usage of random portfolios to be a bootstrap simulation process, and not a Monte Carlo one. ↩ To be noted that the universe of assets, constraints and rules might perfectly be time-dependent; for example, at a given point in time, the universe of assets might be completely different from that at an earlier or later point in time. ↩ The frontier between the universe of assets and the rebalancing rules might not always be perfectly clear; at heart, the rebalancing rules must model the trading behaviour of the fund. ↩ See Kothari, S.P. and Warner, Jerold B., Evaluating Mutual Fund Performance (August 1997). ↩ See Ronald J. Surz, Testing the Hypothesis “Hedge Fund Performance Is Good”, The Journal of Wealth Management, Spring 2005, 7 (4) 78-83. ↩ ↩2 ↩3 ↩4 See Burns, Patrick J., Performance Measurement Via Random Portfolios (December 2, 2004). ↩ ↩2 ↩3 See Robert D. Arnott, Jason Hsu, Vitali Kalesnik, Phil Tindall, The Surprising Alpha From Malkiel’s Monkey and Upside-Down Strategies, The Journal of Portfolio Management, Summer 2013, 39 (4) 91-105. ↩ ↩2 ↩3 ↩4 See N A Heard, P Rubin-Delanchy, Choosing between methods of combining p-values, Biometrika, Volume 105, Issue 1, March 2018, Pages 239–246. ↩ I have no affiliation. ↩ ↩2 See Molyboga, M., Ahelec, C. A simulation-based methodology for evaluating hedge fund investments. J Asset Manag 17, 434–452 (2016). ↩ See Kritzman, Mark and Sébastien Page (2003), The Hierarchy of Investment Choice, Journal of Portfolio Management 29, number 4, pages 11-23.. ↩ ↩2 ↩3 ↩4 ↩5 See Staub, R. (2004). The Hierarchy of Investment Choice. The Journal of Portfolio Management, 31(1), 118–123. ↩ C.f. the MSCI World index factsheet. ↩ If future asset prices are available, for example thanks to a bootstrap simulation, nothing prevents a returns dispersion analysis to integrate them. ↩ In more details, the methodology is as follows: 1) Gross USD monthly price data for all the 23 countries represented in the MSCI World index has been collected from the MSCI website, 2) The MSCI World tracking portfolio - exhibiting a nearly null tracking error - has been computed over the period 31 December 2008 - 29 December 2023, which gives reference weights for the 23 countries represented in the MSCI World (e.g., United States ~50% and France ~5%), 3) Using Portfolio Optimizer, the evolution of 10000 random portfolios has been simulated over the period 31 December 2008 - 29 December 2023, these portfolios being a) constrained so that all country weights are positive, sum to one and all country weights except these for the United States and France are kept constant v.s. their reference weights and b) monthly rebalanced toward random portfolios in order to encompass any possible tilting, 4) The annualized return of each of these 10000 random portfolios has been computed. ↩ Which, in addition, is near the top of the achievable annualized returns! ↩ To be noted that in practice, this 95th quantile would probably need to be replaced by the 50th quantile because the benchmark return [should] always ranks median1. ↩ (Adjusted) prices have have been retrieved using Tiingo. ↩ See Faber, Meb, A Quantitative Approach to Tactical Asset Allocation (February 1, 2013). The Journal of Wealth Management, Spring 2007. ↩ ↩2 ↩3 In more details, using Portfolio Optimizer, the evolution of 25000 random portfolios has been simulated over the considered period, these portfolios being a) constrained so that all weights are positive and sum to a random exposure between 0% and 100% and b) monthly rebalanced toward random portfolios in order to encompass any possible tactical allocation. ↩ Which justifies a posteriori the use of the buy and hold portfolio as a benchmark. ↩ As a side note, Figure 9 highlights that an equal-weighted buy and hold portfolio is a tough benchmark to beat, c.f. DeMiguel et al.45! ↩ Long-only and fully invested. ↩ ↩2 Gary Antonacci, Dual Momentum Investing: An Innovative Strategy for Higher Returns With Lower Risk ↩ ↩2 In more details, using Portfolio Optimizer, the evolution of 10000 random portfolios has been simulated over the considered period, these portfolios being a) constrained so that all weights are positive and sum to a 100% and b) monthly rebalanced toward random portfolios in order to encompass any possible tactical allocation. ↩ ↩2 ↩3 See DeMiguel, Victor and Garlappi, Lorenzo and Uppal, Raman, Optimal Versus Naive Diversification: How Inefficient is the 1/N Portfolio Strategy? (May 2009). The Review of Financial Studies, Vol. 22, Issue 5, pp. 1915-1953, 2009. ↩Sparse Index Tracking: Limiting the Number of Assets in an Index Tracking Portfolio2024-01-08T00:00:00-06:002024-01-08T00:00:00-06:00https://portfoliooptimizer.io/blog/sparse-index-tracking-limiting-the-number-of-assets-in-an-index-tracking-portfolio<p>In the <a href="/blog/index-tracking-reproducing-the-performance-of-a-financial-market-index-and-more/">previous post</a>, I introduced the <em>index tracking problem</em><sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote">1</a></sup>, which consists
in finding a portfolio that tracks as closely as possible<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote">2</a></sup> a given financial market index.</p>
<p>Because such a portfolio might contain any number of assets, with for example an <a href="https://en.wikipedia.org/wiki/S%26P_500">S&P 500</a> tracking portfolio possibly containing ~500 stocks, <em>it is [sometimes desirable]
that the tracking portfolio consists of a small number of assets</em><sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote">3</a></sup> in order to <em>simplify the execution, avoid small and illiquid positions, and large transaction costs</em><sup id="fnref:3:1" role="doc-noteref"><a href="#fn:3" class="footnote">3</a></sup>.</p>
<p>In other words, it is sometimes desirable to impose a constraint on the maximum number of assets contained in an index tracking portfolio, which leads to an extension of the regular index tracking problem called the <em>sparse index tracking problem</em><sup id="fnref:24" role="doc-noteref"><a href="#fn:24" class="footnote">4</a></sup>.</p>
<p>In this new post, I will describe the mathematics of the sparse index tracking problem and I will detail a few examples of usage like:</p>
<ul>
<li>Replicating a fund of funds to reduce fees while keeping the number of directly-invested funds manageable</li>
<li>Replicating the S&P 500 with only ~50 stocks as an (extreme) illustration of the <em>optimized sampling</em><sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote">5</a></sup> ETF methodology</li>
</ul>
<h2 id="mathematical-preliminaries">Mathematical preliminaries</h2>
<h3 id="the-general-sparse-index-tracking-optimization-problem">The general sparse index tracking optimization problem</h3>
<p>Let be:</p>
<ul>
<li>$T$, a number of time periods</li>
<li>$r_{idx} = \left( r_{idx, 1}, …, r_{idx, T} \right) \in \mathcal{R}^{T}$, the vector of the index arithmetic returns over each of the $T$ time periods</li>
<li>$n$, the number of assets in the universe of the sparse index tracking portfolio</li>
<li>$X \in \mathcal{R}^{T \times n}$ the matrix of the $n$ assets arithmetic returns over each of the $T$ time periods</li>
<li>$w = \left( w_1,…,w_n \right) \in \mathcal{R}^{n} $ the vector of the weights of a portfolio in each of the $n$ assets</li>
</ul>
<p>The vector of the sparse index tracking portfolio weights $w^*$ is then the<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote">6</a></sup> solution to the optimization problem which consists in minimizing a tracking error measure
$f \left( X w - r_{idx} \right)$ - with $f$ some <a href="https://en.wikipedia.org/wiki/Loss_function">loss function</a> - subject to additional constraints like full investment constraint, no short sale constraint, etc. and
a cardinality constraint<sup id="fnref:3:2" role="doc-noteref"><a href="#fn:3" class="footnote">3</a></sup></p>
\[w^* = \operatorname{argmin} f \left( X w - r_{idx} \right) \newline \textrm{s.t. } \begin{cases} \sum_{i=1}^{n} w_i = 1 \newline 0 \leqslant w_i \leqslant 1, i = 1..n \newline ... \newline \lVert w \rVert_0 \le n_{max} \end{cases}\]
<p>, where:</p>
<ul>
<li>$\lVert . \rVert_0$ is the zero “norm”<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote">7</a></sup> of a vector, that is, the number of non-zero elements in that vector</li>
<li>$n_{max}$ is the maximum number of assets with non-zero weights desired in the sparse index tracking portfolio</li>
</ul>
<p>From a computational perspective, and whatever the exact definition of the loss function $f$, it is known that <em>when a cardinality constraint restricting the number of stocks is introduced, the problem of optimizing the composition
of a portfolio tends to become <a href="https://en.wikipedia.org/wiki/NP-hardness">NP-hard</a></em><sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote">8</a></sup>, which <em>means that exact solutions to instances of realistic sizes are computationally intractable, and thus inexact solution methods are the
only practical ones</em><sup id="fnref:8:1" role="doc-noteref"><a href="#fn:8" class="footnote">8</a></sup>.</p>
<p>Standard quadratic optimization methods guaranteeing an optimal solution are thus unfortunately not usable to solve the sparse index tracking optimization problem<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote">9</a></sup>…</p>
<h3 id="heuristics-for-solving-the-general-sparse-index-tracking-optimization-problem">Heuristics for solving the general sparse index tracking optimization problem</h3>
<p>While an exhaustive enumeration of all possible methods for solving the sparse index tracking optimization problem is out of scope of this post, most of them<sup id="fnref:34" role="doc-noteref"><a href="#fn:34" class="footnote">10</a></sup> seem to fall into one of three rough categories.</p>
<h4 id="combinatorial-optimization-methods">Combinatorial optimization methods</h4>
<p>The cardinality constraint present in the sparse index tracking problem makes it a combinatorial problem, so that standard combinatorial optimization methods (continuous relaxation of a mixed integer programming formulation<sup id="fnref:8:2" role="doc-noteref"><a href="#fn:8" class="footnote">8</a></sup><sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote">11</a></sup>,
genetic algorithms<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote">12</a></sup><sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote">13</a></sup>…) can be used to find an approximate solution.</p>
<p>As a side note, thanks to the relationship of the sparse index tracking problem with combinatorial optimization and hence with <a href="https://en.wikipedia.org/wiki/Operations_research">operations research</a>,
it is possible to find datasets for different index tracking problems - ranging from a universe of 31 assets to a universe of 2151 assets - in the <a href="http://people.brunel.ac.uk/~mastjjb/jeb/orlib/indtrackinfo.html">OR-Library</a>.</p>
<h4 id="signal-processing-methods">Signal processing methods</h4>
<p>The sparse index tracking problem in finance is closely related to a similar problem in signal processing called <a href="https://en.wikipedia.org/wiki/Compressed_sensing">compressed sensing</a>,
for which specific algorithms have been developed over the years, like the iterative hard thresholding algorithm<sup id="fnref:16" role="doc-noteref"><a href="#fn:16" class="footnote">14</a></sup> or the compressive sampling matching pursuit (CoSaMP) algorithm<sup id="fnref:17" role="doc-noteref"><a href="#fn:17" class="footnote">15</a></sup>.</p>
<p>To be noted, though, that all signal processing methods that rely on the <em>least absolute shrinkage and selection operator (<a href="https://en.wikipedia.org/wiki/Lasso_(statistics)">LASSO</a>)</em>, which consists in approximating
the $l_0$-norm constraint by a $l_1$-norm constraint, are not applicable to the sparse index tracking problem. Indeed, when full investment and no-short sales constraints are imposed<sup id="fnref:20" role="doc-noteref"><a href="#fn:20" class="footnote">16</a></sup>,
we have, for any portfolio weights vector $w$</p>
\[\lVert w \rVert_1 = \sum_{i=1}^n \left| w_i \right| = \sum_{i=1}^n w_i = 1\]
<p>, so that <em>the</em> $l_1$<em>-norm reduces to a constant and is therefore irrelevant</em><sup id="fnref:3:3" role="doc-noteref"><a href="#fn:3" class="footnote">3</a></sup>.</p>
<h4 id="non-linear-optimization-methods">Non-linear optimization methods</h4>
<p>At the crossroads with signal processing methods, generic non-linear optimization methods taking into account cardinality constraints have also been developed over the years:</p>
<ul>
<li>
<p>Methods that directly integrate cardinality constraints, like projected gradient methods<sup id="fnref:14" role="doc-noteref"><a href="#fn:14" class="footnote">17</a></sup><sup id="fnref:15" role="doc-noteref"><a href="#fn:15" class="footnote">18</a></sup> or coordinate descent methods<sup id="fnref:15:1" role="doc-noteref"><a href="#fn:15" class="footnote">18</a></sup></p>
</li>
<li>
<p>Methods that indirectly integrate cardinality constraints by either:</p>
<ul>
<li>Relaxing the $l_0$-norm constraint into an “easier” constraint, like the entropic lower bound algorithm of Jacquet et al.<sup id="fnref:19" role="doc-noteref"><a href="#fn:19" class="footnote">19</a></sup></li>
<li>Reformulating the $l_0$-norm constraint as a $l_0$-norm penalty integrated in the tracking error measure<sup id="fnref:18" role="doc-noteref"><a href="#fn:18" class="footnote">20</a></sup> and replacing that $l_0$-norm penalty with an “easier” penalty, like the $q$-norm approximation algorithm of Jansen and van Dijk<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote">21</a></sup> or the <a href="https://en.wikipedia.org/wiki/MM_algorithm">majorization-minimization</a> algorithm of Benidis et al.<sup id="fnref:3:4" role="doc-noteref"><a href="#fn:3" class="footnote">3</a></sup></li>
</ul>
</li>
</ul>
<h3 id="the-sparse-index-trackingempirical-tracking-error-optimization-problem">The sparse index tracking/empirical tracking error optimization problem</h3>
<p>Like in the <a href="/blog/index-tracking-reproducing-the-performance-of-a-financial-market-index-and-more/">previous post of this series</a>, this post will use the empirical tracking error as the preferred tracking error measure.</p>
<p>In this case, the general sparse index tracking optimization problem becomes</p>
\[w^* = \operatorname{argmin} \frac{1}{T} \lVert X w - r_{idx} \rVert_2^2 \newline \textrm{s.t. } \begin{cases} \sum_{i=1}^{n} w_i = 1 \newline 0 \leqslant w_i \leqslant 1, i = 1..n \newline ... \newline \lVert w \rVert_0 \le n_{max} \end{cases}\]
<p>, which, as noted by Benidis et al.<sup id="fnref:3:5" role="doc-noteref"><a href="#fn:3" class="footnote">3</a></sup>, is nothing else than a constrained sparse regression problem:</p>
<blockquote>
<p>the sparse index tracking problem is similar to many sparsity formulations in the signal processing area in the sense that it is a regression problem with some sparsity requirements</p>
</blockquote>
<h2 id="implementation-in-portfolio-optimizer">Implementation in Portfolio Optimizer</h2>
<p><strong>Portfolio Optimizer</strong> allows to compute an approximate solution to the sparse index tracking optimization problem under the empirical tracking error measure through the endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/portfolios/replication/index-tracking/sparse</code></a>.</p>
<p>In addition, <strong>Portfolio Optimizer</strong> allows to impose misc. constraints like minimum and maximum asset weights or minimum and maximum group weights constraints<sup id="fnref:23" role="doc-noteref"><a href="#fn:23" class="footnote">22</a></sup>.</p>
<h2 id="examples-of-usage">Examples of usage</h2>
<h3 id="replicating-the-sp-500-by-extreme-optimized-sampling">Replicating the S&P 500 by (extreme) optimized sampling</h3>
<p>One of the most immediate application of the sparse index tracking problem is the ability to replicate a financial market index without investing into all its constituents, which is desirable for the reasons explained in the introduction of this post.</p>
<p>In order to illustrate this, I propose to partially replicate one of the numerical experiments in Benidis et al.<sup id="fnref:3:6" role="doc-noteref"><a href="#fn:3" class="footnote">3</a></sup>.</p>
<p>In details, using the S&P 500 data provided in <a href="https://cran.r-project.org/web/packages/sparseIndexTracking/index.html">the R package <em>sparseIndexTracking</em></a> - which consists in the daily returns of the S&P 500 index and of 386 of its constituents
over the period from 1st January 2010 to 31th December 2010 - I will solve<sup id="fnref:35" role="doc-noteref"><a href="#fn:35" class="footnote">23</a></sup> the following 80 sparse index tracking problems:</p>
<ul>
<li><strong>Index</strong> - S&P 500</li>
<li><strong>Tracking assets</strong> - 386 stocks included in the S&P 500</li>
<li><strong>Maximum number of assets</strong> - Varying from 20 to 100</li>
<li><strong>Misc. constraints</strong> - Maximum asset weight constraint of 5%</li>
</ul>
<p>The empirical tracking error of the resulting 80 sparse S&P 500 tracking portfolios, as well as - for reference - the empirical tracking error of the regular S&P 500 tracking portfolio<sup id="fnref:32" role="doc-noteref"><a href="#fn:32" class="footnote">24</a></sup>, is depicted in Figure 1.</p>
<figure>
<a href="/assets/images/blog/sparse-index-tracking-portfolio-ete-sp500.png"><img src="/assets/images/blog/sparse-index-tracking-portfolio-ete-sp500.png" alt="Sparse S&P 500 tracking portfolios over the period 01 January 2010 - 31 December 2010, empirical tracking error v.s. maximum number of stocks" /></a>
<figcaption>Figure 1. Sparse S&P 500 tracking portfolios over the period 01 January 2010 - 31 December 2010, empirical tracking error v.s. maximum number of stocks</figcaption>
</figure>
<p>From this figure, it is visible that a sparse tracking portfolio made of ~100 stocks allows to track the S&P 500 with the same level of tracking error as a regular tracking portfolio made of much more stocks<sup id="fnref:32:1" role="doc-noteref"><a href="#fn:32" class="footnote">24</a></sup>.</p>
<p>But it is possible to do better!</p>
<p>A sparse tracking portfolio made of only ~50 stocks is actually sufficient, as illustrated in Figure 2.</p>
<figure>
<a href="/assets/images/blog/sparse-index-tracking-portfolio-growth-sp500.png"><img src="/assets/images/blog/sparse-index-tracking-portfolio-growth-sp500.png" alt="Sparse S&P 500 55-stock tracking portfolios over the period 01 January 2010 - 31 December 2010, performances comparison with S&P 500" /></a>
<figcaption>Figure 2. Sparse 55-stock S&P 500 tracking portfolio over the period 01 January 2010 - 31 December 2010, performances comparison with the S&P 500</figcaption>
</figure>
<p>Of course, such a small portfolio might be less “diversified”<sup id="fnref:33" role="doc-noteref"><a href="#fn:33" class="footnote">25</a></sup> than its 100-stock counterpart, but the main takeaway is that a portfolio made of ~50-100 stocks adequately replicates the S&P 500, which opens the door of
<em>direct investing</em><sup id="fnref:31" role="doc-noteref"><a href="#fn:31" class="footnote">26</a></sup> to DIY investors.</p>
<h3 id="automatically-determining-the-proper-composite-benchmark-for-a-mutual-fund">Automatically determining the proper composite benchmark for a mutual fund</h3>
<p>In <a href="/blog/index-tracking-reproducing-the-performance-of-a-financial-market-index-and-more/">the previous post of this series</a>, the index tracking machinery was shown to allow to easily and automatically construct a benchmark for any mutual fund.</p>
<p>By extension, the sparse index tracking machinery also allows to do the same, with an important advantage for multi-asset mutual funds - the automated construction of the best composite benchmark with a given number of constituents.</p>
<h3 id="replicating-a-fund-of-funds-to-reduce-fees">Replicating a fund of funds to reduce fees</h3>
<p>As a last example of usage, I propose to build upon <a href="https://www.linkedin.com/posts/finominal_whats-the-likelihood-that-all-22-funds-in-activity-7112822889518616576-lEu4?utm_source=share&utm_medium=member_desktop">a LinkedIn post</a>
from <a href="https://finominal.com/">Finominal</a>, in which the following question is asked:</p>
<blockquote>
<p>What’s the likelihood that all […] funds in JP’s Investor Growth & Income Fund contribute value?</p>
</blockquote>
<p>A bit of context first.</p>
<p>The <em><a href="https://am.jpmorgan.com/us/en/asset-management/adv/products/jpmorgan-investor-growth-income-fund-a-4812c2858">JPMorgan Investor Growth & Income Fund</a></em> is a fund of funds (FoF) whose <em>main investment strategy is to invest in other J.P. Morgan Funds</em><sup id="fnref:25" role="doc-noteref"><a href="#fn:25" class="footnote">27</a></sup>.
As of 31 December 2023, it is invested in 25 such other J.P. Morgan funds (<a href="https://am.jpmorgan.com/us/en/asset-management/adv/products/jpmorgan-core-bond-fund-r6-4812c0100"><em>JPMorgan Core Bond R6</em></a>,
<a href="https://am.jpmorgan.com/us/en/asset-management/adv/products/jpmorgan-us-equity-fund-r6-48121l817"><em>JPMorgan US Equity R6</em></a>…).</p>
<p>One problem with FoF is that they usually have higher fees than traditional mutual funds <em>because they include the management fees charged by the underlying funds</em><sup id="fnref:26" role="doc-noteref"><a href="#fn:26" class="footnote">28</a></sup>.</p>
<p>The <em>JPMorgan Investor Growth & Income Fund</em> is no different, with a total expense ratio ranging from 1.47% to 0.72%, the latter being for its institutional class share (ticker ONGFX).</p>
<p>Inspired by the results from <a href="/blog/index-tracking-reproducing-the-performance-of-a-financial-market-index-and-more/">the previous post of this series</a>, it is possible to reduce those fees by investing directly in all the underlying J.P. Morgan funds
in proportion determined by solving<sup id="fnref:35:1" role="doc-noteref"><a href="#fn:35" class="footnote">23</a></sup> the following regular index tracking problem:</p>
<ul>
<li><strong>Index</strong> - The <em>JPMorgan Investor Growth & Income Fund</em> (ONGFX)</li>
<li><strong>Tracking assets</strong> - 24 over the 25 J.P. Morgan funds held by the <em>JPMorgan Investor Growth & Income Fund</em> as of 31 December 2023, knowing that the 25th fund is marked as restricted for investment by Morningstar<sup id="fnref:27" role="doc-noteref"><a href="#fn:27" class="footnote">29</a></sup></li>
</ul>
<p>Using price data<sup id="fnref:28" role="doc-noteref"><a href="#fn:28" class="footnote">30</a></sup> for the period 01 January 2023 - 31 December 2023, this gives the regular tracking portfolio whose weights are displayed in Figure 3 and whose performances are compared with the <em>JPMorgan Investor Growth & Income Fund</em> in Figure 4.</p>
<figure>
<a href="/assets/images/blog/sparse-index-tracking-portfolio-full-replication-weights-jpm.png"><img src="/assets/images/blog/sparse-index-tracking-portfolio-full-replication-weights-jpm.png" alt="JPMorgan Investor Growth & Income Fund tracking portfolio over the period 01 January 2023 - 31 December 2023, performances comparison with JPMorgan Investor Growth & Income Fund" /></a>
<figcaption>Figure 3. JPMorgan Investor Growth & Income Fund tracking portfolio over the period 01 January 2023 - 31 December 2023, composition</figcaption>
</figure>
<figure>
<a href="/assets/images/blog/sparse-index-tracking-portfolio-full-replication-growth-jpm.png"><img src="/assets/images/blog/sparse-index-tracking-portfolio-full-replication-growth-jpm.png" alt="JPMorgan Investor Growth & Income Fund tracking portfolio over the period 01 January 2023 - 31 December 2023, composition" /></a>
<figcaption>Figure 4. JPMorgan Investor Growth & Income Fund tracking portfolio over the period 01 January 2023 - 31 December 2023, performances comparison with JPMorgan Investor Growth & Income Fund</figcaption>
</figure>
<p>This tracking portfolio has:</p>
<ul>
<li>A practically null tracking error v.s. the <em>JPMorgan Investor Growth & Income Fund</em> (Figure 4)</li>
<li>An expense ratio of ~0.43%<sup id="fnref:29" role="doc-noteref"><a href="#fn:29" class="footnote">31</a></sup> v.s. of at least 0.72% for the <em>JPMorgan Investor Growth & Income Fund</em></li>
</ul>
<p>So, this tracking portfolio is an interesting alternative to the <em>JPMorgan Investor Growth & Income Fund</em>, but there is a catch - it is invested in a total of 22 J.P. Morgan funds (Figure 3), which might be cumbersome to manage…</p>
<p>An even better tracking portfolio would have a smaller number of directly-invested funds.</p>
<p>Does such a unicorn exist?</p>
<p>Figures 5 and 6 answer this question, by solving<sup id="fnref:35:2" role="doc-noteref"><a href="#fn:35" class="footnote">23</a></sup> the following sequence of sparse index tracking problems over the period 01 January 2023 - 31 December 2023:</p>
<ul>
<li><strong>Index</strong> - The <em>JPMorgan Investor Growth & Income Fund</em> (ONGFX)</li>
<li><strong>Tracking assets</strong> - The 24 non-restricted J.P. Morgan funds held by the <em>JPMorgan Investor Growth & Income Fund</em> as of 31 December 2023, knowing that the 25th fund is marked as restricted for investment by Morningstar<sup id="fnref:27:1" role="doc-noteref"><a href="#fn:27" class="footnote">29</a></sup></li>
<li><strong>Maximum number of assets</strong> - Varying from 1 to 24</li>
</ul>
<figure>
<a href="/assets/images/blog/sparse-index-tracking-portfolio-growth-jpm.png"><img src="/assets/images/blog/sparse-index-tracking-portfolio-growth-jpm.png" alt="Sparse JPMorgan Investor Growth & Income Fund tracking portfolio over the period 01 January 2023 - 31 December 2023, performances comparison with JPMorgan Investor Growth & Income Fund" /></a>
<figcaption>Figure 5. Sparse JPMorgan Investor Growth & Income Fund tracking portfolio over the period 01 January 2023 - 31 December 2023, performances comparison with JPMorgan Investor Growth & Income Fund</figcaption>
</figure>
<figure>
<a href="/assets/images/blog/sparse-index-tracking-portfolio-fees-jpm.png"><img src="/assets/images/blog/sparse-index-tracking-portfolio-fees-jpm.png" alt="Sparse JPMorgan Investor Growth & Income Fund tracking portfolio over the period 01 January 2023 - 31 December 2023, fees" /></a>
<figcaption>Figure 6. Sparse JPMorgan Investor Growth & Income Fund tracking portfolio over the period 01 January 2023 - 31 December 2023, fees</figcaption>
</figure>
<p>From these two figures, the sweet spot for the number of directly-invested funds in terms of tracking error/fees/manageability seems to be ~6-8.</p>
<p>Coming back to the initial question from Finominal, this analysis confirms that there is <em>basically zero likelihood that all […] funds in JP’s Investor Growth & Income Fund contribute value</em>, because these can be drastically reduced<sup id="fnref:30" role="doc-noteref"><a href="#fn:30" class="footnote">32</a></sup>.</p>
<h2 id="conclusion">Conclusion</h2>
<p>That’s it for the first blog post of 2024!</p>
<p>Next in this series, I will discuss another extension of the regular index tracking problem, aiming this time at tracking a financial index with “dynamic” asset weights over the considered $T$ time periods.</p>
<p>As usual, feel free to connect with me <a href="https://www.linkedin.com/in/roman-rubsamen/">on LinkedIn</a> or to follow me <a href="https://twitter.com/portfoliooptim">on Twitter</a>.</p>
<p>–</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Also called the <em>benchmark replication problem</em>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>In terms of a given tracking error measure, like the <em>empirical tracking error</em><sup id="fnref:3:7" role="doc-noteref"><a href="#fn:3" class="footnote">3</a></sup>, also called the <em>mean squared tracking error</em>. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>See <a href="https://ieeexplore.ieee.org/document/8384194">Konstantinos Benidis; Yiyong Feng; Daniel P. Palomar, Optimization Methods for Financial Index Tracking: From Theory to Practice , now, 2018.</a>. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:3:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:3:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:3:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:3:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a> <a href="#fnref:3:5" class="reversefootnote" role="doc-backlink">↩<sup>6</sup></a> <a href="#fnref:3:6" class="reversefootnote" role="doc-backlink">↩<sup>7</sup></a> <a href="#fnref:3:7" class="reversefootnote" role="doc-backlink">↩<sup>8</sup></a></p>
</li>
<li id="fn:24" role="doc-endnote">
<p>Also called the <em>partial benchmark replication problem</em>. <a href="#fnref:24" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>See <a href="https://www.pm-research.com/content/iijpormgmt/34/2/143">Kees van Montfort, Elout Visser, Laurens Fijn van Draat, Index Tracking by Means of Optimized Sampling, The Journal of Portfolio Management Winter 2008, 34 (2) 143-152</a>. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>Or a solution, in case multiple solutions exist. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>The zero “norm” is not a real norm. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/abs/pii/S0305054817302265">Purity Mutunge, Dag Haugland, Minimizing the tracking error of cardinality constrained portfolios, Computers & Operations Research, Volume 90, 2018, Pages 33-41</a>. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:8:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:8:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>To be noted that in the specific case of a quadratic loss function, Mutunge and Haugland<sup id="fnref:8:3" role="doc-noteref"><a href="#fn:8" class="footnote">8</a></sup> establish that the sparse index tracking optimization problem is actually <a href="https://en.wikipedia.org/wiki/Strong_NP-completeness">strongly NP-hard</a>. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:34" role="doc-endnote">
<p>Excluding simple heuristics like selecting the largest assets according to a given criteria. For example, <em>a widely used method, especially for a market capitalization weighted index, is to select the largest K assets according to their market capitalizations</em><sup id="fnref:3:8" role="doc-noteref"><a href="#fn:3" class="footnote">3</a></sup>; as another example, greedy or reverse greedy algorithms can be used on top of a non-sparse formulation of the index tracking problem to include or exclude assets one by one<sup id="fnref:19:1" role="doc-noteref"><a href="#fn:19" class="footnote">19</a></sup>. <a href="#fnref:34" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:11" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/abs/pii/S037722170800283X">N.A. Canakgoz, J.E. Beasley, Mixed-integer programming approaches for index tracking and enhanced indexation, European Journal of Operational Research, Volume 196, Issue 1, 2009, Pages 384-399</a>. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:12" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/abs/pii/S0377221702004253">J.E. Beasley, N. Meade, T.-J. Chang, An evolutionary heuristic for the index tracking problem, European Journal of Operational Research, Volume 148, Issue 3, 2003, Pages 621-643</a>. <a href="#fnref:12" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:13" role="doc-endnote">
<p>See <a href="https://link.springer.com/chapter/10.1007/978-1-4757-5226-7_1">Gilli, M., Kellezi, E. (2002). The Threshold Accepting Heuristic for Index Tracking. In: Pardalos, P.M., Tsitsiringos, V.K. (eds) Financial Engineering, E-commerce and Supply Chain. Applied Optimization, vol 70. Springer, Boston, MA</a>. <a href="#fnref:13" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:16" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/pii/S1063520309000384">T. Blumensath and M. Davies, Iterative hard thresholding for compressed sensing, Appl. Comput. Harmon. Anal., 27 (2009), pp. 265–274.</a>. <a href="#fnref:16" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:17" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/pii/S1063520308000638">Deanna Needell and Joel A Tropp. Cosamp: iterative signal recovery from incomplete and inaccurate samples. Applied and Computational Harmonic Analysis, 26(3):301–321, 2009</a>. <a href="#fnref:17" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:20" role="doc-endnote">
<p>Which is usually the case in practice… <a href="#fnref:20" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:14" role="doc-endnote">
<p>See <a href="https://link.springer.com/article/10.1007/s11425-016-9124-0">Xu, F., Dai, Y., Zhao, Z. et al. Efficient projected gradient methods for cardinality constrained optimization. Sci. China Math. 62, 245–268 (2019).</a>. <a href="#fnref:14" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:15" role="doc-endnote">
<p>See <a href="https://epubs.siam.org/doi/abs/10.1137/120869778?journalCode=sjope8">Amir Beck, Yonina C. Eldar, Sparsity Constrained Nonlinear Optimization: Optimality Conditions and Algorithms, 2013, SIAM Journal on Optimization, 1480-1509, 23, 3</a> <a href="#fnref:15" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:15:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:19" role="doc-endnote">
<p>See <a href="https://inria.hal.science/hal-03874638/">Quentin Jacquet, Agnes Bialecki, Laurent El Ghaoui, Stéphane Gaubert, Riadh Zorgati. Entropic Lower Bound of Cardinality for Sparse Optimization. 2022.</a>. <a href="#fnref:19" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:19:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:18" role="doc-endnote">
<p>It is not trivial to show that such a reformulation is equivalent; c.f. Jansen and van Dijk<sup id="fnref:10:1" role="doc-noteref"><a href="#fn:10" class="footnote">21</a></sup> for a detailed proof. <a href="#fnref:18" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>See <a href="https://www.pm-research.com/content/iijpormgmt/28/2/33">Roel Jansen, Ronald van Dijk, Optimal Benchmark Tracking with Small Portfolios, The Journal of Portfolio Management, Winter 2002, 28 (2) 33-39</a>. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:10:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:23" role="doc-endnote">
<p>The possibility to include minimum and maximum group weights constraints is important in practice, because it allows for example to set an upper limit on fees or to (try to) impose sector neutrality in the computed portfolio, a characteristic often overlooked c.f. for example Che et al.<sup id="fnref:22" role="doc-noteref"><a href="#fn:22" class="footnote">33</a></sup> <a href="#fnref:23" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:35" role="doc-endnote">
<p>Using <strong>Portfolio Optimizer</strong>. <a href="#fnref:35" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:35:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:35:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:32" role="doc-endnote">
<p>The non-sparse S&P 500 tracking portfolio, subject to the same maximum asset weight constraint of 5% as the sparse S&P 500 tracking portfolios, is made of 286 stocks. <a href="#fnref:32" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:32:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:33" role="doc-endnote">
<p>For example, subject to sector concentration bias, as highlighted in Che et al.<sup id="fnref:22:1" role="doc-noteref"><a href="#fn:22" class="footnote">33</a></sup>. <a href="#fnref:33" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:31" role="doc-endnote">
<p>Direct investing solutions are usually provided by asset management firms (<a href="https://www.fidelity.com/learning-center/trading-investing/direct-indexing">Fidelity</a>, <a href="https://frec.com/">Frec</a>…). In the case of a stock market index, investing directly in stocks composing that stock market index, as opposed to indirectly through an ETF, has several advantages (null expense ratio, possibility to do tax-loss harvesting…). <a href="#fnref:31" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:25" role="doc-endnote">
<p>C.f. the <a href="https://am.jpmorgan.com/JPMorgan/TADF/4812C2858/SP?site=JPMorgan">JPMorgan Investor Growth & Income Fund prospectus</a>. <a href="#fnref:25" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:26" role="doc-endnote">
<p>C.f. <a href="https://en.wikipedia.org/wiki/Fund_of_funds">Wikipedia</a>. <a href="#fnref:26" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:27" role="doc-endnote">
<p>C.f. <a href="https://www.morningstar.com/funds/xnas/ONGFX/quote">Morningstar</a>. <a href="#fnref:27" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:27:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:28" role="doc-endnote">
<p>(Adjusted) prices have have been retrieved using <a href="https://api.tiingo.com/">Tiingo</a>. <a href="#fnref:28" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:29" role="doc-endnote">
<p>Computation made using the expense ratios reported by Morningstar for the different J.P. Morgan funds, you will have to trust me on this one. To also be noted that the fees associated to the potential rebalancing of the tracking portfolio are not taken into account and would need to be properly minimized in real life. <a href="#fnref:29" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:30" role="doc-endnote">
<p>In view of Figure 5 and Figure 6, I would take the <em>returns increased</em> part of <a href="https://www.linkedin.com/posts/finominal_whats-the-likelihood-that-all-22-funds-in-activity-7112822889518616576-lEu4?utm_source=share&utm_medium=member_desktop">Finominal’s LinkedIn post</a> with a grain of salt, although it is certain that returns will increase thanks to lowering the fees. <a href="#fnref:30" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:22" role="doc-endnote">
<p>See <a href="https://doi.org/10.3390/math10152645">Che, Y.; Chen, S.; Liu, X. Sparse Index Tracking Portfolio with Sector Neutrality. Mathematics 2022, 10, 2645.</a>. <a href="#fnref:22" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:22:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
</ol>
</div>Roman R.In the previous post, I introduced the index tracking problem1, which consists in finding a portfolio that tracks as closely as possible2 a given financial market index. Because such a portfolio might contain any number of assets, with for example an S&P 500 tracking portfolio possibly containing ~500 stocks, it is [sometimes desirable] that the tracking portfolio consists of a small number of assets3 in order to simplify the execution, avoid small and illiquid positions, and large transaction costs3. In other words, it is sometimes desirable to impose a constraint on the maximum number of assets contained in an index tracking portfolio, which leads to an extension of the regular index tracking problem called the sparse index tracking problem4. In this new post, I will describe the mathematics of the sparse index tracking problem and I will detail a few examples of usage like: Replicating a fund of funds to reduce fees while keeping the number of directly-invested funds manageable Replicating the S&P 500 with only ~50 stocks as an (extreme) illustration of the optimized sampling5 ETF methodology Mathematical preliminaries The general sparse index tracking optimization problem Let be: $T$, a number of time periods $r_{idx} = \left( r_{idx, 1}, …, r_{idx, T} \right) \in \mathcal{R}^{T}$, the vector of the index arithmetic returns over each of the $T$ time periods $n$, the number of assets in the universe of the sparse index tracking portfolio $X \in \mathcal{R}^{T \times n}$ the matrix of the $n$ assets arithmetic returns over each of the $T$ time periods $w = \left( w_1,…,w_n \right) \in \mathcal{R}^{n} $ the vector of the weights of a portfolio in each of the $n$ assets The vector of the sparse index tracking portfolio weights $w^*$ is then the6 solution to the optimization problem which consists in minimizing a tracking error measure $f \left( X w - r_{idx} \right)$ - with $f$ some loss function - subject to additional constraints like full investment constraint, no short sale constraint, etc. and a cardinality constraint3 \[w^* = \operatorname{argmin} f \left( X w - r_{idx} \right) \newline \textrm{s.t. } \begin{cases} \sum_{i=1}^{n} w_i = 1 \newline 0 \leqslant w_i \leqslant 1, i = 1..n \newline ... \newline \lVert w \rVert_0 \le n_{max} \end{cases}\] , where: $\lVert . \rVert_0$ is the zero “norm”7 of a vector, that is, the number of non-zero elements in that vector $n_{max}$ is the maximum number of assets with non-zero weights desired in the sparse index tracking portfolio From a computational perspective, and whatever the exact definition of the loss function $f$, it is known that when a cardinality constraint restricting the number of stocks is introduced, the problem of optimizing the composition of a portfolio tends to become NP-hard8, which means that exact solutions to instances of realistic sizes are computationally intractable, and thus inexact solution methods are the only practical ones8. Standard quadratic optimization methods guaranteeing an optimal solution are thus unfortunately not usable to solve the sparse index tracking optimization problem9… Heuristics for solving the general sparse index tracking optimization problem While an exhaustive enumeration of all possible methods for solving the sparse index tracking optimization problem is out of scope of this post, most of them10 seem to fall into one of three rough categories. Combinatorial optimization methods The cardinality constraint present in the sparse index tracking problem makes it a combinatorial problem, so that standard combinatorial optimization methods (continuous relaxation of a mixed integer programming formulation811, genetic algorithms1213…) can be used to find an approximate solution. As a side note, thanks to the relationship of the sparse index tracking problem with combinatorial optimization and hence with operations research, it is possible to find datasets for different index tracking problems - ranging from a universe of 31 assets to a universe of 2151 assets - in the OR-Library. Signal processing methods The sparse index tracking problem in finance is closely related to a similar problem in signal processing called compressed sensing, for which specific algorithms have been developed over the years, like the iterative hard thresholding algorithm14 or the compressive sampling matching pursuit (CoSaMP) algorithm15. To be noted, though, that all signal processing methods that rely on the least absolute shrinkage and selection operator (LASSO), which consists in approximating the $l_0$-norm constraint by a $l_1$-norm constraint, are not applicable to the sparse index tracking problem. Indeed, when full investment and no-short sales constraints are imposed16, we have, for any portfolio weights vector $w$ \[\lVert w \rVert_1 = \sum_{i=1}^n \left| w_i \right| = \sum_{i=1}^n w_i = 1\] , so that the $l_1$-norm reduces to a constant and is therefore irrelevant3. Non-linear optimization methods At the crossroads with signal processing methods, generic non-linear optimization methods taking into account cardinality constraints have also been developed over the years: Methods that directly integrate cardinality constraints, like projected gradient methods1718 or coordinate descent methods18 Methods that indirectly integrate cardinality constraints by either: Relaxing the $l_0$-norm constraint into an “easier” constraint, like the entropic lower bound algorithm of Jacquet et al.19 Reformulating the $l_0$-norm constraint as a $l_0$-norm penalty integrated in the tracking error measure20 and replacing that $l_0$-norm penalty with an “easier” penalty, like the $q$-norm approximation algorithm of Jansen and van Dijk21 or the majorization-minimization algorithm of Benidis et al.3 The sparse index tracking/empirical tracking error optimization problem Like in the previous post of this series, this post will use the empirical tracking error as the preferred tracking error measure. In this case, the general sparse index tracking optimization problem becomes \[w^* = \operatorname{argmin} \frac{1}{T} \lVert X w - r_{idx} \rVert_2^2 \newline \textrm{s.t. } \begin{cases} \sum_{i=1}^{n} w_i = 1 \newline 0 \leqslant w_i \leqslant 1, i = 1..n \newline ... \newline \lVert w \rVert_0 \le n_{max} \end{cases}\] , which, as noted by Benidis et al.3, is nothing else than a constrained sparse regression problem: the sparse index tracking problem is similar to many sparsity formulations in the signal processing area in the sense that it is a regression problem with some sparsity requirements Implementation in Portfolio Optimizer Portfolio Optimizer allows to compute an approximate solution to the sparse index tracking optimization problem under the empirical tracking error measure through the endpoint /portfolios/replication/index-tracking/sparse. In addition, Portfolio Optimizer allows to impose misc. constraints like minimum and maximum asset weights or minimum and maximum group weights constraints22. Examples of usage Replicating the S&P 500 by (extreme) optimized sampling One of the most immediate application of the sparse index tracking problem is the ability to replicate a financial market index without investing into all its constituents, which is desirable for the reasons explained in the introduction of this post. In order to illustrate this, I propose to partially replicate one of the numerical experiments in Benidis et al.3. In details, using the S&P 500 data provided in the R package sparseIndexTracking - which consists in the daily returns of the S&P 500 index and of 386 of its constituents over the period from 1st January 2010 to 31th December 2010 - I will solve23 the following 80 sparse index tracking problems: Index - S&P 500 Tracking assets - 386 stocks included in the S&P 500 Maximum number of assets - Varying from 20 to 100 Misc. constraints - Maximum asset weight constraint of 5% The empirical tracking error of the resulting 80 sparse S&P 500 tracking portfolios, as well as - for reference - the empirical tracking error of the regular S&P 500 tracking portfolio24, is depicted in Figure 1. Figure 1. Sparse S&P 500 tracking portfolios over the period 01 January 2010 - 31 December 2010, empirical tracking error v.s. maximum number of stocks From this figure, it is visible that a sparse tracking portfolio made of ~100 stocks allows to track the S&P 500 with the same level of tracking error as a regular tracking portfolio made of much more stocks24. But it is possible to do better! A sparse tracking portfolio made of only ~50 stocks is actually sufficient, as illustrated in Figure 2. Figure 2. Sparse 55-stock S&P 500 tracking portfolio over the period 01 January 2010 - 31 December 2010, performances comparison with the S&P 500 Of course, such a small portfolio might be less “diversified”25 than its 100-stock counterpart, but the main takeaway is that a portfolio made of ~50-100 stocks adequately replicates the S&P 500, which opens the door of direct investing26 to DIY investors. Automatically determining the proper composite benchmark for a mutual fund In the previous post of this series, the index tracking machinery was shown to allow to easily and automatically construct a benchmark for any mutual fund. By extension, the sparse index tracking machinery also allows to do the same, with an important advantage for multi-asset mutual funds - the automated construction of the best composite benchmark with a given number of constituents. Replicating a fund of funds to reduce fees As a last example of usage, I propose to build upon a LinkedIn post from Finominal, in which the following question is asked: What’s the likelihood that all […] funds in JP’s Investor Growth & Income Fund contribute value? A bit of context first. The JPMorgan Investor Growth & Income Fund is a fund of funds (FoF) whose main investment strategy is to invest in other J.P. Morgan Funds27. As of 31 December 2023, it is invested in 25 such other J.P. Morgan funds (JPMorgan Core Bond R6, JPMorgan US Equity R6…). One problem with FoF is that they usually have higher fees than traditional mutual funds because they include the management fees charged by the underlying funds28. The JPMorgan Investor Growth & Income Fund is no different, with a total expense ratio ranging from 1.47% to 0.72%, the latter being for its institutional class share (ticker ONGFX). Inspired by the results from the previous post of this series, it is possible to reduce those fees by investing directly in all the underlying J.P. Morgan funds in proportion determined by solving23 the following regular index tracking problem: Index - The JPMorgan Investor Growth & Income Fund (ONGFX) Tracking assets - 24 over the 25 J.P. Morgan funds held by the JPMorgan Investor Growth & Income Fund as of 31 December 2023, knowing that the 25th fund is marked as restricted for investment by Morningstar29 Using price data30 for the period 01 January 2023 - 31 December 2023, this gives the regular tracking portfolio whose weights are displayed in Figure 3 and whose performances are compared with the JPMorgan Investor Growth & Income Fund in Figure 4. Figure 3. JPMorgan Investor Growth & Income Fund tracking portfolio over the period 01 January 2023 - 31 December 2023, composition Figure 4. JPMorgan Investor Growth & Income Fund tracking portfolio over the period 01 January 2023 - 31 December 2023, performances comparison with JPMorgan Investor Growth & Income Fund This tracking portfolio has: A practically null tracking error v.s. the JPMorgan Investor Growth & Income Fund (Figure 4) An expense ratio of ~0.43%31 v.s. of at least 0.72% for the JPMorgan Investor Growth & Income Fund So, this tracking portfolio is an interesting alternative to the JPMorgan Investor Growth & Income Fund, but there is a catch - it is invested in a total of 22 J.P. Morgan funds (Figure 3), which might be cumbersome to manage… An even better tracking portfolio would have a smaller number of directly-invested funds. Does such a unicorn exist? Figures 5 and 6 answer this question, by solving23 the following sequence of sparse index tracking problems over the period 01 January 2023 - 31 December 2023: Index - The JPMorgan Investor Growth & Income Fund (ONGFX) Tracking assets - The 24 non-restricted J.P. Morgan funds held by the JPMorgan Investor Growth & Income Fund as of 31 December 2023, knowing that the 25th fund is marked as restricted for investment by Morningstar29 Maximum number of assets - Varying from 1 to 24 Figure 5. Sparse JPMorgan Investor Growth & Income Fund tracking portfolio over the period 01 January 2023 - 31 December 2023, performances comparison with JPMorgan Investor Growth & Income Fund Figure 6. Sparse JPMorgan Investor Growth & Income Fund tracking portfolio over the period 01 January 2023 - 31 December 2023, fees From these two figures, the sweet spot for the number of directly-invested funds in terms of tracking error/fees/manageability seems to be ~6-8. Coming back to the initial question from Finominal, this analysis confirms that there is basically zero likelihood that all […] funds in JP’s Investor Growth & Income Fund contribute value, because these can be drastically reduced32. Conclusion That’s it for the first blog post of 2024! Next in this series, I will discuss another extension of the regular index tracking problem, aiming this time at tracking a financial index with “dynamic” asset weights over the considered $T$ time periods. As usual, feel free to connect with me on LinkedIn or to follow me on Twitter. – Also called the benchmark replication problem. ↩ In terms of a given tracking error measure, like the empirical tracking error3, also called the mean squared tracking error. ↩ See Konstantinos Benidis; Yiyong Feng; Daniel P. Palomar, Optimization Methods for Financial Index Tracking: From Theory to Practice , now, 2018.. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 Also called the partial benchmark replication problem. ↩ See Kees van Montfort, Elout Visser, Laurens Fijn van Draat, Index Tracking by Means of Optimized Sampling, The Journal of Portfolio Management Winter 2008, 34 (2) 143-152. ↩ Or a solution, in case multiple solutions exist. ↩ The zero “norm” is not a real norm. ↩ See Purity Mutunge, Dag Haugland, Minimizing the tracking error of cardinality constrained portfolios, Computers & Operations Research, Volume 90, 2018, Pages 33-41. ↩ ↩2 ↩3 To be noted that in the specific case of a quadratic loss function, Mutunge and Haugland8 establish that the sparse index tracking optimization problem is actually strongly NP-hard. ↩ Excluding simple heuristics like selecting the largest assets according to a given criteria. For example, a widely used method, especially for a market capitalization weighted index, is to select the largest K assets according to their market capitalizations3; as another example, greedy or reverse greedy algorithms can be used on top of a non-sparse formulation of the index tracking problem to include or exclude assets one by one19. ↩ See N.A. Canakgoz, J.E. Beasley, Mixed-integer programming approaches for index tracking and enhanced indexation, European Journal of Operational Research, Volume 196, Issue 1, 2009, Pages 384-399. ↩ See J.E. Beasley, N. Meade, T.-J. Chang, An evolutionary heuristic for the index tracking problem, European Journal of Operational Research, Volume 148, Issue 3, 2003, Pages 621-643. ↩ See Gilli, M., Kellezi, E. (2002). The Threshold Accepting Heuristic for Index Tracking. In: Pardalos, P.M., Tsitsiringos, V.K. (eds) Financial Engineering, E-commerce and Supply Chain. Applied Optimization, vol 70. Springer, Boston, MA. ↩ See T. Blumensath and M. Davies, Iterative hard thresholding for compressed sensing, Appl. Comput. Harmon. Anal., 27 (2009), pp. 265–274.. ↩ See Deanna Needell and Joel A Tropp. Cosamp: iterative signal recovery from incomplete and inaccurate samples. Applied and Computational Harmonic Analysis, 26(3):301–321, 2009. ↩ Which is usually the case in practice… ↩ See Xu, F., Dai, Y., Zhao, Z. et al. Efficient projected gradient methods for cardinality constrained optimization. Sci. China Math. 62, 245–268 (2019).. ↩ See Amir Beck, Yonina C. Eldar, Sparsity Constrained Nonlinear Optimization: Optimality Conditions and Algorithms, 2013, SIAM Journal on Optimization, 1480-1509, 23, 3 ↩ ↩2 See Quentin Jacquet, Agnes Bialecki, Laurent El Ghaoui, Stéphane Gaubert, Riadh Zorgati. Entropic Lower Bound of Cardinality for Sparse Optimization. 2022.. ↩ ↩2 It is not trivial to show that such a reformulation is equivalent; c.f. Jansen and van Dijk21 for a detailed proof. ↩ See Roel Jansen, Ronald van Dijk, Optimal Benchmark Tracking with Small Portfolios, The Journal of Portfolio Management, Winter 2002, 28 (2) 33-39. ↩ ↩2 The possibility to include minimum and maximum group weights constraints is important in practice, because it allows for example to set an upper limit on fees or to (try to) impose sector neutrality in the computed portfolio, a characteristic often overlooked c.f. for example Che et al.33 ↩ Using Portfolio Optimizer. ↩ ↩2 ↩3 The non-sparse S&P 500 tracking portfolio, subject to the same maximum asset weight constraint of 5% as the sparse S&P 500 tracking portfolios, is made of 286 stocks. ↩ ↩2 For example, subject to sector concentration bias, as highlighted in Che et al.33. ↩ Direct investing solutions are usually provided by asset management firms (Fidelity, Frec…). In the case of a stock market index, investing directly in stocks composing that stock market index, as opposed to indirectly through an ETF, has several advantages (null expense ratio, possibility to do tax-loss harvesting…). ↩ C.f. the JPMorgan Investor Growth & Income Fund prospectus. ↩ C.f. Wikipedia. ↩ C.f. Morningstar. ↩ ↩2 (Adjusted) prices have have been retrieved using Tiingo. ↩ Computation made using the expense ratios reported by Morningstar for the different J.P. Morgan funds, you will have to trust me on this one. To also be noted that the fees associated to the potential rebalancing of the tracking portfolio are not taken into account and would need to be properly minimized in real life. ↩ In view of Figure 5 and Figure 6, I would take the returns increased part of Finominal’s LinkedIn post with a grain of salt, although it is certain that returns will increase thanks to lowering the fees. ↩ See Che, Y.; Chen, S.; Liu, X. Sparse Index Tracking Portfolio with Sector Neutrality. Mathematics 2022, 10, 2645.. ↩ ↩2Beyond Modified Value-at-Risk: Application of Gaussian Mixtures to the Computation of Value-at-Risk2023-12-18T00:00:00-06:002023-12-18T00:00:00-06:00https://portfoliooptimizer.io/blog/beyond-modified-value-at-risk-application-of-gaussian-mixtures-to-the-computation-of-value-at-risk<p>In <a href="/blog/corrected-cornish-fisher-expansion-improving-the-accuracy-of-modified-value-at-risk/">a previous post</a>, I described a parametric approach to computing Value-at-Risk (VaR) -
called <em>modified VaR</em><sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote">1</a></sup><sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote">2</a></sup> - that adjusts Gaussian VaR for asymmetry and fat tails present in financial asset returns<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote">3</a></sup> thanks to the usage of a
<a href="https://en.wikipedia.org/wiki/Cornish%E2%80%93Fisher_expansion">Cornish–Fisher expansion</a>.</p>
<p>Modified VaR, when properly used<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote">4</a></sup>, provides accurate estimates of the VaR for a wide range of non-normal portfolio return distributions. Unfortunately, for mathematical reasons, it cannot be computed for
an even wider range of such distributions<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote">5</a></sup>, which poses an issue in practice.</p>
<p>In this blog post, I will describe another parametric approach to computing VaR - called <em>Gaussian mixture Value-at-Risk</em> for the lack of a better name - which
this time adjusts Gaussian VaR for higher moments thanks to the usage of a Gaussian mixture distribution and is free of any computational restriction.</p>
<p>After providing the formula for the Gaussian mixture VaR and explaining how to fit a Gaussian mixture distribution to an empirical return distribution, I will show how to compute the Value-at-Risk of Bitcoin
under the Gaussian mixture model described in the <a href="https://www.blackrock.com/">BlackRock</a> paper <em>Asset Allocation with Crypto: Application of Preferences for Positive Skewness</em> from Ang et al.<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote">6</a></sup></p>
<h2 id="mathematical-preliminaries">Mathematical preliminaries</h2>
<h3 id="univariate-gaussian-mixture-distribution">Univariate Gaussian mixture distribution</h3>
<p>A random variable $X$ is said to follow a univariate Gaussian mixture distribution, written as $X \sim \mathcal{GM} \left( \left( \mu_i, \sigma_i, p_i \right)_{i=1}^k \right) $,
if its cumulative distribution function (c.d.f.) $F_X$ is of the form<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote">7</a></sup></p>
\[F_X(x) = \sum_{i=1}^k p_i \Phi \left( \frac{x - \mu_j}{\sigma_j} \right)\]
<p>, where:</p>
<ul>
<li>$k \geq 1$ is the (integer) number of mixture components</li>
<li>$p_i \in \left[ 0,1 \right]$, $i=1..k$, are the probabilities of the mixture components, with $\sum_{i=1}^kp_i = 1$</li>
<li>$\mu_i \in \mathbb{R}$, $i=1..k$, are the means of the mixture components</li>
<li>$\sigma_i \gt 0$, $i=1..k$, are the standard deviations of the mixture components</li>
<li>$\Phi$ is the c.d.f. of the standard normal distribution</li>
</ul>
<h3 id="reminder-on-value-at-risk">Reminder on Value-at-Risk</h3>
<p>The (percentage) VaR of a portfolio of financial assets corresponds to the percentage of portfolio wealth that can be lost over a certain time horizon and with a certain probability.</p>
<p>Formally, the VaR $VaR_{\alpha}$ of a portfolio over a time horizon $T$ (1 day, 10 days…) and at a confidence level $\alpha$% $\in ]0,1[$ (95%, 97.5%, 99%…) can be defined as<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote">8</a></sup></p>
\[\text{VaR}_{\alpha} (X) = - F_X^{-1}(1 - \alpha)\]
<p>, where:</p>
<ul>
<li>$X$ is a random variable representing the portfolio return over the time horizon $T$</li>
<li>$F_X^{-1}$ is the inverse c.d.f. - also called the <a href="https://en.wikipedia.org/wiki/Quantile_function">quantile function</a> - of the random variable $X$</li>
</ul>
<h2 id="gaussian-mixture-value-at-risk">Gaussian mixture Value-at-Risk</h2>
<h3 id="definition">Definition</h3>
<p>When the return distribution of a portfolio is approximated by a Gaussian mixture distribution, that is, when $X \sim \mathcal{GM} \left( \left( \mu_i, \sigma_i, p_i \right)_{i=1}^k \right) $ with $X$ a
random variable representing the portfolio return over a given time horizon $T$, the resulting parametric VaR can be called <em>Gaussian mixture Value-at-Risk</em> (GmVaR).</p>
<p>Mathematically, the Gaussian mixture VaR over the chosen time horizon $T$ and at a confidence level $\alpha$% $GmVaR_{\alpha}$ is implicitly defined by the following equation<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote">9</a></sup>,
obtained by mixing together the formula for the VaR of a portfolio and the formula for the c.d.f. of a Gaussian mixture distribution</p>
\[\sum_{i=1}^k p_i \Phi \left( - \frac{ GmVaR_{\alpha} + \mu_i }{\sigma_i} \right) = 1 - \alpha\]
<h3 id="computation">Computation</h3>
<p>Because the formula of the previous subsection is implicit in $GmVaR_{\alpha}$, the effective computation of the Gaussian mixture VaR must involve a numerical procedure such as:</p>
<ul>
<li>A non-linear least-squares minimization procedure<sup id="fnref:10:1" role="doc-noteref"><a href="#fn:10" class="footnote">9</a></sup></li>
<li>A bisection procedure<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote">10</a></sup></li>
<li>Or more generally, a procedure to compute a quantile function from a closed-form c.d.f.</li>
</ul>
<h3 id="rationale">Rationale</h3>
<p>It has been observed since the 1970s<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote">11</a></sup> that Gaussian mixture distributions adequately approximate the unconditional distributions of financial asset returns. For example,
Kon<sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote">12</a></sup> shows that two, three and four-component Gaussian mixture distributions can be used to model the daily returns of 3 U.S. stock market indexes and of 30 Dow Jones stocks. Similarly,
Cuevas-Covarrubias et al.<sup id="fnref:17" role="doc-noteref"><a href="#fn:17" class="footnote">13</a></sup> shows how two-component Gaussian mixture distributions exhibit <em>a natural capacity to fit leptokurtic distributions</em><sup id="fnref:17:1" role="doc-noteref"><a href="#fn:17" class="footnote">13</a></sup> using the daily returns of 3 Mexican stocks.</p>
<p>Compared to alternative distributions, Gaussian mixture distributions <em>offer a versatile combination of precision and simplicity</em><sup id="fnref:17:2" role="doc-noteref"><a href="#fn:17" class="footnote">13</a></sup> for modeling asset returns. Indeed,
Gaussian mixture distributions are both flexible enough to handle arbitrary asset return distributions<sup id="fnref:16" role="doc-noteref"><a href="#fn:16" class="footnote">14</a></sup> and simple enough to be numerically tractable<sup id="fnref:16:1" role="doc-noteref"><a href="#fn:16" class="footnote">14</a></sup>. In contrast,
<em>[alternative distributions] tend to suffer from one of two evils: either they are too restrictive in the variety of p.d.f. shapes that can be achieved,
or not restrictive enough, in the sense that they have too many degrees of freedom for calibration to be feasible</em><sup id="fnref:16:2" role="doc-noteref"><a href="#fn:16" class="footnote">14</a></sup>.</p>
<p>As for applications of Gaussian mixture distributions to VaR, Hull and White<sup id="fnref:15" role="doc-noteref"><a href="#fn:15" class="footnote">15</a></sup> used a two-component Gaussian mixture distribution more than 25 years ago in order to illustrate their non-normal VaR computation methodology.
More recent papers include for example Saissi Hassani and Dionne<sup id="fnref:10:2" role="doc-noteref"><a href="#fn:10" class="footnote">9</a></sup> who performs VaR backtests for several parametric models, among which two and three-component Gaussian mixture models.</p>
<p>The Gaussian mixture VaR is thus a well-known extension of the Gaussian VaR when dealing with real-life asset returns.</p>
<h2 id="fitting-a-univariate-gaussian-mixture-distribution-to-an-empirical-portfolio-return-distribution">Fitting a univariate Gaussian mixture distribution to an empirical portfolio return distribution</h2>
<p>Let $r_1,…,r_T \in \mathbb{R}$ be the returns of a portfolio observed over $T$ time periods.</p>
<p>In order to compute the Gaussian mixture VaR of this portfolio using the formula of the previous section, it is first required to approximate the empirical distribution of the returns $r_1,…,r_T$ by a
Gaussian mixture distribution.</p>
<p>This is usually done by determining the “best” <a href="https://en.wikipedia.org/wiki/Estimator">statistical estimators</a> for the unknown parameters $k$ and $\left(p_i, \mu_i, \sigma_i \right)$, $i=1..k$ of that
Gaussian mixture distribution, with “best” to be defined.</p>
<h3 id="how-to-determine-the-number-of-mixture-components">How to determine the number of mixture components?</h3>
<h4 id="manually">Manually</h4>
<p>Because it is possible to <em>interpret the components of a Gaussian mixture return distribution as market regimes, with a latent variable that represents the active regime and a return distribution that is Gaussian, given the regime</em><sup id="fnref:8:1" role="doc-noteref"><a href="#fn:8" class="footnote">7</a></sup>,
the number of mixture components $k$ can be chosen manually.</p>
<p>For example:</p>
<ul>
<li>Choosing $k = 2$ means defining two market regimes in the portfolio returns - a <em>normal</em> regime and a <em>rare events</em> regime, sometimes called a <em>distressed regime</em><sup id="fnref:16:3" role="doc-noteref"><a href="#fn:16" class="footnote">14</a></sup> or a <em>bliss regime</em><sup id="fnref:6:1" role="doc-noteref"><a href="#fn:6" class="footnote">6</a></sup></li>
<li>Choosing $k = 3$ means defining three market regimes in the portfolio returns - the typical <em>bear</em>, <em>neutral</em> and <em>bull</em> market regimes<sup id="fnref:19" role="doc-noteref"><a href="#fn:19" class="footnote">16</a></sup></li>
</ul>
<h4 id="automatically">Automatically</h4>
<p>The number of mixture components $k$ can also be chosen automatically through two general families of methods:</p>
<ul>
<li>
<p>Methods that determine the number of mixture components independently of the remaining mixture parameters</p>
<p>Methods from this family typically iterate over a number of potential values for $k$, determine the remaining parameters $\left(p_i, \mu_i, \sigma_i \right)$, $i=1..k$, and evaluate how well the resulting $k$-component Gaussian mixture distribution approximates the portfolio return distribution.</p>
<p>The chosen number of mixture components is then the value $k^*$, among these potential values for $k$, that allows to best approximate the portfolio return distribution<sup id="fnref:23" role="doc-noteref"><a href="#fn:23" class="footnote">17</a></sup>.</p>
<p>Examples from this family include the <a href="https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test">Kolmogorov-Smirnov test</a><sup id="fnref:18" role="doc-noteref"><a href="#fn:18" class="footnote">18</a></sup> or the more common Bayesian Information Criterion (BIC)<sup id="fnref:25" role="doc-noteref"><a href="#fn:25" class="footnote">19</a></sup>.</p>
</li>
<li>
<p>Methods that determine the number of mixture components together with the remaining mixture parameters</p>
<p>Methods from this family determine all the parameters of the Gaussian mixture distribution at once.</p>
<p>Examples from this family include modified Expectation-Maximization algorithms<sup id="fnref:24" role="doc-noteref"><a href="#fn:24" class="footnote">20</a></sup> or ad-hoc procedures<sup id="fnref:26" role="doc-noteref"><a href="#fn:26" class="footnote">21</a></sup>.</p>
</li>
</ul>
<h3 id="how-to-determine-the-remaining-mixture-parameters">How to determine the remaining mixture parameters?</h3>
<p>In this subsection, the number of mixture components $k$ is supposed to be known<sup id="fnref:30" role="doc-noteref"><a href="#fn:30" class="footnote">22</a></sup>.</p>
<h4 id="likelihood-maximization">Likelihood maximization</h4>
<p>Sensible values for the unknown mixture parameters $\left(p_i, \mu_i, \sigma_i \right)$, $i=1..k$, are maximum-likelihood estimates, defined as the choice of parameters
$\left(p_i^*, \mu_i^*, \sigma_i^* \right)$, $i=1..k$, that globally<sup id="fnref:28" role="doc-noteref"><a href="#fn:28" class="footnote">23</a></sup> maximizes the log-likelihood function $L$ associated to the returns $r_1,…,r_T$ and given by</p>
\[L \left( p_1, \mu_1, \sigma_1,..., p_k, \mu_k, \sigma_k \right) = \sum_{i=1}^T \ln \left[ \sum_{j=1}^k p_j \frac{1}{\sqrt{2 \pi} \sigma_j} \exp \left( - \frac{1}{2} \left( \frac{ x_i - \mu_j}{ \sigma_j} \right )^2 \right) \right]\]
<p>Numerically, several procedures can be used to compute these maximum-likelihood estimates, like Newton’s method<sup id="fnref:29" role="doc-noteref"><a href="#fn:29" class="footnote">24</a></sup>, Fisher’s method of scoring<sup id="fnref:29:1" role="doc-noteref"><a href="#fn:29" class="footnote">24</a></sup> or the Expectation-Maximization (EM) algorithm<sup id="fnref:25:1" role="doc-noteref"><a href="#fn:25" class="footnote">19</a></sup> which is actually <em>the
method of choice for learning mixtures of Gaussians</em><sup id="fnref:27" role="doc-noteref"><a href="#fn:27" class="footnote">25</a></sup><sup id="fnref:34" role="doc-noteref"><a href="#fn:34" class="footnote">26</a></sup>.</p>
<p>Unfortunately, due to the complexity<sup id="fnref:33" role="doc-noteref"><a href="#fn:33" class="footnote">27</a></sup> of the log-likelihood function $L$, none of these procedures is guaranteed to produce “true” maximum-likelihood estimates<sup id="fnref:32" role="doc-noteref"><a href="#fn:32" class="footnote">28</a></sup>. Even worse, the behavior of all these procedures
is known to <em>be highly dependent from [the initial guess of the unknown parameters] and it may fail as a result of degeneracies</em><sup id="fnref:32:1" role="doc-noteref"><a href="#fn:32" class="footnote">28</a></sup>, which led practitioners to
implement misc. numerical tweaks, like running these procedures <em>several times with different, randomly chosen starting points</em><sup id="fnref:35" role="doc-noteref"><a href="#fn:35" class="footnote">29</a></sup>, in order to <em>avoid [obtaining] a poor approximation to the true maximum</em><sup id="fnref:35:1" role="doc-noteref"><a href="#fn:35" class="footnote">29</a></sup>.</p>
<h4 id="turbulence-partitioning-of-returns">Turbulence partitioning of returns</h4>
<p>Although <em>in most applications, parameters of a [Gaussian] mixture model are estimated by maximizing the likelihood</em><sup id="fnref:35:2" role="doc-noteref"><a href="#fn:35" class="footnote">29</a></sup>, there are other procedures to determine the remaining parameters of a Gaussian mixture distribution
(minimum distance estimation<sup id="fnref:41" role="doc-noteref"><a href="#fn:41" class="footnote">30</a></sup>…) which have <em>often been show to be more robust to departures from underlying assumptions than is maximum likelihood</em><sup id="fnref:41:1" role="doc-noteref"><a href="#fn:41" class="footnote">30</a></sup>.</p>
<p>While an exhaustive review of these alternative procedures is out of scope of this post, I would still like to mention a procedure described in
<a href="/blog/the-turbulence-index-regime-based-partitioning-of-asset-returns/">another post</a> in the the context of a two-component Gaussian mixture distribution.</p>
<p>This procedure is based on a measure of statistical unusualness of asset returns popularized by Kritzman and Li<sup id="fnref:36" role="doc-noteref"><a href="#fn:36" class="footnote">31</a></sup> called the <em>Turbulence Index</em>, and can easily be generalized to a $k$-component Gaussian mixture distribution as follows:</p>
<ul>
<li>Compute the turbulence index values $d(r_t)$, $t=1..T$, associated to the portfolio returns<sup id="fnref:37" role="doc-noteref"><a href="#fn:37" class="footnote">32</a></sup></li>
<li>Partition the portfolio returns into $k$ partitions based on their turbulence index values, either through manual percentage-based thresholding<sup id="fnref:20" role="doc-noteref"><a href="#fn:20" class="footnote">33</a></sup> or through automated clustering (1D $k$-means<sup id="fnref:38" role="doc-noteref"><a href="#fn:38" class="footnote">34</a></sup>…).</li>
<li>For each of these partitions $P_i$,$i=1..k$
<ul>
<li>Define the estimate of the $i$-th Gaussian mixture component probability $p_i^{**}$ as the proportion of the portfolio returns belonging to $P_i$</li>
<li>Define the estimate of the $i$-th Gaussian mixture component mean $\mu_i^{**}$ as the mean of the portfolio returns belonging to $P_i$</li>
<li>Define the estimate of the $i$-th Gaussian mixture component standard deviation $\sigma_i^{**}$ as the standard deviation of the portfolio returns belonging to $P_i$</li>
</ul>
</li>
</ul>
<p>Compared to likelihood maximization, this procedure is:</p>
<ul>
<li>Easy to implement, with no local optima and no convergence issue to worry about</li>
<li>Easy to interpret, with a one-to-one relationship between partitions and market regimes</li>
</ul>
<p>In addition, as illustrated in <a href="/blog/the-turbulence-index-regime-based-partitioning-of-asset-returns/">the aforementioned post</a>, the quality of the resulting parameters estimates is on par with that of
the maximum-likelihood parameters estimates, at least in the case of a two-asset universe made of U.S. stocks and U.S. Treasuries.</p>
<h2 id="implementation-in-portfolio-optimizer">Implementation in Portfolio Optimizer</h2>
<p><strong>Portfolio Optimizer</strong> implements the computation of a portfolio Gaussian mixture Value-at-Risk through the endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/portfolios/analysis/value-at-risk/gaussian/mixture</code></a>.</p>
<p>This endpoint:</p>
<ul>
<li>Fits a univariate Gaussian mixture distribution to an empirical portfolio (logarithmic) return distribution, using either
<ul>
<li>A variation of the Expectation-Maximization algorithm described in the previous section</li>
<li>The turbulence partitioning procedure also described in the previous section, relying on either a percentage-based thresholding of turbulence index values or a 1D $k$-means clustering of turbulence index values</li>
</ul>
</li>
<li>Computes the associated Gaussian mixture Value-at-Risk, using a variation of the bisection procedure</li>
</ul>
<p>To be noted that it is possible to let <strong>Portfolio Optimizer</strong> automatically determine the best number of Gaussian mixture components<sup id="fnref:46" role="doc-noteref"><a href="#fn:46" class="footnote">35</a></sup>.</p>
<h2 id="example-of-usage---computing-the-value-at-risk-of-bitcoin">Example of usage - Computing the Value-at-Risk of Bitcoin</h2>
<p>In their paper, Ang et al.<sup id="fnref:6:2" role="doc-noteref"><a href="#fn:6" class="footnote">6</a></sup> empirically demonstrate that <em>the very positively skewed returns [of Bitcoin] are well
captured by a [two-component] mixture of Normals distribution</em><sup id="fnref:6:3" role="doc-noteref"><a href="#fn:6" class="footnote">6</a></sup>. They then use this result to suggest that <em>given a small probability regime where Bitcoin may generate high returns, investors with
preferences for positive skewness of portfolio returns should have portfolio exposure to Bitcoin</em><sup id="fnref:39" role="doc-noteref"><a href="#fn:39" class="footnote">36</a></sup>.</p>
<p>As an example of usage of the Gaussian mixture VaR, I propose to compute the VaR of Bitcoin under such a two-component Gaussian mixture model.</p>
<h3 id="bitcoin-returns-analysis">Bitcoin returns analysis</h3>
<p>I propose to start by comparing the Bitcoin price data used in Ang et al.<sup id="fnref:6:4" role="doc-noteref"><a href="#fn:6" class="footnote">6</a></sup> to the Bitcoin price data used in this post:</p>
<ul>
<li>
<p>Figures 1<sup id="fnref:42" role="doc-noteref"><a href="#fn:42" class="footnote">37</a></sup> and 2 compare the cumulative returns of Bitcoin over the period 31 August 2010 - 31 December 2021</p>
<figure>
<a href="/assets/images/blog/beyond-modified-var-bitcoin-cumulative-returns-ang.png"><img src="/assets/images/blog/beyond-modified-var-bitcoin-cumulative-returns-ang.png" alt="Cumulative Bitcoin returns over the period 31 August 2010 - 31 December 2021, Ang et al.'s data. Source: Ang et al." /></a>
<figcaption>Figure 1. Cumulative Bitcoin returns over the period 31 August 2010 - 31 December 2021, Ang et al.'s data. Source: Ang et al.</figcaption>
</figure>
<figure>
<a href="/assets/images/blog/beyond-modified-var-bitcoin-cumulative-returns.png"><img src="/assets/images/blog/beyond-modified-var-bitcoin-cumulative-returns.png" alt="Cumulative Bitcoin returns over the period 31 August 2010 - 31 December 2021, this post's data." /></a>
<figcaption>Figure 2. Cumulative Bitcoin returns over the period 31 August 2010 - 31 December 2021, this post's data.</figcaption>
</figure>
</li>
<li>
<p>Figures 3<sup id="fnref:42:1" role="doc-noteref"><a href="#fn:42" class="footnote">37</a></sup> and 4 compare the annualized mean, standard deviation and Sharpe ratio of Bitcoin monthly (log) returns over the same period</p>
<figure>
<a href="/assets/images/blog/beyond-modified-var-bitcoin-moments-ang.png"><img src="/assets/images/blog/beyond-modified-var-bitcoin-moments-ang.png" alt="Summary statistics of Bitcoin monthly returns over the period 31 August 2010 - 31 December 2021, Ang et al.'s data. Source: Ang et al." /></a>
<figcaption>Figure 3. Summary statistics of Bitcoin monthly returns over the period 31 August 2010 - 31 December 2021, Ang et al.'s data. Source: Ang et al.</figcaption>
</figure>
<figure>
<a href="/assets/images/blog/beyond-modified-var-bitcoin-moments.png"><img src="/assets/images/blog/beyond-modified-var-bitcoin-moments.png" alt="Summary statistics of Bitcoin monthly returns over the period 31 August 2010 - 31 December 2021, this post's data." /></a>
<figcaption>Figure 4. Summary statistics of Bitcoin monthly returns over the period 31 August 2010 - 31 December 2021, this post's data.</figcaption>
</figure>
</li>
</ul>
<p>From these figures, it is clear that the Bitcoin data used in Ang et al.<sup id="fnref:6:5" role="doc-noteref"><a href="#fn:6" class="footnote">6</a></sup> differ from the Bitcoin data used in this post (Figure 3 v.s. Figure 4), although not that much (Figure 1 v.s. Figure 2).</p>
<p>To be noted that this is not surprising, because there is no official price for Bitcoin due to its decentralized nature.</p>
<h3 id="fitting-a-gaussian-mixture-model-to-bitcoin-returns">Fitting a Gaussian mixture model to Bitcoin returns</h3>
<p>Similar to Ang et al.<sup id="fnref:6:6" role="doc-noteref"><a href="#fn:6" class="footnote">6</a></sup>, I now propose to fit a two-component Gaussian mixture model to the Bitcoin monthly (log) returns over the period 31 August 2010 - 31 December 2021.</p>
<h4 id="likelihood-maximization-1">Likelihood maximization</h4>
<p>The EM algorithm, as implemented by the <a href="https://scikit-learn.org/">scikit-learn</a> method <a href="https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html"><code class="language-plaintext highlighter-rouge">sklearn.mixture.GaussianMixture</code></a>
with default parameters, leads to three different maximum likelihood estimates<sup id="fnref:47" role="doc-noteref"><a href="#fn:47" class="footnote">38</a></sup></p>
<ul>
<li>$\left(p_1^1, \mu_1^1, \sigma_1^1 \right)$ = $\left( 0.96, 0.66, 0.84 \right)$ and $\left(p_2^1, \mu_2^1, \sigma_2^1 \right)$ = $\left( 0.04, 14.98, 0.89 \right)$</li>
<li>$\left(p_1^2, \mu_1^2, \sigma_1^2 \right)$ = $\left( 0.91, 0.54, 0.82 \right)$ and $\left(p_2^2, \mu_2^2, \sigma_2^2 \right)$ = $\left( 0.09, 7.45, 2.03 \right)$</li>
<li>$\left(p_1^3, \mu_1^3, \sigma_1^3 \right)$ = $\left( 0.91, 0.55, 0.82 \right)$ and $\left(p_2^3, \mu_2^3, \sigma_2^3 \right)$ = $\left( 0.09, 7.81, 2.03 \right)$</li>
</ul>
<p>This situation confirms that an <em>important drawback of EM is that its solution can highly depend on its starting position and consequently produce sub-optimal maximum likelihood estimates</em><sup id="fnref:35:3" role="doc-noteref"><a href="#fn:35" class="footnote">29</a></sup>!</p>
<p>Fortunately, increasing the number of random initializations performed in the <code class="language-plaintext highlighter-rouge">sklearn.mixture.GaussianMixture</code> method leads to the optimal maximum likelihood estimates,
but somebody unaware of the numerical properties of the EM algorithm might conclude too quickly that <em>the mixture of Normals can be estimated <strong>easily</strong> by maximum likelihood or EM algorithms</em><sup id="fnref:6:7" role="doc-noteref"><a href="#fn:6" class="footnote">6</a></sup>.</p>
<p>These “true” maximum likelihood estimates are $ \left(p_1^*, \mu_1^*, \sigma_1^* \right)$ = $\left( 0.96, 0.66, 0.84 \right)$ and $\left(p_2^*, \mu_2^*, \sigma_2^* \right)$ = $\left( 0.04, 14.98, 0.89 \right)$.</p>
<h4 id="turbulence-partitioning">Turbulence partitioning</h4>
<p>The turbulence partitioning procedure used with an exact 1D $k$-means clustering leads to the same<sup id="fnref:43" role="doc-noteref"><a href="#fn:43" class="footnote">39</a></sup> estimates as the “true” maximum likelihood estimates,
which empirically demonstrates the validity of this procedure to fit the parameters of a univariate Gaussian mixture distribution.</p>
<p>As a reminder, these estimates are $ \left(p_1^{**}, \mu_1^{**}, \sigma_1^{**} \right)$ = $\left( 0.96, 0.66, 0.84 \right)$ and $\left(p_2^{**}, \mu_2^{**}, \sigma_2^{**} \right)$ = $\left( 0.04, 14.98, 0.89 \right)$.</p>
<p>One of the advantages of the turbulence partitioning procedure is that it allows to easily explain how the Gaussian mixture parameters have been estimated:</p>
<ul>
<li>
<p>$p_1^{**}$ (resp. $p_2^{**}$) is the proportion of Bitcoin returns whose turbulence index values are classified as “normal” (resp. “outliers”) by the 1D $k$-means clustering method</p>
<p>Graphically, thanks to Figure 5 which displays the 137 turbulence index values associated to the Bitcoin monthly (log) returns over the period 31 August 2010 - 31 December 2021:</p>
<ul>
<li>
<p>$p_1^{**}$ is the proportion of Bitcoin returns whose turbulence index values are below the red horizontal line, that is, $p_1^{**} = \frac{132}{137} \approx 0.96$</p>
</li>
<li>
<p>$p_2^{**}$ is the proportion of Bitcoin returns whose turbulence index values are above the red horizontal line, that is, $p_2^{**} = \frac{5}{137} \approx 0.04$</p>
</li>
</ul>
<figure>
<a href="/assets/images/blog/beyond-modified-var-bitcoin-turbulence-index.png"><img src="/assets/images/blog/beyond-modified-var-bitcoin-turbulence-index.png" alt="Bitcoin monthly turbulence index values, 31 August 2010 - 31 December 2021." /></a>
<figcaption>Figure 5. Bitcoin monthly turbulence index values, 31 August 2010 - 31 December 2021.</figcaption>
</figure>
</li>
<li>
<p>$\mu_1^{**}$ (resp. $\mu_2^{**}$) is the mean of the Bitcoin returns whose turbulence index values are classified as “normal” (resp. “outliers”) by the 1D $k$-means clustering method</p>
</li>
<li>
<p>$\sigma_1^{**}$ (resp. $\sigma_2^{**}$) is the standard deviation of the Bitcoin returns whose turbulence index values are classified as “normal” (resp. “outliers”) by the 1D $k$-means clustering method</p>
</li>
</ul>
<p>In terms of market regimes, the first (resp. second) Gaussian mixture component models the behavior of Bitcoin during a <em>normal</em> (resp. <em>rare events</em>) market regime.</p>
<p>As a side note, the turbulence partitioning procedure highlights that only 5 returns have been used to compute the parameters estimates $\left(p_2^{**}, \mu_2^{**}, \sigma_2^{**} \right)$, which
raises the question of the statistical significance of both these estimates<sup id="fnref:44" role="doc-noteref"><a href="#fn:44" class="footnote">40</a></sup> and the maximum likelihood estimates $\left(p_2^*, \mu_2^*, \sigma_2^* \right)$ since they are equal…</p>
<h4 id="evaluation">Evaluation</h4>
<p>Figure 6 and Figure 7 compare the theoretical distribution of the estimated two-component Gaussian mixture distribution with the empirical distribution of the monthly (log) Bitcoin returns.</p>
<figure>
<a href="/assets/images/blog/beyond-modified-var-bitcoin-cdf-vs-gmm-cdf.png"><img src="/assets/images/blog/beyond-modified-var-bitcoin-cdf-vs-gmm-cdf.png" alt="Bitcoin monthly returns, empirical c.d.f. v.s. two-component Gaussian mixture c.d.f., 31 August 2010 - 31 December 2021." /></a>
<figcaption>Figure 6. Bitcoin monthly returns, empirical c.d.f. v.s. two-component Gaussian mixture c.d.f., 31 August 2010 - 31 December 2021.</figcaption>
</figure>
<figure>
<a href="/assets/images/blog/beyond-modified-var-bitcoin-cdf-vs-gmm-cdf-left-tail.png"><img src="/assets/images/blog/beyond-modified-var-bitcoin-cdf-vs-gmm-cdf-left-tail.png" alt="Bitcoin monthly returns, empirical c.d.f. v.s. two-component Gaussian mixture c.d.f., left tail, 31 August 2010 - 31 December 2021." /></a>
<figcaption>Figure 7. Bitcoin monthly returns, empirical c.d.f. v.s. two-component Gaussian mixture c.d.f., left tail, 31 August 2010 - 31 December 2021.</figcaption>
</figure>
<p>From these figures, the estimated mixture model seems to adequately capture the main characteristics of the Bitcoin returns, except part of their left tail behavior, which is confirmed more quantitatively
by the Kolmogorov-Smirnov goodness of fit test<sup id="fnref:45" role="doc-noteref"><a href="#fn:45" class="footnote">41</a></sup>.</p>
<h3 id="bitcoin-returns-value-at-risk">Bitcoin returns Value-at-Risk</h3>
<p>I finally propose to compute the Value-at-Risk of Bitcoin monthly (log) returns over the period 31 August 2010 - 31 December 2021, using two methods:</p>
<ul>
<li>The <em>historical Value-at-Risk</em> (HVaR), introduced in <a href="/blog/the-turbulence-index-regime-based-partitioning-of-asset-returns/">another post</a></li>
<li>The Gaussian mixture Value-at-Risk, introduced in this blog post</li>
</ul>
<p>Results for various confidence levels, all visible on Figure 7, are provided below:</p>
<table>
<thead>
<tr>
<th>Confidence level $\alpha$</th>
<th>$\text{HVaR}_{\alpha}$</th>
<th>$\text{GmVaR}_{\alpha}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>95%</td>
<td>42.02%</td>
<td>34.21%</td>
</tr>
<tr>
<td>97.5%</td>
<td>45.84%</td>
<td>41.97%</td>
</tr>
<tr>
<td>99%</td>
<td>47.22%</td>
<td>50.96%</td>
</tr>
<tr>
<td>99.5%</td>
<td>49.86%</td>
<td>57.08%</td>
</tr>
<tr>
<td>99.9%</td>
<td>49.86%</td>
<td>69.69%</td>
</tr>
</tbody>
</table>
<h2 id="conclusion">Conclusion</h2>
<p>Aim of this post was to describe an extension of Gaussian Value-at-Risk, second in order of complexity after modified Value-At-Risk, in case the later cannot be computed.</p>
<p>This being done, I wish you Season’s Greetings and a Happy New Year!</p>
<p>Waiting for 2024, feel free to <a href="https://www.linkedin.com/in/roman-rubsamen/">connect with me on LinkedIn</a> or <a href="https://twitter.com/portfoliooptim">follow me on Twitter</a>.</p>
<p>–</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>See <a href="">Zangari, P. (1996). A VaR methodology for portfolios that include options. RiskMetrics Monitor First Quarter, 4–12</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>Or <em>Cornish-Fisher VaR</em>. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>In this post, all returns are assumed to be logarithmic returns. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>See <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1997178">Maillard, Didier, A User’s Guide to the Cornish Fisher Expansion</a>. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>All portfolio return distribution for which the skewness and the (excess) kurtosis are outside of the domain of validity of the Cornish-Fisher expansion underlying modified VaR, c.f. Maillard<sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote">4</a></sup>. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>See <a href="https://www.pm-research.com/content/iijaltinv/25/4/7">Ang, Andrew, Morris, Tom, Savi, Raffaele, Asset Allocation with Crypto: Application of Preferences for Positive Skewness, The Journal of Alternative Investments, Spring 2023, 25 (4) 7-28</a>. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:6:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:6:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:6:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:6:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a> <a href="#fnref:6:5" class="reversefootnote" role="doc-backlink">↩<sup>6</sup></a> <a href="#fnref:6:6" class="reversefootnote" role="doc-backlink">↩<sup>7</sup></a> <a href="#fnref:6:7" class="reversefootnote" role="doc-backlink">↩<sup>8</sup></a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>See <a href="https://doi.org/10.1007/s11081-023-09814-y">Luxenberg, E., Boyd, S. Portfolio construction with Gaussian mixture returns and exponential utility via convex optimization. Optim Eng (2023).</a>. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:8:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>Some theoretical subtleties are detailed in <a href="/blog/corrected-cornish-fisher-expansion-improving-the-accuracy-of-modified-value-at-risk/">a previous post</a>. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>See <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3766511">Saissi Hassani, Samir and Dionne, Georges, The New International Regulation of Market Risk: Roles of VaR and CVaR in Model Validation (January 12, 2021)</a>. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:10:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:10:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:11" role="doc-endnote">
<p>See <a href="https://arxiv.org/abs/2202.10721">Benjamin Bruder, Nazar Kostyuchyk, Thierry Roncalli, Risk Parity Portfolios with Skewness Risk: An Application to Factor Investing and Alternative Risk Premia, arXiv</a>. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:12" role="doc-endnote">
<p>See <a href="http://www.jstor.org/stable/2330751">Joyce A. Hall, Wade Brorsen and Scott H. Irwin, The Distribution of Futures Prices: A Test of the Stable Paretian and Mixture of Normals Hypotheses, The Journal of Financial and Quantitative Analysis, Vol. 24, No. 1 (Mar., 1989), pp.105-116</a>. <a href="#fnref:12" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:13" role="doc-endnote">
<p>See <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1540-6261.1984.tb03865.x">Kon S (1984) Models of stock returns-a comparison. J Financ 39(1):147–16</a>. <a href="#fnref:13" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:17" role="doc-endnote">
<p>See <a href="https://www.discuss.wmie.uz.zgora.pl/ps/source/publish/view_pdf.php?doi=10.7151/dmps.1190">C. Cuevas-Covarrubias, J. Inigo-Martinez and R. Jimenez-Padilla, Gaussian mixtures and financial returns, Discussiones Mathematicae, Probability and Statistics 37 (2017) 101–122</a>. <a href="#fnref:17" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:17:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:17:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:16" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/abs/pii/S0377221706006242">Ian Buckley, David Saunders, Luis Seco, Portfolio optimization when asset returns have the Gaussian mixture distribution, European Journal of Operational Research, Volume 185, Issue 3, 2008, Pages 1434-1461</a>. <a href="#fnref:16" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:16:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:16:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:16:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a></p>
</li>
<li id="fn:15" role="doc-endnote">
<p>See <a href="https://www.pm-research.com/content/iijderiv/5/3/9">J. Hull, A. White, Value at risk when daily changes in market variables are not normally distributed, Journal of Derivatives 5 (3) (1998) 9–19</a>. <a href="#fnref:15" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:19" role="doc-endnote">
<p>See <a href="https://arxiv.org/abs/2006.11383">Yuantong Li, Qi Ma, Sujit K. Ghosh, A Non-Iterative Quantile Change Detection Method in Mixture Model with Heavy-Tailed Components, arXiv</a>. <a href="#fnref:19" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:23" role="doc-endnote">
<p>The selected number of mixture components sometimes also takes into account both the quality of the approximation of the portfolio return distribution and the potential for overfitting. <a href="#fnref:23" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:18" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/pii/S0895717703900227">Ming-Heng Zhang, Qian-Sheng Cheng, Gaussian mixture modelling to detect random walks in capital markets, Mathematical and Computer Modelling, Volume 38, Issues 5–6, 2003, Pages 503-508</a>. <a href="#fnref:18" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:25" role="doc-endnote">
<p>See <a href="https://ieeexplore.ieee.org/abstract/document/8046036/">J. Worms and S. Touati, “Modelling Program’s Performance with Gaussian Mixtures for Parametric Statistics,” in IEEE Transactions on Multi-Scale Computing Systems, vol. 4, no. 3, pp. 383-395, 1 July-Sept. 2018</a>. <a href="#fnref:25" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:25:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:24" role="doc-endnote">
<p>See <a href="https://www3.stat.sinica.edu.tw/statistica/J27N1/J27N17/J27N17.html">Tao Huang, Heng Peng and Kun Zhang, Model selection for Gaussian mixture models, Statistica Sinica 27 (2017), 147-169</a>. <a href="#fnref:24" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:26" role="doc-endnote">
<p><a href="https://dl.acm.org/doi/10.1145/3394486.3403240">Yuantong Li, Qi Ma, and Sujit K. Ghosh. 2020. A Non-Iterative Quantile Change Detection Method in Mixture Model with Heavy-Tailed Components. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ‘20). Association for Computing Machinery, New York, NY, USA, 1888–1898.</a>. <a href="#fnref:26" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:30" role="doc-endnote">
<p>Typically, because it is determined independently of the remaining parameters $\left(p_i, \mu_i, \sigma_i \right)$, $i=1..k$. <a href="#fnref:30" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:28" role="doc-endnote">
<p>To avoid theoretical issues, some authors prefer to work with non-unique local maximizers, c.f. for example Peters and Walker<sup id="fnref:29:2" role="doc-noteref"><a href="#fn:29" class="footnote">24</a></sup>. <a href="#fnref:28" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:29" role="doc-endnote">
<p>See <a href="http://www.jstor.org/stable/2100676">B. Charles Peters, Jr. and Homer F. Walker, An Iterative Procedure for Obtaining Maximum-Likelihood Estimates of the Parameters for a Mixture of Normal Distributions, SIAM Journal on Applied Mathematics, Vol. 35, No. 2 (Sep., 1978), pp. 362-378</a>. <a href="#fnref:29" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:29:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:29:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:27" role="doc-endnote">
<p>See <a href="https://arxiv.org/abs/1301.3850">Sanjoy Dasgupta, Leonard Schulman, A Two-round Variant of EM for Gaussian Mixtures, arXiv</a>. <a href="#fnref:27" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:34" role="doc-endnote">
<p>This is because the estimation of the parameters of a Gaussian mixture distribution can be formulated as a missing data problem<sup id="fnref:31" role="doc-noteref"><a href="#fn:31" class="footnote">42</a></sup>, for which the EM algorithm has been specifically designed. <a href="#fnref:34" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:33" role="doc-endnote">
<p>In particular, it is non convex. <a href="#fnref:33" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:32" role="doc-endnote">
<p>See <a href="https://inria.hal.science/hal-01113242/document">Jean-Patrick Baudry, Gilles Celeux. EM for mixtures - Initialization requires special care. 2015. hal-01113242</a>. <a href="#fnref:32" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:32:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:35" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/abs/pii/S0167947302001639">Christophe Biernacki, Gilles Celeux, Gérard Govaert, Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models, Computational Statistics & Data Analysis, Volume 41, Issues 3–4, 2003, Pages 561-575</a>. <a href="#fnref:35" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:35:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:35:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:35:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a></p>
</li>
<li id="fn:41" role="doc-endnote">
<p>See <a href="https://www.tandfonline.com/doi/abs/10.1080/01621459.1984.10478085">Wayne A. Woodward , William C. Parr , William R. Schucany & Hildegard Lindsey (1984) A Comparison of Minimum Distance and Maximum Likelihood Estimation of a Mixture Proportion, Journal of the American Statistical Association, 79:387, 590-598</a>. <a href="#fnref:41" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:41:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:36" role="doc-endnote">
<p>See <a href="https://www.tandfonline.com/doi/abs/10.2469/faj.v66.n5.3">M. Kritzman, Y. Li, Skulls, Financial Turbulence, and Risk Management,Financial Analysts Journal, Volume 66, Number 5, Pages 30-41, Year 2010</a>. <a href="#fnref:36" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:37" role="doc-endnote">
<p>Or the underlying multivariate asset returns, if available. <a href="#fnref:37" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:20" role="doc-endnote">
<p>See <a href="https://www.jstor.org/stable/4480169">George Chow, Jacquier, E., Kritzman, M., & Kenneth Lowry. (1999). Optimal Portfolios in Good Times and Bad. Financial Analysts Journal, 55(3), 65–73.</a>. <a href="#fnref:20" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:38" role="doc-endnote">
<p>To be noted that in 1D, the $k$-means algorithm can be computed exactly, see for example <a href="https://arxiv.org/abs/1701.07204">Allan Gronlund, Kasper Green Larsen, Alexander Mathiasen, Jesper Sindahl Nielsen, Stefan Schneider, Mingzhou Song, Fast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D, arXiv</a>. <a href="#fnref:38" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:46" role="doc-endnote">
<p>A proprietary methodology is used for that, except when the turbulence partitioning procedure is used together with a percentage-based thresholding of turbulence index values, in which case the number of Gaussian mixture components is equal<sup id="fnref:40" role="doc-noteref"><a href="#fn:40" class="footnote">43</a></sup> to the number of provided thresholds plus one. <a href="#fnref:46" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:39" role="doc-endnote">
<p>See <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4217841">Sepp, Artur, Optimal Allocation to Cryptocurrencies in Diversified Portfolios (December 31, 2022). Risk Magazine, October 2023, 1-6</a>. <a href="#fnref:39" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:42" role="doc-endnote">
<p>Adapted from Ang et al.<sup id="fnref:6:8" role="doc-noteref"><a href="#fn:6" class="footnote">6</a></sup>. <a href="#fnref:42" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:42:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:47" role="doc-endnote">
<p>Annualized. <a href="#fnref:47" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:43" role="doc-endnote">
<p>You will need to trust me on this one, as long as <strong>Portfolio Optimizer</strong> does not expose an endpoint to fit a Gaussian mixture distribution to an empirical return distribution. <a href="#fnref:43" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:44" role="doc-endnote">
<p>See <a href="https://digitalcommons.unl.edu/usnavyresearch/34">Greenwood, Joseph A. and Sandomire, Marion M., “Sample Size Required For Estimating The Standard Deviation as a Percent of Its True Value” (1950). U.S. Navy Research. 34.</a>. <a href="#fnref:44" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:45" role="doc-endnote">
<p>The $p$-values are so ridiculously high (> 0.95) that even <a href="https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Test_with_estimated_parameters">taking into account the fact that the parameters of the two-component Gaussian mixture distribution have been estimated</a>, the conclusion remains the same. <a href="#fnref:45" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:31" role="doc-endnote">
<p>See <a href="https://www.econstor.eu/bitstream/10419/22198/1/24_tk_gm_skn.pdf">McLachlan, Geoffrey J.; Krishnan, Thriyambakam; Ng, See Ket (2004) : The EM Algorithm, Papers, No. 2004,24, Humboldt-Universität zu Berlin, Center for Applied Statistics and Economics (CASE), Berlin</a>. <a href="#fnref:31" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:40" role="doc-endnote">
<p>Except when there are empty partitions. <a href="#fnref:40" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Roman R.In a previous post, I described a parametric approach to computing Value-at-Risk (VaR) - called modified VaR12 - that adjusts Gaussian VaR for asymmetry and fat tails present in financial asset returns3 thanks to the usage of a Cornish–Fisher expansion. Modified VaR, when properly used4, provides accurate estimates of the VaR for a wide range of non-normal portfolio return distributions. Unfortunately, for mathematical reasons, it cannot be computed for an even wider range of such distributions5, which poses an issue in practice. In this blog post, I will describe another parametric approach to computing VaR - called Gaussian mixture Value-at-Risk for the lack of a better name - which this time adjusts Gaussian VaR for higher moments thanks to the usage of a Gaussian mixture distribution and is free of any computational restriction. After providing the formula for the Gaussian mixture VaR and explaining how to fit a Gaussian mixture distribution to an empirical return distribution, I will show how to compute the Value-at-Risk of Bitcoin under the Gaussian mixture model described in the BlackRock paper Asset Allocation with Crypto: Application of Preferences for Positive Skewness from Ang et al.6 Mathematical preliminaries Univariate Gaussian mixture distribution A random variable $X$ is said to follow a univariate Gaussian mixture distribution, written as $X \sim \mathcal{GM} \left( \left( \mu_i, \sigma_i, p_i \right)_{i=1}^k \right) $, if its cumulative distribution function (c.d.f.) $F_X$ is of the form7 \[F_X(x) = \sum_{i=1}^k p_i \Phi \left( \frac{x - \mu_j}{\sigma_j} \right)\] , where: $k \geq 1$ is the (integer) number of mixture components $p_i \in \left[ 0,1 \right]$, $i=1..k$, are the probabilities of the mixture components, with $\sum_{i=1}^kp_i = 1$ $\mu_i \in \mathbb{R}$, $i=1..k$, are the means of the mixture components $\sigma_i \gt 0$, $i=1..k$, are the standard deviations of the mixture components $\Phi$ is the c.d.f. of the standard normal distribution Reminder on Value-at-Risk The (percentage) VaR of a portfolio of financial assets corresponds to the percentage of portfolio wealth that can be lost over a certain time horizon and with a certain probability. Formally, the VaR $VaR_{\alpha}$ of a portfolio over a time horizon $T$ (1 day, 10 days…) and at a confidence level $\alpha$% $\in ]0,1[$ (95%, 97.5%, 99%…) can be defined as8 \[\text{VaR}_{\alpha} (X) = - F_X^{-1}(1 - \alpha)\] , where: $X$ is a random variable representing the portfolio return over the time horizon $T$ $F_X^{-1}$ is the inverse c.d.f. - also called the quantile function - of the random variable $X$ Gaussian mixture Value-at-Risk Definition When the return distribution of a portfolio is approximated by a Gaussian mixture distribution, that is, when $X \sim \mathcal{GM} \left( \left( \mu_i, \sigma_i, p_i \right)_{i=1}^k \right) $ with $X$ a random variable representing the portfolio return over a given time horizon $T$, the resulting parametric VaR can be called Gaussian mixture Value-at-Risk (GmVaR). Mathematically, the Gaussian mixture VaR over the chosen time horizon $T$ and at a confidence level $\alpha$% $GmVaR_{\alpha}$ is implicitly defined by the following equation9, obtained by mixing together the formula for the VaR of a portfolio and the formula for the c.d.f. of a Gaussian mixture distribution \[\sum_{i=1}^k p_i \Phi \left( - \frac{ GmVaR_{\alpha} + \mu_i }{\sigma_i} \right) = 1 - \alpha\] Computation Because the formula of the previous subsection is implicit in $GmVaR_{\alpha}$, the effective computation of the Gaussian mixture VaR must involve a numerical procedure such as: A non-linear least-squares minimization procedure9 A bisection procedure10 Or more generally, a procedure to compute a quantile function from a closed-form c.d.f. Rationale It has been observed since the 1970s11 that Gaussian mixture distributions adequately approximate the unconditional distributions of financial asset returns. For example, Kon12 shows that two, three and four-component Gaussian mixture distributions can be used to model the daily returns of 3 U.S. stock market indexes and of 30 Dow Jones stocks. Similarly, Cuevas-Covarrubias et al.13 shows how two-component Gaussian mixture distributions exhibit a natural capacity to fit leptokurtic distributions13 using the daily returns of 3 Mexican stocks. Compared to alternative distributions, Gaussian mixture distributions offer a versatile combination of precision and simplicity13 for modeling asset returns. Indeed, Gaussian mixture distributions are both flexible enough to handle arbitrary asset return distributions14 and simple enough to be numerically tractable14. In contrast, [alternative distributions] tend to suffer from one of two evils: either they are too restrictive in the variety of p.d.f. shapes that can be achieved, or not restrictive enough, in the sense that they have too many degrees of freedom for calibration to be feasible14. As for applications of Gaussian mixture distributions to VaR, Hull and White15 used a two-component Gaussian mixture distribution more than 25 years ago in order to illustrate their non-normal VaR computation methodology. More recent papers include for example Saissi Hassani and Dionne9 who performs VaR backtests for several parametric models, among which two and three-component Gaussian mixture models. The Gaussian mixture VaR is thus a well-known extension of the Gaussian VaR when dealing with real-life asset returns. Fitting a univariate Gaussian mixture distribution to an empirical portfolio return distribution Let $r_1,…,r_T \in \mathbb{R}$ be the returns of a portfolio observed over $T$ time periods. In order to compute the Gaussian mixture VaR of this portfolio using the formula of the previous section, it is first required to approximate the empirical distribution of the returns $r_1,…,r_T$ by a Gaussian mixture distribution. This is usually done by determining the “best” statistical estimators for the unknown parameters $k$ and $\left(p_i, \mu_i, \sigma_i \right)$, $i=1..k$ of that Gaussian mixture distribution, with “best” to be defined. How to determine the number of mixture components? Manually Because it is possible to interpret the components of a Gaussian mixture return distribution as market regimes, with a latent variable that represents the active regime and a return distribution that is Gaussian, given the regime7, the number of mixture components $k$ can be chosen manually. For example: Choosing $k = 2$ means defining two market regimes in the portfolio returns - a normal regime and a rare events regime, sometimes called a distressed regime14 or a bliss regime6 Choosing $k = 3$ means defining three market regimes in the portfolio returns - the typical bear, neutral and bull market regimes16 Automatically The number of mixture components $k$ can also be chosen automatically through two general families of methods: Methods that determine the number of mixture components independently of the remaining mixture parameters Methods from this family typically iterate over a number of potential values for $k$, determine the remaining parameters $\left(p_i, \mu_i, \sigma_i \right)$, $i=1..k$, and evaluate how well the resulting $k$-component Gaussian mixture distribution approximates the portfolio return distribution. The chosen number of mixture components is then the value $k^*$, among these potential values for $k$, that allows to best approximate the portfolio return distribution17. Examples from this family include the Kolmogorov-Smirnov test18 or the more common Bayesian Information Criterion (BIC)19. Methods that determine the number of mixture components together with the remaining mixture parameters Methods from this family determine all the parameters of the Gaussian mixture distribution at once. Examples from this family include modified Expectation-Maximization algorithms20 or ad-hoc procedures21. How to determine the remaining mixture parameters? In this subsection, the number of mixture components $k$ is supposed to be known22. Likelihood maximization Sensible values for the unknown mixture parameters $\left(p_i, \mu_i, \sigma_i \right)$, $i=1..k$, are maximum-likelihood estimates, defined as the choice of parameters $\left(p_i^*, \mu_i^*, \sigma_i^* \right)$, $i=1..k$, that globally23 maximizes the log-likelihood function $L$ associated to the returns $r_1,…,r_T$ and given by \[L \left( p_1, \mu_1, \sigma_1,..., p_k, \mu_k, \sigma_k \right) = \sum_{i=1}^T \ln \left[ \sum_{j=1}^k p_j \frac{1}{\sqrt{2 \pi} \sigma_j} \exp \left( - \frac{1}{2} \left( \frac{ x_i - \mu_j}{ \sigma_j} \right )^2 \right) \right]\] Numerically, several procedures can be used to compute these maximum-likelihood estimates, like Newton’s method24, Fisher’s method of scoring24 or the Expectation-Maximization (EM) algorithm19 which is actually the method of choice for learning mixtures of Gaussians2526. Unfortunately, due to the complexity27 of the log-likelihood function $L$, none of these procedures is guaranteed to produce “true” maximum-likelihood estimates28. Even worse, the behavior of all these procedures is known to be highly dependent from [the initial guess of the unknown parameters] and it may fail as a result of degeneracies28, which led practitioners to implement misc. numerical tweaks, like running these procedures several times with different, randomly chosen starting points29, in order to avoid [obtaining] a poor approximation to the true maximum29. Turbulence partitioning of returns Although in most applications, parameters of a [Gaussian] mixture model are estimated by maximizing the likelihood29, there are other procedures to determine the remaining parameters of a Gaussian mixture distribution (minimum distance estimation30…) which have often been show to be more robust to departures from underlying assumptions than is maximum likelihood30. While an exhaustive review of these alternative procedures is out of scope of this post, I would still like to mention a procedure described in another post in the the context of a two-component Gaussian mixture distribution. This procedure is based on a measure of statistical unusualness of asset returns popularized by Kritzman and Li31 called the Turbulence Index, and can easily be generalized to a $k$-component Gaussian mixture distribution as follows: Compute the turbulence index values $d(r_t)$, $t=1..T$, associated to the portfolio returns32 Partition the portfolio returns into $k$ partitions based on their turbulence index values, either through manual percentage-based thresholding33 or through automated clustering (1D $k$-means34…). For each of these partitions $P_i$,$i=1..k$ Define the estimate of the $i$-th Gaussian mixture component probability $p_i^{**}$ as the proportion of the portfolio returns belonging to $P_i$ Define the estimate of the $i$-th Gaussian mixture component mean $\mu_i^{**}$ as the mean of the portfolio returns belonging to $P_i$ Define the estimate of the $i$-th Gaussian mixture component standard deviation $\sigma_i^{**}$ as the standard deviation of the portfolio returns belonging to $P_i$ Compared to likelihood maximization, this procedure is: Easy to implement, with no local optima and no convergence issue to worry about Easy to interpret, with a one-to-one relationship between partitions and market regimes In addition, as illustrated in the aforementioned post, the quality of the resulting parameters estimates is on par with that of the maximum-likelihood parameters estimates, at least in the case of a two-asset universe made of U.S. stocks and U.S. Treasuries. Implementation in Portfolio Optimizer Portfolio Optimizer implements the computation of a portfolio Gaussian mixture Value-at-Risk through the endpoint /portfolios/analysis/value-at-risk/gaussian/mixture. This endpoint: Fits a univariate Gaussian mixture distribution to an empirical portfolio (logarithmic) return distribution, using either A variation of the Expectation-Maximization algorithm described in the previous section The turbulence partitioning procedure also described in the previous section, relying on either a percentage-based thresholding of turbulence index values or a 1D $k$-means clustering of turbulence index values Computes the associated Gaussian mixture Value-at-Risk, using a variation of the bisection procedure To be noted that it is possible to let Portfolio Optimizer automatically determine the best number of Gaussian mixture components35. Example of usage - Computing the Value-at-Risk of Bitcoin In their paper, Ang et al.6 empirically demonstrate that the very positively skewed returns [of Bitcoin] are well captured by a [two-component] mixture of Normals distribution6. They then use this result to suggest that given a small probability regime where Bitcoin may generate high returns, investors with preferences for positive skewness of portfolio returns should have portfolio exposure to Bitcoin36. As an example of usage of the Gaussian mixture VaR, I propose to compute the VaR of Bitcoin under such a two-component Gaussian mixture model. Bitcoin returns analysis I propose to start by comparing the Bitcoin price data used in Ang et al.6 to the Bitcoin price data used in this post: Figures 137 and 2 compare the cumulative returns of Bitcoin over the period 31 August 2010 - 31 December 2021 Figure 1. Cumulative Bitcoin returns over the period 31 August 2010 - 31 December 2021, Ang et al.'s data. Source: Ang et al. Figure 2. Cumulative Bitcoin returns over the period 31 August 2010 - 31 December 2021, this post's data. Figures 337 and 4 compare the annualized mean, standard deviation and Sharpe ratio of Bitcoin monthly (log) returns over the same period Figure 3. Summary statistics of Bitcoin monthly returns over the period 31 August 2010 - 31 December 2021, Ang et al.'s data. Source: Ang et al. Figure 4. Summary statistics of Bitcoin monthly returns over the period 31 August 2010 - 31 December 2021, this post's data. From these figures, it is clear that the Bitcoin data used in Ang et al.6 differ from the Bitcoin data used in this post (Figure 3 v.s. Figure 4), although not that much (Figure 1 v.s. Figure 2). To be noted that this is not surprising, because there is no official price for Bitcoin due to its decentralized nature. Fitting a Gaussian mixture model to Bitcoin returns Similar to Ang et al.6, I now propose to fit a two-component Gaussian mixture model to the Bitcoin monthly (log) returns over the period 31 August 2010 - 31 December 2021. Likelihood maximization The EM algorithm, as implemented by the scikit-learn method sklearn.mixture.GaussianMixture with default parameters, leads to three different maximum likelihood estimates38 $\left(p_1^1, \mu_1^1, \sigma_1^1 \right)$ = $\left( 0.96, 0.66, 0.84 \right)$ and $\left(p_2^1, \mu_2^1, \sigma_2^1 \right)$ = $\left( 0.04, 14.98, 0.89 \right)$ $\left(p_1^2, \mu_1^2, \sigma_1^2 \right)$ = $\left( 0.91, 0.54, 0.82 \right)$ and $\left(p_2^2, \mu_2^2, \sigma_2^2 \right)$ = $\left( 0.09, 7.45, 2.03 \right)$ $\left(p_1^3, \mu_1^3, \sigma_1^3 \right)$ = $\left( 0.91, 0.55, 0.82 \right)$ and $\left(p_2^3, \mu_2^3, \sigma_2^3 \right)$ = $\left( 0.09, 7.81, 2.03 \right)$ This situation confirms that an important drawback of EM is that its solution can highly depend on its starting position and consequently produce sub-optimal maximum likelihood estimates29! Fortunately, increasing the number of random initializations performed in the sklearn.mixture.GaussianMixture method leads to the optimal maximum likelihood estimates, but somebody unaware of the numerical properties of the EM algorithm might conclude too quickly that the mixture of Normals can be estimated easily by maximum likelihood or EM algorithms6. These “true” maximum likelihood estimates are $ \left(p_1^*, \mu_1^*, \sigma_1^* \right)$ = $\left( 0.96, 0.66, 0.84 \right)$ and $\left(p_2^*, \mu_2^*, \sigma_2^* \right)$ = $\left( 0.04, 14.98, 0.89 \right)$. Turbulence partitioning The turbulence partitioning procedure used with an exact 1D $k$-means clustering leads to the same39 estimates as the “true” maximum likelihood estimates, which empirically demonstrates the validity of this procedure to fit the parameters of a univariate Gaussian mixture distribution. As a reminder, these estimates are $ \left(p_1^{**}, \mu_1^{**}, \sigma_1^{**} \right)$ = $\left( 0.96, 0.66, 0.84 \right)$ and $\left(p_2^{**}, \mu_2^{**}, \sigma_2^{**} \right)$ = $\left( 0.04, 14.98, 0.89 \right)$. One of the advantages of the turbulence partitioning procedure is that it allows to easily explain how the Gaussian mixture parameters have been estimated: $p_1^{**}$ (resp. $p_2^{**}$) is the proportion of Bitcoin returns whose turbulence index values are classified as “normal” (resp. “outliers”) by the 1D $k$-means clustering method Graphically, thanks to Figure 5 which displays the 137 turbulence index values associated to the Bitcoin monthly (log) returns over the period 31 August 2010 - 31 December 2021: $p_1^{**}$ is the proportion of Bitcoin returns whose turbulence index values are below the red horizontal line, that is, $p_1^{**} = \frac{132}{137} \approx 0.96$ $p_2^{**}$ is the proportion of Bitcoin returns whose turbulence index values are above the red horizontal line, that is, $p_2^{**} = \frac{5}{137} \approx 0.04$ Figure 5. Bitcoin monthly turbulence index values, 31 August 2010 - 31 December 2021. $\mu_1^{**}$ (resp. $\mu_2^{**}$) is the mean of the Bitcoin returns whose turbulence index values are classified as “normal” (resp. “outliers”) by the 1D $k$-means clustering method $\sigma_1^{**}$ (resp. $\sigma_2^{**}$) is the standard deviation of the Bitcoin returns whose turbulence index values are classified as “normal” (resp. “outliers”) by the 1D $k$-means clustering method In terms of market regimes, the first (resp. second) Gaussian mixture component models the behavior of Bitcoin during a normal (resp. rare events) market regime. As a side note, the turbulence partitioning procedure highlights that only 5 returns have been used to compute the parameters estimates $\left(p_2^{**}, \mu_2^{**}, \sigma_2^{**} \right)$, which raises the question of the statistical significance of both these estimates40 and the maximum likelihood estimates $\left(p_2^*, \mu_2^*, \sigma_2^* \right)$ since they are equal… Evaluation Figure 6 and Figure 7 compare the theoretical distribution of the estimated two-component Gaussian mixture distribution with the empirical distribution of the monthly (log) Bitcoin returns. Figure 6. Bitcoin monthly returns, empirical c.d.f. v.s. two-component Gaussian mixture c.d.f., 31 August 2010 - 31 December 2021. Figure 7. Bitcoin monthly returns, empirical c.d.f. v.s. two-component Gaussian mixture c.d.f., left tail, 31 August 2010 - 31 December 2021. From these figures, the estimated mixture model seems to adequately capture the main characteristics of the Bitcoin returns, except part of their left tail behavior, which is confirmed more quantitatively by the Kolmogorov-Smirnov goodness of fit test41. Bitcoin returns Value-at-Risk I finally propose to compute the Value-at-Risk of Bitcoin monthly (log) returns over the period 31 August 2010 - 31 December 2021, using two methods: The historical Value-at-Risk (HVaR), introduced in another post The Gaussian mixture Value-at-Risk, introduced in this blog post Results for various confidence levels, all visible on Figure 7, are provided below: Confidence level $\alpha$ $\text{HVaR}_{\alpha}$ $\text{GmVaR}_{\alpha}$ 95% 42.02% 34.21% 97.5% 45.84% 41.97% 99% 47.22% 50.96% 99.5% 49.86% 57.08% 99.9% 49.86% 69.69% Conclusion Aim of this post was to describe an extension of Gaussian Value-at-Risk, second in order of complexity after modified Value-At-Risk, in case the later cannot be computed. This being done, I wish you Season’s Greetings and a Happy New Year! Waiting for 2024, feel free to connect with me on LinkedIn or follow me on Twitter. – See Zangari, P. (1996). A VaR methodology for portfolios that include options. RiskMetrics Monitor First Quarter, 4–12. ↩ Or Cornish-Fisher VaR. ↩ In this post, all returns are assumed to be logarithmic returns. ↩ See Maillard, Didier, A User’s Guide to the Cornish Fisher Expansion. ↩ All portfolio return distribution for which the skewness and the (excess) kurtosis are outside of the domain of validity of the Cornish-Fisher expansion underlying modified VaR, c.f. Maillard4. ↩ See Ang, Andrew, Morris, Tom, Savi, Raffaele, Asset Allocation with Crypto: Application of Preferences for Positive Skewness, The Journal of Alternative Investments, Spring 2023, 25 (4) 7-28. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 See Luxenberg, E., Boyd, S. Portfolio construction with Gaussian mixture returns and exponential utility via convex optimization. Optim Eng (2023).. ↩ ↩2 Some theoretical subtleties are detailed in a previous post. ↩ See Saissi Hassani, Samir and Dionne, Georges, The New International Regulation of Market Risk: Roles of VaR and CVaR in Model Validation (January 12, 2021). ↩ ↩2 ↩3 See Benjamin Bruder, Nazar Kostyuchyk, Thierry Roncalli, Risk Parity Portfolios with Skewness Risk: An Application to Factor Investing and Alternative Risk Premia, arXiv. ↩ See Joyce A. Hall, Wade Brorsen and Scott H. Irwin, The Distribution of Futures Prices: A Test of the Stable Paretian and Mixture of Normals Hypotheses, The Journal of Financial and Quantitative Analysis, Vol. 24, No. 1 (Mar., 1989), pp.105-116. ↩ See Kon S (1984) Models of stock returns-a comparison. J Financ 39(1):147–16. ↩ See C. Cuevas-Covarrubias, J. Inigo-Martinez and R. Jimenez-Padilla, Gaussian mixtures and financial returns, Discussiones Mathematicae, Probability and Statistics 37 (2017) 101–122. ↩ ↩2 ↩3 See Ian Buckley, David Saunders, Luis Seco, Portfolio optimization when asset returns have the Gaussian mixture distribution, European Journal of Operational Research, Volume 185, Issue 3, 2008, Pages 1434-1461. ↩ ↩2 ↩3 ↩4 See J. Hull, A. White, Value at risk when daily changes in market variables are not normally distributed, Journal of Derivatives 5 (3) (1998) 9–19. ↩ See Yuantong Li, Qi Ma, Sujit K. Ghosh, A Non-Iterative Quantile Change Detection Method in Mixture Model with Heavy-Tailed Components, arXiv. ↩ The selected number of mixture components sometimes also takes into account both the quality of the approximation of the portfolio return distribution and the potential for overfitting. ↩ See Ming-Heng Zhang, Qian-Sheng Cheng, Gaussian mixture modelling to detect random walks in capital markets, Mathematical and Computer Modelling, Volume 38, Issues 5–6, 2003, Pages 503-508. ↩ See J. Worms and S. Touati, “Modelling Program’s Performance with Gaussian Mixtures for Parametric Statistics,” in IEEE Transactions on Multi-Scale Computing Systems, vol. 4, no. 3, pp. 383-395, 1 July-Sept. 2018. ↩ ↩2 See Tao Huang, Heng Peng and Kun Zhang, Model selection for Gaussian mixture models, Statistica Sinica 27 (2017), 147-169. ↩ Yuantong Li, Qi Ma, and Sujit K. Ghosh. 2020. A Non-Iterative Quantile Change Detection Method in Mixture Model with Heavy-Tailed Components. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ‘20). Association for Computing Machinery, New York, NY, USA, 1888–1898.. ↩ Typically, because it is determined independently of the remaining parameters $\left(p_i, \mu_i, \sigma_i \right)$, $i=1..k$. ↩ To avoid theoretical issues, some authors prefer to work with non-unique local maximizers, c.f. for example Peters and Walker24. ↩ See B. Charles Peters, Jr. and Homer F. Walker, An Iterative Procedure for Obtaining Maximum-Likelihood Estimates of the Parameters for a Mixture of Normal Distributions, SIAM Journal on Applied Mathematics, Vol. 35, No. 2 (Sep., 1978), pp. 362-378. ↩ ↩2 ↩3 See Sanjoy Dasgupta, Leonard Schulman, A Two-round Variant of EM for Gaussian Mixtures, arXiv. ↩ This is because the estimation of the parameters of a Gaussian mixture distribution can be formulated as a missing data problem42, for which the EM algorithm has been specifically designed. ↩ In particular, it is non convex. ↩ See Jean-Patrick Baudry, Gilles Celeux. EM for mixtures - Initialization requires special care. 2015. hal-01113242. ↩ ↩2 See Christophe Biernacki, Gilles Celeux, Gérard Govaert, Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models, Computational Statistics & Data Analysis, Volume 41, Issues 3–4, 2003, Pages 561-575. ↩ ↩2 ↩3 ↩4 See Wayne A. Woodward , William C. Parr , William R. Schucany & Hildegard Lindsey (1984) A Comparison of Minimum Distance and Maximum Likelihood Estimation of a Mixture Proportion, Journal of the American Statistical Association, 79:387, 590-598. ↩ ↩2 See M. Kritzman, Y. Li, Skulls, Financial Turbulence, and Risk Management,Financial Analysts Journal, Volume 66, Number 5, Pages 30-41, Year 2010. ↩ Or the underlying multivariate asset returns, if available. ↩ See George Chow, Jacquier, E., Kritzman, M., & Kenneth Lowry. (1999). Optimal Portfolios in Good Times and Bad. Financial Analysts Journal, 55(3), 65–73.. ↩ To be noted that in 1D, the $k$-means algorithm can be computed exactly, see for example Allan Gronlund, Kasper Green Larsen, Alexander Mathiasen, Jesper Sindahl Nielsen, Stefan Schneider, Mingzhou Song, Fast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D, arXiv. ↩ A proprietary methodology is used for that, except when the turbulence partitioning procedure is used together with a percentage-based thresholding of turbulence index values, in which case the number of Gaussian mixture components is equal43 to the number of provided thresholds plus one. ↩ See Sepp, Artur, Optimal Allocation to Cryptocurrencies in Diversified Portfolios (December 31, 2022). Risk Magazine, October 2023, 1-6. ↩ Adapted from Ang et al.6. ↩ ↩2 Annualized. ↩ You will need to trust me on this one, as long as Portfolio Optimizer does not expose an endpoint to fit a Gaussian mixture distribution to an empirical return distribution. ↩ See Greenwood, Joseph A. and Sandomire, Marion M., “Sample Size Required For Estimating The Standard Deviation as a Percent of Its True Value” (1950). U.S. Navy Research. 34.. ↩ The $p$-values are so ridiculously high (> 0.95) that even taking into account the fact that the parameters of the two-component Gaussian mixture distribution have been estimated, the conclusion remains the same. ↩ See McLachlan, Geoffrey J.; Krishnan, Thriyambakam; Ng, See Ket (2004) : The EM Algorithm, Papers, No. 2004,24, Humboldt-Universität zu Berlin, Center for Applied Statistics and Economics (CASE), Berlin. ↩ Except when there are empty partitions. ↩Index Tracking: Reproducing the Performance of a Financial Market Index (and more)2023-10-31T00:00:00-05:002023-10-31T00:00:00-05:00https://portfoliooptimizer.io/blog/index-tracking-reproducing-the-performance-of-a-financial-market-index-and-more<p>An <em>index tracking portfolio</em><sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote">1</a></sup> is a portfolio designed to track as closely<sup id="fnref:28" role="doc-noteref"><a href="#fn:28" class="footnote">2</a></sup> as possible a financial market index when its exact replication<sup id="fnref:29" role="doc-noteref"><a href="#fn:29" class="footnote">3</a></sup> is either impractical or impossible due to various reasons<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote">4</a></sup>
(transaction costs, liquidity issues, licensing requirements…).</p>
<p>In this blog post, after reviewing the underlying mathematics described in Hallerbach<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote">5</a></sup>, I will review a couple of applications of index tracking beyond “pure” index tracking, like helping financial advisors to
reduce their customers investment fees or minimizing the sensitivity of mean-variance optimization to estimation error.</p>
<blockquote>
<p><strong><em>Notes:</em></strong></p>
<ul>
<li>A Google Sheet corresponding to this post is available <a href="https://docs.google.com/spreadsheets/d/1U431Zp3viXzJurK0lexV3Oi1iT4P3MrVpHgKRq5IUeU/edit?usp=sharing">here</a></li>
</ul>
</blockquote>
<h2 id="mathematical-preliminaries">Mathematical preliminaries</h2>
<h3 id="the-general-index-tracking-optimization-problem">The general index tracking optimization problem</h3>
<p>Let be:</p>
<ul>
<li>$T$, a number of time periods</li>
<li>$r_{idx} = \left( r_{idx, 1}, …, r_{idx, T} \right) \in \mathcal{R}^{T}$, the vector of the index arithmetic returns over each of the $T$ time periods</li>
<li>$r_{tracking} = \left( r_{tracking, 1}, …, r_{tracking, T} \right) \in \mathcal{R}^{T}$, the vector of the (yet to be computed) index tracking portfolio arithmetic returns over each of the $T$ time periods</li>
</ul>
<p>Then:</p>
<ul>
<li>
<p>For a given time period $t$, the difference between the index return $r_{idx, t}$ and the index tracking portfolio return $r_{tracking, t}$ is defined as the <em>tracking error</em> $TE_t$:</p>
\[TE_t = r_{tracking, t} - r_{idx, t}\]
</li>
<li>
<p>For all time periods $t=1..T$, the vector of the tracking errors $TE_t$ is defined as the <em>tracking error vector</em> $TE \in \mathcal{R}^{T}$:</p>
\[TE = \left( r_{tracking, 1} - r_{idx, 1}, ..., r_{tracking, T} - r_{idx, T}\right)\]
</li>
</ul>
<p>Now, let be:</p>
<ul>
<li>$n$, the number of assets in the universe of the index tracking portfolio</li>
<li>$w = \left( w_1,…,w_n \right) \in \mathcal{R}^{n} $ the vector of the weights of a portfolio in each of the $n$ assets</li>
<li>$r_i = \left( r_{i, 1}, …, r_{i, T} \right) \in \mathcal{R}^{T}$, the vector of the asset $i$ arithmetic returns over each of the $T$ time periods, $i = 1..n$</li>
</ul>
<p>Because arithmetic returns can be aggregated across assets, we have for each time period $t$:</p>
\[r_{tracking, t} \left( w \right) = \sum_{i=1}^n w_i r_{i, t}\]
<p>So that:</p>
\[TE_t \left( w \right) = \sum_{i=1}^n w_i r_{i, t} - r_{idx, t}\]
<p>and:</p>
\[TE \left( w \right) = X w - r_{idx}\]
<p>, with $X \in \mathcal{R}^{T \times n}$ the matrix of the $n$ assets arithmetic returns over each of the $T$ time periods.</p>
<p>The vector of the index tracking portfolio weights $w^<em>$ is then the<sup id="fnref:31" role="doc-noteref"><a href="#fn:31" class="footnote">6</a></sup> solution to the general index tracking optimization problem which consists in minimizing some <a href="https://en.wikipedia.org/wiki/Loss_function">loss function</a>
$f$ of the tracking error vector $TE$ - called *tracking error measure</em> - subject to additional constraints like full investment constraint, no short sale constraint, etc.<sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote">5</a></sup></p>
\[w^* = \operatorname{argmin} f(TE \left( w \right) ) \newline \textrm{s.t. } \begin{cases} \sum_{i=1}^{n} w_i = 1 \newline 0 \leqslant w_i \leqslant 1, i = 1..n \newline ... \end{cases}\]
<h3 id="tracking-error-measures">Tracking error measures</h3>
<p>A common tracking error measure is the <em>empirical tracking error (ETE)</em><sup id="fnref:4:1" role="doc-noteref"><a href="#fn:4" class="footnote">4</a></sup>, also called the <em>mean squared tracking error (MSTE)</em>, defined as the sum of the squared tracking errors $TE_t$, $t=1..T$</p>
\[ETE(w) = \frac{1}{T} \lVert X w - r_{idx} \rVert_2^2\]
<p>or, equivalently, the <em>root mean square tracking error (RMSTE)</em><sup id="fnref:2:2" role="doc-noteref"><a href="#fn:2" class="footnote">5</a></sup>, defined by</p>
\[RMSTE(w) = \sqrt{\frac{1}{T} \lVert X w - r_{idx} \rVert_2^2}\]
<p>Another common tracking error measure is the <em>tracking error variance (TEV)</em><sup id="fnref:2:3" role="doc-noteref"><a href="#fn:2" class="footnote">5</a></sup>, defined as the variance of the tracking errors $TE_t$, $t=1..T$</p>
\[TEV(w) = Var( X w - r_b )\]
<p>However, as noted in Beasley et al.<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote">7</a></sup>, the usage of the tracking error variance is problematic due to its insensitivity to bias in the tracking errors $TE_t$. Indeed, if the index tracking portfolio
returns $r_{tracking}$ are systematically higher or lower than the index returns $r_{idx}$<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote">8</a></sup>, the tracking error variance is null although the index tracking portfolio systematically deviates
from its index!</p>
<p>This limitation is illustrated in Figure 1, adapted from Rossbach and Karlow<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote">9</a></sup>, in which it is clearly visible that <em>portfolio 1 has a worse tracking quality than portfolio 2 but the TEV of portfolio 1
is equal to zero</em><sup id="fnref:6:1" role="doc-noteref"><a href="#fn:6" class="footnote">9</a></sup>.</p>
<figure>
<a href="/assets/images/blog/index-tracking-tev-limitation.png"><img src="/assets/images/blog/index-tracking-tev-limitation.png" alt="Insensitivity of the tracking error variance to bias. Source: Rossbach and Karlow." /></a>
<figcaption>Figure 1. Insensitivity of the tracking error variance to bias. Source: Rossbach and Karlow.</figcaption>
</figure>
<p>In addition to the empirical tracking error and to the tracking error variance, other tracking error measures also exist like the mean absolute deviation of the tracking errors<sup id="fnref:6:2" role="doc-noteref"><a href="#fn:6" class="footnote">9</a></sup>,
the correlation between the index tracking portfolio returns $r_{tracking}$ and the index returns $r_{idx}$<sup id="fnref:2:4" role="doc-noteref"><a href="#fn:2" class="footnote">5</a></sup>, the downside risk<sup id="fnref:4:2" role="doc-noteref"><a href="#fn:4" class="footnote">4</a></sup>, etc., but they will not be detailed in this blog post.</p>
<p>In all cases, it is important to note that different tracking error measures lead to different solutions to the general index tracking optimization problem - that is, different index tracking portfolios -
with different properties<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote">10</a></sup>.</p>
<h3 id="the-index-trackingempirical-tracking-error-optimization-problem">The index tracking/empirical tracking error optimization problem</h3>
<p>Due to the limitation of the tracking error variance highlighted in the previous sub-section, this blog post will use the empirical tracking error as the preferred tracking error measure<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote">11</a></sup>.</p>
<p>In this case, the general index tracking optimization problem becomes a constrained regression problem in which the tracking errors $TE_t$, $t = 1..T$ represent the error terms:</p>
\[w^* = \operatorname{argmin} \frac{1}{T} \lVert X w - r_{idx} \rVert_2^2 \newline \textrm{s.t. } \begin{cases} \sum_{i=1}^{n} w_i = 1 \newline 0 \leqslant w_i \leqslant 1, i = 1..n \newline ... \end{cases}\]
<p>From this perspective, and <em>although index tracking is driven from the financial industry, it is in fact a pure signal processing problem</em><sup id="fnref:4:3" role="doc-noteref"><a href="#fn:4" class="footnote">4</a></sup> that can be solved with standard quadratic optimization algorithms.</p>
<h3 id="caveats">Caveats</h3>
<p>The index tracking optimization problem as presented in the previous sub-sections relies on a couple of assumptions that are best to keep in mind:</p>
<ul>
<li>
<p>The<sup id="fnref:31:1" role="doc-noteref"><a href="#fn:31" class="footnote">6</a></sup> solution to the index tracking optimization problem is the portfolio with the lowest in-sample tracking error measure, while out-of-sample performances are ultimately what matters</p>
<p>Rossbach and Karlow<sup id="fnref:6:3" role="doc-noteref"><a href="#fn:6" class="footnote">9</a></sup> study the stability of standard tracking error measures, including the empirical tracking error and the tracking error variance, and conclude that<sup id="fnref:6:4" role="doc-noteref"><a href="#fn:6" class="footnote">9</a></sup></p>
<blockquote>
<p>The results indicate a poor stability for every of the […] traditional measures. Furthermore, the relation to the ex post tracking quality is weak.</p>
</blockquote>
<p>Nevertheless, this problem can be attenuated by regularly re-computing an index tracking portfolio thanks to a rolling window of recent past returns, c.f. Benidis<sup id="fnref:4:4" role="doc-noteref"><a href="#fn:4" class="footnote">4</a></sup>.</p>
</li>
<li>
<p>The<sup id="fnref:31:2" role="doc-noteref"><a href="#fn:31" class="footnote">6</a></sup> solution to the index tracking optimization problem has fixed asset weights, while a certain level of drift is always present in practice</p>
<p>To make the underlying optimization problem tractable, the index tracking portfolio is supposed to be constantly rebalanced at each in-sample time period $t=1..T$.</p>
<p>This assumption is likely to be violated in practice, so that evaluating the <em>ex-post</em> impact of other rebalancing strategies (buy and hold, etc.) is important.</p>
</li>
</ul>
<h3 id="relationship-with-sharpes-returns-based-style-analysis">Relationship with Sharpe’s returns-based style analysis</h3>
<p><a href="https://en.wikipedia.org/wiki/Returns-based_style_analysis"><em>Returns-based style analysis (RBSA)</em></a> is a methodology initially introduced by Sharpe<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote">12</a></sup><sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote">13</a></sup> to evaluate the composition
of a mutual fund in absence of detailed holdings information.</p>
<p>In details, returns-based style analysis consists in determining the combination of major asset classes - represented by market indices - which most closely replicate the actual performances
of a mutual fund over a given time period.</p>
<p>Unsurprisingly, the underlying mathematical formulation is actually equivalent to that of the index tracking/tracking error variance optimization problem<sup id="fnref:3:1" role="doc-noteref"><a href="#fn:3" class="footnote">13</a></sup>, with a specific vocabulary:</p>
<ul>
<li>The index tracking portfolio is called the mutual fund <em>style benchmark</em></li>
<li>The index tracking portfolio weights vector $w$ is called the mutual fund <em>effective asset mix</em></li>
<li>For a given time period $t$
<ul>
<li>The index tracking portfolio return $r_{tracking, t}$ is interpreted as the mutual fund return attributable to <em>style</em>, that is, the mutual fund return originating from its passive exposure to the different asset classes</li>
<li>The tracking error $TE_t$ is interpreted as the mutual fund return attributable to <em>selection</em>, that is, the mutual fund return originating from the skill of the portfolio manager(s)</li>
</ul>
</li>
</ul>
<h3 id="relationship-with-factor-analysis">Relationship with factor analysis</h3>
<p>Sharpe<sup id="fnref:3:2" role="doc-noteref"><a href="#fn:3" class="footnote">13</a></sup> highlights that returns-based style analysis is a specific type of <a href="https://en.wikipedia.org/wiki/Factor_analysis">factor analysis</a></p>
<blockquote>
<p>An asset class factor model can be considered a special case of the generic type. In such a model each factor represents the return on an asset class, and the sensitivities ([beta_i] values) are required to sum to 1 (100%).</p>
</blockquote>
<p>Similarly, from the previous sub-section, index tracking can also be considered as a member of the more general family of methods for portfolio factor analysis.</p>
<h2 id="implementation-in-portfolio-optimizer">Implementation in Portfolio Optimizer</h2>
<p><strong>Portfolio Optimizer</strong> implements three functionalities related to index tracking as presented in the previous section:</p>
<ul>
<li>The computation of the empirical tracking error between a portfolio and an index, through the endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/portfolios/analysis/empirical-tracking-error</code></a></li>
<li>The computation of the tracking error variance between a portfolio and an index, through the endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/portfolios/analysis/tracking-error-variance</code></a></li>
<li>The computation of the index tracking portfolio solution to the index tracking optimization problem under the empirical tracking error measure, through the endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/portfolios/replication/index-tracking</code></a></li>
</ul>
<h2 id="examples-of-usage">Examples of usage</h2>
<h3 id="tracking-an-index-when-no-etf-is-available">Tracking an index when no ETF is available</h3>
<p>I propose to illustrate the “vanilla” usage of an index tracking portfolio when no ETF associated to an index is available due to regulatory reasons.</p>
<p>For the context, residents of France have access to a tax-efficient investment wrapper called <em><a href="https://www.hsbc.fr/en-fr/investissement/bourse-opcvm/produits/plan-epargne-actions/">Plan d’Epargne en Actions (PEA)</a></em>.</p>
<p>A limitation of this investment wrapper, though, is that non-European financial assets cannot be sheltered under it - like global ETFs (MSCI World ETFs, etc.) - unless they have specifically been issued as PEA-compatible.</p>
<p>For example:</p>
<ul>
<li>Investing in an ETF tracking <a href="https://www.msci.com/documents/10199/178e6643-6ae6-47b9-82be-e1fc565ededb">the MSCI World Index</a> is allowed within a PEA thanks for example to the <a href="https://www.amundietf.fr/fr/professionnels/produits/equity/lyxor-pea-monde-msci-world-ucits-etf-capi/fr0011869353"><em>Lyxor PEA Monde (MSCI World) ETF Capi</em> ETF</a><sup id="fnref:32" role="doc-noteref"><a href="#fn:32" class="footnote">14</a></sup></li>
<li>Similarly, investing in an ETF tracking <a href="https://www.msci.com/www/fact-sheet/msci-emerging-markets-index/07149641">the MSCI Emerging Markets (EM) Index</a> is allowed within a PEA thanks for example to the <a href="https://www.amundietf.fr/fr/professionnels/produits/equity/amundi-pea-msci-emerging-emea-esg-leaders-ucits-etf-acc/fr0011440478"><em>Amundi PEA MSCI Emg EMEA ESG Lead UE Acc</em> ETF</a><sup id="fnref:33" role="doc-noteref"><a href="#fn:33" class="footnote">15</a></sup></li>
<li>Yet, investing in an ETF tracking <a href="https://www.msci.com/www/fact-sheet/msci-acwi/06846097">the MSCI All Country World (ACWI) Index</a> is NOT allowed within a PEA, because no PEA-compatible ETF which tracks that index has been issued<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote">16</a></sup></li>
</ul>
<p>Now, for the sake of the exercise, let’s suppose a French DIY investor would like to track the MSCI ACWI Index within her PEA.</p>
<p>As mentioned above, this cannot be done directly since there is no PEA-compatible ETF which tracks this index.</p>
<p>Nevertheless, this can be done indirectly<sup id="fnref:35" role="doc-noteref"><a href="#fn:35" class="footnote">17</a></sup> through an adequate combination of a PEA-compatible MSCI World ETF and a PEA-compatible MSCI EM ETF!</p>
<p>Indeed, Figure 2 compares the evolution, over the period 31 December 2019 - 30 September 2023, of:</p>
<ul>
<li>The PEA-incompatible <a href="https://www.amundietf.fr/fr/professionnels/produits/equity/lyxor-msci-all-country-world-ucits-etf-acc-eur/lu1829220216"><em>Lyxor MSCI All Country World UCITS ETF - Acc</em></a><sup id="fnref:34" role="doc-noteref"><a href="#fn:34" class="footnote">18</a></sup></li>
<li>The PEA-compatible portfolio invested 86% in the <em>Lyxor PEA Monde (MSCI World) ETF Capi</em> ETF and 14% in the <em>Amundi PEA MSCI Emg EMEA ESG Lead UE Acc</em> ETF</li>
</ul>
<figure>
<a href="/assets/images/blog/index-tracking-pea-etfs.png"><img src="/assets/images/blog/index-tracking-pea-etfs.png" alt="PEA-incompatible MSCI ACWI ETF v.s. PEA-compatible MSCI ACWI Index tracking portfolio, 31 December 2019 - 30 September 2023" /></a>
<figcaption>Figure 2. PEA-incompatible MSCI ACWI ETF v.s. PEA-compatible MSCI ACWI Index tracking portfolio, 31 December 2019 - 30 September 2023</figcaption>
</figure>
<p>On this figure, it is visible that the tracking error between the two portfolios is almost null, which confirms that building a custom MSCI ACWI ETF inside a PEA is perfectly feasible<sup id="fnref:16" role="doc-noteref"><a href="#fn:16" class="footnote">19</a></sup>!</p>
<p>Question is, how have the weights of this PEA-compatible MSCI ACWI Index tracking portfolio been computed?</p>
<p>Well, it turns out that the problem at hand is an index tracking problem, with:</p>
<ul>
<li><strong>Index</strong> - The PEA-incompatible <a href="https://www.amundietf.fr/fr/professionnels/produits/equity/lyxor-msci-all-country-world-ucits-etf-acc-eur/lu1829220216"><em>Lyxor MSCI All Country World UCITS ETF - Acc</em></a><sup id="fnref:34:1" role="doc-noteref"><a href="#fn:34" class="footnote">18</a></sup></li>
<li><strong>Tracking assets</strong> - The two PEA-compatible ETFs <em>Lyxor PEA Monde (MSCI World) ETF Capi</em> and <em>Amundi PEA MSCI Emg EMEA ESG Lead UE Acc</em></li>
</ul>
<p>Thus, using the monthly returns of these ETFs over the period 31 December 2019 - 30 September 2023, it is possible to compute the associated index tracking portfolio, which results in the PEA-compatible portfolio above.</p>
<h3 id="reducing-the-fees-of-a-mutual-funds-portfolio">Reducing the fees of a mutual funds portfolio</h3>
<p>In the book <em>Cloning Wall Street</em><sup id="fnref:14" role="doc-noteref"><a href="#fn:14" class="footnote">20</a></sup>, <a href="https://www.jonathanwallentine.com/">Jonathan Wallentine</a> details how financial advisors can use index tracking portfolios - called <em>replicating portfolios</em><sup id="fnref:14:1" role="doc-noteref"><a href="#fn:14" class="footnote">20</a></sup> -
to reduce portfolio fees for their clients.</p>
<p>As an example from that book<sup id="fnref:23" role="doc-noteref"><a href="#fn:23" class="footnote">21</a></sup>, let’s suppose you are a financial advisor living in 2016 and a prospective client named Jessica comes to see you in order to decrease her mutual funds portfolio annual fees.</p>
<p>Jessica’s portfolio is made of the mutual funds listed<sup id="fnref:15" role="doc-noteref"><a href="#fn:15" class="footnote">22</a></sup> in Figure 3, taken from Wallentine<sup id="fnref:14:2" role="doc-noteref"><a href="#fn:14" class="footnote">20</a></sup>.</p>
<figure>
<a href="/assets/images/blog/index-tracking-cloning-wall-street-mutual-funds.png"><img src="/assets/images/blog/index-tracking-cloning-wall-street-mutual-funds.png" alt="Jessica's portfolio constituents. Source: Wallentine." /></a>
<figcaption>Figure 3. Jessica's portfolio constituents. Source: Wallentine.</figcaption>
</figure>
<p>After some thinking<sup id="fnref:14:3" role="doc-noteref"><a href="#fn:14" class="footnote">20</a></sup>, you conclude that Jessica’s mutual funds portfolio could probably be replaced by a portfolio made of the following low-cost ETFs:</p>
<ul>
<li>U.S. short term Treasuries (SHY ETF)</li>
<li>U.S. aggregate bonds (AGG ETF)</li>
<li>U.S. big caps stocks (SPY ETF)</li>
<li>U.S. small caps stocks (IWM ETF)</li>
<li>International stocks (EFA ETF)</li>
</ul>
<p>Problem is, you don’t have a clue about the proportion to invest in each of these ETFs to properly replace Jessica’s portfolio…</p>
<p>What to do?</p>
<p>Here again, the problem at hand is actually an index tracking problem in disguise, with:</p>
<ul>
<li><strong>Index</strong> - Jessica’s mutual funds portfolio</li>
<li><strong>Tracking assets</strong> - The five selected ETFs SHY, AGG, SPY, IWM and EFA</li>
</ul>
<p>Thus, using the monthly returns of Jessica’s portfolio and of these five ETFs over the past period 2011 - 2015, it is possible to compute<sup id="fnref:14:4" role="doc-noteref"><a href="#fn:14" class="footnote">20</a></sup> Jessica’s tracking portfolio weights:</p>
<ul>
<li>5% in the SHY ETF</li>
<li>13% in the AGG ETF</li>
<li>54% in the SPY ETF</li>
<li>6% in the IWM ETF</li>
<li>22% in the EFA ETF</li>
</ul>
<p>Figure 4, adapted from Wallentine<sup id="fnref:14:5" role="doc-noteref"><a href="#fn:14" class="footnote">20</a></sup>, depicts the evolution of Jessica’s original portfolio and of Jessica’s tracking portfolio.</p>
<figure>
<a href="/assets/images/blog/index-tracking-cloning-wall-street-mutual-funds-replication.png"><img src="/assets/images/blog/index-tracking-cloning-wall-street-mutual-funds-replication.png" alt="Jessica's original portfolio v.s. Jessica's tracking portfolio, 2011 - 2015. Source: Wallentine." /></a>
<figcaption>Figure 4. Jessica's original portfolio v.s. Jessica's tracking portfolio, 2011 - 2015. Source: Wallentine.</figcaption>
</figure>
<p>In addition to confirming a very small tacking error between the two portfolios, Figure 4 also highlights that Jessica’s tracking portfolio actually <em>outperformed Jessica’s portfolio by 70bps annually</em><sup id="fnref:14:6" role="doc-noteref"><a href="#fn:14" class="footnote">20</a></sup><sup id="fnref:17" role="doc-noteref"><a href="#fn:17" class="footnote">23</a></sup>.</p>
<p>Now, to come back on Jessica’s original request:</p>
<ul>
<li>Jessica’s mutual funds portfolio has an average expense ratio of 1.13%, which leads to total annual fees of around $11,623<sup id="fnref:14:7" role="doc-noteref"><a href="#fn:14" class="footnote">20</a></sup></li>
<li>Your alternative low-cost ETFs portfolio has an average expense ratio of 0.06%, which leads to total annual fees of around $617<sup id="fnref:14:8" role="doc-noteref"><a href="#fn:14" class="footnote">20</a></sup></li>
</ul>
<p>In other words, your alternative portfolio - which is for all intent and purposes equivalent to Jessica’s current portfolio - leads to a 95% decrease in annual fees<sup id="fnref:18" role="doc-noteref"><a href="#fn:18" class="footnote">24</a></sup>.</p>
<p>Jessica is delighted!</p>
<p>To conclude on this example, <em>the results outlined in this [sub-]section should give readers confidence that [an index tracking portfolio] can successfully
replicate a variety of different mutual fund portfolios from large fund companies</em><sup id="fnref:14:9" role="doc-noteref"><a href="#fn:14" class="footnote">20</a></sup>.</p>
<h3 id="automatically-determining-the-proper-benchmark-for-a-mutual-fund">Automatically determining the proper benchmark for a mutual fund</h3>
<p>In his 1992 paper, Sharpe notes that<sup id="fnref:3:3" role="doc-noteref"><a href="#fn:3" class="footnote">13</a></sup></p>
<blockquote>
<p>Style analysis provides a natural method for constructing benchmarks [for performance measurement]. The return obtained by a fund in each month can be compared with the return on a mix of asset classes with the same estimated style, where the style is estimated prior to the month in question.</p>
</blockquote>
<p>Due to the one-to-one relationship between Sharpe’s style analysis and index tracking, the index tracking machinery then allows to easily and automatically construct a benchmark for any mutual fund<sup id="fnref:19" role="doc-noteref"><a href="#fn:19" class="footnote">25</a></sup>.</p>
<p>As an illustration, I propose to revisit <a href="https://twitter.com/Finominally/status/1694360191527977299">a tweet from Nicolas Rabener</a> - the founder and CEO of <a href="https://finominal.com/">Finominal</a> -
in which an interesting discrepancy is reported between:</p>
<ul>
<li>The <a href="https://www.seic.com/asset-management/enhanced-factor-etfs/sei-enhanced-low-volatility-us-large-cap-etf-selv">SEI Enhanced Low Volatility U.S. Large Cap (SELV) ETF</a></li>
<li>The self-reported benchmark<sup id="fnref:21" role="doc-noteref"><a href="#fn:21" class="footnote">26</a></sup> for this ETF - the Russell 1000 Index</li>
</ul>
<p>For this, Figure 5 compares, over the period 01 January 2023 - 26 October 2023, the evolution of:</p>
<ul>
<li>The SELV ETF</li>
<li>The <a href="https://www.ishares.com/us/products/239707/ishares-russell-1000-etf">iShares Russell 1000 (IWB) ETF</a>, representative of the self-reported benchmark for the SELV ETF - the Russel 1000 Index</li>
<li>The <a href="https://www.ishares.com/us/products/239695/">iShares MSCI USA Min Vol Factor (USMV) ETF</a>, representative of the benchmark for the SELV ETF automatically determined by <a href="https://finominal.com/">Finominal</a> - the MSCI USA Minimum Volatility Index</li>
<li>A portfolio made of 13% IWB ETF / 87% USMV ETF, representative of the benchmark for the SELV ETF automatically determined by <strong>Portfolio Optimizer</strong> using
<ul>
<li><strong>Index</strong> - The SELV ETF</li>
<li><strong>Tracking assets</strong> - The IWB and the USMV ETFs</li>
</ul>
</li>
</ul>
<figure>
<a href="/assets/images/blog/index-tracking-selv.png"><img src="/assets/images/blog/index-tracking-selv.png" alt="SELV ETF v.s. misc. other ETFs and optimal benchmark, 01 January 2023 - 26 October 2023." /></a>
<figcaption>Figure 5. SELV ETF v.s. misc. other ETFs and optimal benchmark, 01 January 2023 - 26 October 2023.</figcaption>
</figure>
<p>Figure 5 empirically demonstrates that the Russell 1000 Index is a very bad choice of benchmark for the SELV ETF and that a more realistic benchmark would consist either of:</p>
<ul>
<li>The MSCI USA Minimum Volatility Index, if only one index can be chosen</li>
<li>The portfolio made of 13% Russel 1000 Index / 87% MSCI USA Minimum Volatility Index, if a composite index can be chosen<sup id="fnref:22" role="doc-noteref"><a href="#fn:22" class="footnote">27</a></sup></li>
</ul>
<p>These two alternative benchmarks are consistent with the SELV ETF fact sheet, which states that<sup id="fnref:21:1" role="doc-noteref"><a href="#fn:21" class="footnote">26</a></sup></p>
<blockquote>
<p>The SEI Enhanced Low Volatility U.S. Large Cap ETF seeks to provide long-term capital appreciation by investing primarily in U.S. common stocks, while aiming to experience lower volatility compared to the broad U.S. large cap equity market.</p>
</blockquote>
<p>Coincidentally, Nicolas Rabener recently<sup id="fnref:11:1" role="doc-noteref"><a href="#fn:11" class="footnote">16</a></sup> blogged about different metrics that can help selecting an optimal benchmark for a fund in his article
<a href="https://insights.finominal.com/research-determining-the-optimal-benchmark-for-funds/"><em>Determining the Optimal Benchmark for Funds</em></a>. Among these metrics, the tracking error is found to be the best performing one and
<em>results in nine selections [out of ten] that seem reasonable</em><sup id="fnref:20" role="doc-noteref"><a href="#fn:20" class="footnote">28</a></sup>.</p>
<p>As a side note, I wholeheartedly recommend readers interested in the topic of automatic benchmark selection to dig into the associated <a href="https://insights.finominal.com/">Finominal’s research</a> blog posts.</p>
<h3 id="minimizing-the-sensitivity-of-mean-variance-optimization-to-estimation-error">Minimizing the sensitivity of mean-variance optimization to estimation error</h3>
<p>A stylized fact of Markowitz’s mean-variance framework is the sensitivity of efficient portfolios to estimation error in asset returns and (co)variances<sup id="fnref:25" role="doc-noteref"><a href="#fn:25" class="footnote">29</a></sup>,
which explains why efficient portfolios are sometimes labeled <em>estimation-error maximizers</em><sup id="fnref:24" role="doc-noteref"><a href="#fn:24" class="footnote">30</a></sup>.</p>
<p>This issue is thoroughly analyzed, both theoretically and empirically, by Kinlaw et al.<sup id="fnref:26" role="doc-noteref"><a href="#fn:26" class="footnote">31</a></sup> who conclude that</p>
<blockquote>
<p>[mean-variance optimization] only appears to be hypersensitive when asset classes are close substitutes for each other, because the optimal weights may shift significantly in response to input errors.</p>
</blockquote>
<p>In other words, Kinlaw et al.<sup id="fnref:26:1" role="doc-noteref"><a href="#fn:26" class="footnote">31</a></sup> establish that it is possible to minimize the sensitivity of mean-variance optimization to estimation error by ensuring that assets are as “distinct” as possible.</p>
<p>A practical method for doing so is proposed in David Berns’ book <em>Modern Asset Allocation for Wealth Management</em><sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote">32</a></sup>.</p>
<p>In details, Berns’ methodology is a two-step procedure allowing to determine whether an asset in a portfolio is redundant with the other assets also present in that portfolio:</p>
<ul>
<li>Step 1 - Compute an index tracking portfolio - called the <em>mimicking portfolio</em><sup id="fnref:13:1" role="doc-noteref"><a href="#fn:13" class="footnote">32</a></sup> - using:
<ul>
<li><strong>Index</strong> - The selected asset</li>
<li><strong>Tracking assets</strong> - The other assets present in the portfolio</li>
</ul>
</li>
<li>
<p>Step 2 - Assess the redundancy of the selected asset by <em>looking at the tracking error [variance] between the mimicking portfolio and the asset</em><sup id="fnref:13:2" role="doc-noteref"><a href="#fn:13" class="footnote">32</a></sup>, because<sup id="fnref:13:3" role="doc-noteref"><a href="#fn:13" class="footnote">32</a></sup></p>
<blockquote>
<p>If the mimicking portfolio closely resembles [the selected] asset, then the [tracking error variance] will be low, and [the selected] asset is redundant and should be avoided to avoid estimation error sensitivity. However, if the [tracking error variance] for [the selected] asset is high, then [that] asset clearly showcases distinctness in one or more moments and can potentially be a valuable asset to consider for the portfolio.</p>
</blockquote>
</li>
</ul>
<p>In practice, Berns<sup id="fnref:13:4" role="doc-noteref"><a href="#fn:13" class="footnote">32</a></sup> suggests to use a tracking error variance of around 2% as <em>a reasonable level below which to start flagging assets for potential removal due to redundancy</em><sup id="fnref:13:5" role="doc-noteref"><a href="#fn:13" class="footnote">32</a></sup>.</p>
<p>Berns<sup id="fnref:13:6" role="doc-noteref"><a href="#fn:13" class="footnote">32</a></sup> illustrates his proposed methodology using a portfolio made of intermediate-term Treasuries, intermediate-term investment grade bonds, and high-yield bonds.
The mimicking portfolio for each asset is computed from the two other assets and the tracking error variance - called <em>mimicking
portfolio tracking error (MPTE)</em> - between the mimicking portfolio and the asset under consideration is computed as in Figure 6, adapted from Berns<sup id="fnref:13:7" role="doc-noteref"><a href="#fn:13" class="footnote">32</a></sup>:</p>
<figure>
<a href="/assets/images/blog/index-tracking-bonds-mpte.png"><img src="/assets/images/blog/index-tracking-bonds-mpte.png" alt="Tracking error variance of fixed income mutual funds v.s. associated tracking portfolios, January 1994 - February 2019. Source: Berns." /></a>
<figcaption>Figure 6. Tracking error variance of fixed income mutual funds v.s. associated tracking portfolios, January 1994 - February 2019. Source: Berns.</figcaption>
</figure>
<p>For this specific portfolio, intermediate-term investment grade bonds are flagged as the most redundant asset for reasons detailed in Berns<sup id="fnref:13:8" role="doc-noteref"><a href="#fn:13" class="footnote">32</a></sup>, but there is no real redundancy between the three assets<sup id="fnref:36" role="doc-noteref"><a href="#fn:36" class="footnote">33</a></sup>.</p>
<p>To be noted that other approaches for the problem of mean-variance optimization sensitivity to estimation error exist - like the computation of
<a href="/blog/mean-variance-optimization-in-practice-subset-resampling-based-efficient-portfolios/">subset resampling-based efficient portfolios</a> or the usage of the <em>nested clustered optimization algorithm (NCO)</em><sup id="fnref:27" role="doc-noteref"><a href="#fn:27" class="footnote">34</a></sup> -
but these alternatives are usually much more mathematically involved than Berns’ <em>modern yet practical</em><sup id="fnref:13:9" role="doc-noteref"><a href="#fn:13" class="footnote">32</a></sup> approach.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Index tracking is a framework which deserves a place in one’s quantitative arsenal thanks to its versatility, as hopefully shown in this blog post.</p>
<p>Next in this series, I will discuss extensions of the basic index tracking problem, like for example methods to construct an index tracking portfolio with a limit on the number of assets it contains.</p>
<p>In the meantime, feel free to track me <a href="https://www.linkedin.com/in/roman-rubsamen/">on LinkedIn</a> or <a href="https://twitter.com/portfoliooptim">on Twitter</a>.</p>
<p>–</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Also called a <em>benchmark replicating portfolio</em>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:28" role="doc-endnote">
<p>The term <em>closely</em> is loosely defined at this point. <a href="#fnref:28" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:29" role="doc-endnote">
<p>From its constituent weights. <a href="#fnref:29" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>See <a href="https://ieeexplore.ieee.org/document/8384194">Konstantinos Benidis; Yiyong Feng; Daniel P. Palomar, Optimization Methods for Financial Index Tracking: From Theory to Practice , now, 2018.</a>. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:4:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:4:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:4:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:4:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>See <a href="https://link.springer.com/chapter/10.1007/978-3-642-86706-4_7">Hallerbach, W.G. (1994). Index Tracking: Some Techniques and Results. In: Peccati, L., Virén, M. (eds) Financial Modelling. Contributions to Management Science. Physica-Verlag HD</a>. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:2:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:2:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:2:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a></p>
</li>
<li id="fn:31" role="doc-endnote">
<p>Or a solution, in case multiple solutions exist. <a href="#fnref:31" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:31:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:31:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/abs/pii/S0377221702004253">J.E. Beasley, N. Meade, T.-J. Chang, An evolutionary heuristic for the index tracking problem, European Journal of Operational Research, Volume 148, Issue 3, 2003, Pages 621-643</a>. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>That is, if $r_{tracking, t} = r_{idx, t} + c$, $t=1..T$, with $c$ a constant. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>See <a href="https://econpapers.repec.org/paper/zbwfsfmwp/164.htm">Rossbach, Peter and Karlow, Denis, (2011), The stability of traditional measures of index tracking quality, No 164, Frankfurt School - Working Paper Series, Frankfurt School of Finance and Management.</a>. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:6:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:6:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:6:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:6:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/abs/pii/S0378426698000764">Markus Rudolf, Hans-Jurgen Wolter, Heinz Zimmermann, A linear model for tracking error minimization, Journal of Banking & Finance, Volume 23, Issue 1, 1999, Pages 85-103</a>. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>The empirical tracking error is not without issues, though; for example, it penalizes similarly both positive and negative deviations from the index. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>See Sharpe, William F, Determining a Fund’s Effective Asset Mix. Investment Management Review, December 1988, pp. 59-69. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>See <a href="https://www.pm-research.com/content/iijpormgmt/18/2/7">William F. Sharpe, Asset allocation, Management style and performance measurement, The Journal of Portfolio Management, Winter 1992, 18 (2) 7-19</a>. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:3:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:3:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:3:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a></p>
</li>
<li id="fn:32" role="doc-endnote">
<p>FR0011869353 <a href="#fnref:32" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:33" role="doc-endnote">
<p>FR0011440478 <a href="#fnref:33" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:11" role="doc-endnote">
<p>At the date of the initial publication of this blog post. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:11:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:35" role="doc-endnote">
<p>The rationale is that by construction, all constituents of the MSCI ACWI Index belong either to the MSCI World Index or to the MSCI EM Index. <a href="#fnref:35" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:34" role="doc-endnote">
<p>LU1829220216 <a href="#fnref:34" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:34:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:16" role="doc-endnote">
<p>Of course, this example of usage is very close to a toy example, but it already illustrates the potential of index tracking based on returns data. <a href="#fnref:16" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:14" role="doc-endnote">
<p>See <a href="https://www.jonathanwallentine.com/">Jonathan Wallentine, Cloning Wall Street, Actuarial Management Company</a>. <a href="#fnref:14" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:14:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:14:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:14:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:14:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a> <a href="#fnref:14:5" class="reversefootnote" role="doc-backlink">↩<sup>6</sup></a> <a href="#fnref:14:6" class="reversefootnote" role="doc-backlink">↩<sup>7</sup></a> <a href="#fnref:14:7" class="reversefootnote" role="doc-backlink">↩<sup>8</sup></a> <a href="#fnref:14:8" class="reversefootnote" role="doc-backlink">↩<sup>9</sup></a> <a href="#fnref:14:9" class="reversefootnote" role="doc-backlink">↩<sup>10</sup></a></p>
</li>
<li id="fn:23" role="doc-endnote">
<p>Further details are provided in Wallentine<sup id="fnref:14:10" role="doc-noteref"><a href="#fn:14" class="footnote">20</a></sup>. <a href="#fnref:23" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:15" role="doc-endnote">
<p>Some of these mutual funds do not exist anymore at the date of publication of this post. <a href="#fnref:15" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:17" role="doc-endnote">
<p>As noted in Wallentine<sup id="fnref:14:11" role="doc-noteref"><a href="#fn:14" class="footnote">20</a></sup>, <em>is likely the result of high mutual fund expense ratios in her portfolio (i.e. it is not due to any secret alpha generated by [the index tracking portfolio itself]</em><sup id="fnref:14:12" role="doc-noteref"><a href="#fn:14" class="footnote">20</a></sup>. <a href="#fnref:17" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:18" role="doc-endnote">
<p>One caveat, though, is that the index tracking portfolio needs to be purchased and then regularly re-balanced; also, as mentioned in Wallentine<sup id="fnref:14:13" role="doc-noteref"><a href="#fn:14" class="footnote">20</a></sup>, your management fees as advisor should of course be be taken into account. <a href="#fnref:18" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:19" role="doc-endnote">
<p>Or more generally, a benchmarka for any given asset. <a href="#fnref:19" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:21" role="doc-endnote">
<p>C.f. <a href="https://seietfs.filepoint.live/assets/pdfs/SELV_FactSheet.pdf">SELV Fact Sheet / 31th July 2023</a>. <a href="#fnref:21" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:21:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:22" role="doc-endnote">
<p>While not visible on Figure 5, the portfolio made of 13% IWB ETF / 87% USMV ETF has a lower empirical tracking error against the SELV ETF than the IWB ETF. <a href="#fnref:22" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:20" role="doc-endnote">
<p>See <a href="https://insights.finominal.com/research-determining-the-optimal-benchmark-for-funds/">Determining the Optimal Benchmark for Funds</a>. <a href="#fnref:20" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:25" role="doc-endnote">
<p>See <a href="https://link.springer.com/article/10.1007/BF02282040">Broadie, M. Computing efficient frontiers using estimated parameters. Ann Oper Res 45, 21–58 (1993)</a>. <a href="#fnref:25" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:24" role="doc-endnote">
<p>See <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2387669">Michaud, Richard O., The Markowitz Optimization Enigma: Is ‘Optimized’ Optimal? (1989). Financial Analysts Journal, 1989</a>. <a href="#fnref:24" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:26" role="doc-endnote">
<p>See <a href="https://www.wiley.com/en-br/Asset+Allocation:+From+Theory+to+Practice+and+Beyond-p-9781119817710">William Kinlaw, Mark P. Kritzman, David Turkington, Asset Allocation: From Theory to Practice and Beyond, Wiley Finance</a>. <a href="#fnref:26" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:26:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:13" role="doc-endnote">
<p>See <a href="https://www.wiley.com/en-us/Modern+Asset+Allocation+for+Wealth+Management-p-9781119566946">David M. Berns, Modern Asset Allocation for Wealth Management, Wiley Finance, 2020</a>. <a href="#fnref:13" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:13:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:13:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:13:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:13:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a> <a href="#fnref:13:5" class="reversefootnote" role="doc-backlink">↩<sup>6</sup></a> <a href="#fnref:13:6" class="reversefootnote" role="doc-backlink">↩<sup>7</sup></a> <a href="#fnref:13:7" class="reversefootnote" role="doc-backlink">↩<sup>8</sup></a> <a href="#fnref:13:8" class="reversefootnote" role="doc-backlink">↩<sup>9</sup></a> <a href="#fnref:13:9" class="reversefootnote" role="doc-backlink">↩<sup>10</sup></a></p>
</li>
<li id="fn:36" role="doc-endnote">
<p>And no asset has a mimicking portfolio tracking error greater than 2%. <a href="#fnref:36" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:27" role="doc-endnote">
<p>See <a href="https://ssrn.com/abstract=3469961">Lopez de Prado, Marcos, A Robust Estimator of the Efficient Frontier (October 15, 2016)</a>. <a href="#fnref:27" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Roman R.An index tracking portfolio1 is a portfolio designed to track as closely2 as possible a financial market index when its exact replication3 is either impractical or impossible due to various reasons4 (transaction costs, liquidity issues, licensing requirements…). In this blog post, after reviewing the underlying mathematics described in Hallerbach5, I will review a couple of applications of index tracking beyond “pure” index tracking, like helping financial advisors to reduce their customers investment fees or minimizing the sensitivity of mean-variance optimization to estimation error. Notes: A Google Sheet corresponding to this post is available here Mathematical preliminaries The general index tracking optimization problem Let be: $T$, a number of time periods $r_{idx} = \left( r_{idx, 1}, …, r_{idx, T} \right) \in \mathcal{R}^{T}$, the vector of the index arithmetic returns over each of the $T$ time periods $r_{tracking} = \left( r_{tracking, 1}, …, r_{tracking, T} \right) \in \mathcal{R}^{T}$, the vector of the (yet to be computed) index tracking portfolio arithmetic returns over each of the $T$ time periods Then: For a given time period $t$, the difference between the index return $r_{idx, t}$ and the index tracking portfolio return $r_{tracking, t}$ is defined as the tracking error $TE_t$: \[TE_t = r_{tracking, t} - r_{idx, t}\] For all time periods $t=1..T$, the vector of the tracking errors $TE_t$ is defined as the tracking error vector $TE \in \mathcal{R}^{T}$: \[TE = \left( r_{tracking, 1} - r_{idx, 1}, ..., r_{tracking, T} - r_{idx, T}\right)\] Now, let be: $n$, the number of assets in the universe of the index tracking portfolio $w = \left( w_1,…,w_n \right) \in \mathcal{R}^{n} $ the vector of the weights of a portfolio in each of the $n$ assets $r_i = \left( r_{i, 1}, …, r_{i, T} \right) \in \mathcal{R}^{T}$, the vector of the asset $i$ arithmetic returns over each of the $T$ time periods, $i = 1..n$ Because arithmetic returns can be aggregated across assets, we have for each time period $t$: \[r_{tracking, t} \left( w \right) = \sum_{i=1}^n w_i r_{i, t}\] So that: \[TE_t \left( w \right) = \sum_{i=1}^n w_i r_{i, t} - r_{idx, t}\] and: \[TE \left( w \right) = X w - r_{idx}\] , with $X \in \mathcal{R}^{T \times n}$ the matrix of the $n$ assets arithmetic returns over each of the $T$ time periods. The vector of the index tracking portfolio weights $w^$ is then the6 solution to the general index tracking optimization problem which consists in minimizing some loss function $f$ of the tracking error vector $TE$ - called *tracking error measure - subject to additional constraints like full investment constraint, no short sale constraint, etc.5 \[w^* = \operatorname{argmin} f(TE \left( w \right) ) \newline \textrm{s.t. } \begin{cases} \sum_{i=1}^{n} w_i = 1 \newline 0 \leqslant w_i \leqslant 1, i = 1..n \newline ... \end{cases}\] Tracking error measures A common tracking error measure is the empirical tracking error (ETE)4, also called the mean squared tracking error (MSTE), defined as the sum of the squared tracking errors $TE_t$, $t=1..T$ \[ETE(w) = \frac{1}{T} \lVert X w - r_{idx} \rVert_2^2\] or, equivalently, the root mean square tracking error (RMSTE)5, defined by \[RMSTE(w) = \sqrt{\frac{1}{T} \lVert X w - r_{idx} \rVert_2^2}\] Another common tracking error measure is the tracking error variance (TEV)5, defined as the variance of the tracking errors $TE_t$, $t=1..T$ \[TEV(w) = Var( X w - r_b )\] However, as noted in Beasley et al.7, the usage of the tracking error variance is problematic due to its insensitivity to bias in the tracking errors $TE_t$. Indeed, if the index tracking portfolio returns $r_{tracking}$ are systematically higher or lower than the index returns $r_{idx}$8, the tracking error variance is null although the index tracking portfolio systematically deviates from its index! This limitation is illustrated in Figure 1, adapted from Rossbach and Karlow9, in which it is clearly visible that portfolio 1 has a worse tracking quality than portfolio 2 but the TEV of portfolio 1 is equal to zero9. Figure 1. Insensitivity of the tracking error variance to bias. Source: Rossbach and Karlow. In addition to the empirical tracking error and to the tracking error variance, other tracking error measures also exist like the mean absolute deviation of the tracking errors9, the correlation between the index tracking portfolio returns $r_{tracking}$ and the index returns $r_{idx}$5, the downside risk4, etc., but they will not be detailed in this blog post. In all cases, it is important to note that different tracking error measures lead to different solutions to the general index tracking optimization problem - that is, different index tracking portfolios - with different properties10. The index tracking/empirical tracking error optimization problem Due to the limitation of the tracking error variance highlighted in the previous sub-section, this blog post will use the empirical tracking error as the preferred tracking error measure11. In this case, the general index tracking optimization problem becomes a constrained regression problem in which the tracking errors $TE_t$, $t = 1..T$ represent the error terms: \[w^* = \operatorname{argmin} \frac{1}{T} \lVert X w - r_{idx} \rVert_2^2 \newline \textrm{s.t. } \begin{cases} \sum_{i=1}^{n} w_i = 1 \newline 0 \leqslant w_i \leqslant 1, i = 1..n \newline ... \end{cases}\] From this perspective, and although index tracking is driven from the financial industry, it is in fact a pure signal processing problem4 that can be solved with standard quadratic optimization algorithms. Caveats The index tracking optimization problem as presented in the previous sub-sections relies on a couple of assumptions that are best to keep in mind: The6 solution to the index tracking optimization problem is the portfolio with the lowest in-sample tracking error measure, while out-of-sample performances are ultimately what matters Rossbach and Karlow9 study the stability of standard tracking error measures, including the empirical tracking error and the tracking error variance, and conclude that9 The results indicate a poor stability for every of the […] traditional measures. Furthermore, the relation to the ex post tracking quality is weak. Nevertheless, this problem can be attenuated by regularly re-computing an index tracking portfolio thanks to a rolling window of recent past returns, c.f. Benidis4. The6 solution to the index tracking optimization problem has fixed asset weights, while a certain level of drift is always present in practice To make the underlying optimization problem tractable, the index tracking portfolio is supposed to be constantly rebalanced at each in-sample time period $t=1..T$. This assumption is likely to be violated in practice, so that evaluating the ex-post impact of other rebalancing strategies (buy and hold, etc.) is important. Relationship with Sharpe’s returns-based style analysis Returns-based style analysis (RBSA) is a methodology initially introduced by Sharpe1213 to evaluate the composition of a mutual fund in absence of detailed holdings information. In details, returns-based style analysis consists in determining the combination of major asset classes - represented by market indices - which most closely replicate the actual performances of a mutual fund over a given time period. Unsurprisingly, the underlying mathematical formulation is actually equivalent to that of the index tracking/tracking error variance optimization problem13, with a specific vocabulary: The index tracking portfolio is called the mutual fund style benchmark The index tracking portfolio weights vector $w$ is called the mutual fund effective asset mix For a given time period $t$ The index tracking portfolio return $r_{tracking, t}$ is interpreted as the mutual fund return attributable to style, that is, the mutual fund return originating from its passive exposure to the different asset classes The tracking error $TE_t$ is interpreted as the mutual fund return attributable to selection, that is, the mutual fund return originating from the skill of the portfolio manager(s) Relationship with factor analysis Sharpe13 highlights that returns-based style analysis is a specific type of factor analysis An asset class factor model can be considered a special case of the generic type. In such a model each factor represents the return on an asset class, and the sensitivities ([beta_i] values) are required to sum to 1 (100%). Similarly, from the previous sub-section, index tracking can also be considered as a member of the more general family of methods for portfolio factor analysis. Implementation in Portfolio Optimizer Portfolio Optimizer implements three functionalities related to index tracking as presented in the previous section: The computation of the empirical tracking error between a portfolio and an index, through the endpoint /portfolios/analysis/empirical-tracking-error The computation of the tracking error variance between a portfolio and an index, through the endpoint /portfolios/analysis/tracking-error-variance The computation of the index tracking portfolio solution to the index tracking optimization problem under the empirical tracking error measure, through the endpoint /portfolios/replication/index-tracking Examples of usage Tracking an index when no ETF is available I propose to illustrate the “vanilla” usage of an index tracking portfolio when no ETF associated to an index is available due to regulatory reasons. For the context, residents of France have access to a tax-efficient investment wrapper called Plan d’Epargne en Actions (PEA). A limitation of this investment wrapper, though, is that non-European financial assets cannot be sheltered under it - like global ETFs (MSCI World ETFs, etc.) - unless they have specifically been issued as PEA-compatible. For example: Investing in an ETF tracking the MSCI World Index is allowed within a PEA thanks for example to the Lyxor PEA Monde (MSCI World) ETF Capi ETF14 Similarly, investing in an ETF tracking the MSCI Emerging Markets (EM) Index is allowed within a PEA thanks for example to the Amundi PEA MSCI Emg EMEA ESG Lead UE Acc ETF15 Yet, investing in an ETF tracking the MSCI All Country World (ACWI) Index is NOT allowed within a PEA, because no PEA-compatible ETF which tracks that index has been issued16 Now, for the sake of the exercise, let’s suppose a French DIY investor would like to track the MSCI ACWI Index within her PEA. As mentioned above, this cannot be done directly since there is no PEA-compatible ETF which tracks this index. Nevertheless, this can be done indirectly17 through an adequate combination of a PEA-compatible MSCI World ETF and a PEA-compatible MSCI EM ETF! Indeed, Figure 2 compares the evolution, over the period 31 December 2019 - 30 September 2023, of: The PEA-incompatible Lyxor MSCI All Country World UCITS ETF - Acc18 The PEA-compatible portfolio invested 86% in the Lyxor PEA Monde (MSCI World) ETF Capi ETF and 14% in the Amundi PEA MSCI Emg EMEA ESG Lead UE Acc ETF Figure 2. PEA-incompatible MSCI ACWI ETF v.s. PEA-compatible MSCI ACWI Index tracking portfolio, 31 December 2019 - 30 September 2023 On this figure, it is visible that the tracking error between the two portfolios is almost null, which confirms that building a custom MSCI ACWI ETF inside a PEA is perfectly feasible19! Question is, how have the weights of this PEA-compatible MSCI ACWI Index tracking portfolio been computed? Well, it turns out that the problem at hand is an index tracking problem, with: Index - The PEA-incompatible Lyxor MSCI All Country World UCITS ETF - Acc18 Tracking assets - The two PEA-compatible ETFs Lyxor PEA Monde (MSCI World) ETF Capi and Amundi PEA MSCI Emg EMEA ESG Lead UE Acc Thus, using the monthly returns of these ETFs over the period 31 December 2019 - 30 September 2023, it is possible to compute the associated index tracking portfolio, which results in the PEA-compatible portfolio above. Reducing the fees of a mutual funds portfolio In the book Cloning Wall Street20, Jonathan Wallentine details how financial advisors can use index tracking portfolios - called replicating portfolios20 - to reduce portfolio fees for their clients. As an example from that book21, let’s suppose you are a financial advisor living in 2016 and a prospective client named Jessica comes to see you in order to decrease her mutual funds portfolio annual fees. Jessica’s portfolio is made of the mutual funds listed22 in Figure 3, taken from Wallentine20. Figure 3. Jessica's portfolio constituents. Source: Wallentine. After some thinking20, you conclude that Jessica’s mutual funds portfolio could probably be replaced by a portfolio made of the following low-cost ETFs: U.S. short term Treasuries (SHY ETF) U.S. aggregate bonds (AGG ETF) U.S. big caps stocks (SPY ETF) U.S. small caps stocks (IWM ETF) International stocks (EFA ETF) Problem is, you don’t have a clue about the proportion to invest in each of these ETFs to properly replace Jessica’s portfolio… What to do? Here again, the problem at hand is actually an index tracking problem in disguise, with: Index - Jessica’s mutual funds portfolio Tracking assets - The five selected ETFs SHY, AGG, SPY, IWM and EFA Thus, using the monthly returns of Jessica’s portfolio and of these five ETFs over the past period 2011 - 2015, it is possible to compute20 Jessica’s tracking portfolio weights: 5% in the SHY ETF 13% in the AGG ETF 54% in the SPY ETF 6% in the IWM ETF 22% in the EFA ETF Figure 4, adapted from Wallentine20, depicts the evolution of Jessica’s original portfolio and of Jessica’s tracking portfolio. Figure 4. Jessica's original portfolio v.s. Jessica's tracking portfolio, 2011 - 2015. Source: Wallentine. In addition to confirming a very small tacking error between the two portfolios, Figure 4 also highlights that Jessica’s tracking portfolio actually outperformed Jessica’s portfolio by 70bps annually2023. Now, to come back on Jessica’s original request: Jessica’s mutual funds portfolio has an average expense ratio of 1.13%, which leads to total annual fees of around $11,62320 Your alternative low-cost ETFs portfolio has an average expense ratio of 0.06%, which leads to total annual fees of around $61720 In other words, your alternative portfolio - which is for all intent and purposes equivalent to Jessica’s current portfolio - leads to a 95% decrease in annual fees24. Jessica is delighted! To conclude on this example, the results outlined in this [sub-]section should give readers confidence that [an index tracking portfolio] can successfully replicate a variety of different mutual fund portfolios from large fund companies20. Automatically determining the proper benchmark for a mutual fund In his 1992 paper, Sharpe notes that13 Style analysis provides a natural method for constructing benchmarks [for performance measurement]. The return obtained by a fund in each month can be compared with the return on a mix of asset classes with the same estimated style, where the style is estimated prior to the month in question. Due to the one-to-one relationship between Sharpe’s style analysis and index tracking, the index tracking machinery then allows to easily and automatically construct a benchmark for any mutual fund25. As an illustration, I propose to revisit a tweet from Nicolas Rabener - the founder and CEO of Finominal - in which an interesting discrepancy is reported between: The SEI Enhanced Low Volatility U.S. Large Cap (SELV) ETF The self-reported benchmark26 for this ETF - the Russell 1000 Index For this, Figure 5 compares, over the period 01 January 2023 - 26 October 2023, the evolution of: The SELV ETF The iShares Russell 1000 (IWB) ETF, representative of the self-reported benchmark for the SELV ETF - the Russel 1000 Index The iShares MSCI USA Min Vol Factor (USMV) ETF, representative of the benchmark for the SELV ETF automatically determined by Finominal - the MSCI USA Minimum Volatility Index A portfolio made of 13% IWB ETF / 87% USMV ETF, representative of the benchmark for the SELV ETF automatically determined by Portfolio Optimizer using Index - The SELV ETF Tracking assets - The IWB and the USMV ETFs Figure 5. SELV ETF v.s. misc. other ETFs and optimal benchmark, 01 January 2023 - 26 October 2023. Figure 5 empirically demonstrates that the Russell 1000 Index is a very bad choice of benchmark for the SELV ETF and that a more realistic benchmark would consist either of: The MSCI USA Minimum Volatility Index, if only one index can be chosen The portfolio made of 13% Russel 1000 Index / 87% MSCI USA Minimum Volatility Index, if a composite index can be chosen27 These two alternative benchmarks are consistent with the SELV ETF fact sheet, which states that26 The SEI Enhanced Low Volatility U.S. Large Cap ETF seeks to provide long-term capital appreciation by investing primarily in U.S. common stocks, while aiming to experience lower volatility compared to the broad U.S. large cap equity market. Coincidentally, Nicolas Rabener recently16 blogged about different metrics that can help selecting an optimal benchmark for a fund in his article Determining the Optimal Benchmark for Funds. Among these metrics, the tracking error is found to be the best performing one and results in nine selections [out of ten] that seem reasonable28. As a side note, I wholeheartedly recommend readers interested in the topic of automatic benchmark selection to dig into the associated Finominal’s research blog posts. Minimizing the sensitivity of mean-variance optimization to estimation error A stylized fact of Markowitz’s mean-variance framework is the sensitivity of efficient portfolios to estimation error in asset returns and (co)variances29, which explains why efficient portfolios are sometimes labeled estimation-error maximizers30. This issue is thoroughly analyzed, both theoretically and empirically, by Kinlaw et al.31 who conclude that [mean-variance optimization] only appears to be hypersensitive when asset classes are close substitutes for each other, because the optimal weights may shift significantly in response to input errors. In other words, Kinlaw et al.31 establish that it is possible to minimize the sensitivity of mean-variance optimization to estimation error by ensuring that assets are as “distinct” as possible. A practical method for doing so is proposed in David Berns’ book Modern Asset Allocation for Wealth Management32. In details, Berns’ methodology is a two-step procedure allowing to determine whether an asset in a portfolio is redundant with the other assets also present in that portfolio: Step 1 - Compute an index tracking portfolio - called the mimicking portfolio32 - using: Index - The selected asset Tracking assets - The other assets present in the portfolio Step 2 - Assess the redundancy of the selected asset by looking at the tracking error [variance] between the mimicking portfolio and the asset32, because32 If the mimicking portfolio closely resembles [the selected] asset, then the [tracking error variance] will be low, and [the selected] asset is redundant and should be avoided to avoid estimation error sensitivity. However, if the [tracking error variance] for [the selected] asset is high, then [that] asset clearly showcases distinctness in one or more moments and can potentially be a valuable asset to consider for the portfolio. In practice, Berns32 suggests to use a tracking error variance of around 2% as a reasonable level below which to start flagging assets for potential removal due to redundancy32. Berns32 illustrates his proposed methodology using a portfolio made of intermediate-term Treasuries, intermediate-term investment grade bonds, and high-yield bonds. The mimicking portfolio for each asset is computed from the two other assets and the tracking error variance - called mimicking portfolio tracking error (MPTE) - between the mimicking portfolio and the asset under consideration is computed as in Figure 6, adapted from Berns32: Figure 6. Tracking error variance of fixed income mutual funds v.s. associated tracking portfolios, January 1994 - February 2019. Source: Berns. For this specific portfolio, intermediate-term investment grade bonds are flagged as the most redundant asset for reasons detailed in Berns32, but there is no real redundancy between the three assets33. To be noted that other approaches for the problem of mean-variance optimization sensitivity to estimation error exist - like the computation of subset resampling-based efficient portfolios or the usage of the nested clustered optimization algorithm (NCO)34 - but these alternatives are usually much more mathematically involved than Berns’ modern yet practical32 approach. Conclusion Index tracking is a framework which deserves a place in one’s quantitative arsenal thanks to its versatility, as hopefully shown in this blog post. Next in this series, I will discuss extensions of the basic index tracking problem, like for example methods to construct an index tracking portfolio with a limit on the number of assets it contains. In the meantime, feel free to track me on LinkedIn or on Twitter. – Also called a benchmark replicating portfolio. ↩ The term closely is loosely defined at this point. ↩ From its constituent weights. ↩ See Konstantinos Benidis; Yiyong Feng; Daniel P. Palomar, Optimization Methods for Financial Index Tracking: From Theory to Practice , now, 2018.. ↩ ↩2 ↩3 ↩4 ↩5 See Hallerbach, W.G. (1994). Index Tracking: Some Techniques and Results. In: Peccati, L., Virén, M. (eds) Financial Modelling. Contributions to Management Science. Physica-Verlag HD. ↩ ↩2 ↩3 ↩4 ↩5 Or a solution, in case multiple solutions exist. ↩ ↩2 ↩3 See J.E. Beasley, N. Meade, T.-J. Chang, An evolutionary heuristic for the index tracking problem, European Journal of Operational Research, Volume 148, Issue 3, 2003, Pages 621-643. ↩ That is, if $r_{tracking, t} = r_{idx, t} + c$, $t=1..T$, with $c$ a constant. ↩ See Rossbach, Peter and Karlow, Denis, (2011), The stability of traditional measures of index tracking quality, No 164, Frankfurt School - Working Paper Series, Frankfurt School of Finance and Management.. ↩ ↩2 ↩3 ↩4 ↩5 See Markus Rudolf, Hans-Jurgen Wolter, Heinz Zimmermann, A linear model for tracking error minimization, Journal of Banking & Finance, Volume 23, Issue 1, 1999, Pages 85-103. ↩ The empirical tracking error is not without issues, though; for example, it penalizes similarly both positive and negative deviations from the index. ↩ See Sharpe, William F, Determining a Fund’s Effective Asset Mix. Investment Management Review, December 1988, pp. 59-69. ↩ See William F. Sharpe, Asset allocation, Management style and performance measurement, The Journal of Portfolio Management, Winter 1992, 18 (2) 7-19. ↩ ↩2 ↩3 ↩4 FR0011869353 ↩ FR0011440478 ↩ At the date of the initial publication of this blog post. ↩ ↩2 The rationale is that by construction, all constituents of the MSCI ACWI Index belong either to the MSCI World Index or to the MSCI EM Index. ↩ LU1829220216 ↩ ↩2 Of course, this example of usage is very close to a toy example, but it already illustrates the potential of index tracking based on returns data. ↩ See Jonathan Wallentine, Cloning Wall Street, Actuarial Management Company. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 Further details are provided in Wallentine20. ↩ Some of these mutual funds do not exist anymore at the date of publication of this post. ↩ As noted in Wallentine20, is likely the result of high mutual fund expense ratios in her portfolio (i.e. it is not due to any secret alpha generated by [the index tracking portfolio itself]20. ↩ One caveat, though, is that the index tracking portfolio needs to be purchased and then regularly re-balanced; also, as mentioned in Wallentine20, your management fees as advisor should of course be be taken into account. ↩ Or more generally, a benchmarka for any given asset. ↩ C.f. SELV Fact Sheet / 31th July 2023. ↩ ↩2 While not visible on Figure 5, the portfolio made of 13% IWB ETF / 87% USMV ETF has a lower empirical tracking error against the SELV ETF than the IWB ETF. ↩ See Determining the Optimal Benchmark for Funds. ↩ See Broadie, M. Computing efficient frontiers using estimated parameters. Ann Oper Res 45, 21–58 (1993). ↩ See Michaud, Richard O., The Markowitz Optimization Enigma: Is ‘Optimized’ Optimal? (1989). Financial Analysts Journal, 1989. ↩ See William Kinlaw, Mark P. Kritzman, David Turkington, Asset Allocation: From Theory to Practice and Beyond, Wiley Finance. ↩ ↩2 See David M. Berns, Modern Asset Allocation for Wealth Management, Wiley Finance, 2020. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 And no asset has a mimicking portfolio tracking error greater than 2%. ↩ See Lopez de Prado, Marcos, A Robust Estimator of the Efficient Frontier (October 15, 2016). ↩Volatility Forecasting: Simple and Exponentially Weighted Moving Average Models2023-10-19T00:00:00-05:002023-10-19T00:00:00-05:00https://portfoliooptimizer.io/blog/volatility-forecasting-simple-and-exponentially-weighted-moving-average-models<p>One of the simplest and most pragmatic approach to volatility forecasting is to model the volatility of an asset as a <a href="https://en.wikipedia.org/wiki/Moving_average">weighted moving average</a>
of its past squared returns<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup>.</p>
<p>Two weighting schemes widely used by practitioners<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote">2</a></sup><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup> are the constant weighting scheme and the exponentially decreasing weighting scheme, leading respectively to the
the <em>simple moving average volatility forecasting model</em> and to the <em>exponentially weighted moving average volatility forecasting model</em>.</p>
<p>In this blog post, I will detail these two models and I will illustrate how they can be used for monthly and daily volatility forecasting.</p>
<h2 id="mathematical-preliminaries">Mathematical preliminaries</h2>
<h3 id="volatility-modelling">Volatility modelling</h3>
<p>Let $r_t$ be the (<a href="https://en.wikipedia.org/wiki/Rate_of_return#Logarithmic_or_continuously_compounded_return">logarithmic</a>) return of an asset over a time period $t$.</p>
<p>In all generality, $r_t$ can be expressed as<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote">4</a></sup></p>
\[r_t = \mu_t + \epsilon_t\]
<p>, where:</p>
<ul>
<li>$\mu_t = \mathbb{E} \left[ r_t \right]$ is a predictable quantity representing the (conditional) asset mean return over the time period $t$</li>
<li>$\epsilon_t = r_t - \mathbb{E} \left[ r_t \right]$ is an unpredictable error term, often referred to as a “shock”, over the time period $t$</li>
</ul>
<p>The asset (conditional) variance, $\sigma_t^2$, is then defined by</p>
\[\begin{aligned}
\sigma_t^2 &= Var \left[ r_t \right] \\
&= \mathbb{E} \left[ r_t^2 \right] - \mathbb{E} \left[ r_t \right]^2 \\
&= \mathbb{E} \left[ r_t^2 \right] - \mu_t^2 \\
\end{aligned}\]
<p>, and the asset (conditional) volatility, $\sigma_t$, by the square root of the asset variance.</p>
<p>From this general model for asset returns, it is possible to derive different models for the asset volatility depending on working assumptions.</p>
<p>As an example, in a <a href="/blog/range-based-volatility-estimators-overview-and-examples-of-usage/">previous blog post on range-based volatility estimators</a>, the main working assumption
is that the prices of the asset follow a <a href="https://en.wikipedia.org/wiki/Geometric_Brownian_motion">geometric Brownian motion</a> with constant volatility coefficient $\sigma$ and constant drift coefficient $\mu$.
This assumption implies<sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote">5</a></sup> that the asset returns $r_t$ are all <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables">i.i.d.</a> random variables of normal distribution
$\mathcal{N} \left( \mu - \frac{1}{2} \sigma^2, \sigma^2 \right)$ and that the asset volatility is equal to the volatility coefficient of the geometric Brownian motion.</p>
<p>In this blog post, the main working assumption will be that $\mu_t = 0$.</p>
<p>Such an assumption is standard when working with daily asset returns, with Fischer Black already using it in the early days of option pricing theory<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote">6</a></sup>:</p>
<blockquote>
<p>The new data is a set of volatility estimates On all the option stocks based on about one month of daily returns. One month of daily returns in a typical month is 21 data points. For each stock, I square the returns, take the average of the squares, and then take the square root. I don’t subtract the average return before squaring, because a monthly average return isn’t a good estimate of the long run average return. <strong>Zero is a better estimate.</strong></p>
</blockquote>
<p>But such an assumption is not standard when working with lower frequency data like weekly or monthly asset returns, even though it has
been empirically demonstrated to be justified<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote">7</a></sup>.</p>
<h3 id="variance-proxies">Variance proxies</h3>
<p>Under the working assumption of the previous sub-section, the asset variance $\sigma_t^2$ becomes equal to:</p>
\[\sigma_t^2 = \mathbb{E} \left[ r_t^2 \right]\]
<p>As a consequence, the squared return $r_t^2$ of an asset over a time period $t$ (a day, a week, a month..) is a <em>variance estimator</em><sup id="fnref:14" role="doc-noteref"><a href="#fn:14" class="footnote">8</a></sup> - or <em>variance proxy</em><sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote">2</a></sup> -
for that asset variance over the considered time period.</p>
<p>However, <em>it has long been known that squared returns are a rather noisy proxy for the true conditional variance</em><sup id="fnref:1:2" role="doc-noteref"><a href="#fn:1" class="footnote">2</a></sup>, so that more accurate estimates have been
proposed in the literature over the years.</p>
<p>Among them, <a href="/blog/range-based-volatility-estimators-overview-and-examples-of-usage/">the Parkinson volatility estimator</a>
has in particular been found to be <em>theoretically, numerically, and empirically</em><sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote">9</a></sup> superior as a variance proxy to squared returns, even if its usage theoretically requires
that asset prices follow a driftless geometric Brownian motion with constant volatility<sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup> and not only that the asset mean return is zero<sup id="fnref:23" role="doc-noteref"><a href="#fn:23" class="footnote">10</a></sup>.</p>
<p>The Parkinson range of an asset, as a variance proxy, is defined over a time period $t$ by<sup id="fnref:1:3" role="doc-noteref"><a href="#fn:1" class="footnote">2</a></sup>:</p>
\[\tilde{\sigma}_{P,t}^2 = \frac{1}{4 \ln 2} \left( \ln \frac{H_t}{L_t} \right) ^2\]
<p>, where:</p>
<ul>
<li>$H_t$ is the asset highest price over the time period $t$</li>
<li>$L_t$ is the asset lowest price over the time period $t$</li>
</ul>
<p>Additionally, because the Parkinson range does not <em>account for the period during which the market is closed</em><sup id="fnref:27" role="doc-noteref"><a href="#fn:27" class="footnote">11</a></sup>, the jump-adjusted Parkinson range of an asset has also
been proposed as a variance proxy, and is defined over a time period $t$ by<sup id="fnref:27:1" role="doc-noteref"><a href="#fn:27" class="footnote">11</a></sup>:</p>
\[\tilde{\sigma}_{jaP,t}^2 = \frac{1}{4 \ln 2} \left( \ln \frac{H_t}{L_t} \right) ^2 + \left( \ln \frac{O_t}{C_{t-1}} \right) ^2\]
<p>, where:</p>
<ul>
<li>$O_t$ is the asset opening price over the time period $t$</li>
<li>$H_t$ is the asset highest price over the time period $t$</li>
<li>$L_t$ is the asset lowest price over the time period $t$</li>
<li>$C_{t-1}$ is the asset closing price over the previous time period $t-1$</li>
</ul>
<h3 id="volatility-proxies">Volatility proxies</h3>
<p>A <em>volatility estimator</em> - or <em>volatility proxy</em> - $\tilde{\sigma}_t$ for an asset volatility over a time period $t$ is defined as the square root of a variance proxy $\tilde{\sigma}_t^2$
for that asset over the same time period.</p>
<p>To be noted that in the financial literature, the term <em>volatility proxy</em> is frequently used instead of <em>variance proxy</em>, which warrants some caution.</p>
<h3 id="weighted-moving-average-volatility-forecasting-model">Weighted moving average volatility forecasting model</h3>
<p><em>Most empirical methods for predicting volatility on the basis of past data start with the premise that volatility clusters through time</em><sup id="fnref:2:2" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup><sup id="fnref:45" role="doc-noteref"><a href="#fn:45" class="footnote">12</a></sup>.</p>
<p>From this observation, it makes sense to model an asset next period’s<sup id="fnref:15" role="doc-noteref"><a href="#fn:15" class="footnote">13</a></sup> variance $\hat{\sigma}_{T+1}^2$ as a weighted moving average of that asset past periods’ variance proxies
$\tilde{\sigma}^2_t$, $t=1..T$.</p>
<p>This leads to the following formula:</p>
\[\hat{\sigma}_{T+1}^2 = w_0 + \sum_{i=1}^{k} w_i \tilde{\sigma}^2_{T+1-i}\]
<p>, where:</p>
<ul>
<li>$1 \leq k \leq T$ is the size of the moving average, possibly time-dependent</li>
<li>$w_i, i=0..k$ are the weights of the moving average, possibly time-dependent as well</li>
</ul>
<p>Although very simple, this family of volatility forecasting models encompasses <em>many of the empirical volatility forecasting [models …] used in finance</em><sup id="fnref:2:3" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup>:</p>
<ul>
<li>The random walk model<sup id="fnref:26" role="doc-noteref"><a href="#fn:26" class="footnote">14</a></sup></li>
<li>The historical average model<sup id="fnref:26:1" role="doc-noteref"><a href="#fn:26" class="footnote">14</a></sup></li>
<li>The exponentially smoothed model (a.k.a., the RiskMetrics model<sup id="fnref:19" role="doc-noteref"><a href="#fn:19" class="footnote">15</a></sup>)</li>
<li>The <a href="https://en.wikipedia.org/wiki/Autoregressive_conditional_heteroskedasticity">GARCH</a> model</li>
<li>The HAR model<sup id="fnref:17" role="doc-noteref"><a href="#fn:17" class="footnote">16</a></sup></li>
<li>…</li>
</ul>
<h2 id="simple-moving-average-volatility-forecasting-model">Simple moving average volatility forecasting model</h2>
<h3 id="relationship-with-the-generic-weighted-moving-average-model">Relationship with the generic weighted moving average model</h3>
<p>A simple moving average (SMA) volatility forecasting model is a specific kind of weighted moving average volatility forecasting model, with:</p>
<ul>
<li>$w_0 = 0$</li>
<li>$w_i = \frac{1}{k}$, $i = 1..k$, that is, equal weights giving each of the last $k$ past variance proxies the same importance in the model</li>
<li>$w_j = 0$, $j = k+1..T$, discarding all the past variance proxies beyond the $k$-th from the model</li>
</ul>
<h3 id="volatility-forecasting-formulas">Volatility forecasting formulas</h3>
<p>Under a simple moving average volatility forecasting model, the generic weighted moving average volatility forecasting formula becomes:</p>
<ul>
<li>
<p>To estimate an asset next period’s volatility:</p>
\[\hat{\sigma}_{T+1} = \sqrt{ \frac{\sum_{i=1}^{k} \tilde{\sigma}^2_{T+1-i}}{k} }\]
</li>
<li>
<p>To estimate an asset next $h$-period’s ahead volatility<sup id="fnref:16" role="doc-noteref"><a href="#fn:16" class="footnote">17</a></sup>, $h \geq 2$:</p>
\[\hat{\sigma}_{T+h} = \sqrt{ \frac{ \sum_{i=1}^{k-h+1} \tilde{\sigma}^2_{T+1-i} + \sum_{i=1}^{h-1} \hat{\sigma}^2_{T+h-i}}{k} }\]
</li>
<li>
<p>To estimate an asset aggregated volatility<sup id="fnref:16:1" role="doc-noteref"><a href="#fn:16" class="footnote">17</a></sup> over the next $h$ periods:</p>
\[\hat{\sigma}_{T+1:T+h} = \sqrt{ \sum_{i=1}^{h} \hat{\sigma}^2_{T+i} }\]
</li>
</ul>
<h3 id="specific-cases">Specific cases</h3>
<p>The simple moving average volatility forecasting model encompasses two specific models:</p>
<ul>
<li>
<p>The random walk model, which corresponds to $k = 1$</p>
<p>Under this volatility model, introduced in <a href="/blog/range-based-volatility-estimators-overview-and-examples-of-usage/">a previous blog post</a>, the forecast for an asset next period’s
volatility is that asset current period’s volatility.</p>
</li>
<li>
<p>The historical average model, which corresponds to $k = T$</p>
<p>Under this volatility model, the forecast for an asset next period’s volatility is the long term average of that asset past periods’ volatility.</p>
</li>
</ul>
<h3 id="how-to-choose-the-window-size">How to choose the window size?</h3>
<p>There are two common procedures to choose the window size $k$ of a simple moving average volatility forecasting model:</p>
<ul>
<li>
<p>Using sensible ad-hoc values</p>
<p>For example, in order to forecast an asset volatility for each day over the next month, it makes sense to use that asset past volatility for each day over the last month.</p>
</li>
<li>
<p>Determining the optimal window size w.r.t. the forecast horizon $h$</p>
<p>Because the window size best suited to a given forecast horizon (e.g. 1 day) is possibly different from the window size best suited to another forecast horizon (e.g., 1 month), some
authors like Figlewski<sup id="fnref:10:1" role="doc-noteref"><a href="#fn:10" class="footnote">4</a></sup> propose to select the window size as the value minimizing the <a href="https://en.wikipedia.org/wiki/Root-mean-square_deviation">root mean square error (RMSE)</a> between:</p>
<ul>
<li>The volatility forecasted over the desired horizon</li>
<li>The volatility effectively observed over that horizon</li>
</ul>
<p>Such a procedure has the benefit of rigor v.s. using an ad-hoc window size, but comes with its own issues like need to capture the time variation
in volatility (e.g., using an expanding window or a rolling window to compute the RMSE).</p>
</li>
</ul>
<h2 id="exponentially-weighted-moving-average-volatility-forecasting-model">Exponentially weighted moving average volatility forecasting model</h2>
<h3 id="relationship-with-the-generic-weighted-moving-average-model-1">Relationship with the generic weighted moving average model</h3>
<p>An exponentially weighted moving average (EWMA) volatility forecasting model is defined by<sup id="fnref:19:1" role="doc-noteref"><a href="#fn:19" class="footnote">15</a></sup>:</p>
<ul>
<li>A decay factor $\lambda \in [0, 1]$</li>
<li>An initial moving average value<sup id="fnref:20" role="doc-noteref"><a href="#fn:20" class="footnote">18</a></sup>, for example $\hat{\sigma}_{1}^2 = \tilde{\sigma}^2_1$</li>
<li>A recursive computation formula $\hat{\sigma}^2_{t+1} = \lambda \hat{\sigma}_t^2 + \left( 1 - \lambda \right) \tilde{\sigma}^2_t$, $t \geq 1$</li>
</ul>
<p>By developing the recursion, it is easy to see that an exponentially weighted moving average volatility forecasting model is a specific kind of
weighted moving average volatility forecasting model, with:</p>
<ul>
<li>$k = T$</li>
<li>$w_0 = 0$</li>
<li>$w_1 = \left( 1 - \lambda \right)$, $w_2 = \lambda \left( 1 - \lambda \right)$, …, $w_{T-1} = \lambda^{T-1} \left( 1 - \lambda \right)$, $w_T = \lambda^T$, that is, exponentially decreasing weights emphasizing recent past variance proxies v.s. more distant ones in the model</li>
</ul>
<h3 id="volatility-forecasting-formulas-1">Volatility forecasting formulas</h3>
<p>Under an exponentially weighted moving average volatility forecasting model, the generic weighted moving average volatility forecasting formula becomes:</p>
<ul>
<li>
<p>To estimate an asset next period’s volatility:</p>
\[\hat{\sigma}_{T+1} = \sqrt{ \lambda \hat{\sigma}_{T}^2 + \left( 1 - \lambda \right) \tilde{\sigma}^2_{T} }\]
</li>
<li>
<p>To estimate an asset next $h$-period’s ahead volatility<sup id="fnref:16:2" role="doc-noteref"><a href="#fn:16" class="footnote">17</a></sup>, $h \geq 2$:</p>
\[\hat{\sigma}_{T+h} = \hat{\sigma}_{T+1}\]
<p>This result means that volatility forecasts beyond the next period are all equal to the volatility forecast for that next period, in a kind of random walk model way, and is
a known limitation of this model when multi-period ahead forecasts are required.</p>
</li>
<li>
<p>To estimate an asset aggregated volatility<sup id="fnref:16:3" role="doc-noteref"><a href="#fn:16" class="footnote">17</a></sup> over the next $h$ periods:</p>
\[\hat{\sigma}_{T+1:T+h} = \sqrt{h} \hat{\sigma}_{T+1}\]
</li>
</ul>
<h3 id="how-to-choose-the-decay-factor">How to choose the decay factor?</h3>
<p>Similar to the simple moving average volatility forecasting model, there are two<sup id="fnref:21" role="doc-noteref"><a href="#fn:21" class="footnote">19</a></sup> common procedures to choose the decay factor $\lambda$ of an exponentially weighted moving average
volatility forecasting model:</p>
<ul>
<li>
<p>Using recommended values from the literature</p>
<p>For example, for variance proxies represented by daily squared returns, these are:</p>
<ul>
<li>$\lambda = 0.94$<sup id="fnref:19:2" role="doc-noteref"><a href="#fn:19" class="footnote">15</a></sup> or $\lambda = 0.89$<sup id="fnref:22" role="doc-noteref"><a href="#fn:22" class="footnote">20</a></sup>, for 1-day ahead volatility forecast</li>
<li>$\lambda = 0.92$<sup id="fnref:22:1" role="doc-noteref"><a href="#fn:22" class="footnote">20</a></sup>, for 1-week ahead volatility forecast</li>
<li>$\lambda = 0.95$<sup id="fnref:22:2" role="doc-noteref"><a href="#fn:22" class="footnote">20</a></sup>, for 2-week ahead volatility forecast</li>
<li>$\lambda = 0.97$<sup id="fnref:19:3" role="doc-noteref"><a href="#fn:19" class="footnote">15</a></sup> or $\lambda = 0.98$<sup id="fnref:22:3" role="doc-noteref"><a href="#fn:22" class="footnote">20</a></sup>, for 1-month ahead volatility forecast</li>
</ul>
</li>
<li>
<p>Determining the optimal value w.r.t. the forecast horizon $h$</p>
<p>Here again, it is possible to select the decay factor as the value minimizing the RMSE between the volatility forecasted over the desired horizon and the volatility
effectively observed over that horizon.</p>
<p>A good reference for this is the RiskMetrics technical document<sup id="fnref:19:4" role="doc-noteref"><a href="#fn:19" class="footnote">15</a></sup>.</p>
</li>
</ul>
<h2 id="performance-of-the-simple-and-exponentially-weighted-moving-average-volatility-forecasting-models">Performance of the simple and exponentially weighted moving average volatility forecasting models</h2>
<p>The simple and exponentially weighted moving average volatility forecasting models are studied in a couple of papers (Boudoukh<sup id="fnref:2:4" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup>, Figlewski<sup id="fnref:10:2" role="doc-noteref"><a href="#fn:10" class="footnote">4</a></sup>…)
and are found to be competitive with more complex models.</p>
<p>Of course, <em>the predictive ability of a model depends largely on the asset class and the frequency of the observations</em><sup id="fnref:26:2" role="doc-noteref"><a href="#fn:26" class="footnote">14</a></sup>, so that the “best” volatility forecasting model ultimately
depends on the context, but it is quite remarkable that these two simple models are already quite good!</p>
<h2 id="implementation-in-portfolio-optimizer">Implementation in Portfolio Optimizer</h2>
<p><strong>Portfolio Optimizer</strong> implements:</p>
<ul>
<li>The simple moving average volatility forecasting model through the endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/assets/volatility/forecast/sma</code></a></li>
<li>The exponentially weighted moving average volatility forecasting model through the endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/assets/volatility/forecast/ewma</code></a></li>
</ul>
<p>Both of these endpoints support the 4 variance proxies below:</p>
<ul>
<li>Squared close-to-close returns</li>
<li>Demeaned squared close-to-close returns</li>
<li>The Parkinson range</li>
<li>The jump-adjusted Parkinson range</li>
</ul>
<p>Both of these endpoints allow to automatically determine the optimal value of their parameter (window size $k$ or decay factor $\lambda$) using a proprietary variation of the procedures described in
Figlewski<sup id="fnref:10:3" role="doc-noteref"><a href="#fn:10" class="footnote">4</a></sup> and in the RiskMetrics technical document<sup id="fnref:19:5" role="doc-noteref"><a href="#fn:19" class="footnote">15</a></sup>.</p>
<h2 id="examples-of-usage">Examples of usage</h2>
<h3 id="volatility-forecasting-at-monthly-level-for-various-etfs">Volatility forecasting at monthly level for various ETFs</h3>
<p>As a first example of usage, I propose to complement the results of <a href="/blog/range-based-volatility-estimators-overview-and-examples-of-usage/">a previous blog post</a>, in which
monthly forecasts produced by a random walk volatility model are compared to the next month’s close-to-close observed volatility for 10 ETFs representative<sup id="fnref:29" role="doc-noteref"><a href="#fn:29" class="footnote">21</a></sup> of misc. asset classes:</p>
<ul>
<li>U.S. stocks (SPY ETF)</li>
<li>European stocks (EZU ETF)</li>
<li>Japanese stocks (EWJ ETF)</li>
<li>Emerging markets stocks (EEM ETF)</li>
<li>U.S. REITs (VNQ ETF)</li>
<li>International REITs (RWX ETF)</li>
<li>U.S. 7-10 year Treasuries (IEF ETF)</li>
<li>U.S. 20+ year Treasuries (TLT ETF)</li>
<li>Commodities (DBC ETF)</li>
<li>Gold (GLD ETF)</li>
</ul>
<p>In details, I propose to include the simple and exponentially weighted moving average models as additional volatility forecasting models to be evaluated using Mincer-Zarnowitz<sup id="fnref:31" role="doc-noteref"><a href="#fn:31" class="footnote">22</a></sup> regressions.</p>
<p>Averaged results for all ETFs/regression models over each ETF price history<sup id="fnref:33" role="doc-noteref"><a href="#fn:33" class="footnote">23</a></sup> are the following<sup id="fnref:34" role="doc-noteref"><a href="#fn:34" class="footnote">24</a></sup>:</p>
<table>
<thead>
<tr>
<th>Volatility model</th>
<th>Variance proxy</th>
<th>$\bar{\alpha}$</th>
<th>$\bar{\beta}$</th>
<th>$\bar{R^2}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>Random walk <em>(previous blog post)</em></td>
<td>Squared close-to-close returns</td>
<td>5.8%</td>
<td>0.66</td>
<td>44%</td>
</tr>
<tr>
<td>Random walk <em>(previous blog post)</em></td>
<td>Parkinson range</td>
<td>5.6%</td>
<td>0.94</td>
<td>44%</td>
</tr>
<tr>
<td>Random walk <em>(previous blog post)</em></td>
<td>Jump-adjusted Parkinson range</td>
<td>4.9%</td>
<td>0.70</td>
<td>45%</td>
</tr>
<tr>
<td>SMA, $k$ = 1 month</td>
<td>Squared close-to-close returns</td>
<td>5.7%</td>
<td>0.68</td>
<td>46%</td>
</tr>
<tr>
<td>SMA, $k$ = 1 month</td>
<td>Parkinson range</td>
<td>5.5%</td>
<td>0.95</td>
<td>46%</td>
</tr>
<tr>
<td>SMA, $k$ = 1 month</td>
<td>Jump-adjusted Parkinson range</td>
<td>5.1%</td>
<td>0.71</td>
<td>47%</td>
</tr>
<tr>
<td>EWMA, $\lambda = 0.94$</td>
<td>Squared close-to-close returns</td>
<td>4.4%</td>
<td>0.74</td>
<td><strong>48%</strong></td>
</tr>
<tr>
<td>EWMA, $\lambda = 0.94$</td>
<td>Parkinson range</td>
<td>4.6%</td>
<td>1.02</td>
<td>47%</td>
</tr>
<tr>
<td>EWMA, $\lambda = 0.94$</td>
<td>Jump-adjusted Parkinson range</td>
<td>3.9%</td>
<td>0.76</td>
<td>47%</td>
</tr>
<tr>
<td>EWMA, $\lambda = 0.97$</td>
<td>Squared close-to-close returns</td>
<td>3.8%</td>
<td>0.76</td>
<td>45%</td>
</tr>
<tr>
<td>EWMA, $\lambda = 0.97$</td>
<td>Parkinson range</td>
<td>4.2%</td>
<td>1.03</td>
<td>44%</td>
</tr>
<tr>
<td>EWMA, $\lambda = 0.97$</td>
<td>Jump-adjusted Parkinson range</td>
<td><strong>3.3%</strong></td>
<td>0.78</td>
<td>44%</td>
</tr>
<tr>
<td>SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days</td>
<td>Squared close-to-close returns</td>
<td>5.8%</td>
<td>0.68</td>
<td>46%</td>
</tr>
<tr>
<td>SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days</td>
<td>Parkinson range</td>
<td>5.1%</td>
<td><strong>1.00</strong></td>
<td>47%</td>
</tr>
<tr>
<td>SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days</td>
<td>Jump-adjusted Parkinson range</td>
<td>5.1%</td>
<td>0.71</td>
<td>47%</td>
</tr>
<tr>
<td>EWMA, optimal $\lambda$</td>
<td>Squared close-to-close returns</td>
<td>4.7%</td>
<td>0.73</td>
<td>45%</td>
</tr>
<tr>
<td>EWMA, optimal $\lambda$</td>
<td>Parkinson range</td>
<td>4.3%</td>
<td>1.06</td>
<td><strong>48%</strong></td>
</tr>
<tr>
<td>EWMA, optimal $\lambda$</td>
<td>Jump-adjusted Parkinson range</td>
<td>4.0%</td>
<td>0.76</td>
<td>45%</td>
</tr>
</tbody>
</table>
<p>A couple of general remarks:</p>
<ul>
<li>Whatever the volatility model, forecasts produced using the Parkinson range as a variance proxy are much less biased than those produced using either the squared close-to-close returns or the jump-adjusted Parkinson range</li>
<li>The SMA model produces better<sup id="fnref:36" role="doc-noteref"><a href="#fn:36" class="footnote">25</a></sup> forecasts than the random walk model (lines #1-#3 v.s. lines #4-#6)</li>
<li>The EWMA model produces better<sup id="fnref:36:1" role="doc-noteref"><a href="#fn:36" class="footnote">25</a></sup> forecasts than the SMA model (lines #4-#6 v.s. lines #7-#9)</li>
<li>The “optimal” SMA model produces better<sup id="fnref:36:2" role="doc-noteref"><a href="#fn:36" class="footnote">25</a></sup> forecasts than the “fixed window size” SMA model (lines #4-#6 v.s. lines #13-#15)</li>
<li>The “optimal” EWMA model is comparable<sup id="fnref:36:3" role="doc-noteref"><a href="#fn:36" class="footnote">25</a></sup> to the best “fixed decay factor” EWMA model (lines #7-#12 v.s. lines #16-#18)</li>
</ul>
<p>As a conclusion, using the Parkinson range as variance proxy seems to generate the most accurate volatility forecasts under
both the simple and exponentially weighted moving average models, these two models being roughly comparable provided a proper window size $k$ and a proper decay factor $\lambda$ are chosen<sup id="fnref:46" role="doc-noteref"><a href="#fn:46" class="footnote">26</a></sup>.</p>
<h3 id="volatility-forecasting-at-daily-level-for-the-spy-etf">Volatility forecasting at daily level for the SPY ETF</h3>
<p>As a second example of usage, I propose to revisit the research note <em>Risk Before Return: Targeting Volatility with Higher Frequency Data</em><sup id="fnref:35" role="doc-noteref"><a href="#fn:35" class="footnote">27</a></sup> from <a href="https://saltfinancial.com/">Salt Financial</a>,
in which it is shown that using 15-minute intraday data allows <em>to boost performance in [daily] volatility targeting strategies</em><sup id="fnref:35:1" role="doc-noteref"><a href="#fn:35" class="footnote">27</a></sup> for the SPY ETF.</p>
<p>Given the nature of this blog post, though, I will use the daily jump-adjusted Parkinson range instead of the scaled<sup id="fnref:47" role="doc-noteref"><a href="#fn:47" class="footnote">28</a></sup> intraday high frequency realized volatility measure used
by Salt Financial people.</p>
<p>The underlying rationale is the following:</p>
<ul>
<li>The Parkinson range theoretically possesses the same informational content as 2-hour or 3-hour intraday data<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote">29</a></sup>, so that it should be a poor man’s substitute to the Salt Financial intraday high frequency realized volatility measure</li>
<li>The jump-adjusted Parkinson range takes into account overnight jumps, so that it should be a poor man’s substitute to the Salt Financial scaled intraday high frequency realized volatility measure</li>
</ul>
<h4 id="volatility-targeting">Volatility targeting</h4>
<p>Volatility targeting is a portfolio risk-management strategy which consists in scaling the exposure of a portfolio to risky assets in order to target a constant level of
portfolio volatility<sup id="fnref:38" role="doc-noteref"><a href="#fn:38" class="footnote">30</a></sup>.</p>
<p>In order to achieve this, the proportion of the portfolio allocated to risky assets is regularly adjusted w.r.t. the forecasted portfolio volatility.</p>
<p>More formally, a (non-leveraged) portfolio targeting a constant level of volatility $\sigma_{target}$ is regularly adjusted at rebalancing times $t_1,t_2,…$ so that:</p>
<ul>
<li>A proportion $w_{t_i} = \frac{\sigma_{target}}{\hat{\sigma}_{t_i}}$ % of the portfolio is invested into risky assets, with $w_{t_i}$ varying from 0% to 100%</li>
<li>A proportion $1 - w_{t_i}$% of the portfolio is invested into a risk-free asset</li>
</ul>
<p>, where $\hat{\sigma}_{t_i}$ is the forecasted portfolio volatility at time $t_i$ over the next rebalancing time $t_{i+1}$.</p>
<p>Volatility targeting exhibits several interesting characteristics from a risk-management perspective, c.f. Harvey et al<sup id="fnref:38:1" role="doc-noteref"><a href="#fn:38" class="footnote">30</a></sup>:</p>
<ul>
<li>It improves the Sharpe ratios (equity and credit assets)</li>
<li>It reduces the likelihood of extreme returns (all assets)</li>
<li>It reduces the volatility of returns volatility (all assets)</li>
<li>It reduces maximum drawdowns (all assets)</li>
</ul>
<h4 id="daily-volatility-targeting-for-the-spy-etf-using-high-frequency-intraday-data">Daily volatility targeting for the SPY ETF using high frequency intraday data</h4>
<p>Figure 1, directly taken from the Salt Financial research note<sup id="fnref:35:2" role="doc-noteref"><a href="#fn:35" class="footnote">27</a></sup>, illustrates a daily volatility targeting strategy<sup id="fnref:41" role="doc-noteref"><a href="#fn:41" class="footnote">31</a></sup> for the SPY ETF over the period December 2003 - March 2020,
using three different volatility forecasting models:</p>
<ul>
<li><em>30D HV</em> - the 30-day simple moving average of the square root of the daily squared close-to-close returns<sup id="fnref:40" role="doc-noteref"><a href="#fn:40" class="footnote">32</a></sup></li>
<li><em>VIX</em> - the daily value of the VIX index</li>
<li><em>2D RV</em> - the 2-day simple moving average of a scaled realized volatility measure based on intraday returns sampled at a 15-minute frequency</li>
</ul>
<figure>
<a href="/assets/images/blog/volatility-forecasting-volatility-targeting-salt.png"><img src="/assets/images/blog/volatility-forecasting-volatility-targeting-salt.png" alt="Daily volatility targeting strategy for misc. volatility proxies, SPY ETF, December 2003 - March 2020. Source: Salt Financial." /></a>
<figcaption>Figure 1. Daily volatility targeting strategy for misc. volatility proxies, SPY ETF, December 2003 - March 2020. Source: Salt Financial.</figcaption>
</figure>
<p>On this figure, the added value of high frequency data in terms of improved performances for the associated volatility targeting strategy is clear.</p>
<h4 id="daily-volatility-targeting-for-the-spy-etf-using-openhighlowclose-intraday-data">Daily volatility targeting for the SPY ETF using open/high/low/close intraday data</h4>
<p>Figure 2 illustrates my reproduction of Figure 1, using two different volatility forecasting models:</p>
<ul>
<li><em>30d HV</em> - the volatility forecast<sup id="fnref:42" role="doc-noteref"><a href="#fn:42" class="footnote">33</a></sup> produced by the 30-day simple moving average of the square root of the daily squared close-to-close returns</li>
<li><em>2d JAPR</em> - the volatility forecast<sup id="fnref:42:1" role="doc-noteref"><a href="#fn:42" class="footnote">33</a></sup> produced by the 2-day simple moving average of the daily jump-adjusted Parkinson range</li>
</ul>
<figure>
<a href="/assets/images/blog/volatility-forecasting-volatility-targeting-bh-hv-parkinson.png"><img src="/assets/images/blog/volatility-forecasting-volatility-targeting-bh-hv-parkinson.png" alt="Daily volatility targeting strategy for misc. volatility proxies, SPY ETF, December 2003 - March 2020." /></a>
<figcaption>Figure 2. Daily volatility targeting strategy for misc. volatility proxies, SPY ETF, December 2003 - March 2020.</figcaption>
</figure>
<p>Comparing Figure 1 and Figure 2, it seems that performance metrics differ for buy and hold<sup id="fnref:43" role="doc-noteref"><a href="#fn:43" class="footnote">34</a></sup>, so that an absolute comparison between the different volatility targeting strategies will not be possible.</p>
<p>Nevertheless, it is visible in Figure 2 that the volatility forecasting model based on the jump-adjusted Parkinson range exhibit a better volatility control and a slightly
higher average return than the volatility forecasting model based on squared close-to-close returns.</p>
<p>While the relative improvement is not as dramatic as in Figure 1<sup id="fnref:48" role="doc-noteref"><a href="#fn:48" class="footnote">35</a></sup>, this empirically demonstrates that using open/high/low/close intraday
data can definitely be beneficial to a volatility targeting strategy.</p>
<h2 id="conclusion">Conclusion</h2>
<p>In this blog post, I showed that simple volatility forecasting models can be useful in practice, especially when paired with range-based volatility estimators like the Parkinson estimator.</p>
<p>Next in this series dedicated to volatility forecasting, I will detail the reference model when it comes to volatility forecasting<sup id="fnref:44" role="doc-noteref"><a href="#fn:44" class="footnote">36</a></sup> - the GARCH model - and as usual, I will add my own twist to it.</p>
<p>Meanwhile, feel free to <a href="https://www.linkedin.com/in/roman-rubsamen/">connect with me on LinkedIn</a> or to <a href="https://twitter.com/portfoliooptim">follow me on Twitter</a>.</p>
<p>–</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:4" role="doc-endnote">
<p>The term <em>returns</em> is loosely defined here, but think for example daily close-to-close returns for daily volatility forecasting. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:1" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/abs/pii/S030440761000076X">Andrew J. Patton, Volatility forecast comparison using imperfect volatility proxies, Journal of Econometrics, Volume 160, Issue 1, 2011, Pages 246-256</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:1:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:1:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>See <a href="https://www.pm-research.com/content/iijderiv/4/3/63">Boudoukh, J., Richardson, M., & Whitelaw, R.F. (1997). Investigation of a class of volatility estimators, Journal of Derivatives, 4 Spring, 63-71</a>. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:2:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:2:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:2:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>See <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/1468-0416.00009">Figlewski, S. (1997), Forecasting Volatility. Financial Markets, Institutions & Instruments, 6: 1-88</a>. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:10:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:10:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:10:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a></p>
</li>
<li id="fn:13" role="doc-endnote">
<p>Assuming the time period $t$ is of unit length. <a href="#fnref:13" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:12" role="doc-endnote">
<p>See <a href="https://www.rasmusen.org/special/black/fbpage.htm">Fischer Black, One Way to Estimate Volatility, Black on Options, Vol 1, No. 8, May 17, 1976</a>. <a href="#fnref:12" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>Indeed, <em>it is extremely difficult to obtain an accurate mean estimate from the data</em><sup id="fnref:10:4" role="doc-noteref"><a href="#fn:10" class="footnote">4</a></sup> and <em>the real problem as far as volatility calculation is concerned is to avoid using extreme sample mean returns that will periodically be produced from short data samples</em><sup id="fnref:10:5" role="doc-noteref"><a href="#fn:10" class="footnote">4</a></sup>. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:14" role="doc-endnote">
<p>In the statistical sense, c.f. <a href="https://en.wikipedia.org/wiki/Estimator">Wikipedia</a>. <a href="#fnref:14" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>See <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/1540-6261.00454">Alizadeh, S., Brandt, M.W. and Diebold, F.X. (2002), Range-Based Estimation of Stochastic Volatility Models. The Journal of Finance, 57: 1047-1091</a>. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:23" role="doc-endnote">
<p>That being said, Brunetti and Lildholdt<sup id="fnref:24" role="doc-noteref"><a href="#fn:24" class="footnote">37</a></sup> showed that the conditions of validity of the Parkinson volatility estimator could be relaxed, allowing the process driving the asset returns to exhibit time-varying volatility and fat tails! <a href="#fnref:23" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:27" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/abs/pii/S0169207009000466">Richard D.F. Harris, Fatih Yilmaz, Estimation of the conditional variance-covariance matrix of returns using the intraday range, International Journal of Forecasting, Volume 26, Issue 1, 2010, Pages 180-194</a>. <a href="#fnref:27" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:27:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:45" role="doc-endnote">
<p>Which, by the way, is a stylized fact of volatility<sup id="fnref:25" role="doc-noteref"><a href="#fn:25" class="footnote">38</a></sup>. <a href="#fnref:45" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:15" role="doc-endnote">
<p>By recursive substitution, an estimate of the asset next $h$-period’s ahead volatility $\hat{\sigma}_{T+h}$, $h \geq 2$, is a weighted moving average of that asset past periods variance proxies plus that asset past periods variance forecasts. <a href="#fnref:15" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:26" role="doc-endnote">
<p>See <a href="https://www.lazardassetmanagement.com/us/en_us/research-insights/investment-research/22430-predicting-volatility">Lazard Asset Management, Predicting Volatility, December 2015</a>. <a href="#fnref:26" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:26:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:26:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:19" role="doc-endnote">
<p>See <a href="https://www.msci.com/documents/10199/5915b101-4206-4ba0-aee2-3449d5c7e95a">RiskMetrics. Technical Document, J.P.Morgan/Reuters, New York, 1996. Fourth Edition</a>. <a href="#fnref:19" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:19:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:19:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:19:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:19:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a> <a href="#fnref:19:5" class="reversefootnote" role="doc-backlink">↩<sup>6</sup></a></p>
</li>
<li id="fn:17" role="doc-endnote">
<p>See <a href="https://academic.oup.com/jfec/article-abstract/7/2/174/856522?redirectedFrom=fulltext">Fulvio Corsi, A Simple Approximate Long-Memory Model of Realized Volatility, Journal of Financial Econometrics, Volume 7, Issue 2, Spring 2009, Pages 174-196</a>. <a href="#fnref:17" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:16" role="doc-endnote">
<p>See <a href="https://centaur.reading.ac.uk/21316/">Brooks, Chris and Persand, Gitanjali (2003) Volatility forecasting for risk management. Journal of Forecasting, 22(1). pp. 1-22</a>. <a href="#fnref:16" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:16:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:16:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:16:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a></p>
</li>
<li id="fn:20" role="doc-endnote">
<p>There are other possible choices for the initial value $\hat{\sigma}_{0}^2. <a href="#fnref:20" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:21" role="doc-endnote">
<p>Other procedures are described in the RiskMetrics technical document<sup id="fnref:19:6" role="doc-noteref"><a href="#fn:19" class="footnote">15</a></sup>. <a href="#fnref:21" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:22" role="doc-endnote">
<p>See <a href="https://arxiv.org/abs/2105.14382">Axel A. Araneda, Asset volatility forecasting:The optimal decay parameter in the EWMA model, arXiv</a>. <a href="#fnref:22" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:22:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:22:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:22:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a></p>
</li>
<li id="fn:29" role="doc-endnote">
<p>These ETFs are used in the <em>Adaptative Asset Allocation</em> strategy from <a href="https://investresolve.com/">ReSolve Asset Management</a>, described in the paper <em>Adaptive Asset Allocation: A Primer</em><sup id="fnref:30" role="doc-noteref"><a href="#fn:30" class="footnote">39</a></sup>. <a href="#fnref:29" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:31" role="doc-endnote">
<p>See <a href="https://econpapers.repec.org/bookchap/nbrnberch/1214.htm">Mincer, J. and V. Zarnowitz (1969). The evaluation of economic forecasts. In J. Mincer (Ed.), Economic Forecasts and Expectations</a>. <a href="#fnref:31" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:33" role="doc-endnote">
<p>The common ending price history of all the ETFs is 31 August 2023, but there is no common starting price history, as all ETFs started trading on different dates. <a href="#fnref:33" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:34" role="doc-endnote">
<p>For the exponentially weighted moving average, I used an expanding window for the volatility forecast computation. <a href="#fnref:34" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:36" role="doc-endnote">
<p>In terms of lower $\alpha$, $\beta$ closer to 1 and higher $R^2$. <a href="#fnref:36" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:36:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:36:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:36:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a></p>
</li>
<li id="fn:46" role="doc-endnote">
<p>Personally, I would recommend to use the exponentially weighted moving average model with the optimal decay factor $\lambda$ automatically determined. <a href="#fnref:46" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:35" role="doc-endnote">
<p>See <a href="https://saltfinancial.com/static/uploads/2020/06/Risk%20Before%20Return%20-%20Targeting%20Volatility%20with%20Higher%20Frequency%20Data-FINAL.pdf">Salt Financial, Risk Before Return: Targeting Volatility with Higher Frequency Data, Research Note</a>. <a href="#fnref:35" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:35:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:35:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:47" role="doc-endnote">
<p>Salt Financial people scale their proposed intraday high frequency realized volatility measure in order to account for overnight returns. <a href="#fnref:47" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>See <a href="https://www.jstor.org/stable/2527343">Andersen, T. G. and Bollerslev, T.: 1998, Answering the skeptics: Yes, standard volatility models do provide accurate forecasts, International Economic Review 39, 885-905</a>. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:38" role="doc-endnote">
<p>See <a href="https://ssrn.com/abstract=3175538">Harvey, Campbell R. and Hoyle, Edward and Korgaonkar, Russell and Rattray, Sandy and Sargaison, Matthew and van Hemert, Otto, The Impact of Volatility Targeting</a>. <a href="#fnref:38" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:38:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:41" role="doc-endnote">
<p>Implemented with a one-day lag. <a href="#fnref:41" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:40" role="doc-endnote">
<p>That’s my understanding of the term “historical volatility” in the Salt Financial research note<sup id="fnref:35:3" role="doc-noteref"><a href="#fn:35" class="footnote">27</a></sup>. <a href="#fnref:40" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:42" role="doc-endnote">
<p>At a 2-day horizon, due to the Salt Financial one-day lag implementation<sup id="fnref:35:4" role="doc-noteref"><a href="#fn:35" class="footnote">27</a></sup>. <a href="#fnref:42" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:42:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:43" role="doc-endnote">
<p>Probably because the research note from Salt Financial<sup id="fnref:35:5" role="doc-noteref"><a href="#fn:35" class="footnote">27</a></sup> uses excess returns whereas my reproduction uses raw returns. <a href="#fnref:43" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:48" role="doc-endnote">
<p>In Figure 1, the increase in terms of Sharpe ratio is 0.09, while in Figure 2, it is only 0.05. <a href="#fnref:48" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:44" role="doc-endnote">
<p>See <a href="https://onlinelibrary.wiley.com/doi/full/10.1002/jae.800">Hansen, P.R. and Lunde, A. (2005), A forecast comparison of volatility models: does anything beat a GARCH(1,1)?. J. Appl. Econ., 20: 873-889</a>. <a href="#fnref:44" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:24" role="doc-endnote">
<p>See <a href="https://ssrn.com/abstract=296875">Brunetti, Celso and Lildholdt, Peter M., Return-Based and Range-Based (Co)Variance Estimation - with an Application to Foreign Exchange Markets (March 2002)</a>. <a href="#fnref:24" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:25" role="doc-endnote">
<p>See <a href="https://www.tandfonline.com/doi/abs/10.1080/713665670">R. Cont (2001) Empirical properties of asset returns: stylized facts and statistical issues, Quantitative Finance, 1:2, 223-236</a>. <a href="#fnref:25" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:30" role="doc-endnote">
<p>See <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2328254">Butler, Adam and Philbrick, Mike and Gordillo, Rodrigo and Varadi, David, Adaptive Asset Allocation: A Primer</a>. <a href="#fnref:30" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Roman R.One of the simplest and most pragmatic approach to volatility forecasting is to model the volatility of an asset as a weighted moving average of its past squared returns1. Two weighting schemes widely used by practitioners23 are the constant weighting scheme and the exponentially decreasing weighting scheme, leading respectively to the the simple moving average volatility forecasting model and to the exponentially weighted moving average volatility forecasting model. In this blog post, I will detail these two models and I will illustrate how they can be used for monthly and daily volatility forecasting. Mathematical preliminaries Volatility modelling Let $r_t$ be the (logarithmic) return of an asset over a time period $t$. In all generality, $r_t$ can be expressed as4 \[r_t = \mu_t + \epsilon_t\] , where: $\mu_t = \mathbb{E} \left[ r_t \right]$ is a predictable quantity representing the (conditional) asset mean return over the time period $t$ $\epsilon_t = r_t - \mathbb{E} \left[ r_t \right]$ is an unpredictable error term, often referred to as a “shock”, over the time period $t$ The asset (conditional) variance, $\sigma_t^2$, is then defined by \[\begin{aligned} \sigma_t^2 &= Var \left[ r_t \right] \\ &= \mathbb{E} \left[ r_t^2 \right] - \mathbb{E} \left[ r_t \right]^2 \\ &= \mathbb{E} \left[ r_t^2 \right] - \mu_t^2 \\ \end{aligned}\] , and the asset (conditional) volatility, $\sigma_t$, by the square root of the asset variance. From this general model for asset returns, it is possible to derive different models for the asset volatility depending on working assumptions. As an example, in a previous blog post on range-based volatility estimators, the main working assumption is that the prices of the asset follow a geometric Brownian motion with constant volatility coefficient $\sigma$ and constant drift coefficient $\mu$. This assumption implies5 that the asset returns $r_t$ are all i.i.d. random variables of normal distribution $\mathcal{N} \left( \mu - \frac{1}{2} \sigma^2, \sigma^2 \right)$ and that the asset volatility is equal to the volatility coefficient of the geometric Brownian motion. In this blog post, the main working assumption will be that $\mu_t = 0$. Such an assumption is standard when working with daily asset returns, with Fischer Black already using it in the early days of option pricing theory6: The new data is a set of volatility estimates On all the option stocks based on about one month of daily returns. One month of daily returns in a typical month is 21 data points. For each stock, I square the returns, take the average of the squares, and then take the square root. I don’t subtract the average return before squaring, because a monthly average return isn’t a good estimate of the long run average return. Zero is a better estimate. But such an assumption is not standard when working with lower frequency data like weekly or monthly asset returns, even though it has been empirically demonstrated to be justified7. Variance proxies Under the working assumption of the previous sub-section, the asset variance $\sigma_t^2$ becomes equal to: \[\sigma_t^2 = \mathbb{E} \left[ r_t^2 \right]\] As a consequence, the squared return $r_t^2$ of an asset over a time period $t$ (a day, a week, a month..) is a variance estimator8 - or variance proxy2 - for that asset variance over the considered time period. However, it has long been known that squared returns are a rather noisy proxy for the true conditional variance2, so that more accurate estimates have been proposed in the literature over the years. Among them, the Parkinson volatility estimator has in particular been found to be theoretically, numerically, and empirically9 superior as a variance proxy to squared returns, even if its usage theoretically requires that asset prices follow a driftless geometric Brownian motion with constant volatility3 and not only that the asset mean return is zero10. The Parkinson range of an asset, as a variance proxy, is defined over a time period $t$ by2: \[\tilde{\sigma}_{P,t}^2 = \frac{1}{4 \ln 2} \left( \ln \frac{H_t}{L_t} \right) ^2\] , where: $H_t$ is the asset highest price over the time period $t$ $L_t$ is the asset lowest price over the time period $t$ Additionally, because the Parkinson range does not account for the period during which the market is closed11, the jump-adjusted Parkinson range of an asset has also been proposed as a variance proxy, and is defined over a time period $t$ by11: \[\tilde{\sigma}_{jaP,t}^2 = \frac{1}{4 \ln 2} \left( \ln \frac{H_t}{L_t} \right) ^2 + \left( \ln \frac{O_t}{C_{t-1}} \right) ^2\] , where: $O_t$ is the asset opening price over the time period $t$ $H_t$ is the asset highest price over the time period $t$ $L_t$ is the asset lowest price over the time period $t$ $C_{t-1}$ is the asset closing price over the previous time period $t-1$ Volatility proxies A volatility estimator - or volatility proxy - $\tilde{\sigma}_t$ for an asset volatility over a time period $t$ is defined as the square root of a variance proxy $\tilde{\sigma}_t^2$ for that asset over the same time period. To be noted that in the financial literature, the term volatility proxy is frequently used instead of variance proxy, which warrants some caution. Weighted moving average volatility forecasting model Most empirical methods for predicting volatility on the basis of past data start with the premise that volatility clusters through time312. From this observation, it makes sense to model an asset next period’s13 variance $\hat{\sigma}_{T+1}^2$ as a weighted moving average of that asset past periods’ variance proxies $\tilde{\sigma}^2_t$, $t=1..T$. This leads to the following formula: \[\hat{\sigma}_{T+1}^2 = w_0 + \sum_{i=1}^{k} w_i \tilde{\sigma}^2_{T+1-i}\] , where: $1 \leq k \leq T$ is the size of the moving average, possibly time-dependent $w_i, i=0..k$ are the weights of the moving average, possibly time-dependent as well Although very simple, this family of volatility forecasting models encompasses many of the empirical volatility forecasting [models …] used in finance3: The random walk model14 The historical average model14 The exponentially smoothed model (a.k.a., the RiskMetrics model15) The GARCH model The HAR model16 … Simple moving average volatility forecasting model Relationship with the generic weighted moving average model A simple moving average (SMA) volatility forecasting model is a specific kind of weighted moving average volatility forecasting model, with: $w_0 = 0$ $w_i = \frac{1}{k}$, $i = 1..k$, that is, equal weights giving each of the last $k$ past variance proxies the same importance in the model $w_j = 0$, $j = k+1..T$, discarding all the past variance proxies beyond the $k$-th from the model Volatility forecasting formulas Under a simple moving average volatility forecasting model, the generic weighted moving average volatility forecasting formula becomes: To estimate an asset next period’s volatility: \[\hat{\sigma}_{T+1} = \sqrt{ \frac{\sum_{i=1}^{k} \tilde{\sigma}^2_{T+1-i}}{k} }\] To estimate an asset next $h$-period’s ahead volatility17, $h \geq 2$: \[\hat{\sigma}_{T+h} = \sqrt{ \frac{ \sum_{i=1}^{k-h+1} \tilde{\sigma}^2_{T+1-i} + \sum_{i=1}^{h-1} \hat{\sigma}^2_{T+h-i}}{k} }\] To estimate an asset aggregated volatility17 over the next $h$ periods: \[\hat{\sigma}_{T+1:T+h} = \sqrt{ \sum_{i=1}^{h} \hat{\sigma}^2_{T+i} }\] Specific cases The simple moving average volatility forecasting model encompasses two specific models: The random walk model, which corresponds to $k = 1$ Under this volatility model, introduced in a previous blog post, the forecast for an asset next period’s volatility is that asset current period’s volatility. The historical average model, which corresponds to $k = T$ Under this volatility model, the forecast for an asset next period’s volatility is the long term average of that asset past periods’ volatility. How to choose the window size? There are two common procedures to choose the window size $k$ of a simple moving average volatility forecasting model: Using sensible ad-hoc values For example, in order to forecast an asset volatility for each day over the next month, it makes sense to use that asset past volatility for each day over the last month. Determining the optimal window size w.r.t. the forecast horizon $h$ Because the window size best suited to a given forecast horizon (e.g. 1 day) is possibly different from the window size best suited to another forecast horizon (e.g., 1 month), some authors like Figlewski4 propose to select the window size as the value minimizing the root mean square error (RMSE) between: The volatility forecasted over the desired horizon The volatility effectively observed over that horizon Such a procedure has the benefit of rigor v.s. using an ad-hoc window size, but comes with its own issues like need to capture the time variation in volatility (e.g., using an expanding window or a rolling window to compute the RMSE). Exponentially weighted moving average volatility forecasting model Relationship with the generic weighted moving average model An exponentially weighted moving average (EWMA) volatility forecasting model is defined by15: A decay factor $\lambda \in [0, 1]$ An initial moving average value18, for example $\hat{\sigma}_{1}^2 = \tilde{\sigma}^2_1$ A recursive computation formula $\hat{\sigma}^2_{t+1} = \lambda \hat{\sigma}_t^2 + \left( 1 - \lambda \right) \tilde{\sigma}^2_t$, $t \geq 1$ By developing the recursion, it is easy to see that an exponentially weighted moving average volatility forecasting model is a specific kind of weighted moving average volatility forecasting model, with: $k = T$ $w_0 = 0$ $w_1 = \left( 1 - \lambda \right)$, $w_2 = \lambda \left( 1 - \lambda \right)$, …, $w_{T-1} = \lambda^{T-1} \left( 1 - \lambda \right)$, $w_T = \lambda^T$, that is, exponentially decreasing weights emphasizing recent past variance proxies v.s. more distant ones in the model Volatility forecasting formulas Under an exponentially weighted moving average volatility forecasting model, the generic weighted moving average volatility forecasting formula becomes: To estimate an asset next period’s volatility: \[\hat{\sigma}_{T+1} = \sqrt{ \lambda \hat{\sigma}_{T}^2 + \left( 1 - \lambda \right) \tilde{\sigma}^2_{T} }\] To estimate an asset next $h$-period’s ahead volatility17, $h \geq 2$: \[\hat{\sigma}_{T+h} = \hat{\sigma}_{T+1}\] This result means that volatility forecasts beyond the next period are all equal to the volatility forecast for that next period, in a kind of random walk model way, and is a known limitation of this model when multi-period ahead forecasts are required. To estimate an asset aggregated volatility17 over the next $h$ periods: \[\hat{\sigma}_{T+1:T+h} = \sqrt{h} \hat{\sigma}_{T+1}\] How to choose the decay factor? Similar to the simple moving average volatility forecasting model, there are two19 common procedures to choose the decay factor $\lambda$ of an exponentially weighted moving average volatility forecasting model: Using recommended values from the literature For example, for variance proxies represented by daily squared returns, these are: $\lambda = 0.94$15 or $\lambda = 0.89$20, for 1-day ahead volatility forecast $\lambda = 0.92$20, for 1-week ahead volatility forecast $\lambda = 0.95$20, for 2-week ahead volatility forecast $\lambda = 0.97$15 or $\lambda = 0.98$20, for 1-month ahead volatility forecast Determining the optimal value w.r.t. the forecast horizon $h$ Here again, it is possible to select the decay factor as the value minimizing the RMSE between the volatility forecasted over the desired horizon and the volatility effectively observed over that horizon. A good reference for this is the RiskMetrics technical document15. Performance of the simple and exponentially weighted moving average volatility forecasting models The simple and exponentially weighted moving average volatility forecasting models are studied in a couple of papers (Boudoukh3, Figlewski4…) and are found to be competitive with more complex models. Of course, the predictive ability of a model depends largely on the asset class and the frequency of the observations14, so that the “best” volatility forecasting model ultimately depends on the context, but it is quite remarkable that these two simple models are already quite good! Implementation in Portfolio Optimizer Portfolio Optimizer implements: The simple moving average volatility forecasting model through the endpoint /assets/volatility/forecast/sma The exponentially weighted moving average volatility forecasting model through the endpoint /assets/volatility/forecast/ewma Both of these endpoints support the 4 variance proxies below: Squared close-to-close returns Demeaned squared close-to-close returns The Parkinson range The jump-adjusted Parkinson range Both of these endpoints allow to automatically determine the optimal value of their parameter (window size $k$ or decay factor $\lambda$) using a proprietary variation of the procedures described in Figlewski4 and in the RiskMetrics technical document15. Examples of usage Volatility forecasting at monthly level for various ETFs As a first example of usage, I propose to complement the results of a previous blog post, in which monthly forecasts produced by a random walk volatility model are compared to the next month’s close-to-close observed volatility for 10 ETFs representative21 of misc. asset classes: U.S. stocks (SPY ETF) European stocks (EZU ETF) Japanese stocks (EWJ ETF) Emerging markets stocks (EEM ETF) U.S. REITs (VNQ ETF) International REITs (RWX ETF) U.S. 7-10 year Treasuries (IEF ETF) U.S. 20+ year Treasuries (TLT ETF) Commodities (DBC ETF) Gold (GLD ETF) In details, I propose to include the simple and exponentially weighted moving average models as additional volatility forecasting models to be evaluated using Mincer-Zarnowitz22 regressions. Averaged results for all ETFs/regression models over each ETF price history23 are the following24: Volatility model Variance proxy $\bar{\alpha}$ $\bar{\beta}$ $\bar{R^2}$ Random walk (previous blog post) Squared close-to-close returns 5.8% 0.66 44% Random walk (previous blog post) Parkinson range 5.6% 0.94 44% Random walk (previous blog post) Jump-adjusted Parkinson range 4.9% 0.70 45% SMA, $k$ = 1 month Squared close-to-close returns 5.7% 0.68 46% SMA, $k$ = 1 month Parkinson range 5.5% 0.95 46% SMA, $k$ = 1 month Jump-adjusted Parkinson range 5.1% 0.71 47% EWMA, $\lambda = 0.94$ Squared close-to-close returns 4.4% 0.74 48% EWMA, $\lambda = 0.94$ Parkinson range 4.6% 1.02 47% EWMA, $\lambda = 0.94$ Jump-adjusted Parkinson range 3.9% 0.76 47% EWMA, $\lambda = 0.97$ Squared close-to-close returns 3.8% 0.76 45% EWMA, $\lambda = 0.97$ Parkinson range 4.2% 1.03 44% EWMA, $\lambda = 0.97$ Jump-adjusted Parkinson range 3.3% 0.78 44% SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days Squared close-to-close returns 5.8% 0.68 46% SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days Parkinson range 5.1% 1.00 47% SMA, optimal $k \in \left[ 1, 5, 10, 15, 20 \right]$ days Jump-adjusted Parkinson range 5.1% 0.71 47% EWMA, optimal $\lambda$ Squared close-to-close returns 4.7% 0.73 45% EWMA, optimal $\lambda$ Parkinson range 4.3% 1.06 48% EWMA, optimal $\lambda$ Jump-adjusted Parkinson range 4.0% 0.76 45% A couple of general remarks: Whatever the volatility model, forecasts produced using the Parkinson range as a variance proxy are much less biased than those produced using either the squared close-to-close returns or the jump-adjusted Parkinson range The SMA model produces better25 forecasts than the random walk model (lines #1-#3 v.s. lines #4-#6) The EWMA model produces better25 forecasts than the SMA model (lines #4-#6 v.s. lines #7-#9) The “optimal” SMA model produces better25 forecasts than the “fixed window size” SMA model (lines #4-#6 v.s. lines #13-#15) The “optimal” EWMA model is comparable25 to the best “fixed decay factor” EWMA model (lines #7-#12 v.s. lines #16-#18) As a conclusion, using the Parkinson range as variance proxy seems to generate the most accurate volatility forecasts under both the simple and exponentially weighted moving average models, these two models being roughly comparable provided a proper window size $k$ and a proper decay factor $\lambda$ are chosen26. Volatility forecasting at daily level for the SPY ETF As a second example of usage, I propose to revisit the research note Risk Before Return: Targeting Volatility with Higher Frequency Data27 from Salt Financial, in which it is shown that using 15-minute intraday data allows to boost performance in [daily] volatility targeting strategies27 for the SPY ETF. Given the nature of this blog post, though, I will use the daily jump-adjusted Parkinson range instead of the scaled28 intraday high frequency realized volatility measure used by Salt Financial people. The underlying rationale is the following: The Parkinson range theoretically possesses the same informational content as 2-hour or 3-hour intraday data29, so that it should be a poor man’s substitute to the Salt Financial intraday high frequency realized volatility measure The jump-adjusted Parkinson range takes into account overnight jumps, so that it should be a poor man’s substitute to the Salt Financial scaled intraday high frequency realized volatility measure Volatility targeting Volatility targeting is a portfolio risk-management strategy which consists in scaling the exposure of a portfolio to risky assets in order to target a constant level of portfolio volatility30. In order to achieve this, the proportion of the portfolio allocated to risky assets is regularly adjusted w.r.t. the forecasted portfolio volatility. More formally, a (non-leveraged) portfolio targeting a constant level of volatility $\sigma_{target}$ is regularly adjusted at rebalancing times $t_1,t_2,…$ so that: A proportion $w_{t_i} = \frac{\sigma_{target}}{\hat{\sigma}_{t_i}}$ % of the portfolio is invested into risky assets, with $w_{t_i}$ varying from 0% to 100% A proportion $1 - w_{t_i}$% of the portfolio is invested into a risk-free asset , where $\hat{\sigma}_{t_i}$ is the forecasted portfolio volatility at time $t_i$ over the next rebalancing time $t_{i+1}$. Volatility targeting exhibits several interesting characteristics from a risk-management perspective, c.f. Harvey et al30: It improves the Sharpe ratios (equity and credit assets) It reduces the likelihood of extreme returns (all assets) It reduces the volatility of returns volatility (all assets) It reduces maximum drawdowns (all assets) Daily volatility targeting for the SPY ETF using high frequency intraday data Figure 1, directly taken from the Salt Financial research note27, illustrates a daily volatility targeting strategy31 for the SPY ETF over the period December 2003 - March 2020, using three different volatility forecasting models: 30D HV - the 30-day simple moving average of the square root of the daily squared close-to-close returns32 VIX - the daily value of the VIX index 2D RV - the 2-day simple moving average of a scaled realized volatility measure based on intraday returns sampled at a 15-minute frequency Figure 1. Daily volatility targeting strategy for misc. volatility proxies, SPY ETF, December 2003 - March 2020. Source: Salt Financial. On this figure, the added value of high frequency data in terms of improved performances for the associated volatility targeting strategy is clear. Daily volatility targeting for the SPY ETF using open/high/low/close intraday data Figure 2 illustrates my reproduction of Figure 1, using two different volatility forecasting models: 30d HV - the volatility forecast33 produced by the 30-day simple moving average of the square root of the daily squared close-to-close returns 2d JAPR - the volatility forecast33 produced by the 2-day simple moving average of the daily jump-adjusted Parkinson range Figure 2. Daily volatility targeting strategy for misc. volatility proxies, SPY ETF, December 2003 - March 2020. Comparing Figure 1 and Figure 2, it seems that performance metrics differ for buy and hold34, so that an absolute comparison between the different volatility targeting strategies will not be possible. Nevertheless, it is visible in Figure 2 that the volatility forecasting model based on the jump-adjusted Parkinson range exhibit a better volatility control and a slightly higher average return than the volatility forecasting model based on squared close-to-close returns. While the relative improvement is not as dramatic as in Figure 135, this empirically demonstrates that using open/high/low/close intraday data can definitely be beneficial to a volatility targeting strategy. Conclusion In this blog post, I showed that simple volatility forecasting models can be useful in practice, especially when paired with range-based volatility estimators like the Parkinson estimator. Next in this series dedicated to volatility forecasting, I will detail the reference model when it comes to volatility forecasting36 - the GARCH model - and as usual, I will add my own twist to it. Meanwhile, feel free to connect with me on LinkedIn or to follow me on Twitter. – The term returns is loosely defined here, but think for example daily close-to-close returns for daily volatility forecasting. ↩ See Andrew J. Patton, Volatility forecast comparison using imperfect volatility proxies, Journal of Econometrics, Volume 160, Issue 1, 2011, Pages 246-256. ↩ ↩2 ↩3 ↩4 See Boudoukh, J., Richardson, M., & Whitelaw, R.F. (1997). Investigation of a class of volatility estimators, Journal of Derivatives, 4 Spring, 63-71. ↩ ↩2 ↩3 ↩4 ↩5 See Figlewski, S. (1997), Forecasting Volatility. Financial Markets, Institutions & Instruments, 6: 1-88. ↩ ↩2 ↩3 ↩4 Assuming the time period $t$ is of unit length. ↩ See Fischer Black, One Way to Estimate Volatility, Black on Options, Vol 1, No. 8, May 17, 1976. ↩ Indeed, it is extremely difficult to obtain an accurate mean estimate from the data4 and the real problem as far as volatility calculation is concerned is to avoid using extreme sample mean returns that will periodically be produced from short data samples4. ↩ In the statistical sense, c.f. Wikipedia. ↩ See Alizadeh, S., Brandt, M.W. and Diebold, F.X. (2002), Range-Based Estimation of Stochastic Volatility Models. The Journal of Finance, 57: 1047-1091. ↩ That being said, Brunetti and Lildholdt37 showed that the conditions of validity of the Parkinson volatility estimator could be relaxed, allowing the process driving the asset returns to exhibit time-varying volatility and fat tails! ↩ See Richard D.F. Harris, Fatih Yilmaz, Estimation of the conditional variance-covariance matrix of returns using the intraday range, International Journal of Forecasting, Volume 26, Issue 1, 2010, Pages 180-194. ↩ ↩2 Which, by the way, is a stylized fact of volatility38. ↩ By recursive substitution, an estimate of the asset next $h$-period’s ahead volatility $\hat{\sigma}_{T+h}$, $h \geq 2$, is a weighted moving average of that asset past periods variance proxies plus that asset past periods variance forecasts. ↩ See Lazard Asset Management, Predicting Volatility, December 2015. ↩ ↩2 ↩3 See RiskMetrics. Technical Document, J.P.Morgan/Reuters, New York, 1996. Fourth Edition. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 See Fulvio Corsi, A Simple Approximate Long-Memory Model of Realized Volatility, Journal of Financial Econometrics, Volume 7, Issue 2, Spring 2009, Pages 174-196. ↩ See Brooks, Chris and Persand, Gitanjali (2003) Volatility forecasting for risk management. Journal of Forecasting, 22(1). pp. 1-22. ↩ ↩2 ↩3 ↩4 There are other possible choices for the initial value $\hat{\sigma}_{0}^2. ↩ Other procedures are described in the RiskMetrics technical document15. ↩ See Axel A. Araneda, Asset volatility forecasting:The optimal decay parameter in the EWMA model, arXiv. ↩ ↩2 ↩3 ↩4 These ETFs are used in the Adaptative Asset Allocation strategy from ReSolve Asset Management, described in the paper Adaptive Asset Allocation: A Primer39. ↩ See Mincer, J. and V. Zarnowitz (1969). The evaluation of economic forecasts. In J. Mincer (Ed.), Economic Forecasts and Expectations. ↩ The common ending price history of all the ETFs is 31 August 2023, but there is no common starting price history, as all ETFs started trading on different dates. ↩ For the exponentially weighted moving average, I used an expanding window for the volatility forecast computation. ↩ In terms of lower $\alpha$, $\beta$ closer to 1 and higher $R^2$. ↩ ↩2 ↩3 ↩4 Personally, I would recommend to use the exponentially weighted moving average model with the optimal decay factor $\lambda$ automatically determined. ↩ See Salt Financial, Risk Before Return: Targeting Volatility with Higher Frequency Data, Research Note. ↩ ↩2 ↩3 Salt Financial people scale their proposed intraday high frequency realized volatility measure in order to account for overnight returns. ↩ See Andersen, T. G. and Bollerslev, T.: 1998, Answering the skeptics: Yes, standard volatility models do provide accurate forecasts, International Economic Review 39, 885-905. ↩ See Harvey, Campbell R. and Hoyle, Edward and Korgaonkar, Russell and Rattray, Sandy and Sargaison, Matthew and van Hemert, Otto, The Impact of Volatility Targeting. ↩ ↩2 Implemented with a one-day lag. ↩ That’s my understanding of the term “historical volatility” in the Salt Financial research note27. ↩ At a 2-day horizon, due to the Salt Financial one-day lag implementation27. ↩ ↩2 Probably because the research note from Salt Financial27 uses excess returns whereas my reproduction uses raw returns. ↩ In Figure 1, the increase in terms of Sharpe ratio is 0.09, while in Figure 2, it is only 0.05. ↩ See Hansen, P.R. and Lunde, A. (2005), A forecast comparison of volatility models: does anything beat a GARCH(1,1)?. J. Appl. Econ., 20: 873-889. ↩ See Brunetti, Celso and Lildholdt, Peter M., Return-Based and Range-Based (Co)Variance Estimation - with an Application to Foreign Exchange Markets (March 2002). ↩ See R. Cont (2001) Empirical properties of asset returns: stylized facts and statistical issues, Quantitative Finance, 1:2, 223-236. ↩ See Butler, Adam and Philbrick, Mike and Gordillo, Rodrigo and Varadi, David, Adaptive Asset Allocation: A Primer. ↩Range-Based Volatility Estimators: Overview and Examples of Usage2023-09-20T00:00:00-05:002023-09-20T00:00:00-05:00https://portfoliooptimizer.io/blog/range-based-volatility-estimators-overview-and-examples-of-usage<p>Volatility estimation and forecasting plays a crucial role in many areas of finance.</p>
<p>For example, standard risk-based portfolio allocation methods (minimum variance, equal risk contributions,
<a href="/blog/hierarchical-risk-parity-introducing-graph-theory-and-machine-learning-in-portfolio-optimizer/">hierarchical risk parity</a>…) critically depend on the ability to build accurate volatility forecasts<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote">1</a></sup>.</p>
<p>Multiple methods for estimating volatility have been proposed over the past several decades, and in this blog post I will focus on range-based volatility estimators.</p>
<p>These estimators, the first of which introduced by Parkinson<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote">2</a></sup> as a way to compute the <em>true variance of the rate of return of a common stock</em><sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote">2</a></sup>, rely on the highest and lowest prices
of an asset over a given time period to estimate its volatility, hence their name<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote">3</a></sup>.</p>
<p>After describing the four most well known range-based volatility estimators, I will reproduce the analysis of <a href="https://artursepp.com/">Arthur Sepp</a> in his presentation <em>Volatility Modelling and Trading</em><sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote">4</a></sup>
made at Global Derivatives Conference 2016 and test the predictive power of the naive volatility forecasts produced by these estimators for various ETFs.</p>
<blockquote>
<p><strong><em>Notes:</em></strong><br />
A very accessible series of papers about range-based volatility estimators has recently<sup id="fnref:32" role="doc-noteref"><a href="#fn:32" class="footnote">5</a></sup> been released by people at <a href="https://www.lombardodier.com/home.html">Lombard Odier</a>, c.f. <a href="https://am.lombardodier.com/fr/en/contents/news/investment-viewpoints/2023/february/1148-NA-NA-NA-volatility.html">here</a>, <a href="https://am.lombardodier.com/fr/en/contents/news/investment-viewpoints/2023/may/1148-NA-NA-NA-volatility.html">here</a> and <a href="https://am.lombardodier.com/fr/en/contents/news/investment-viewpoints/2023/august/1148-MARS-PROD-risk-based.html">here</a>.</p>
</blockquote>
<h2 id="mathematical-preliminaries">Mathematical preliminaries</h2>
<h3 id="volatility-modelling">Volatility modelling</h3>
<p>One of the main<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote">6</a></sup> assumptions made when working with range-based volatility estimators<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote">7</a></sup> is that the price movements $S_t$ of the asset under consideration follow a
<a href="https://en.wikipedia.org/wiki/Geometric_Brownian_motion">geometric Brownian motion</a> with unknown volatility coefficient<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote">8</a></sup> $\sigma$ and unknown drift coefficient $\mu$, that is</p>
\[d S_t = \mu S_t dt + \sigma S_t dW_t\]
<p>, where $W_t$ is a standard Brownian motion.</p>
<p>Under this working assumption, $\sigma$ represents the volatility of the asset.</p>
<h3 id="volatility-and-variance-estimators">Volatility and variance estimators</h3>
<p>Although anyone can empirically observe the impact of “volatility” on the prices of a given asset, the volatility coefficient $\sigma$ of this asset is not directly observable<sup id="fnref:36" role="doc-noteref"><a href="#fn:36" class="footnote">9</a></sup> and must be estimated
using stock market information.</p>
<p>A <a href="https://en.wikipedia.org/wiki/Estimator">statistical estimator</a> of $\sigma$ is then called a <em>volatility estimator</em>, and a statistical estimator of $\sigma^2$ is called a <em>variance estimator</em>.</p>
<h3 id="efficiency-of-a-volatility-estimator">Efficiency of a volatility estimator</h3>
<p>In order to determine the quality of a volatility estimator, two measures are commonly used:</p>
<ul>
<li>
<p><a href="https://en.wikipedia.org/wiki/Bias_(statistics)">Bias</a></p>
<p>The bias of a volatility estimator measures whether this estimator produces, on average, too high or too low volatility estimates.</p>
<p>More formally, a volatility estimator $\sigma_A$ is said to be unbiased when $\mathbb{E}[\sigma_A] = \sigma$ and biased otherwise.</p>
</li>
<li>
<p><a href="https://en.wikipedia.org/wiki/Efficiency_(statistics)">Efficiency</a></p>
<p>The efficiency of a volatility estimator measures the uncertainty of the volatility estimates produced by this estimator, with the greater the efficiency of the estimator, the more accurate the volatility estimates.</p>
<p>More formally, the relative efficiency $Eff \left( \sigma_A \right)$ of a volatility estimator $\sigma_A$ compared to a reference volatility estimator $\sigma_B$ is defined as the ratio
of the variance of the estimator $\sigma_B^2$ over the variance of the estimator $\sigma_A^2$, that is,</p>
\[Eff \left( \sigma_A \right) = \frac{Var \left( \sigma_B^2 \right)}{Var \left( \sigma_A^2 \right)}\]
</li>
</ul>
<p>To be noted that bias and efficiency are sometimes conflicting, which is more generally known in statistics as <a href="https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff">the bias-variance tradeoff</a>.</p>
<h3 id="close-to-close-volatility-estimators">Close-to-close volatility estimators</h3>
<p>Let $C_1,…,C_T$ be the closing prices of an asset for $T$ time periods $t=1..T$<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote">10</a></sup>.</p>
<p>Then,</p>
\[\sigma_{cc,0} \left( T \right) = \sqrt{ \frac{1}{T-1} \sum_{i=2}^T \ln{\frac{C_i}{C_{i-1}}}^2 }\]
<p>is a biased<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote">11</a></sup> estimator of the asset volatility $\sigma$ over the $T$ time periods, assuming zero drift (i.e., $\mu = 0$), c.f. Parkinson<sup id="fnref:2:2" role="doc-noteref"><a href="#fn:2" class="footnote">2</a></sup>.</p>
<p>In addition,</p>
\[\sigma_{cc} \left( T \right) = \sqrt{ \frac{1}{T-2} \sum_{i=2}^T \left( \ln \frac{C_i}{C_{i-1}} - \mu_{cc} \right)^2 }\]
<p>, with $\mu_{cc} = \frac{1}{T-1} \sum_{i=2}^T \ln \frac{C_i}{C_{i-1}} $, is a biased<sup id="fnref:11:1" role="doc-noteref"><a href="#fn:11" class="footnote">11</a></sup> estimator of the asset volatility $\sigma$ over the $T$ time periods, assuming non-zero drift (i.e., $\mu \ne 0$), c.f. Yang and Zhang<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote">12</a></sup>.</p>
<p>These two estimators are known as <em>close-to-close volatility estimators</em>.</p>
<h2 id="range-based-volatility-estimators">Range-based volatility estimators</h2>
<p>Let be:</p>
<ul>
<li>$t=1..T$, $T$ time periods<sup id="fnref:10:1" role="doc-noteref"><a href="#fn:10" class="footnote">10</a></sup></li>
<li>$\left( O_1,H_1,L_1,C_1 \right), …, \left( O_T,H_T,L_T,C_T \right)$, the opening, highest, lowest and closing prices of an asset for time periods $t=1..T$</li>
</ul>
<p>As mentioned in the introduction, a volatility estimator fully or partially relying on the highest prices $H_t, t=1..T$ and on the lowest prices $L_t, t=1..T$ is called a <em>range-based volatility estimator</em>.</p>
<p>The underlying idea behind such estimators is that information contained in the asset high-low price ranges $H_t - L_t, t=1..T$ should allow to build volatility estimators that are more efficient
than the close-to-close volatility estimators, which use only one price inside this range<sup id="fnref:17" role="doc-noteref"><a href="#fn:17" class="footnote">13</a></sup>.</p>
<p>This quest for efficiency is important because, contrary to one of the working assumptions<sup id="fnref:8:1" role="doc-noteref"><a href="#fn:8" class="footnote">6</a></sup>, the volatility of an asset is known to be time-varying<sup id="fnref:20" role="doc-noteref"><a href="#fn:20" class="footnote">14</a></sup>, so that the less the number of time periods required
to estimate its volatility, the more chances that its volatility is constant(ish) over the time periods under consideration.</p>
<p>As Rogers et al.<sup id="fnref:16" role="doc-noteref"><a href="#fn:16" class="footnote">15</a></sup> put it:</p>
<blockquote>
<p>[…] volatility may change over long periods of time; a highly efficient procedure will allow researchers to estimate volatility with a small number of observations.</p>
</blockquote>
<h3 id="parkinson-volatility-estimator">Parkinson volatility estimator</h3>
<p>Parkinson<sup id="fnref:2:3" role="doc-noteref"><a href="#fn:2" class="footnote">2</a></sup> introduces an estimator for the diffusion coefficient of a Brownian motion without drift that relies on the highest and lowest observed values of this Brownian motion over a given time period.</p>
<p>When applied to the estimation of an asset volatility, this gives the <em>Parkinson volatility estimator</em> $\sigma_{P} \left( T \right)$ defined over $T$ time periods by</p>
\[\sigma_{P} \left( T \right) = \sqrt{\frac{1}{T}} \sqrt{\frac{1}{4 \ln 2} \sum_{i=1}^T \left( \ln \frac{H_i}{L_i} \right) ^2}\]
<p>Intuitively, the Parkinson estimator should be “better” than the close-to-close estimators because large price movements impacting the high-low price range $H_t - L_t$ but leaving the closing price
$C_t$ unchanged might occur within any time period $t$.</p>
<p>This is confirmed by the efficiency of this estimator, up to 5.2 times higher than the efficiency of the close-to-close estimators<sup id="fnref:14" role="doc-noteref"><a href="#fn:14" class="footnote">16</a></sup>.</p>
<h3 id="garman-klass-volatility-estimator">Garman-Klass volatility estimator</h3>
<p>Garman and Klass<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote">17</a></sup> propose to improve the Parkinson estimator by taking into account the opening prices $O_t, t=1..T$ and the closing prices $C_t, t=1..T$.</p>
<p>This leads to the <em>Garman-Klass volatility estimator</em> $\sigma_{GK} \left( T \right)$, defined over $T$ time periods by</p>
\[\sigma_{GK} \left( T \right) = \sqrt{\frac{1}{T}} \sqrt{ \sum_{i=1}^T \frac{1}{2} \left( \ln\frac{H_i}{L_i} \right) ^2 - \left( 2 \ln2 - 1 \right) \left( \ln\frac{C_i}{O_i} \right )^2 }\]
<p>For the historical comment, Garman and Klass<sup id="fnref:6:1" role="doc-noteref"><a href="#fn:6" class="footnote">17</a></sup> establish in their paper that $\sigma_{GK}$ is the “best reasonable”<sup id="fnref:18" role="doc-noteref"><a href="#fn:18" class="footnote">18</a></sup> volatility estimator that depends only on the high-open price range $H_t - O_t$,
the low-open price range $L_t - O_t$ and the close-open price range $C_t - O_t$, $t=1..T$.</p>
<p>The Garman-Klass estimator is up to 7.4 times more efficient than the close-to-close estimators<sup id="fnref:14:1" role="doc-noteref"><a href="#fn:14" class="footnote">16</a></sup>.</p>
<h3 id="rogers-satchell-volatility-estimator">Rogers-Satchell volatility estimator</h3>
<p>The Parkinson and the Garman-Klass estimators have both been derived under a zero drift assumption.</p>
<p>When this assumption is not verified for an asset, for example because of a strong upward or downward trend in the asset prices or because of the usage of large time periods (monthly, yearly…),
these estimators should in theory not be used because the quality of their volatility estimates is negatively impacted by the presence of a non-zero drift<sup id="fnref:15" role="doc-noteref"><a href="#fn:15" class="footnote">19</a></sup><sup id="fnref:16:1" role="doc-noteref"><a href="#fn:16" class="footnote">15</a></sup>.</p>
<p>In order to solve this problem, Rogers and Satchell<sup id="fnref:15:1" role="doc-noteref"><a href="#fn:15" class="footnote">19</a></sup> devise the <em>Rogers-Satchell volatility estimator</em> $\sigma_{RS} \left( T \right)$, defined over $T$ time periods by</p>
\[\sigma_{RS} \left( T \right) = \sqrt{\frac{1}{T}} \sqrt{ \sum_{i=1}^T \ln\frac{H_i}{C_i} \ln\frac{H_i}{O_i} - \ln\frac{L_i}{C_i} \ln\frac{L_i}{O_i} }\]
<p>The Rogers-Satchell estimator is up to 6 times more efficient than the close-to-close estimators<sup id="fnref:15:2" role="doc-noteref"><a href="#fn:15" class="footnote">19</a></sup>, which is less than the Garman-Klass estimator<sup id="fnref:21" role="doc-noteref"><a href="#fn:21" class="footnote">20</a></sup>.</p>
<h3 id="yang-zhang-volatility-estimator">Yang-Zhang volatility estimator</h3>
<p>The range-based volatility estimators discussed so far do not take into account opening jumps in an asset prices<sup id="fnref:25" role="doc-noteref"><a href="#fn:25" class="footnote">21</a></sup>, that is, the potential difference between an asset opening price $O_t$
and its closing price $C_{t-1}$ for a time period $t$<sup id="fnref:22" role="doc-noteref"><a href="#fn:22" class="footnote">22</a></sup>.</p>
<p>This limitation causes a systematic underestimation of the true volatility<sup id="fnref:9:1" role="doc-noteref"><a href="#fn:9" class="footnote">12</a></sup>.</p>
<p>When trying to integrate opening jumps into the Parkinson, the Garman-Klass and the Rogers-Satchell estimators, Yang and Zhang<sup id="fnref:9:2" role="doc-noteref"><a href="#fn:9" class="footnote">12</a></sup> discover that it is unfortunately not possible for any “reasonable”
<em>single-period</em><sup id="fnref:23" role="doc-noteref"><a href="#fn:23" class="footnote">23</a></sup> volatility estimator to properly handle both a non-zero drift and opening jumps.</p>
<p>This leads them to introduce the <em>multi-period</em><sup id="fnref:23:1" role="doc-noteref"><a href="#fn:23" class="footnote">23</a></sup> <em>Yang-Zhang volatility estimator</em> $\sigma_{YZ} \left( T \right)$, defined over $T$ time periods by</p>
\[\sigma_{YZ} \left( T \right) = \sqrt{ \sigma_{ov}^2+ k \sigma_{oc}^2 + (1-k ) \sigma_{RS}^2 ) }\]
<p>, where:</p>
<ul>
<li>
<p>$\sigma_{co} \left( T \right)$ is the close-to-open volatility, defined as</p>
\[\sigma_{co} = \sqrt{\frac{1}{T-2} \sum_{i=2}^T \left( \ln \frac{O_i}{C_{i-1}} - \mu_{co} \right)^2}\]
<p>, with $\mu_{co} = \frac{1}{T-1} \sum_{i=2}^T \ln \frac{O_i}{C_{i-1}}$</p>
</li>
<li>
<p>$\sigma_{oc} $ is the open-to-close volatility, defined as</p>
\[\sigma_{oc} \left( T \right) = \sqrt{\frac{1}{T-2} \sum_{i=2}^T \left( \ln \frac{O_i}{C_{i}} - \mu_{oc} \right)^2}\]
<p>, with $\mu_{oc} = \frac{1}{T-1} \sum_{i=2}^T \ln \frac{C_i}{O_{i}}$</p>
</li>
<li>
<p>$\sigma_{RS}$ is the Rogers-Satchell volatility estimator over the time periods $t=2..T$</p>
</li>
<li>
<p>$k = \frac{0.34}{1.34 + \frac{T}{T-2}}$</p>
</li>
</ul>
<p>In addition to the new estimator $\sigma_{YZ}$, Yang and Zhang<sup id="fnref:9:3" role="doc-noteref"><a href="#fn:9" class="footnote">12</a></sup> also provide multi-period versions of the Parkinson, the Garman-Klass and the Rogers-Satchell estimators that support opening jumps<sup id="fnref:24" role="doc-noteref"><a href="#fn:24" class="footnote">24</a></sup>.</p>
<p>The Yang-Zhang estimator is up to 14 times more efficient than the close-to-close estimators<sup id="fnref:9:4" role="doc-noteref"><a href="#fn:9" class="footnote">12</a></sup>, a result that Yang and Zhang<sup id="fnref:9:5" role="doc-noteref"><a href="#fn:9" class="footnote">12</a></sup> comment as follows</p>
<blockquote>
<p>The improvement of accuracy over the classical close-to-close estimator is dramatic for real-life time series</p>
</blockquote>
<h3 id="other-estimators">Other estimators</h3>
<p>The family of range-based volatility estimators has many other members:</p>
<ul>
<li>The <em>Kunitomo<sup id="fnref:37" role="doc-noteref"><a href="#fn:37" class="footnote">25</a></sup> volatility estimator</em></li>
<li>The <em>Alizadeh-Brandt-Diebold<sup id="fnref:30" role="doc-noteref"><a href="#fn:30" class="footnote">26</a></sup> volatility estimator</em></li>
<li>The <em>Meilijson<sup id="fnref:26" role="doc-noteref"><a href="#fn:26" class="footnote">27</a></sup> volatility estimator</em></li>
<li>…</li>
</ul>
<p>Still, the Parkinson, the Garman-Klass, the Rogers-Satchell and the Yang-Zhang volatility estimators are representative of this family, so that I will not detail any other range-based volatility estimator in this blog post.</p>
<h2 id="from-volatility-estimation-to-volatility-forecasting">From volatility estimation to volatility forecasting</h2>
<p>Range-based volatility estimators are <em>based on the assumption of independent sample and observations within the sample</em><sup id="fnref:4:1" role="doc-noteref"><a href="#fn:4" class="footnote">4</a></sup>, so that the corresponding volatility forecasts are simply naive
forecasts under a random walk model.</p>
<p>In other words, with such volatility estimators, the “natural” forecast of an asset volatility over the next $T$ time periods is the (past) estimate of the asset volatility over the last $T$ time periods.</p>
<p>That being said, it is perfectly possible to use range-based volatility estimates together with any volatility forecasting model such as:</p>
<ul>
<li>A time series forecasting model (simple moving average, exponentially weighted moving average…), as detailed for example in Jacob and Vipul<sup id="fnref:28" role="doc-noteref"><a href="#fn:28" class="footnote">28</a></sup></li>
<li>An econometric forecasting model (<a href="https://en.wikipedia.org/wiki/Autoregressive_conditional_heteroskedasticity#GARCH">GARCH</a> model…), c.f. Mapa<sup id="fnref:35" role="doc-noteref"><a href="#fn:35" class="footnote">29</a></sup></li>
<li>A specific range-based forecasting model (Chou’s<sup id="fnref:38" role="doc-noteref"><a href="#fn:38" class="footnote">30</a></sup> Conditional AutoRegressive Range model, Harris and Yilmaz’s<sup id="fnref:39" role="doc-noteref"><a href="#fn:39" class="footnote">31</a></sup> hybrid multivariate exponentially weighted moving average model…)</li>
</ul>
<h2 id="performance-of-range-based-volatility-estimators">Performance of range-based volatility estimators</h2>
<p>Theoretical and practical performances of range-based volatility estimators are studied in several papers, for example Shu and Zhang<sup id="fnref:27" role="doc-noteref"><a href="#fn:27" class="footnote">32</a></sup>, Jacob and Vipul<sup id="fnref:28:1" role="doc-noteref"><a href="#fn:28" class="footnote">28</a></sup> and Brandt and Kinlay<sup id="fnref:29" role="doc-noteref"><a href="#fn:29" class="footnote">33</a></sup>, among others.</p>
<p>Most of these studies agree that range-based volatility estimators are biased<sup id="fnref:11:2" role="doc-noteref"><a href="#fn:11" class="footnote">11</a></sup>, but other conclusions differ depending on the exact methodology used.</p>
<p>In particular, as highlighted by Brandt and Kinlay<sup id="fnref:29:1" role="doc-noteref"><a href="#fn:29" class="footnote">33</a></sup>, <em>the results from empirical research differ significantly from those seen in simulation studies in a number of respects</em><sup id="fnref:29:2" role="doc-noteref"><a href="#fn:29" class="footnote">33</a></sup>.</p>
<p>One perfect example of these differences is Shu and Zhang<sup id="fnref:27:1" role="doc-noteref"><a href="#fn:27" class="footnote">32</a></sup> concluding, using a Monte Carlo simulation, that</p>
<blockquote>
<p>If the drift term is large, the Parkinson estimator and the [Garman-Klass] estimator will significantly overestimate the true variance […]</p>
</blockquote>
<p>, while Jacob and Vipul<sup id="fnref:28:2" role="doc-noteref"><a href="#fn:28" class="footnote">28</a></sup> concluding, using real stock market data, that</p>
<blockquote>
<p>Overall, the [Garman-Klass] estimator, which indirectly adjusts for the drift, performs better for the high-drift stocks.</p>
</blockquote>
<p>Motivated by such inconsistencies, Lyocsa et al.<sup id="fnref:42" role="doc-noteref"><a href="#fn:42" class="footnote">34</a></sup>, building on Patton and Sheppard<sup id="fnref:41" role="doc-noteref"><a href="#fn:41" class="footnote">35</a></sup>, introduced what I will call the <em>Lyocsa-Plihal-Vyrost volatility estimator</em> $\sigma_{LPV}$, defined as the arithmetic average
of the Parkinson, the Garman-Klass and the Rogers-Satchell volatility estimators<sup id="fnref:43" role="doc-noteref"><a href="#fn:43" class="footnote">36</a></sup></p>
\[\sigma_{LPV} = \frac{\sigma_{P} + \sigma_{GK} + \sigma_{RS}}{3}\]
<p>As Lyocsa et al.<sup id="fnref:42:1" role="doc-noteref"><a href="#fn:42" class="footnote">34</a></sup> explain, <em>the motivation behind using the (naive) equally weighted average is based on the assumption that we have no prior information on which estimator might be more accurate</em><sup id="fnref:42:2" role="doc-noteref"><a href="#fn:42" class="footnote">34</a></sup>.</p>
<p>I personally like the idea of an averaged estimator, but at this point, I think it is safe to highlight that there is no “best” range-based volatility estimator…</p>
<h2 id="implementation-in-portfolio-optimizer">Implementation in <strong>Portfolio Optimizer</strong></h2>
<p><strong>Portfolio Optimizer</strong> implements all the volatility estimators discussed in this blog post:</p>
<ul>
<li>The close-to-close volatility estimators, through the endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/assets/volatility/estimation/close-to-close</code></a></li>
<li>The Parkinson volatility estimator, through the endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/assets/volatility/estimation/parkinson</code></a></li>
<li>The Garman-Klass volatility estimator, through the endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/assets/volatility/estimation/garman-klass</code></a></li>
<li>The original Garman-Klass volatility estimator<sup id="fnref:18:1" role="doc-noteref"><a href="#fn:18" class="footnote">18</a></sup>, through the endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/assets/volatility/estimation/garman-klass/original</code></a></li>
<li>The Rogers-Satchell volatility estimator, through the endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/assets/volatility/estimation/rogers-satchell</code></a></li>
<li>The Yang-Zhang volatility estimator, through the endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/assets/volatility/estimation/yang-zhang</code></a></li>
</ul>
<p>, as well as their jump-adjusted variations, whenever applicable.</p>
<h2 id="examples-of-usage">Examples of usage</h2>
<p>To illustrate possible uses of range-based volatility estimators, I propose to reproduce a couple of results from Sepp<sup id="fnref:4:2" role="doc-noteref"><a href="#fn:4" class="footnote">4</a></sup>:</p>
<ul>
<li>The estimation and the forecast of the SPY ETF monthly volatility</li>
<li>The forecast of the monthly volatility of misc. ETFs representative of different asset classes (U.S. treasuries, international stock market, gold…)</li>
</ul>
<p>Such examples will allow to compare the empirical behavior of the different volatility estimators and maybe reach a conclusion as to their relative performance in this specific setting.</p>
<h3 id="estimating-spy-etf-volatility">Estimating SPY ETF volatility</h3>
<p>I will estimate the SPY ETF monthly volatility using all the daily open/high/low/close prices<sup id="fnref:33" role="doc-noteref"><a href="#fn:33" class="footnote">37</a></sup> observed during that month<sup id="fnref:40" role="doc-noteref"><a href="#fn:40" class="footnote">38</a></sup>.</p>
<p>Figure 1, limited to 5 volatility estimators for readability purposes, illustrates the results obtained over the period 31 January 2005 - 29 February 2016<sup id="fnref:34" role="doc-noteref"><a href="#fn:34" class="footnote">39</a></sup>.</p>
<figure>
<a href="/assets/images/blog/range-based-volatility-estimators-spy-volatility.png"><img src="/assets/images/blog/range-based-volatility-estimators-spy-volatility.png" alt="SPY ETF monthly volatility estimates, using daily returns over the period 31 January 2005 - 29 February 2016." /></a>
<figcaption>Figure 1. SPY ETF monthly volatility estimates, using daily returns over the period 31 January 2005 - 29 February 2016.</figcaption>
</figure>
<p>Figure 1 is mostly identical to the figure on slide 22 from Sepp<sup id="fnref:4:3" role="doc-noteref"><a href="#fn:4" class="footnote">4</a></sup>, on which it seems in particular that the close-to-close and the Yang-Zhang volatility estimators
<em>provide higher estimates of volatility when the overall level of volatility is high</em><sup id="fnref:4:4" role="doc-noteref"><a href="#fn:4" class="footnote">4</a></sup>.</p>
<p>Overall, though, the behavior of the different volatility estimators is essentially the same on this specific example, which is confirmed by their correlations displayed in Figure 2.</p>
<figure>
<a href="/assets/images/blog/range-based-volatility-estimators-spy-estimates-correlations.png"><img src="/assets/images/blog/range-based-volatility-estimators-spy-estimates-correlations.png" alt="Correlations of SPY ETF monthly volatility estimates, using daily returns over the period 31 January 2005 - 29 February 2016." /></a>
<figcaption>Figure 2. Correlations of SPY ETF monthly volatility estimates, using daily returns over the period 31 January 2005 - 29 February 2016.</figcaption>
</figure>
<h3 id="forecasting-misc-etfs-volatility">Forecasting misc. ETFs volatility</h3>
<p>Using the same methodology as in Sepp<sup id="fnref:4:5" role="doc-noteref"><a href="#fn:4" class="footnote">4</a></sup>, I will now evaluate the quality of the naive forecasts produced by all the range-based volatility estimators implemented in <strong>Portfolio Optimizer</strong> against
the next month’s close-to-close observed volatility<sup id="fnref:44" role="doc-noteref"><a href="#fn:44" class="footnote">40</a></sup>, for 10 ETFs representative of misc. asset classes:</p>
<ul>
<li>U.S. stocks (SPY ETF)</li>
<li>European stocks (EZU ETF)</li>
<li>Japanese stocks (EWJ ETF)</li>
<li>Emerging markets stocks (EEM ETF)</li>
<li>U.S. REITs (VNQ ETF)</li>
<li>International REITs (RWX ETF)</li>
<li>U.S. 7-10 year Treasuries (IEF ETF)</li>
<li>U.S. 20+ year Treasuries (TLT ETF)</li>
<li>Commodities (DBC ETF)</li>
<li>Gold (GLD ETF)</li>
</ul>
<p>These ETFs are used in the <em>Adaptative Asset Allocation</em> strategy from <a href="https://investresolve.com/">ReSolve Asset Management</a>, described in the paper <em>Adaptive Asset Allocation: A Primer</em><sup id="fnref:31" role="doc-noteref"><a href="#fn:31" class="footnote">41</a></sup>.</p>
<p>For each ETF, Sepps’s methodology is as follows:</p>
<ul>
<li>
<p>At each month’s end, compute the volatility estimates $\sigma_{cc, t}$, $\sigma_{P, t}$, … using all the ETF daily open/high/low/close prices<sup id="fnref:33:1" role="doc-noteref"><a href="#fn:33" class="footnote">37</a></sup> observed during that month<sup id="fnref:40:1" role="doc-noteref"><a href="#fn:40" class="footnote">38</a></sup></p>
<p>Under a random walk volatility model, each of these estimates represents the next month’s volatility forecast $\hat{\sigma}_{t+1}$</p>
</li>
<li>
<p>At each month’s end, also compute the next month’s close-to-close volatility estimate $\sigma_{cc, t+1}$ using all the ETF daily close prices<sup id="fnref:33:2" role="doc-noteref"><a href="#fn:33" class="footnote">37</a></sup> observed during that month<sup id="fnref:40:2" role="doc-noteref"><a href="#fn:40" class="footnote">38</a></sup></p>
<p>This estimate is the volatility benchmark, which represents how the ETF “volatility” is perceived by an investor monitoring her portfolio daily.</p>
</li>
<li>
<p>Once all months have been processed that way, regress the volatility forecasts on the volatility benchmarks by applying the Mincer-Zarnowitz<sup id="fnref:49" role="doc-noteref"><a href="#fn:49" class="footnote">42</a></sup> regression model:</p>
\[\hat{\sigma}_{t+1} = \alpha + \beta \sigma_{cc, t+1} + \epsilon_{t+1}\]
<p>, where $\epsilon_{t+1}$ is an error term.</p>
</li>
</ul>
<p>Then, <em>the estimator producing [the best] volatility forecast is indicated by [a] high explanatory power R^2, [a] small intercept $\alpha$ and [a] $\beta$ coefficient close to one</em><sup id="fnref:4:6" role="doc-noteref"><a href="#fn:4" class="footnote">4</a></sup>.</p>
<h4 id="forecasting-spy-etf-volatility">Forecasting SPY ETF volatility</h4>
<p>In the case of the SPY ETF, Figure 3 illustrates Sepps’s methodology for the Lyocsa-Plihal-Vyrost volatility estimator $\sigma_{LPV}$ over the period 31 January 2005 - 29 February 2016.</p>
<figure>
<a href="/assets/images/blog/range-based-volatility-estimators-spy-lpv-regression.png"><img src="/assets/images/blog/range-based-volatility-estimators-spy-lpv-regression.png" alt="" /></a>
<figcaption>Figure 3. SPY ETF Lyocsa-Plihal-Vyrost naive monthly volatility forecasts v.s. next month's close-to-close volatility estimates, using daily returns over the period 31 January 2005 - 29 February 2016.</figcaption>
</figure>
<p>Detailed results for all regression models over the period 31 January 2005 - 29 February 2016:</p>
<table>
<thead>
<tr>
<th>Volatility estimator</th>
<th>$\alpha$</th>
<th>$\beta$</th>
<th>$R^2$</th>
</tr>
</thead>
<tbody>
<tr>
<td>Close-to-close</td>
<td>4.1%</td>
<td>0.75</td>
<td>57%</td>
</tr>
<tr>
<td>Close-to-close (zero drift)</td>
<td>3.9%</td>
<td>0.77</td>
<td>57%</td>
</tr>
<tr>
<td>Parkinson</td>
<td>3.5%</td>
<td>0.95</td>
<td>58%</td>
</tr>
<tr>
<td>Parkinson (jump-adjusted)</td>
<td>3.4%</td>
<td>0.79</td>
<td>58%</td>
</tr>
<tr>
<td>Garman-Klass</td>
<td>3.7%</td>
<td>0.92</td>
<td>57%</td>
</tr>
<tr>
<td>Garman-Klass (jump-adjusted)</td>
<td>3.6%</td>
<td>0.77</td>
<td>58%</td>
</tr>
<tr>
<td>Garman-Klass (original)</td>
<td>3.7%</td>
<td>0.92</td>
<td>57%</td>
</tr>
<tr>
<td>Garman-Klass (original, jump-adjusted)</td>
<td>3.6%</td>
<td>0.77</td>
<td>58%</td>
</tr>
<tr>
<td>Rogers-Satchell</td>
<td>4.0%</td>
<td>0.88</td>
<td>56%</td>
</tr>
<tr>
<td>Rogers-Satchell (jump-adjusted)</td>
<td>3.9%</td>
<td>0.74</td>
<td>57%</td>
</tr>
<tr>
<td>Yang-Zhang</td>
<td>3.8%</td>
<td>0.75</td>
<td>58%</td>
</tr>
<tr>
<td>Lyocsa-Plihal-Vyrost</td>
<td>3.7%</td>
<td>0.92</td>
<td>57%</td>
</tr>
</tbody>
</table>
<p>While these figures are far<sup id="fnref:47" role="doc-noteref"><a href="#fn:47" class="footnote">43</a></sup> from those on slide 42 from Sepp<sup id="fnref:4:7" role="doc-noteref"><a href="#fn:4" class="footnote">4</a></sup>, with for example nearly no variation in terms of $R^2$ among the different volatility estimators, two observations are similar:</p>
<ul>
<li>All volatility estimators have comparable $\alpha$</li>
<li>The Parkinson, the Garman-Klass and the Rogers-Satchell volatility estimators have a $\beta$ much closer to 1 than the close-to-close volatility estimator</li>
</ul>
<h4 id="forecasting-the-other-etfs-volatility">Forecasting the other ETFs volatility</h4>
<p>Going beyond the SPY ETF, averaged results for all ETFs/regression models over each ETF price history<sup id="fnref:46" role="doc-noteref"><a href="#fn:46" class="footnote">44</a></sup> are the following:</p>
<table>
<thead>
<tr>
<th>Volatility estimator</th>
<th>$\bar{\alpha}$</th>
<th>$\bar{\beta}$</th>
<th>$\bar{R^2}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>Close-to-close</td>
<td>5.8%</td>
<td>0.66</td>
<td>44%</td>
</tr>
<tr>
<td>Close-to-close (zero drift)</td>
<td>5.6%</td>
<td>0.67</td>
<td>45%</td>
</tr>
<tr>
<td>Parkinson</td>
<td>5.6%</td>
<td>0.94</td>
<td>44%</td>
</tr>
<tr>
<td>Parkinson (jump-adjusted)</td>
<td>4.9%</td>
<td>0.70</td>
<td>45%</td>
</tr>
<tr>
<td>Garman-Klass</td>
<td>5.7%</td>
<td>0.93</td>
<td>43%</td>
</tr>
<tr>
<td>Garman-Klass (jump-adjusted)</td>
<td>5.0%</td>
<td>0.70</td>
<td>44%</td>
</tr>
<tr>
<td>Garman-Klass (original)</td>
<td>5.7%</td>
<td>0.93</td>
<td>43%</td>
</tr>
<tr>
<td>Garman-Klass (original, jump-adjusted)</td>
<td>5.0%</td>
<td>0.70</td>
<td>44%</td>
</tr>
<tr>
<td>Rogers-Satchell</td>
<td>6.1%</td>
<td>0.88</td>
<td>42%</td>
</tr>
<tr>
<td>Rogers-Satchell (jump-adjusted)</td>
<td>5.2%</td>
<td>0.68</td>
<td>43%</td>
</tr>
<tr>
<td>Yang-Zhang</td>
<td>5.1%</td>
<td>0.69</td>
<td>44%</td>
</tr>
<tr>
<td>Lyocsa-Plihal-Vyrost</td>
<td>5.7%</td>
<td>0.92</td>
<td>43%</td>
</tr>
</tbody>
</table>
<p>A couple of remarks:</p>
<ul>
<li>Forecasts produced by all the volatility estimators explain on average only ~45% of the variability of the ETFs monthly volatility</li>
<li>Forecasts produced by the jump-adjusted volatility estimators seem to offer no improvement on average over the forecasts produced by the close-to-close volatility estimator</li>
<li>Forecasts produced by the Parkinson, the Garman-Klass and the Rogers-Satchell volatility estimators seem to be much less biased on average than the forecasts produced by the close-to-close volatility estimator, a property inherited by the Lyocsa-Plihal-Vyrost volatility estimator</li>
</ul>
<p>As an empirical conclusion, it is disappointing that the naive monthly volatility forecasts produced by range-based volatility estimators have about the same predictive power as the forecasts produced
by the close-to-close volatility estimator. Nevertheless, because these forecasts are much less biased than their close-to-close counterparts, they still represent an improvement for the many
investors who currently rely on close prices only<sup id="fnref:48" role="doc-noteref"><a href="#fn:48" class="footnote">45</a></sup>.</p>
<p>To also be noted, similar to one of the conclusions of Lyocsa et al.<sup id="fnref:42:3" role="doc-noteref"><a href="#fn:42" class="footnote">34</a></sup>, that the Lyocsa-Plihal-Vyrost volatility estimator should probably be preferred to the Parkinson, the Garman-Klass
or the Rogers-Satchell volatility estimators because <em>using only one range-based estimators has occasionally led to very inaccurate forecasts, which could successfully be avoided by using the
average of the three range-based estimators</em><sup id="fnref:42:4" role="doc-noteref"><a href="#fn:42" class="footnote">34</a></sup>.</p>
<h2 id="conclusion">Conclusion</h2>
<p>One aspect of range-based volatility estimators not discussed in this blog post is their capability to capture important stylized facts about asset returns<sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote">46</a></sup>.</p>
<p>This, together with possible ways to incorporate them in more predictive volatility models than the random walk model, will be the subject of future blog posts.</p>
<p>Meanwhile, for more volatile discussions, feel free to <a href="https://www.linkedin.com/in/roman-rubsamen/">connect with me on LinkedIn</a> or to <a href="https://twitter.com/portfoliooptim">follow me on Twitter</a>.</p>
<p>–</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>As well as correlation forecasts. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>See <a href="https://www.jstor.org/stable/2352357">Parkinson, Michael H., The Extreme Value Method for Estimating the Variance of the Rate of Return, The Journal of Business 53 (1980), 61-65</a>, which is the final version of the working paper <em>The random walk problem: extreme value method for estimating the variance of the displacement (diffusion constant)</em> started 4 years before. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:2:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:2:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Because the range of prices of an asset over a given time period is contained, by definition, within its highest and its lowest price. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>See <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2810768">Sepp, Artur, Volatility Modelling and Trading. Global Derivatives Workshop Global Derivatives Trading & Risk Management, Budapest, 2016</a>. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:4:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:4:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:4:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:4:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a> <a href="#fnref:4:5" class="reversefootnote" role="doc-backlink">↩<sup>6</sup></a> <a href="#fnref:4:6" class="reversefootnote" role="doc-backlink">↩<sup>7</sup></a> <a href="#fnref:4:7" class="reversefootnote" role="doc-backlink">↩<sup>8</sup></a></p>
</li>
<li id="fn:32" role="doc-endnote">
<p>At the date of publication of this post. <a href="#fnref:32" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>Other working assumptions are also commonly made, like assuming that the asset does not pay dividends, assuming that the volatility coefficient $\sigma$ remains constant, assuming that the geometric Brownian motion model also applies during time periods with no trading activity (e.g., stock market closure), etc. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:8:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>In details, the geometric Brownian motion assumption slightly differs between authors; for example, Garman and Klass<sup id="fnref:6:2" role="doc-noteref"><a href="#fn:6" class="footnote">17</a></sup> assume that asset prices follow a more generic diffusion process, which includes the geometric Brownian motion as a specific case. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>$\sigma$ is also called the diffusion coefficient of the geometric Brownian motion, but in the context of this blog post, I think it is clearer to explicitly call it the volatility coefficient. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:36" role="doc-endnote">
<p>See <a href="https://onlinelibrary.wiley.com/doi/10.1111/1468-0262.00418">Andersen, T., Bollerslev, T., Diebold, F., & Labys, P. (2003). Modeling and forecasting realized volatility. Econometrica, 71, 579–625</a>. <a href="#fnref:36" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>In practice, a time period $t$ usually corresponds to a trading day, a week or a month, so that the closing prices $C_t, t=1..T$ are simply the daily, weekly or monthly closing prices of the asset. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:10:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:11" role="doc-endnote">
<p>These estimators are biased, due to <a href="https://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation#Motivation">Jensen’s inequality</a>; c.f. also Molnar<sup id="fnref:13:1" role="doc-noteref"><a href="#fn:13" class="footnote">46</a></sup>. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:11:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:11:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>See <a href="https://www.jstor.org/stable/10.1086/209650">Yang, D., and Q. Zhang, 2000, Drift-Independent Volatility Estimation Based on High, Low, Open, and Close Prices, Journal of Business 73:477–491</a>. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:9:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:9:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:9:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:9:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a> <a href="#fnref:9:5" class="reversefootnote" role="doc-backlink">↩<sup>6</sup></a></p>
</li>
<li id="fn:17" role="doc-endnote">
<p>The asset closing price $C_t, t=1..T$. <a href="#fnref:17" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:20" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/abs/pii/0304405X87900262">French, K. R., Schwert, G. W., & Stambaugh, R. F. (1987). Expected stock returns and volatility. Journal of Financial Economics, 19, 3–29</a>. <a href="#fnref:20" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:16" role="doc-endnote">
<p>See <a href="https://www.tandfonline.com/doi/abs/10.1080/758526905?journalCode=rafe20">L. C. G. Rogers, S. E. Satchell & Y. Yoon (1994) Estimating the volatility of stock prices: a comparison of methods that use high and low prices, Applied Financial Economics, 4:3, 241-247</a>. <a href="#fnref:16" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:16:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:14" role="doc-endnote">
<p>See <a href="https://www.trading-volatility.com/">Colin Bennett, Trading Volatility, Correlation, Term Structure and Skew</a>. <a href="#fnref:14" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:14:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>See <a href="https://www.jstor.org/stable/2352358">Garman, M. B., and M. J. Klass, 1980, On the Estimation of Security Price Volatilities from Historical Data, Journal of Business 53:67–78</a>. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:6:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:6:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:18" role="doc-endnote">
<p>More precisely, Garman and Klass<sup id="fnref:6:3" role="doc-noteref"><a href="#fn:6" class="footnote">17</a></sup> establish that a variation of $\sigma_{GK}$ is the “best” reasonable estimator but note that $\sigma_{GK}$ is 1) more practical and 2) as efficient as this variation, which I will call the <em>original Garman-Klass volatility estimator</em> $\sigma_{GKo}$. <a href="#fnref:18" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:18:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:15" role="doc-endnote">
<p>See <a href="https://www.jstor.org/stable/2959703">L. C. G. Rogers and S. E. Satchell, Estimating Variance From High, Low and Closing Prices, The Annals of Applied Probability, Vol. 1, No. 4 (Nov., 1991), pp. 504-512</a>. <a href="#fnref:15" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:15:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:15:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:21" role="doc-endnote">
<p>Such a decrease in efficiency cannot be avoided because the Rogers-Satchell estimator belongs to class of estimators studied in Garman and Klass<sup id="fnref:6:4" role="doc-noteref"><a href="#fn:6" class="footnote">17</a></sup>, so that its efficiency is necessarily smaller than the efficiency of the Garman-Klass estimator (maximal by definition). <a href="#fnref:21" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:25" role="doc-endnote">
<p>Garman and Klass<sup id="fnref:6:5" role="doc-noteref"><a href="#fn:6" class="footnote">17</a></sup> provide a volatility estimator that takes into account opening jumps, but this estimator has a dependency on an unknown $f$ parameter which makes it unusable in practice; Yang and Zhang<sup id="fnref:9:6" role="doc-noteref"><a href="#fn:9" class="footnote">12</a></sup> show that this dependency is actually spurious and provide a usable form of this estimator. <a href="#fnref:25" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:22" role="doc-endnote">
<p>When the time periods $t$ are measured in trading days, opening jumps are called overnight jumps. <a href="#fnref:22" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:23" role="doc-endnote">
<p>A single-period volatility estimator is a volatility estimator that can be used to estimate the volatility of an asset over a single time period $t$ using price data for this time period only; for example, the Parkinson, the Garman-Klass and the Rogers-Satchell estimators are single-period estimators while the close-to-close estimators are multi-period estimators. <a href="#fnref:23" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:23:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:24" role="doc-endnote">
<p>C.f. also Molnar<sup id="fnref:13:2" role="doc-noteref"><a href="#fn:13" class="footnote">46</a></sup> on this subject. <a href="#fnref:24" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:37" role="doc-endnote">
<p>See <a href="https://www.jstor.org/stable/2353167">Kunitomo, N. (1992). Improving the Parkinson method of estimating security price volatilities. Journal of Business, 65, 295–302</a>. <a href="#fnref:37" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:30" role="doc-endnote">
<p>See <a href="https://onlinelibrary.wiley.com/doi/10.1111/1540-6261.00454">Alizadeh ,S., Brandt, W. M., and Diebold, X.F., 2002. Range-based estimation of stochastic volatility models. Journal of Finance 57: 1047-1091</a>. <a href="#fnref:30" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:26" role="doc-endnote">
<p>See <a href="https://revstat.ine.pt/index.php/REVSTAT/article/view/104">Meilijson , I. (2011). The Garman–Klass Volatility Estimator Revisited. REVSTAT-Statistical Journal, 9(3), 199–212</a>. <a href="#fnref:26" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:28" role="doc-endnote">
<p>See <a href="https://onlinelibrary.wiley.com/doi/abs/10.1002/fut.20321">Jacob, J. and Vipul, (2008), Estimation and forecasting of stock volatility with range-based estimators. J. Fut. Mark., 28: 561-581</a>. <a href="#fnref:28" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:28:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:28:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:35" role="doc-endnote">
<p>See <a href="https://ideas.repec.org/p/pra/mprapa/21323.html">Mapa, Dennis S., 2003. A Range-Based GARCH Model for Forecasting Volatility, MPRA Paper 21323, University Library of Munich, Germany</a>. <a href="#fnref:35" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:38" role="doc-endnote">
<p>See <a href="https://www.jstor.org/stable/3839168">Chou, R.Y. (2005). Forecasting Financial Volatilities with Extreme Values: The Conditional Autoregressive Range (CARR) Model. Journal of Money Credit and Banking, 37(3): 561-582</a>. <a href="#fnref:38" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:39" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/abs/pii/S0169207009000466">Harris, R. D. F., & Yilmaz, F. (2010). Estimation of the conditional variance–covariance matrix of returns using the intraday range. International Journal of Forecasting, 26, 180–194</a>. <a href="#fnref:39" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:27" role="doc-endnote">
<p>See <a href="https://onlinelibrary.wiley.com/doi/10.1002/fut.20197">Shu, J. and Zhang, J.E. (2006), Testing range estimators of historical volatility. J. Fut. Mark., 26: 297-313</a>. <a href="#fnref:27" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:27:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:29" role="doc-endnote">
<p>See <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4384038">Brandt, Michael W. and Kinlay, J, Estimating Historical Volatility (March 10, 2005)</a>. <a href="#fnref:29" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:29:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:29:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:42" role="doc-endnote">
<p>See <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7526631/#bib0024">Lyocsa S, Plihal T, Vyrost T. FX market volatility modelling: Can we use low-frequency data? Financ Res Lett. 2021 May;40:101776. doi: 10.1016/j.frl.2020.101776. Epub 2020 Sep 30</a>. <a href="#fnref:42" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:42:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:42:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:42:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:42:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a></p>
</li>
<li id="fn:41" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/abs/pii/S0169207009000107">Patton A.J., Sheppard K. Optimal combinations of realised volatility estimators. Int. J. Forecast. 2009;25(2):218–238</a>. <a href="#fnref:41" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:43" role="doc-endnote">
<p>The Yang-Zhang volatility estimator is excluded to avoid mixing jump-adjusted volatility estimators with non-jump-adjusted ones. <a href="#fnref:43" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:33" role="doc-endnote">
<p>(Adjusted) prices have have been retrieved using <a href="https://api.tiingo.com/">Tiingo</a>. <a href="#fnref:33" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:33:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:33:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:40" role="doc-endnote">
<p>The jump-adjusted Yang-Zhang volatility estimator, as well as the close-to-close volatility estimators, require the closing price of the last day of the previous month as an additional price. <a href="#fnref:40" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:40:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:40:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:34" role="doc-endnote">
<p>This period more or less matches with the period used in Sepp<sup id="fnref:4:8" role="doc-noteref"><a href="#fn:4" class="footnote">4</a></sup>. <a href="#fnref:34" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:44" role="doc-endnote">
<p>The next month’s close-to-close volatility is then taken as a proxy for the next month’s <em>realized volatility</em>; this choice is important, because different proxies might result in different conclusions as to the out-of-sample forecast performances. <a href="#fnref:44" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:31" role="doc-endnote">
<p>See <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2328254">Butler, Adam and Philbrick, Mike and Gordillo, Rodrigo and Varadi, David, Adaptive Asset Allocation: A Primer</a>. <a href="#fnref:31" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:49" role="doc-endnote">
<p>See <a href="https://econpapers.repec.org/bookchap/nbrnberch/1214.htm">Mincer, J. and V. Zarnowitz (1969). The evaluation of economic forecasts. In J. Mincer (Ed.), Economic Forecasts and Expectations</a>. <a href="#fnref:49" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:47" role="doc-endnote">
<p>This is due to slight differences in methodology, with mainly 1) the definition of “monthly volatility” in Sepp<sup id="fnref:4:9" role="doc-noteref"><a href="#fn:4" class="footnote">4</a></sup> taken to be the volatility from the 3rd Friday of a month to the 3rd Friday of the next month and 2) the usage in Sepp<sup id="fnref:4:10" role="doc-noteref"><a href="#fn:4" class="footnote">4</a></sup> of a linear regression model robust to outliers. <a href="#fnref:47" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:46" role="doc-endnote">
<p>The common ending price history of all the ETFs is 31 August 2023, but there is no common starting price history, as all ETFs started trading on different dates. <a href="#fnref:46" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:48" role="doc-endnote">
<p>For example, for all investors running some kind of monthly tactical asset allocation strategy. <a href="#fnref:48" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:13" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/abs/pii/S1057521911000731">Peter Molnar, Properties of range-based volatility estimators, International Review of Financial Analysis, Volume 23, 2012, Pages 20-29,</a>. <a href="#fnref:13" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:13:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:13:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
</ol>
</div>Roman R.Volatility estimation and forecasting plays a crucial role in many areas of finance. For example, standard risk-based portfolio allocation methods (minimum variance, equal risk contributions, hierarchical risk parity…) critically depend on the ability to build accurate volatility forecasts1. Multiple methods for estimating volatility have been proposed over the past several decades, and in this blog post I will focus on range-based volatility estimators. These estimators, the first of which introduced by Parkinson2 as a way to compute the true variance of the rate of return of a common stock2, rely on the highest and lowest prices of an asset over a given time period to estimate its volatility, hence their name3. After describing the four most well known range-based volatility estimators, I will reproduce the analysis of Arthur Sepp in his presentation Volatility Modelling and Trading4 made at Global Derivatives Conference 2016 and test the predictive power of the naive volatility forecasts produced by these estimators for various ETFs. Notes: A very accessible series of papers about range-based volatility estimators has recently5 been released by people at Lombard Odier, c.f. here, here and here. Mathematical preliminaries Volatility modelling One of the main6 assumptions made when working with range-based volatility estimators7 is that the price movements $S_t$ of the asset under consideration follow a geometric Brownian motion with unknown volatility coefficient8 $\sigma$ and unknown drift coefficient $\mu$, that is \[d S_t = \mu S_t dt + \sigma S_t dW_t\] , where $W_t$ is a standard Brownian motion. Under this working assumption, $\sigma$ represents the volatility of the asset. Volatility and variance estimators Although anyone can empirically observe the impact of “volatility” on the prices of a given asset, the volatility coefficient $\sigma$ of this asset is not directly observable9 and must be estimated using stock market information. A statistical estimator of $\sigma$ is then called a volatility estimator, and a statistical estimator of $\sigma^2$ is called a variance estimator. Efficiency of a volatility estimator In order to determine the quality of a volatility estimator, two measures are commonly used: Bias The bias of a volatility estimator measures whether this estimator produces, on average, too high or too low volatility estimates. More formally, a volatility estimator $\sigma_A$ is said to be unbiased when $\mathbb{E}[\sigma_A] = \sigma$ and biased otherwise. Efficiency The efficiency of a volatility estimator measures the uncertainty of the volatility estimates produced by this estimator, with the greater the efficiency of the estimator, the more accurate the volatility estimates. More formally, the relative efficiency $Eff \left( \sigma_A \right)$ of a volatility estimator $\sigma_A$ compared to a reference volatility estimator $\sigma_B$ is defined as the ratio of the variance of the estimator $\sigma_B^2$ over the variance of the estimator $\sigma_A^2$, that is, \[Eff \left( \sigma_A \right) = \frac{Var \left( \sigma_B^2 \right)}{Var \left( \sigma_A^2 \right)}\] To be noted that bias and efficiency are sometimes conflicting, which is more generally known in statistics as the bias-variance tradeoff. Close-to-close volatility estimators Let $C_1,…,C_T$ be the closing prices of an asset for $T$ time periods $t=1..T$10. Then, \[\sigma_{cc,0} \left( T \right) = \sqrt{ \frac{1}{T-1} \sum_{i=2}^T \ln{\frac{C_i}{C_{i-1}}}^2 }\] is a biased11 estimator of the asset volatility $\sigma$ over the $T$ time periods, assuming zero drift (i.e., $\mu = 0$), c.f. Parkinson2. In addition, \[\sigma_{cc} \left( T \right) = \sqrt{ \frac{1}{T-2} \sum_{i=2}^T \left( \ln \frac{C_i}{C_{i-1}} - \mu_{cc} \right)^2 }\] , with $\mu_{cc} = \frac{1}{T-1} \sum_{i=2}^T \ln \frac{C_i}{C_{i-1}} $, is a biased11 estimator of the asset volatility $\sigma$ over the $T$ time periods, assuming non-zero drift (i.e., $\mu \ne 0$), c.f. Yang and Zhang12. These two estimators are known as close-to-close volatility estimators. Range-based volatility estimators Let be: $t=1..T$, $T$ time periods10 $\left( O_1,H_1,L_1,C_1 \right), …, \left( O_T,H_T,L_T,C_T \right)$, the opening, highest, lowest and closing prices of an asset for time periods $t=1..T$ As mentioned in the introduction, a volatility estimator fully or partially relying on the highest prices $H_t, t=1..T$ and on the lowest prices $L_t, t=1..T$ is called a range-based volatility estimator. The underlying idea behind such estimators is that information contained in the asset high-low price ranges $H_t - L_t, t=1..T$ should allow to build volatility estimators that are more efficient than the close-to-close volatility estimators, which use only one price inside this range13. This quest for efficiency is important because, contrary to one of the working assumptions6, the volatility of an asset is known to be time-varying14, so that the less the number of time periods required to estimate its volatility, the more chances that its volatility is constant(ish) over the time periods under consideration. As Rogers et al.15 put it: […] volatility may change over long periods of time; a highly efficient procedure will allow researchers to estimate volatility with a small number of observations. Parkinson volatility estimator Parkinson2 introduces an estimator for the diffusion coefficient of a Brownian motion without drift that relies on the highest and lowest observed values of this Brownian motion over a given time period. When applied to the estimation of an asset volatility, this gives the Parkinson volatility estimator $\sigma_{P} \left( T \right)$ defined over $T$ time periods by \[\sigma_{P} \left( T \right) = \sqrt{\frac{1}{T}} \sqrt{\frac{1}{4 \ln 2} \sum_{i=1}^T \left( \ln \frac{H_i}{L_i} \right) ^2}\] Intuitively, the Parkinson estimator should be “better” than the close-to-close estimators because large price movements impacting the high-low price range $H_t - L_t$ but leaving the closing price $C_t$ unchanged might occur within any time period $t$. This is confirmed by the efficiency of this estimator, up to 5.2 times higher than the efficiency of the close-to-close estimators16. Garman-Klass volatility estimator Garman and Klass17 propose to improve the Parkinson estimator by taking into account the opening prices $O_t, t=1..T$ and the closing prices $C_t, t=1..T$. This leads to the Garman-Klass volatility estimator $\sigma_{GK} \left( T \right)$, defined over $T$ time periods by \[\sigma_{GK} \left( T \right) = \sqrt{\frac{1}{T}} \sqrt{ \sum_{i=1}^T \frac{1}{2} \left( \ln\frac{H_i}{L_i} \right) ^2 - \left( 2 \ln2 - 1 \right) \left( \ln\frac{C_i}{O_i} \right )^2 }\] For the historical comment, Garman and Klass17 establish in their paper that $\sigma_{GK}$ is the “best reasonable”18 volatility estimator that depends only on the high-open price range $H_t - O_t$, the low-open price range $L_t - O_t$ and the close-open price range $C_t - O_t$, $t=1..T$. The Garman-Klass estimator is up to 7.4 times more efficient than the close-to-close estimators16. Rogers-Satchell volatility estimator The Parkinson and the Garman-Klass estimators have both been derived under a zero drift assumption. When this assumption is not verified for an asset, for example because of a strong upward or downward trend in the asset prices or because of the usage of large time periods (monthly, yearly…), these estimators should in theory not be used because the quality of their volatility estimates is negatively impacted by the presence of a non-zero drift1915. In order to solve this problem, Rogers and Satchell19 devise the Rogers-Satchell volatility estimator $\sigma_{RS} \left( T \right)$, defined over $T$ time periods by \[\sigma_{RS} \left( T \right) = \sqrt{\frac{1}{T}} \sqrt{ \sum_{i=1}^T \ln\frac{H_i}{C_i} \ln\frac{H_i}{O_i} - \ln\frac{L_i}{C_i} \ln\frac{L_i}{O_i} }\] The Rogers-Satchell estimator is up to 6 times more efficient than the close-to-close estimators19, which is less than the Garman-Klass estimator20. Yang-Zhang volatility estimator The range-based volatility estimators discussed so far do not take into account opening jumps in an asset prices21, that is, the potential difference between an asset opening price $O_t$ and its closing price $C_{t-1}$ for a time period $t$22. This limitation causes a systematic underestimation of the true volatility12. When trying to integrate opening jumps into the Parkinson, the Garman-Klass and the Rogers-Satchell estimators, Yang and Zhang12 discover that it is unfortunately not possible for any “reasonable” single-period23 volatility estimator to properly handle both a non-zero drift and opening jumps. This leads them to introduce the multi-period23 Yang-Zhang volatility estimator $\sigma_{YZ} \left( T \right)$, defined over $T$ time periods by \[\sigma_{YZ} \left( T \right) = \sqrt{ \sigma_{ov}^2+ k \sigma_{oc}^2 + (1-k ) \sigma_{RS}^2 ) }\] , where: $\sigma_{co} \left( T \right)$ is the close-to-open volatility, defined as \[\sigma_{co} = \sqrt{\frac{1}{T-2} \sum_{i=2}^T \left( \ln \frac{O_i}{C_{i-1}} - \mu_{co} \right)^2}\] , with $\mu_{co} = \frac{1}{T-1} \sum_{i=2}^T \ln \frac{O_i}{C_{i-1}}$ $\sigma_{oc} $ is the open-to-close volatility, defined as \[\sigma_{oc} \left( T \right) = \sqrt{\frac{1}{T-2} \sum_{i=2}^T \left( \ln \frac{O_i}{C_{i}} - \mu_{oc} \right)^2}\] , with $\mu_{oc} = \frac{1}{T-1} \sum_{i=2}^T \ln \frac{C_i}{O_{i}}$ $\sigma_{RS}$ is the Rogers-Satchell volatility estimator over the time periods $t=2..T$ $k = \frac{0.34}{1.34 + \frac{T}{T-2}}$ In addition to the new estimator $\sigma_{YZ}$, Yang and Zhang12 also provide multi-period versions of the Parkinson, the Garman-Klass and the Rogers-Satchell estimators that support opening jumps24. The Yang-Zhang estimator is up to 14 times more efficient than the close-to-close estimators12, a result that Yang and Zhang12 comment as follows The improvement of accuracy over the classical close-to-close estimator is dramatic for real-life time series Other estimators The family of range-based volatility estimators has many other members: The Kunitomo25 volatility estimator The Alizadeh-Brandt-Diebold26 volatility estimator The Meilijson27 volatility estimator … Still, the Parkinson, the Garman-Klass, the Rogers-Satchell and the Yang-Zhang volatility estimators are representative of this family, so that I will not detail any other range-based volatility estimator in this blog post. From volatility estimation to volatility forecasting Range-based volatility estimators are based on the assumption of independent sample and observations within the sample4, so that the corresponding volatility forecasts are simply naive forecasts under a random walk model. In other words, with such volatility estimators, the “natural” forecast of an asset volatility over the next $T$ time periods is the (past) estimate of the asset volatility over the last $T$ time periods. That being said, it is perfectly possible to use range-based volatility estimates together with any volatility forecasting model such as: A time series forecasting model (simple moving average, exponentially weighted moving average…), as detailed for example in Jacob and Vipul28 An econometric forecasting model (GARCH model…), c.f. Mapa29 A specific range-based forecasting model (Chou’s30 Conditional AutoRegressive Range model, Harris and Yilmaz’s31 hybrid multivariate exponentially weighted moving average model…) Performance of range-based volatility estimators Theoretical and practical performances of range-based volatility estimators are studied in several papers, for example Shu and Zhang32, Jacob and Vipul28 and Brandt and Kinlay33, among others. Most of these studies agree that range-based volatility estimators are biased11, but other conclusions differ depending on the exact methodology used. In particular, as highlighted by Brandt and Kinlay33, the results from empirical research differ significantly from those seen in simulation studies in a number of respects33. One perfect example of these differences is Shu and Zhang32 concluding, using a Monte Carlo simulation, that If the drift term is large, the Parkinson estimator and the [Garman-Klass] estimator will significantly overestimate the true variance […] , while Jacob and Vipul28 concluding, using real stock market data, that Overall, the [Garman-Klass] estimator, which indirectly adjusts for the drift, performs better for the high-drift stocks. Motivated by such inconsistencies, Lyocsa et al.34, building on Patton and Sheppard35, introduced what I will call the Lyocsa-Plihal-Vyrost volatility estimator $\sigma_{LPV}$, defined as the arithmetic average of the Parkinson, the Garman-Klass and the Rogers-Satchell volatility estimators36 \[\sigma_{LPV} = \frac{\sigma_{P} + \sigma_{GK} + \sigma_{RS}}{3}\] As Lyocsa et al.34 explain, the motivation behind using the (naive) equally weighted average is based on the assumption that we have no prior information on which estimator might be more accurate34. I personally like the idea of an averaged estimator, but at this point, I think it is safe to highlight that there is no “best” range-based volatility estimator… Implementation in Portfolio Optimizer Portfolio Optimizer implements all the volatility estimators discussed in this blog post: The close-to-close volatility estimators, through the endpoint /assets/volatility/estimation/close-to-close The Parkinson volatility estimator, through the endpoint /assets/volatility/estimation/parkinson The Garman-Klass volatility estimator, through the endpoint /assets/volatility/estimation/garman-klass The original Garman-Klass volatility estimator18, through the endpoint /assets/volatility/estimation/garman-klass/original The Rogers-Satchell volatility estimator, through the endpoint /assets/volatility/estimation/rogers-satchell The Yang-Zhang volatility estimator, through the endpoint /assets/volatility/estimation/yang-zhang , as well as their jump-adjusted variations, whenever applicable. Examples of usage To illustrate possible uses of range-based volatility estimators, I propose to reproduce a couple of results from Sepp4: The estimation and the forecast of the SPY ETF monthly volatility The forecast of the monthly volatility of misc. ETFs representative of different asset classes (U.S. treasuries, international stock market, gold…) Such examples will allow to compare the empirical behavior of the different volatility estimators and maybe reach a conclusion as to their relative performance in this specific setting. Estimating SPY ETF volatility I will estimate the SPY ETF monthly volatility using all the daily open/high/low/close prices37 observed during that month38. Figure 1, limited to 5 volatility estimators for readability purposes, illustrates the results obtained over the period 31 January 2005 - 29 February 201639. Figure 1. SPY ETF monthly volatility estimates, using daily returns over the period 31 January 2005 - 29 February 2016. Figure 1 is mostly identical to the figure on slide 22 from Sepp4, on which it seems in particular that the close-to-close and the Yang-Zhang volatility estimators provide higher estimates of volatility when the overall level of volatility is high4. Overall, though, the behavior of the different volatility estimators is essentially the same on this specific example, which is confirmed by their correlations displayed in Figure 2. Figure 2. Correlations of SPY ETF monthly volatility estimates, using daily returns over the period 31 January 2005 - 29 February 2016. Forecasting misc. ETFs volatility Using the same methodology as in Sepp4, I will now evaluate the quality of the naive forecasts produced by all the range-based volatility estimators implemented in Portfolio Optimizer against the next month’s close-to-close observed volatility40, for 10 ETFs representative of misc. asset classes: U.S. stocks (SPY ETF) European stocks (EZU ETF) Japanese stocks (EWJ ETF) Emerging markets stocks (EEM ETF) U.S. REITs (VNQ ETF) International REITs (RWX ETF) U.S. 7-10 year Treasuries (IEF ETF) U.S. 20+ year Treasuries (TLT ETF) Commodities (DBC ETF) Gold (GLD ETF) These ETFs are used in the Adaptative Asset Allocation strategy from ReSolve Asset Management, described in the paper Adaptive Asset Allocation: A Primer41. For each ETF, Sepps’s methodology is as follows: At each month’s end, compute the volatility estimates $\sigma_{cc, t}$, $\sigma_{P, t}$, … using all the ETF daily open/high/low/close prices37 observed during that month38 Under a random walk volatility model, each of these estimates represents the next month’s volatility forecast $\hat{\sigma}_{t+1}$ At each month’s end, also compute the next month’s close-to-close volatility estimate $\sigma_{cc, t+1}$ using all the ETF daily close prices37 observed during that month38 This estimate is the volatility benchmark, which represents how the ETF “volatility” is perceived by an investor monitoring her portfolio daily. Once all months have been processed that way, regress the volatility forecasts on the volatility benchmarks by applying the Mincer-Zarnowitz42 regression model: \[\hat{\sigma}_{t+1} = \alpha + \beta \sigma_{cc, t+1} + \epsilon_{t+1}\] , where $\epsilon_{t+1}$ is an error term. Then, the estimator producing [the best] volatility forecast is indicated by [a] high explanatory power R^2, [a] small intercept $\alpha$ and [a] $\beta$ coefficient close to one4. Forecasting SPY ETF volatility In the case of the SPY ETF, Figure 3 illustrates Sepps’s methodology for the Lyocsa-Plihal-Vyrost volatility estimator $\sigma_{LPV}$ over the period 31 January 2005 - 29 February 2016. Figure 3. SPY ETF Lyocsa-Plihal-Vyrost naive monthly volatility forecasts v.s. next month's close-to-close volatility estimates, using daily returns over the period 31 January 2005 - 29 February 2016. Detailed results for all regression models over the period 31 January 2005 - 29 February 2016: Volatility estimator $\alpha$ $\beta$ $R^2$ Close-to-close 4.1% 0.75 57% Close-to-close (zero drift) 3.9% 0.77 57% Parkinson 3.5% 0.95 58% Parkinson (jump-adjusted) 3.4% 0.79 58% Garman-Klass 3.7% 0.92 57% Garman-Klass (jump-adjusted) 3.6% 0.77 58% Garman-Klass (original) 3.7% 0.92 57% Garman-Klass (original, jump-adjusted) 3.6% 0.77 58% Rogers-Satchell 4.0% 0.88 56% Rogers-Satchell (jump-adjusted) 3.9% 0.74 57% Yang-Zhang 3.8% 0.75 58% Lyocsa-Plihal-Vyrost 3.7% 0.92 57% While these figures are far43 from those on slide 42 from Sepp4, with for example nearly no variation in terms of $R^2$ among the different volatility estimators, two observations are similar: All volatility estimators have comparable $\alpha$ The Parkinson, the Garman-Klass and the Rogers-Satchell volatility estimators have a $\beta$ much closer to 1 than the close-to-close volatility estimator Forecasting the other ETFs volatility Going beyond the SPY ETF, averaged results for all ETFs/regression models over each ETF price history44 are the following: Volatility estimator $\bar{\alpha}$ $\bar{\beta}$ $\bar{R^2}$ Close-to-close 5.8% 0.66 44% Close-to-close (zero drift) 5.6% 0.67 45% Parkinson 5.6% 0.94 44% Parkinson (jump-adjusted) 4.9% 0.70 45% Garman-Klass 5.7% 0.93 43% Garman-Klass (jump-adjusted) 5.0% 0.70 44% Garman-Klass (original) 5.7% 0.93 43% Garman-Klass (original, jump-adjusted) 5.0% 0.70 44% Rogers-Satchell 6.1% 0.88 42% Rogers-Satchell (jump-adjusted) 5.2% 0.68 43% Yang-Zhang 5.1% 0.69 44% Lyocsa-Plihal-Vyrost 5.7% 0.92 43% A couple of remarks: Forecasts produced by all the volatility estimators explain on average only ~45% of the variability of the ETFs monthly volatility Forecasts produced by the jump-adjusted volatility estimators seem to offer no improvement on average over the forecasts produced by the close-to-close volatility estimator Forecasts produced by the Parkinson, the Garman-Klass and the Rogers-Satchell volatility estimators seem to be much less biased on average than the forecasts produced by the close-to-close volatility estimator, a property inherited by the Lyocsa-Plihal-Vyrost volatility estimator As an empirical conclusion, it is disappointing that the naive monthly volatility forecasts produced by range-based volatility estimators have about the same predictive power as the forecasts produced by the close-to-close volatility estimator. Nevertheless, because these forecasts are much less biased than their close-to-close counterparts, they still represent an improvement for the many investors who currently rely on close prices only45. To also be noted, similar to one of the conclusions of Lyocsa et al.34, that the Lyocsa-Plihal-Vyrost volatility estimator should probably be preferred to the Parkinson, the Garman-Klass or the Rogers-Satchell volatility estimators because using only one range-based estimators has occasionally led to very inaccurate forecasts, which could successfully be avoided by using the average of the three range-based estimators34. Conclusion One aspect of range-based volatility estimators not discussed in this blog post is their capability to capture important stylized facts about asset returns46. This, together with possible ways to incorporate them in more predictive volatility models than the random walk model, will be the subject of future blog posts. Meanwhile, for more volatile discussions, feel free to connect with me on LinkedIn or to follow me on Twitter. – As well as correlation forecasts. ↩ See Parkinson, Michael H., The Extreme Value Method for Estimating the Variance of the Rate of Return, The Journal of Business 53 (1980), 61-65, which is the final version of the working paper The random walk problem: extreme value method for estimating the variance of the displacement (diffusion constant) started 4 years before. ↩ ↩2 ↩3 ↩4 Because the range of prices of an asset over a given time period is contained, by definition, within its highest and its lowest price. ↩ See Sepp, Artur, Volatility Modelling and Trading. Global Derivatives Workshop Global Derivatives Trading & Risk Management, Budapest, 2016. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 At the date of publication of this post. ↩ Other working assumptions are also commonly made, like assuming that the asset does not pay dividends, assuming that the volatility coefficient $\sigma$ remains constant, assuming that the geometric Brownian motion model also applies during time periods with no trading activity (e.g., stock market closure), etc. ↩ ↩2 In details, the geometric Brownian motion assumption slightly differs between authors; for example, Garman and Klass17 assume that asset prices follow a more generic diffusion process, which includes the geometric Brownian motion as a specific case. ↩ $\sigma$ is also called the diffusion coefficient of the geometric Brownian motion, but in the context of this blog post, I think it is clearer to explicitly call it the volatility coefficient. ↩ See Andersen, T., Bollerslev, T., Diebold, F., & Labys, P. (2003). Modeling and forecasting realized volatility. Econometrica, 71, 579–625. ↩ In practice, a time period $t$ usually corresponds to a trading day, a week or a month, so that the closing prices $C_t, t=1..T$ are simply the daily, weekly or monthly closing prices of the asset. ↩ ↩2 These estimators are biased, due to Jensen’s inequality; c.f. also Molnar46. ↩ ↩2 ↩3 See Yang, D., and Q. Zhang, 2000, Drift-Independent Volatility Estimation Based on High, Low, Open, and Close Prices, Journal of Business 73:477–491. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 The asset closing price $C_t, t=1..T$. ↩ See French, K. R., Schwert, G. W., & Stambaugh, R. F. (1987). Expected stock returns and volatility. Journal of Financial Economics, 19, 3–29. ↩ See L. C. G. Rogers, S. E. Satchell & Y. Yoon (1994) Estimating the volatility of stock prices: a comparison of methods that use high and low prices, Applied Financial Economics, 4:3, 241-247. ↩ ↩2 See Colin Bennett, Trading Volatility, Correlation, Term Structure and Skew. ↩ ↩2 See Garman, M. B., and M. J. Klass, 1980, On the Estimation of Security Price Volatilities from Historical Data, Journal of Business 53:67–78. ↩ ↩2 ↩3 More precisely, Garman and Klass17 establish that a variation of $\sigma_{GK}$ is the “best” reasonable estimator but note that $\sigma_{GK}$ is 1) more practical and 2) as efficient as this variation, which I will call the original Garman-Klass volatility estimator $\sigma_{GKo}$. ↩ ↩2 See L. C. G. Rogers and S. E. Satchell, Estimating Variance From High, Low and Closing Prices, The Annals of Applied Probability, Vol. 1, No. 4 (Nov., 1991), pp. 504-512. ↩ ↩2 ↩3 Such a decrease in efficiency cannot be avoided because the Rogers-Satchell estimator belongs to class of estimators studied in Garman and Klass17, so that its efficiency is necessarily smaller than the efficiency of the Garman-Klass estimator (maximal by definition). ↩ Garman and Klass17 provide a volatility estimator that takes into account opening jumps, but this estimator has a dependency on an unknown $f$ parameter which makes it unusable in practice; Yang and Zhang12 show that this dependency is actually spurious and provide a usable form of this estimator. ↩ When the time periods $t$ are measured in trading days, opening jumps are called overnight jumps. ↩ A single-period volatility estimator is a volatility estimator that can be used to estimate the volatility of an asset over a single time period $t$ using price data for this time period only; for example, the Parkinson, the Garman-Klass and the Rogers-Satchell estimators are single-period estimators while the close-to-close estimators are multi-period estimators. ↩ ↩2 C.f. also Molnar46 on this subject. ↩ See Kunitomo, N. (1992). Improving the Parkinson method of estimating security price volatilities. Journal of Business, 65, 295–302. ↩ See Alizadeh ,S., Brandt, W. M., and Diebold, X.F., 2002. Range-based estimation of stochastic volatility models. Journal of Finance 57: 1047-1091. ↩ See Meilijson , I. (2011). The Garman–Klass Volatility Estimator Revisited. REVSTAT-Statistical Journal, 9(3), 199–212. ↩ See Jacob, J. and Vipul, (2008), Estimation and forecasting of stock volatility with range-based estimators. J. Fut. Mark., 28: 561-581. ↩ ↩2 ↩3 See Mapa, Dennis S., 2003. A Range-Based GARCH Model for Forecasting Volatility, MPRA Paper 21323, University Library of Munich, Germany. ↩ See Chou, R.Y. (2005). Forecasting Financial Volatilities with Extreme Values: The Conditional Autoregressive Range (CARR) Model. Journal of Money Credit and Banking, 37(3): 561-582. ↩ See Harris, R. D. F., & Yilmaz, F. (2010). Estimation of the conditional variance–covariance matrix of returns using the intraday range. International Journal of Forecasting, 26, 180–194. ↩ See Shu, J. and Zhang, J.E. (2006), Testing range estimators of historical volatility. J. Fut. Mark., 26: 297-313. ↩ ↩2 See Brandt, Michael W. and Kinlay, J, Estimating Historical Volatility (March 10, 2005). ↩ ↩2 ↩3 See Lyocsa S, Plihal T, Vyrost T. FX market volatility modelling: Can we use low-frequency data? Financ Res Lett. 2021 May;40:101776. doi: 10.1016/j.frl.2020.101776. Epub 2020 Sep 30. ↩ ↩2 ↩3 ↩4 ↩5 See Patton A.J., Sheppard K. Optimal combinations of realised volatility estimators. Int. J. Forecast. 2009;25(2):218–238. ↩ The Yang-Zhang volatility estimator is excluded to avoid mixing jump-adjusted volatility estimators with non-jump-adjusted ones. ↩ (Adjusted) prices have have been retrieved using Tiingo. ↩ ↩2 ↩3 The jump-adjusted Yang-Zhang volatility estimator, as well as the close-to-close volatility estimators, require the closing price of the last day of the previous month as an additional price. ↩ ↩2 ↩3 This period more or less matches with the period used in Sepp4. ↩ The next month’s close-to-close volatility is then taken as a proxy for the next month’s realized volatility; this choice is important, because different proxies might result in different conclusions as to the out-of-sample forecast performances. ↩ See Butler, Adam and Philbrick, Mike and Gordillo, Rodrigo and Varadi, David, Adaptive Asset Allocation: A Primer. ↩ See Mincer, J. and V. Zarnowitz (1969). The evaluation of economic forecasts. In J. Mincer (Ed.), Economic Forecasts and Expectations. ↩ This is due to slight differences in methodology, with mainly 1) the definition of “monthly volatility” in Sepp4 taken to be the volatility from the 3rd Friday of a month to the 3rd Friday of the next month and 2) the usage in Sepp4 of a linear regression model robust to outliers. ↩ The common ending price history of all the ETFs is 31 August 2023, but there is no common starting price history, as all ETFs started trading on different dates. ↩ For example, for all investors running some kind of monthly tactical asset allocation strategy. ↩ See Peter Molnar, Properties of range-based volatility estimators, International Review of Financial Analysis, Volume 23, 2012, Pages 20-29,. ↩ ↩2 ↩3Correlation Matrix Stress Testing: Random Perturbations of a Correlation Matrix2023-08-23T00:00:00-05:002023-08-23T00:00:00-05:00https://portfoliooptimizer.io/blog/correlation-matrix-stress-testing-random-perturbations-of-a-correlation-matrix<p>In the previous posts of this series, I detailed a methodology to perform stress tests on a correlation matrix by linearly shrinking a baseline correlation matrix
<a href="/blog/correlation-matrix-stress-testing-shrinkage-toward-an-equicorrelation-matrix/">toward an equicorrelation matrix</a> or, more generally,
<a href="/blog/correlation-matrix-stress-testing-shrinkage-toward-the-lower-and-upper-bounds-of-a-correlation-matrix/">toward the lower and upper bounds of its coefficients</a>.</p>
<p>This methodology allows to easily model <em>known unknowns</em> when designing stress testing scenarios, but falls short with <em><a href="https://en.wikipedia.org/wiki/There_are_unknown_unknowns">unknown unknows</a></em>,
that is, completely unanticipated correlation breakdowns. Indeed, by definition, these cannot be represented by an a-priori correlation matrix toward which a baseline correlation matrix could be shrunk<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote">1</a></sup>…</p>
<p>In this blog post, I will describe another approach that can be used instead in this case, based on random perturbations of a baseline correlation matrix.</p>
<p>As an example of application, I will show how to identify extreme correlation stress scenarios through direct and reverse correlation stress testing.</p>
<blockquote>
<p><strong><em>Notes:</em></strong><br />
The main reference for this post is a presentation from Opdyke<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote">2</a></sup> at the <a href="https://informaconnect.com/quantminds-international/">QuantMinds International</a> 2020 event.</p>
</blockquote>
<h2 id="mathematical-preliminaries">Mathematical preliminaries</h2>
<p>As a general reminder, a square matrix $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ is a (valid) correlation matrix if and only if</p>
<ul>
<li>$C$ is symmetric: $C {}^t = C$</li>
<li>$C$ is unit diagonal: $C_{i,i} = 1$, $i=1..n$</li>
<li>$C$ is <a href="https://en.wikipedia.org/wiki/Positive_semidefinite_matrix">positive semi-definite</a>: $C \geqslant 0$</li>
</ul>
<h3 id="eigenvalue-decomposition-of-a-correlation-matrix">Eigenvalue decomposition of a correlation matrix</h3>
<p>A correlation matrix is a real symmetric matrix.</p>
<p>Thus, from standard linear algebra, any correlation matrix $C$ is <a href="https://en.wikipedia.org/wiki/Diagonalizable_matrix">diagonalizable by an orthogonal matrix</a> and can be decomposed as a product</p>
\[C = P \Lambda P^{-1}\]
<p>, where:</p>
<ul>
<li>$P \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ is <a href="https://en.wikipedia.org/wiki/Orthogonal_matrix">an orthogonal matrix</a></li>
<li>$\Lambda = Diag \left( \lambda_{1},…, \lambda_{n} \right)$ $\in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ is <a href="https://en.wikipedia.org/wiki/Diagonal_matrix">a diagonal matrix</a> made of the $n$ <a href="https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors">eigenvalues</a> $\lambda_1 \geq \lambda_2 … \geq \lambda_n \geq 0$ of $C$ which satisfy $\sum_{i=1}^{n} \lambda_i = n$</li>
</ul>
<p>This decomposition is called <em><a href="https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix">the eigendecomposition</a></em> of the correlation matrix $C$.</p>
<h3 id="hypersphere-decomposition-of-a-correlation-matrix">Hypersphere decomposition of a correlation matrix</h3>
<p>Rapisarda et al.<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote">3</a></sup> establish that any correlation matrix $C \in \mathcal{M}(\mathbb{R}^{n \times n})$ can be decomposed as a product</p>
\[C = B B {}^t\]
<p>, where $B \in \mathcal{M}(\mathbb{R}^{n \times n})$ is a lower triangular matrix defined by</p>
\[b_{i,j} = \begin{cases} \cos \theta_{i,1}, \textrm{for } j = 1 \newline \cos \theta_{i,j} \prod_{k=1}^{j-1} \sin \theta_{i,k}, \textrm{for } 2 \leq j \leq i-1 \newline \prod_{k=1}^{i-1} \sin \theta_{i,k}, \textrm{for } j = i \newline 0, \textrm{for } i+1 \leq j \leq n \end{cases}\]
<p>with:</p>
<ul>
<li>$\theta_{1,1} = 0$, by convention</li>
<li>$\theta_{i,j}$, $i = 2..n, j = 1..i-1$ $\frac{n (n-1)}{2}$ <em>correlative angles</em> belonging to the interval $[0, \pi]$</li>
</ul>
<p>This decomposition is called <em>the hypersphere decomposition</em>, or the <em>triangular angles parametrization</em><sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote">4</a></sup>, of the correlation matrix $C$ and is detailed in <a href="/blog/correlation-matrix-stress-testing-shrinkage-toward-the-lower-and-upper-bounds-of-a-correlation-matrix/">the previous post of this series</a>.</p>
<h2 id="random-perturbations-of-a-correlation-matrix">Random perturbations of a correlation matrix</h2>
<p>A random perturbation of a baseline correlation matrix $C$ can be loosely defined as a correlation matrix $\widetilde{C}$ generated “at random” whose correlation coefficients are more or less “close” to those of $C$.</p>
<p>From Opdyke’s<sup id="fnref:3:1" role="doc-noteref"><a href="#fn:3" class="footnote">2</a></sup> extensive literature review, there are three main<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote">5</a></sup> families of methods to generate random perturbations of a correlation matrix:</p>
<ul>
<li>Methods based on random perturbations of its correlation coefficients</li>
<li>Methods based on random perturbations of its eigenvalues</li>
<li>Methods based on random perturbations of its correlative angles</li>
</ul>
<h3 id="random-perturbations-of-the-coefficients-of-a-correlation-matrix">Random perturbations of the coefficients of a correlation matrix</h3>
<p>The first family of methods to randomly perturb a correlation matrix is based on random perturbations of its coefficients.</p>
<h4 id="naive-method">Naive method</h4>
<p>The most natural method to randomly perturb the coefficients of a correlation matrix simply consists in … randomly perturbing these coefficients!</p>
<p>Unfortunately, this method does not work in general, because the resulting randomly perturbed correlation matrix is almost never a valid correlation matrix due to the lack of positive semi-definiteness.</p>
<p>To illustrate this problem, let’s take <a href="https://investresolve.com/permanent-portfolio-shakedown-part-1/">an Harry Browne’s permanent portfolio <em>à la</em> ReSolve</a>, equally invested in:</p>
<ul>
<li>U.S. stocks, represented by the SPY ETF</li>
<li>U.S. treasuries, represented by the IEF ETF</li>
<li>Gold, represented by the GLD ETF</li>
<li>Cash, represented by the SHY ETF</li>
</ul>
<p>The correlations of these assets over the period 18 November 2004 - 11 August 2023 are displayed in Figure 1, adapted from <a href="https://www.portfoliovisualizer.com/asset-class-correlations">Portfolio Visualizer</a>.</p>
<figure>
<a href="/assets/images/blog/correlation-matrix-stress-testing-pp-correlations.png"><img src="/assets/images/blog/correlation-matrix-stress-testing-pp-correlations.png" alt="SPY, IEF, GLD, SHY correlations over the period 18 November 2004 - 11 August 2023, based on daily returns. Source: Portfolio Visualizer." /></a>
<figcaption>Figure 1. SPY, IEF, GLD, SHY correlations over the period 18 November 2004 - 11 August 2023, based on daily returns. Source: Portfolio Visualizer.</figcaption>
</figure>
<p>Before thinking about perturbing all these correlations, let’s assume that we would merely like to perturb the U.S. stock-bond correlation so as to bring it to a level representative
of the pre-2000 period, like 0.5 or above, c.f. Figure 2 reproduced from Brixton et al.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote">6</a></sup>.</p>
<figure>
<a href="/assets/images/blog/correlation-matrix-stress-testing-us-stock-bond-correlation.png"><img src="/assets/images/blog/correlation-matrix-stress-testing-us-stock-bond-correlation.png" alt="Rolling Correlation between US Equity and US Treasury Returns, January 1, 1900–September 30, 2022. Source: Brixton et al." /></a>
<figcaption>Figure 2. Rolling Correlation between US Equity and US Treasury Returns, 01 January 1900 – 30 September 2022. Source: Brixton et al.</figcaption>
</figure>
<p>It turns out that this single perturbation already results in an invalid correlation matrix<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote">7</a></sup>!</p>
<p>As a consequence, trying to perturb the coefficients of a correlation matrix both simultaneously and at random has little chance to produce a valid correlation matrix in general, especially as the number of assets increases.</p>
<p>One solution to this issue is to replace the randomly perturbed correlation matrix by its nearest valid correlation matrix<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote">8</a></sup>, c.f. the post
<em><a href="/blog/when-a-correlation-matrix-is-not-a-correlation-matrix-the-nearest-correlation-matrix-problem/">When a Correlation Matrix is not a Correlation Matrix: the Nearest Correlation Matrix Problem</a></em>.</p>
<p>This leads to the following naive method to generate random perturbations of a correlation matrix $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$:</p>
<ol>
<li>Generate $\frac{n (n-1)}{2}$ randomly perturbed correlation coefficients $\widehat{C}_{i,j} \in [-1, 1]$ around the baseline correlation coefficients $C_{i,j}$, $i=1..n, j=i+1..n$</li>
<li>Compute the (potentially invalid) associated randomly perturbed correlation matrix $\widehat{C}$</li>
<li>Compute the randomly perturbed correlation matrix $\widetilde{C}$ as the nearest valid correlation matrix to $\widehat{C}$</li>
</ol>
<p>While straightforward to implement<sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote">9</a></sup>, this method has several limitations:</p>
<ul>
<li>
<p>It requires the computation of the nearest correlation matrix to every randomly perturbed correlation matrix generated</p>
<p>Such systematic computation is expensive.</p>
</li>
<li>
<p>It usually<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote">10</a></sup> generates randomly perturbed correlation matrices that are singular</p>
<p>This is because standard algorithms to compute the nearest correlation matrix, like Higham’s alternating projections algorithm<sup id="fnref:5:1" role="doc-noteref"><a href="#fn:5" class="footnote">8</a></sup>, output a singular correlation matrix<sup id="fnref:14" role="doc-noteref"><a href="#fn:14" class="footnote">11</a></sup>.</p>
</li>
<li>
<p>It provides no guarantee on the magnitude or on the distribution of the perturbations</p>
<p>Due to the nearest correlation matrix step #3, it seems actually rather difficult to control either the magnitude or the probability distribution of the perturbations
$ \left | C_{i,j} - \widehat{C}_{i,j} \right | $, $i=1..n$, $j=i+1..n$.</p>
</li>
</ul>
<h4 id="hardin-et-als-method">Hardin et al.’s method</h4>
<p>Hardin et al.<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote">12</a></sup> introduce another method to randomly perturb the coefficients of a correlation matrix, relying on <em>the dot product of normalized [independent gaussian random vectors]</em><sup id="fnref:6:1" role="doc-noteref"><a href="#fn:6" class="footnote">12</a></sup> as random perturbations.</p>
<p>One of the many advantages of this method compared to the naive method previously described is that the resulting randomly perturbed correlation matrix is a valid correlation matrix by construction,
which allows to bypass the nearest correlation matrix step #3.</p>
<p>In details, given $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ a baseline correlation matrix, Hardin et al.’s method to generate random perturbations of $C$ works as follows:</p>
<ol>
<li>
<p>Select a maximum noise level $\epsilon_{max}$ such that $0 < \epsilon_{max} < \lambda_{n}$, where $\lambda_{n}$ is the smallest eigenvalue of $C$</p>
<p>$\epsilon_{max}$ controls the magnitude of the generated perturbations.</p>
</li>
<li>
<p>Select the dimension $m \geq 1$ of what is called <em>the noise space</em> in Hardin et al.<sup id="fnref:6:2" role="doc-noteref"><a href="#fn:6" class="footnote">12</a></sup></p>
<p>$m$ influences the distributional characteristics of the random perturbations, as depicted in Figure 3 adapted from Hardin et al.<sup id="fnref:6:3" role="doc-noteref"><a href="#fn:6" class="footnote">12</a></sup> on which it is visible that:</p>
<ul>
<li>$m = 3$ produces uniform-like perturbations (<em>S3</em>)</li>
<li>$m = 25$ produces Gaussian-like perturbations (<em>S25</em>)</li>
</ul>
<figure>
<a href="/assets/images/blog/correlation-matrix-stress-testing-hardin-noise-space.png"><img src="/assets/images/blog/correlation-matrix-stress-testing-hardin-noise-space.png" alt="Impact of the maximum noise level on the distribution of the perturbations (entry-wise differences). Source: Hardin et al." /></a>
<figcaption>Figure 3. Impact of the maximum noise level $\epsilon_{max}$ on the distribution of the perturbations (entry-wise differences). Source: Hardin et al.</figcaption>
</figure>
</li>
<li>Generate $n$ unit vectors $u_1,…,u_n$ belonging to $\mathbb{R}^{n}$ and construct the matrix $U \in \mathcal{M} \left( \mathbb{R}^{m \times n} \right)$ whose columns are the vectors $u_i, i=1..n$</li>
<li>
<p>Compute the randomly perturbed correlation matrix $\widetilde{C}$ as $\widetilde{C} = C + \epsilon_{max} \left( U{}^t U - I_n \right)$, where $I_n \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ is the identity matrix of order $n$</p>
<p>The definition of $\widetilde{C}$ ensures that the perturbations are bounded by the maximum noise level $\epsilon_{max}$, i.e., $\left | C_{i,j} - \widetilde{C}_{i,j} \right | \leq \epsilon_{max} $, $i=1..n$, $j=i+1..n$.</p>
</li>
</ol>
<p>Whenever possible, Hardin et al.’s method should be used (computationally cheap, possibility to control the perturbations in terms of magnitude and distribution…), although it suffers from two major limitations:</p>
<ul>
<li>
<p>It is not applicable to correlation matrices singular or close to singular</p>
<p>This is due to the condition on the maximum noise level $\epsilon_{max}$ in step #1 and is regrettably a problem for applications in finance<sup id="fnref:15" role="doc-noteref"><a href="#fn:15" class="footnote">13</a></sup>, because as highlighted in Opdyke<sup id="fnref:3:2" role="doc-noteref"><a href="#fn:3" class="footnote">2</a></sup>:</p>
<blockquote>
<p>correlation matrices estimated on large portfolios often (perhaps usually) are not positive definite for a wide range of reasons, and once positive definiteness is enforced using reliable, proven methods […], the smallest eigenvalue of the resulting matrix is almost always virtually zero.</p>
</blockquote>
</li>
<li>
<p>It might not be applicable to a specific correlation matrix, even if not remotely close to singular</p>
<p>This is again due to the condition on the maximum noise level $\epsilon_{max}$ in step #1.</p>
<p>For instance, in the case of the Harry Browne’s permanent portfolio introduced in the previous sub-section, Hardin et al.’s method cannot be used to perturb the coefficients of the asset correlation matrix represented in Figure 1 by more than +/- 0.25<sup id="fnref:16" role="doc-noteref"><a href="#fn:16" class="footnote">14</a></sup>,
and in particular, cannot be used to generate perturbed U.S. stock-bond correlations higher than -0.08<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote">15</a></sup>!</p>
</li>
</ul>
<h3 id="random-perturbations-of-the-eigenvalues">Random perturbations of the eigenvalues</h3>
<p>The second family of methods to randomly perturb a correlation matrix is based on random perturbations of its eigenvalues.</p>
<p>A representative member of this family is the following method, with $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ a baseline correlation matrix to be perturbed:</p>
<ol>
<li>Compute the eigendecomposition of $C$, with $C = P \Lambda P^{-1}$</li>
<li>
<p>Generate $n$ randomly perturbed eigenvalues $\widetilde{\lambda}_i \geq 0$ satisfying $\sum_{i=1}^n \widetilde{\lambda}_i = n$ around the baseline eigenvalues $\lambda_i$, $i=1..n$</p>
<p>Galeeva et al.<sup id="fnref:11:1" role="doc-noteref"><a href="#fn:11" class="footnote">4</a></sup> describe several algorithms and associated probability distributions that can be used in this step.</p>
</li>
<li>Compute the associated randomly perturbed diagonal matrix $\widetilde{\Lambda} = Diag \left( \widetilde{\lambda}_1,…, \widetilde{\lambda}_n \right)$</li>
<li>Compute the randomly perturbed correlation matrix $\widetilde{C}$ as $\widetilde{C} = P^{-1} \widetilde{\Lambda} P$</li>
</ol>
<p>Any method from this family guarantees, in theory, the validity of the resulting randomly perturbed correlation matrix.</p>
<p>Nevertheless, in practice, Opdyke<sup id="fnref:3:3" role="doc-noteref"><a href="#fn:3" class="footnote">2</a></sup> notes that:</p>
<blockquote>
<p>perturbing eigenvalues fails under challenging empirical conditions, e.g. when the positive definiteness of the matrix has to be enforced algorithmically […] and eigenvalues are virtually zero (or at least unreliably estimated)</p>
</blockquote>
<p>In addition, controlling the $\frac{n (n-1)}{2}$ perturbations $\left | C_{i,j} - \widetilde{C}_{i,j} \right |$, $i=1..n$, $j=i+1..n$, which is ultimately what matters, through the $n$ perturbations
$\left| \lambda_i - \widetilde{\lambda}_i \right|$, $i=1..n$, sounds rather difficult.</p>
<p>For these reasons, this family of methods might not be the first choice to generate random perturbations of a correlation matrix.</p>
<h3 id="random-perturbations-of-the-correlative-angles">Random perturbations of the correlative angles</h3>
<p>The third and last family of methods to randomly perturb a correlation matrix is based on random perturbations of its correlative angles.</p>
<p>Here, a representative member of this family is the following method, with $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ a baseline correlation matrix to be perturbed:</p>
<ol>
<li>Compute the<sup id="fnref:19" role="doc-noteref"><a href="#fn:19" class="footnote">16</a></sup> hypersphere decomposition of $C$, with $C = B B {}^t$</li>
<li>Generate $\frac{n (n-1)}{2}$ randomly perturbed correlative angles $\widetilde{\theta}_{i,j} \in [0, \pi]$ around the baseline correlative angles $\theta_{i,j}$, $i=1..n$, $j=1..i-1$</li>
<li>Compute the associated randomly perturbed lower triangular matrix $\widetilde{B}$</li>
<li>Compute the randomly perturbed correlation matrix $\widetilde{C}$ as $\widetilde{C} = \widetilde{B} \widetilde{B} {}^t$</li>
</ol>
<p>Any method from this family again guarantees, in theory, the validity of the resulting randomly perturbed correlation matrix.</p>
<p>This time, though, theory seems to be confirmed in practice:</p>
<ul>
<li>Galeeva et al.<sup id="fnref:11:2" role="doc-noteref"><a href="#fn:11" class="footnote">4</a></sup> highlight that perturbing the correlative angles is done <em>via a robust and efficient procedure which makes the whole approach very attractive</em><sup id="fnref:11:3" role="doc-noteref"><a href="#fn:11" class="footnote">4</a></sup></li>
<li>Opdyke<sup id="fnref:3:4" role="doc-noteref"><a href="#fn:3" class="footnote">2</a></sup> notes that perturbing the correlative angles <em>appears to be more robust [in practice] than competing methods […] at least under challenging empirical conditions</em><sup id="fnref:3:5" role="doc-noteref"><a href="#fn:3" class="footnote">2</a></sup></li>
</ul>
<p>One important remark at this stage is that the exact algorithms and associated probability distributions used in step #2 greatly influence the behavior of this family of methods.</p>
<p>For reference, Opdyke<sup id="fnref:3:6" role="doc-noteref"><a href="#fn:3" class="footnote">2</a></sup> proposes an algorithm called <em>Cosecant, Cotangent, Cotangent</em> (C3) able to generate a distribution of correlative angles median-centered on the baseline correlative angles and
satisfying many other desirable properties<sup id="fnref:20" role="doc-noteref"><a href="#fn:20" class="footnote">17</a></sup>.</p>
<p>This algorithm generates a randomly perturbed correlative angle $\widetilde{\theta}_{i,j}$ around a baseline correlative angle $\theta_{i,j}$, $i=1..n$, $j=1..i-1$, as follows:</p>
<ol>
<li>Generate a random variable $X$ of probability density function the p.d.f. of Makalic and Schmidt<sup id="fnref:21" role="doc-noteref"><a href="#fn:21" class="footnote">18</a></sup> defined by $f_{X}(x) = c_k sin^k (x)$, $x \in (0, \pi)$, $k \geq 1$, where $c_k$ is a normalization constant and $k = n - j$</li>
<li>Compute the perturbed correlative angle $\widetilde{\theta}_{i,j}$ as $ \widetilde{\theta}_{i,j} = \arctan \left( \tan \left( \theta_{i,j}-\frac{\pi}{2} \right) + \tan \left( X - \frac{\pi}{2} \right) \right) + \frac{\pi}{2}$</li>
</ol>
<p>This family of methods is particularly well-suited to what is called <em>generalized (correlation) stress testing</em> in Opdyke<sup id="fnref:3:7" role="doc-noteref"><a href="#fn:3" class="footnote">2</a></sup>.</p>
<p>More on this later.</p>
<p>Still, like the family of methods based on random perturbations of the eigenvalues of a correlation matrix, one limitation of this family of methods is that controlling the $\frac{n (n-1)}{2}$
perturbations $\left | C_{i,j} - \widetilde{C}_{i,j} \right |$, $i=1..n$, $j=i+1..n$ sounds once again rather difficult.</p>
<h2 id="implementation-in-portfolio-optimizer">Implementation in <strong>Portfolio Optimizer</strong></h2>
<p><strong>Portfolio Optimizer</strong> allows to generate random perturbations of a baseline correlation matrix with:</p>
<ul>
<li>
<p>The naive method of randomly perturbing the coefficients of a correlation matrix</p>
<p>Once a (potentially invalid) randomly perturbed correlation matrix is generated on client side, the endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/assets/correlation/matrix/nearest</code></a> can be used to compute the nearest correlation matrix to this matrix.</p>
</li>
<li>
<p>The method of randomly perturbing the correlative angles of a correlation matrix</p>
<ul>
<li>
<p>Together with Opdyke’s C3 algorithm<sup id="fnref:3:8" role="doc-noteref"><a href="#fn:3" class="footnote">2</a></sup>, through the endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/assets/correlation/matrix/perturbed</code></a></p>
</li>
<li>
<p>Together with a proprietary algorithm able to control the magnitude of the perturbations of the correlation coefficients, again through the endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/assets/correlation/matrix/perturbed</code></a></p>
<p>In this case, the distribution of the randomly perturbed correlation matrices is asymptotically uniform over the space of positive definite correlation matrices whose distance in terms of
<a href="https://en.wikipedia.org/wiki/Matrix_norm#Max_norm">max norm</a> to the baseline correlation matrix is at most equal to (resp. exactly equal to) a given maximum noise level (resp. a given exact noise level),
similar in spirit to the method of Hardin et al.<sup id="fnref:6:4" role="doc-noteref"><a href="#fn:6" class="footnote">12</a></sup></p>
</li>
</ul>
</li>
</ul>
<h2 id="example-of-application---generalized-stress-testing">Example of application - Generalized stress testing</h2>
<p>Suppose that we are managing the Harry Browne’s permanent portfolio introduced earlier.</p>
<p>Suppose also that on 18 February 2020, we feel something is off and would like to assess the impact of a potential correlation breakdown on this portfolio.</p>
<p>Because this potential correlation breakdown could manifest in many ways (increased correlations between certain ETFs, decreased correlation between other ETFs…), it would be a mistake
to impose any prior on how correlations should behave or should not behave<sup id="fnref:22" role="doc-noteref"><a href="#fn:22" class="footnote">19</a></sup>.</p>
<p>So, what could we do?</p>
<h3 id="direct-correlation-stress-testing">Direct correlation stress testing</h3>
<p>Following the previous sections, one possibility is to generate random perturbations around the current correlation matrix of the ETFs in the portfolio, which will allow to simulate many potential
correlation breakdowns in a prior-free way.</p>
<p>Once this is done, it will then be possible to evaluate the portfolio sensitivity to these random shocks.</p>
<p>Such a direct (correlation) stress testing procedure allows to <em>catch difficult-to-anticipate and/or difficult-to quantify second and third order effects of a large, multivariate, impactful scenario (e.g. pandemic +
economic upheaval)</em><sup id="fnref:3:9" role="doc-noteref"><a href="#fn:3" class="footnote">2</a></sup>.</p>
<p>In order to apply this procedure to the portfolio at hand, three prerequisites are necessary:</p>
<ul>
<li>
<p>Estimating the current correlation matrix $C_{PP}$ of the ETFs in the portfolio</p>
<p>I will estimate $C_{PP}$ as the correlation matrix of the four ETFs in the portfolio<sup id="fnref:23" role="doc-noteref"><a href="#fn:23" class="footnote">20</a></sup> over the 24-day period 14 January 2020 - 18 February 2020<sup id="fnref:24" role="doc-noteref"><a href="#fn:24" class="footnote">21</a></sup>, which gives</p>
\[C_{PP} \approx \begin{pmatrix} 1 & -0.81 & -0.82 & -0.65 & \\ -0.81 & 1 & 0.84 & 0.70 \\ -0.82 & 0.84 & 1 & 0.75 \\ -0.65 & 0.70 & 0.75 & 1 \end{pmatrix}\]
</li>
<li>
<p>Selecting a method to randomly perturb the current correlation matrix $C_{PP}$</p>
<p>I will generate random perturbations of $C_{PP}$ thanks to Opdyke’s C3 algorithm<sup id="fnref:3:10" role="doc-noteref"><a href="#fn:3" class="footnote">2</a></sup> as implemented through the <strong>Portfolio Optimizer</strong> endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/assets/correlation/matrix/perturbed</code></a>.</p>
</li>
<li>
<p>Determining how to evaluate the portfolio sensitivity to the random perturbations of the current correlation matrix $C_{PP}$</p>
<p>To keep things simple, I will evaluate the portfolio <a href="/blog/the-effective-number-of-bets-measuring-portfolio-diversification/">effective number of bets</a><sup id="fnref:25" role="doc-noteref"><a href="#fn:25" class="footnote">22</a></sup>
(ENB), using the <strong>Portfolio Optimizer</strong> endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/portfolio/analysis/effective-number-of-bets</code></a>.</p>
</li>
</ul>
<p>With these prerequisites met, it is possible to generate random perturbations around the current correlation matrix $C_{PP}$ and compute the corresponding ENB distribution.</p>
<p>An example of ENB distribution is provided in Figure 5, in the case of 10000 randomly perturbed correlation matrices.</p>
<figure>
<a href="/assets/images/blog/correlation-matrix-stress-testing-pre-covid-stressed-enb.png"><img src="/assets/images/blog/correlation-matrix-stress-testing-pre-covid-stressed-enb.png" alt="Distribution of the Effective Number of Bets (ENB), using 10000 randomly perturbed correlation matrices around the current correlation matrix $C_{PP}$" /></a>
<figcaption>Figure 5. Distribution of the Effective Number of Bets (ENB), 10000 randomly perturbed correlation matrices around the current correlation matrix $C_{PP}$.</figcaption>
</figure>
<p>Some associated summary statistics:</p>
<table>
<tbody>
<tr>
<td>Mean</td>
<td>1.89</td>
</tr>
<tr>
<td>Standard deviation</td>
<td>0.42</td>
</tr>
<tr>
<td>Minimum</td>
<td>1.01</td>
</tr>
<tr>
<td>5% percentile</td>
<td>1.33</td>
</tr>
<tr>
<td>25% percentile</td>
<td>1.60</td>
</tr>
<tr>
<td>Median</td>
<td>1.81</td>
</tr>
<tr>
<td>75% percentile</td>
<td>2.12</td>
</tr>
<tr>
<td>95% percentile</td>
<td>2.72</td>
</tr>
<tr>
<td>Maximum</td>
<td>3.98</td>
</tr>
</tbody>
</table>
<p>And, for reference, the value of the current ENB of the portfolio, computed with the current correlation matrix $C_{PP}$: <em>1.87</em>.</p>
<p>A couple of comments:</p>
<ul>
<li>
<p>More than half of the ENB are located very close<sup id="fnref:26" role="doc-noteref"><a href="#fn:26" class="footnote">23</a></sup> to the current ENB (<em>1.87</em>)</p>
<p>These ENB are not representative of any real correlation breakdown.</p>
</li>
<li>
<p>The 95% percentile of the ENB distribution (<em>2.72</em>) is much further apart from the current ENB (<em>1.87</em>) than the 5% percentile of the ENB distribution (<em>1.33</em>)</p>
<p>This means that a correlation breakdown with the biggest impact on the ENB would correspond, maybe counter-intuitively, to a scenario of de-correlation<sup id="fnref:30" role="doc-noteref"><a href="#fn:30" class="footnote">24</a></sup> of the four ETFs in the portfolio.</p>
<p>As a side note, and again maybe counter-intuitively, the impact of such a correlation breakdown would then be rather harmless, because an increase in ENB is usually desirable from a portfolio diversification perspective.</p>
</li>
<li>
<p>The minimum (<em>1.01</em>) and maximum (<em>3.98</em>) ENB both correspond to the theoretical minimum (<em>1</em>) and maximum (<em>4</em>) ENB</p>
<p>This shows that all possible (correlation) unknown unknowns have been covered by the stress testing procedure.</p>
</li>
</ul>
<h3 id="reverse-correlation-stress-testing">Reverse correlation stress testing</h3>
<p>In the previous sub-section, we have (empirically) established that the most impactful correlation breakdown scenario for the ENB of the portfolio corresponds to a de-correlation of the ETFs.</p>
<p>The next logical step is now to compute a correlation matrix that would somehow best illustrate this de-correlation scenario, a procedure known as reverse (correlation) stress testing.</p>
<p>For this, inspired by the concept of <em>market states</em> from Stepanov et al<sup id="fnref:27" role="doc-noteref"><a href="#fn:27" class="footnote">25</a></sup>, I propose to apply a <a href="https://en.wikipedia.org/wiki/K-means_clustering"><em>k</em>-means clustering algorithm</a><sup id="fnref:34" role="doc-noteref"><a href="#fn:34" class="footnote">26</a></sup>, with $k = 2$, to the
randomly perturbed correlation matrices generated during the direct stress testing procedure.</p>
<p>One output of this algorithm is two “representative” correlation matrices<sup id="fnref:28" role="doc-noteref"><a href="#fn:28" class="footnote">27</a></sup>, which are, in the case of the 10000 randomly perturbed correlation matrices of the previous sub-section:</p>
<ul>
<li>
<p>A correlation matrix $\widetilde{C}_{PP,1}$ “representative” of all the randomly perturbed correlation matrices that are “maximally similar” to the current correlation matrix $C_{PP}$</p>
\[\widetilde{C}_{PP,1} \approx \begin{pmatrix} 1 & -0.76 & -0.78 & -0.67 & \\ -0.76 & 1 & 0.75 & 0.66 \\ -0.78 & 0.75 & 1 & 0.67 \\ -0.67 & 0.66 & 0.67 & 1 \end{pmatrix}\]
</li>
<li>
<p>A correlation matrix $\widetilde{C}_{PP,2}$ “representative” of all the randomly perturbed correlation matrices that are “maximally dissimilar” from $\widetilde{C}_{PP,1}$</p>
\[\widetilde{C}_{PP,2} \approx \begin{pmatrix} 1 & -0.64 & -0.64 & -0.12 & \\ -0.64 & 1 & 0.53 & 0.07 \\ -0.64 & 0.53 & 1 & 0.06 \\ -0.12 & 0.07 & 0.06 & 1 \end{pmatrix}\]
</li>
</ul>
<p>, with “representative”, “maximally similar” and “maximally dissimilar” loosely defined but usually corresponding to intuition<sup id="fnref:29" role="doc-noteref"><a href="#fn:29" class="footnote">28</a></sup>.</p>
<p>In terms of market states<sup id="fnref:27:1" role="doc-noteref"><a href="#fn:27" class="footnote">25</a></sup>:</p>
<ul>
<li>
<p>The correlation matrix $\widetilde{C}_{PP,1}$ embodies the <em>current market state</em></p>
<p>Indeed, $\widetilde{C}_{PP,1}$ is very close to $C_{PP}$, as confirmed by the small <a href="https://en.wikipedia.org/wiki/Matrix_norm#Frobenius_norm">Frobenius distance</a> between these two matrices (<em>0.21</em>).</p>
<p>In the <em>current market state</em>, the ENB is concentrated<sup id="fnref:32" role="doc-noteref"><a href="#fn:32" class="footnote">29</a></sup> around the current ENB of the portfolio (<em>1.87</em>).</p>
</li>
<li>
<p>The correlation matrix $\widetilde{C}_{PP,2}$ embodies a market state maximally distinct from the current market state, which I will call the <em>de-correlation market state</em></p>
<p>The rationale for this name is that a comparison between $C_{PP}$ and $\widetilde{C}_{PP,2}$ shows that this second market state corresponds to a de-correlation of the four ETFs in the portfolio<sup id="fnref:33" role="doc-noteref"><a href="#fn:33" class="footnote">30</a></sup>.</p>
<p>In the <em>de-correlation market state</em>, the ENB is much higher that in the <em>current market state</em>, with for example the ENB computed with the correlation matrix $\widetilde{C}_{PP,2}$ equal to <em>2.97</em>,
well above the 95% percentile of the ENB distribution (<em>2.72</em>).</p>
</li>
</ul>
<p>Thanks to these observations, it is possible to conclude that $\widetilde{C}_{PP,2}$ is the correlation matrix that best illustrate the most impactful correlation breakdown scenario for the ENB of the portfolio.</p>
<h3 id="reality-check">Reality check</h3>
<p>I will conclude this example on generalized stress testing by a reality check on the results obtained in the previous sub-sections.</p>
<p>The correlation matrix $C_{PP, COVID}$ below is the correlation matrix of the four ETFs in the portfolio<sup id="fnref:23:1" role="doc-noteref"><a href="#fn:23" class="footnote">20</a></sup> over the subsequent 24-day “full crisis” period 19 February 2020 - 23 March 2020<sup id="fnref:24:1" role="doc-noteref"><a href="#fn:24" class="footnote">21</a></sup>.</p>
\[C_{PP, COVID} \approx \begin{pmatrix} 1 & -0.50 & -0.40 & 0.00 & \\ -0.50 & 1 & 0.71 & 0.25 \\ -0.40 & 0.71 & 1 & 0.19 \\ 0.00 & 0.25 & 0.19 & 1 \end{pmatrix}\]
<p>Of particular interest:</p>
<ul>
<li>The resemblance between $C_{PP, COVID}$ and $\widetilde{C}_{PP,2}$, confirmed by a relatively small Frobenius distance between these two matrices (<em>0.58</em>)</li>
<li>The close match between the ENB computed with $C_{PP, COVID}$ (<em>2.84</em>) and the ENB computed with $\widetilde{C}_{PP,2}$ (<em>2.97</em>)</li>
</ul>
<p>In other words:</p>
<ul>
<li>The most theoretically impactful correlation breakdown scenario for the ENB of the portfolio actually occurred in practice, with an associated asset correlation matrix relatively close to the forecast asset correlation matrix</li>
<li>The forecast of the impact on the ENB of the portfolio of this theoretical correlation breakdown scenario was nearly spot on!</li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>The possibility to generate random perturbations of a correlation matrix has many other applications in risk management and even beyond.</p>
<p>As an example, in mean-variance optimization, the <a href="https://en.wikipedia.org/wiki/Resampled_efficient_frontier">resampled efficient frontier</a> is partially based on random perturbations of a baseline correlation matrix.</p>
<p>Also, as a last remark on Opdyke’s C3 algorithm<sup id="fnref:3:11" role="doc-noteref"><a href="#fn:3" class="footnote">2</a></sup>, a fully nonparametric version of it is described on <a href="http://www.datamineit.com/DMI_publications.htm">Opdyke’s website</a>.</p>
<p>This extended version, called <em>Nonparametric Angles-based Correlation</em> (NAbC), covers not only correlation matrices based on any underlying data distributions<sup id="fnref:35" role="doc-noteref"><a href="#fn:35" class="footnote">31</a></sup> but also correlation matrices beyond the standard
<a href="https://en.wikipedia.org/wiki/Pearson_correlation_coefficient">Pearson’s correlation matrix</a>, like <a href="https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient">Spearman’s Rho correlation matrix</a>
or <a href="https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient">Kendall’s Tau correlation matrix</a>.</p>
<p>For more random quantitative discussions, feel free to <a href="https://www.linkedin.com/in/roman-rubsamen/">connect with me on LinkedIn</a> or to <a href="https://twitter.com/portfoliooptim">follow me on Twitter</a>.</p>
<p>–</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:2" role="doc-endnote">
<p>Otherwise, these unknown unknows would become known unknows! <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>See <a href="https://ssrn.com/abstract=3673362">Opdyke, JD, Full Probabilistic Control for Direct and Robust, Generalized and Targeted Stressing of the Correlation Matrix (Even When Eigenvalues are Empirically Challenging) (May 30, 2020). QuantMinds/RiskMinds September 22-23, 2020</a>. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:3:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:3:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:3:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:3:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a> <a href="#fnref:3:5" class="reversefootnote" role="doc-backlink">↩<sup>6</sup></a> <a href="#fnref:3:6" class="reversefootnote" role="doc-backlink">↩<sup>7</sup></a> <a href="#fnref:3:7" class="reversefootnote" role="doc-backlink">↩<sup>8</sup></a> <a href="#fnref:3:8" class="reversefootnote" role="doc-backlink">↩<sup>9</sup></a> <a href="#fnref:3:9" class="reversefootnote" role="doc-backlink">↩<sup>10</sup></a> <a href="#fnref:3:10" class="reversefootnote" role="doc-backlink">↩<sup>11</sup></a> <a href="#fnref:3:11" class="reversefootnote" role="doc-backlink">↩<sup>12</sup></a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>See <a href="https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2010JD015570">Rapisarda, F., Brigo, D. and Mercurio, F. (2007) Parameterizing correlations: a geometric interpretation, IMA Journal of Management Mathematics, 18(1), pp. 55–73</a>. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:11" role="doc-endnote">
<p>See <a href="https://www.oreilly.com/library/view/risk-management-in/9780470694251/13_chap06.html">Risk Management in Commodity Markets: From Shipping to Agriculturals and Energy, Chapter 6, Roza Galeeva, Jiri Hoogland, and Alexander Eydeland, Measuring Correlation Risk for Energy Derivatives</a>. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:11:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:11:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:11:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>Of course, many other methods exist; for example, if the data generating process is known, it is possible to use <a href="https://en.wikipedia.org/wiki/Monte_Carlo_method">a Monte-Carlo method</a> to generate random samples from this process and compute their associated (sample) correlation matrix, which is then a perturbed version of the original correlation matrix. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:1" role="doc-endnote">
<p>See <a href="https://www.pm-research.com/content/iijpormgmt/49/4/64">A Changing Stock–Bond Correlation: Drivers and Implications, Alfie Brixton, Jordan Brooks, Pete Hecht, Antti Ilmanen, Thomas Maloney, Nicholas McQuinn, The Journal of Portfolio Management, Multi-Asset Special Issue 2023, 49 (4) 64 - 80</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:12" role="doc-endnote">
<p>I’ll skip the math, but the interested reader can for example compute the eigenvalues of the asset correlation matrix represented in Figure 1 with the U.S. stock-bond correlation altered from -0.33 to 0.5. <a href="#fnref:12" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p><a href="http://www.maths.manchester.ac.uk/~higham/narep/narep369.pdf">Nicholas J. Higham, Computing the Nearest Correlation Matrix—A Problem from Finance, IMA J. Numer. Anal. 22, 329–343, 2002.</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:5:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:13" role="doc-endnote">
<p>Assuming that an algorithm to compute the nearest correlation matrix is available; otherwise, this method becomes immediately less straightforward to implement… <a href="#fnref:13" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>Except if the initial randomly perturbed correlation matrices are actually valid, non-singular, correlation matrices. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:14" role="doc-endnote">
<p>It is sometimes possible, though, to integrate an additional constraint on the minimum eigenvalue of the computed nearest valid correlation matrix into these algorithms. <a href="#fnref:14" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>See <a href="https://projecteuclid.org/euclid.aoas/1380804814">Hardin, Johanna; Garcia, Stephan Ramon; Golan, David. A method for generating realistic correlation matrices. Ann. Appl. Stat. 7 (2013), no. 3, 1733–1762</a>. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:6:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:6:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:6:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:6:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a></p>
</li>
<li id="fn:15" role="doc-endnote">
<p>This is maybe less of a problem in other applications, like in biology. <a href="#fnref:15" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:16" role="doc-endnote">
<p>Because the smallest eigenvalue of the asset correlation matrix represented in Figure 1 is 0.25. <a href="#fnref:16" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>Similarly, Hardin et al.’s method cannot be used to generate random pertubations of the U.S. stock-bond correlation that would bring this correlation to a level lower than -0.58. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:19" role="doc-endnote">
<p>Strictly speaking, when the correlation matrix $C$ is positive semi-definite, its hypersphere decomposition is not unique. <a href="#fnref:19" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:20" role="doc-endnote">
<p>C.f. Opdyke<sup id="fnref:3:12" role="doc-noteref"><a href="#fn:3" class="footnote">2</a></sup> for the complete list of goals of his proposed approach. <a href="#fnref:20" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:21" role="doc-endnote">
<p>See <a href="https://www.tandfonline.com/doi/abs/10.1080/03610918.2019.1700277?journalCode=lssp20">Enes Makalic & Daniel F. Schmidt (2022) An efficient algorithm for sampling from sink(x) for generating random correlation matrices, Communications in Statistics - Simulation and Computation, 51:5, 2731-2735</a>. <a href="#fnref:21" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:22" role="doc-endnote">
<p>For example, assuming that all correlations would go to one in case of a correlation breakdown is a prior. <a href="#fnref:22" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:23" role="doc-endnote">
<p>More specifically, of the daily arithmetic total returns of the four ETFs in the portfolio, whose prices have been retrieved using <a href="https://api.tiingo.com/">Tiingo</a>. <a href="#fnref:23" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:23:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:24" role="doc-endnote">
<p>I used a 24-day period because the period 19 February 2020 - 23 March 2020, which corresponds to the peak of the COVID financial crisis - c.f. <a href="https://en.wikipedia.org/wiki/2020_stock_market_crash#Overall_size_of_the_falls">Wikipedia</a> - is also a 24-day period. <a href="#fnref:24" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:24:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:25" role="doc-endnote">
<p>Due to personal preferences, I will use the effective number of bets based on principal components analysis as the factors extraction method; in addition, I will use the asset correlation matrix as if it were the asset covariance matrix to not introduce any additional variables (volatilities). <a href="#fnref:25" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:26" role="doc-endnote">
<p>More precisely, within a +/- 0.30 interval around 1.87. <a href="#fnref:26" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:30" role="doc-endnote">
<p>Intuitively, the higher the ENB of an equally-weighted portfolio, the more uncorrelated its constituents. <a href="#fnref:30" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:27" role="doc-endnote">
<p>See <a href="https://iopscience.iop.org/article/10.1088/1742-5468/2015/08/P08011/meta">Stepanov, Y., Rinn, P., Guhr, T., Peinke, J., & Schafer, R. (2015). Stability and hierarchy of quasi-stationary states: financial markets as an example. Journal of Statistical Mechanics: Theory and Experiment, 2015(8), P08011</a>. <a href="#fnref:27" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:27:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:34" role="doc-endnote">
<p>I used the standard <a href="https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html">Scikit-Learn</a> $k$-means algorithm. <a href="#fnref:34" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:28" role="doc-endnote">
<p>The <em>k</em>-means algorithm does not guarantee that the cluster centroids are valid correlation matrices; if this is not the case, it is possible to use either the <a href="https://en.wikipedia.org/wiki/K-medoids">$k$-medoids</a> instead, or to compute the nearest correlation matrices to the cluster centroid. <a href="#fnref:28" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:29" role="doc-endnote">
<p>And more rigorously defined as per the $k$-means algorithm. <a href="#fnref:29" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:32" role="doc-endnote">
<p>To be noted that the ENB computed with the correlation matrix $C^{‘}_{PP,1}$ is nearly identical to the ENB computed with the current correlation matrix $C_{PP}$ (<em>1.87</em>). <a href="#fnref:32" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:33" role="doc-endnote">
<p>For example, U.S. stocks and Gold move from anti-correlated (<em>-0.65</em>) to nearly uncorrelated (<em>-0.12</em>). <a href="#fnref:33" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:35" role="doc-endnote">
<p>That is, data distributions characterized by any degree of serial correlation, asymmetry, non-stationarity, and/or heavy-tailedness. <a href="#fnref:35" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Roman R.In the previous posts of this series, I detailed a methodology to perform stress tests on a correlation matrix by linearly shrinking a baseline correlation matrix toward an equicorrelation matrix or, more generally, toward the lower and upper bounds of its coefficients. This methodology allows to easily model known unknowns when designing stress testing scenarios, but falls short with unknown unknows, that is, completely unanticipated correlation breakdowns. Indeed, by definition, these cannot be represented by an a-priori correlation matrix toward which a baseline correlation matrix could be shrunk1… In this blog post, I will describe another approach that can be used instead in this case, based on random perturbations of a baseline correlation matrix. As an example of application, I will show how to identify extreme correlation stress scenarios through direct and reverse correlation stress testing. Notes: The main reference for this post is a presentation from Opdyke2 at the QuantMinds International 2020 event. Mathematical preliminaries As a general reminder, a square matrix $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ is a (valid) correlation matrix if and only if $C$ is symmetric: $C {}^t = C$ $C$ is unit diagonal: $C_{i,i} = 1$, $i=1..n$ $C$ is positive semi-definite: $C \geqslant 0$ Eigenvalue decomposition of a correlation matrix A correlation matrix is a real symmetric matrix. Thus, from standard linear algebra, any correlation matrix $C$ is diagonalizable by an orthogonal matrix and can be decomposed as a product \[C = P \Lambda P^{-1}\] , where: $P \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ is an orthogonal matrix $\Lambda = Diag \left( \lambda_{1},…, \lambda_{n} \right)$ $\in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ is a diagonal matrix made of the $n$ eigenvalues $\lambda_1 \geq \lambda_2 … \geq \lambda_n \geq 0$ of $C$ which satisfy $\sum_{i=1}^{n} \lambda_i = n$ This decomposition is called the eigendecomposition of the correlation matrix $C$. Hypersphere decomposition of a correlation matrix Rapisarda et al.3 establish that any correlation matrix $C \in \mathcal{M}(\mathbb{R}^{n \times n})$ can be decomposed as a product \[C = B B {}^t\] , where $B \in \mathcal{M}(\mathbb{R}^{n \times n})$ is a lower triangular matrix defined by \[b_{i,j} = \begin{cases} \cos \theta_{i,1}, \textrm{for } j = 1 \newline \cos \theta_{i,j} \prod_{k=1}^{j-1} \sin \theta_{i,k}, \textrm{for } 2 \leq j \leq i-1 \newline \prod_{k=1}^{i-1} \sin \theta_{i,k}, \textrm{for } j = i \newline 0, \textrm{for } i+1 \leq j \leq n \end{cases}\] with: $\theta_{1,1} = 0$, by convention $\theta_{i,j}$, $i = 2..n, j = 1..i-1$ $\frac{n (n-1)}{2}$ correlative angles belonging to the interval $[0, \pi]$ This decomposition is called the hypersphere decomposition, or the triangular angles parametrization4, of the correlation matrix $C$ and is detailed in the previous post of this series. Random perturbations of a correlation matrix A random perturbation of a baseline correlation matrix $C$ can be loosely defined as a correlation matrix $\widetilde{C}$ generated “at random” whose correlation coefficients are more or less “close” to those of $C$. From Opdyke’s2 extensive literature review, there are three main5 families of methods to generate random perturbations of a correlation matrix: Methods based on random perturbations of its correlation coefficients Methods based on random perturbations of its eigenvalues Methods based on random perturbations of its correlative angles Random perturbations of the coefficients of a correlation matrix The first family of methods to randomly perturb a correlation matrix is based on random perturbations of its coefficients. Naive method The most natural method to randomly perturb the coefficients of a correlation matrix simply consists in … randomly perturbing these coefficients! Unfortunately, this method does not work in general, because the resulting randomly perturbed correlation matrix is almost never a valid correlation matrix due to the lack of positive semi-definiteness. To illustrate this problem, let’s take an Harry Browne’s permanent portfolio à la ReSolve, equally invested in: U.S. stocks, represented by the SPY ETF U.S. treasuries, represented by the IEF ETF Gold, represented by the GLD ETF Cash, represented by the SHY ETF The correlations of these assets over the period 18 November 2004 - 11 August 2023 are displayed in Figure 1, adapted from Portfolio Visualizer. Figure 1. SPY, IEF, GLD, SHY correlations over the period 18 November 2004 - 11 August 2023, based on daily returns. Source: Portfolio Visualizer. Before thinking about perturbing all these correlations, let’s assume that we would merely like to perturb the U.S. stock-bond correlation so as to bring it to a level representative of the pre-2000 period, like 0.5 or above, c.f. Figure 2 reproduced from Brixton et al.6. Figure 2. Rolling Correlation between US Equity and US Treasury Returns, 01 January 1900 – 30 September 2022. Source: Brixton et al. It turns out that this single perturbation already results in an invalid correlation matrix7! As a consequence, trying to perturb the coefficients of a correlation matrix both simultaneously and at random has little chance to produce a valid correlation matrix in general, especially as the number of assets increases. One solution to this issue is to replace the randomly perturbed correlation matrix by its nearest valid correlation matrix8, c.f. the post When a Correlation Matrix is not a Correlation Matrix: the Nearest Correlation Matrix Problem. This leads to the following naive method to generate random perturbations of a correlation matrix $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$: Generate $\frac{n (n-1)}{2}$ randomly perturbed correlation coefficients $\widehat{C}_{i,j} \in [-1, 1]$ around the baseline correlation coefficients $C_{i,j}$, $i=1..n, j=i+1..n$ Compute the (potentially invalid) associated randomly perturbed correlation matrix $\widehat{C}$ Compute the randomly perturbed correlation matrix $\widetilde{C}$ as the nearest valid correlation matrix to $\widehat{C}$ While straightforward to implement9, this method has several limitations: It requires the computation of the nearest correlation matrix to every randomly perturbed correlation matrix generated Such systematic computation is expensive. It usually10 generates randomly perturbed correlation matrices that are singular This is because standard algorithms to compute the nearest correlation matrix, like Higham’s alternating projections algorithm8, output a singular correlation matrix11. It provides no guarantee on the magnitude or on the distribution of the perturbations Due to the nearest correlation matrix step #3, it seems actually rather difficult to control either the magnitude or the probability distribution of the perturbations $ \left | C_{i,j} - \widehat{C}_{i,j} \right | $, $i=1..n$, $j=i+1..n$. Hardin et al.’s method Hardin et al.12 introduce another method to randomly perturb the coefficients of a correlation matrix, relying on the dot product of normalized [independent gaussian random vectors]12 as random perturbations. One of the many advantages of this method compared to the naive method previously described is that the resulting randomly perturbed correlation matrix is a valid correlation matrix by construction, which allows to bypass the nearest correlation matrix step #3. In details, given $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ a baseline correlation matrix, Hardin et al.’s method to generate random perturbations of $C$ works as follows: Select a maximum noise level $\epsilon_{max}$ such that $0 < \epsilon_{max} < \lambda_{n}$, where $\lambda_{n}$ is the smallest eigenvalue of $C$ $\epsilon_{max}$ controls the magnitude of the generated perturbations. Select the dimension $m \geq 1$ of what is called the noise space in Hardin et al.12 $m$ influences the distributional characteristics of the random perturbations, as depicted in Figure 3 adapted from Hardin et al.12 on which it is visible that: $m = 3$ produces uniform-like perturbations (S3) $m = 25$ produces Gaussian-like perturbations (S25) Figure 3. Impact of the maximum noise level $\epsilon_{max}$ on the distribution of the perturbations (entry-wise differences). Source: Hardin et al. Generate $n$ unit vectors $u_1,…,u_n$ belonging to $\mathbb{R}^{n}$ and construct the matrix $U \in \mathcal{M} \left( \mathbb{R}^{m \times n} \right)$ whose columns are the vectors $u_i, i=1..n$ Compute the randomly perturbed correlation matrix $\widetilde{C}$ as $\widetilde{C} = C + \epsilon_{max} \left( U{}^t U - I_n \right)$, where $I_n \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ is the identity matrix of order $n$ The definition of $\widetilde{C}$ ensures that the perturbations are bounded by the maximum noise level $\epsilon_{max}$, i.e., $\left | C_{i,j} - \widetilde{C}_{i,j} \right | \leq \epsilon_{max} $, $i=1..n$, $j=i+1..n$. Whenever possible, Hardin et al.’s method should be used (computationally cheap, possibility to control the perturbations in terms of magnitude and distribution…), although it suffers from two major limitations: It is not applicable to correlation matrices singular or close to singular This is due to the condition on the maximum noise level $\epsilon_{max}$ in step #1 and is regrettably a problem for applications in finance13, because as highlighted in Opdyke2: correlation matrices estimated on large portfolios often (perhaps usually) are not positive definite for a wide range of reasons, and once positive definiteness is enforced using reliable, proven methods […], the smallest eigenvalue of the resulting matrix is almost always virtually zero. It might not be applicable to a specific correlation matrix, even if not remotely close to singular This is again due to the condition on the maximum noise level $\epsilon_{max}$ in step #1. For instance, in the case of the Harry Browne’s permanent portfolio introduced in the previous sub-section, Hardin et al.’s method cannot be used to perturb the coefficients of the asset correlation matrix represented in Figure 1 by more than +/- 0.2514, and in particular, cannot be used to generate perturbed U.S. stock-bond correlations higher than -0.0815! Random perturbations of the eigenvalues The second family of methods to randomly perturb a correlation matrix is based on random perturbations of its eigenvalues. A representative member of this family is the following method, with $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ a baseline correlation matrix to be perturbed: Compute the eigendecomposition of $C$, with $C = P \Lambda P^{-1}$ Generate $n$ randomly perturbed eigenvalues $\widetilde{\lambda}_i \geq 0$ satisfying $\sum_{i=1}^n \widetilde{\lambda}_i = n$ around the baseline eigenvalues $\lambda_i$, $i=1..n$ Galeeva et al.4 describe several algorithms and associated probability distributions that can be used in this step. Compute the associated randomly perturbed diagonal matrix $\widetilde{\Lambda} = Diag \left( \widetilde{\lambda}_1,…, \widetilde{\lambda}_n \right)$ Compute the randomly perturbed correlation matrix $\widetilde{C}$ as $\widetilde{C} = P^{-1} \widetilde{\Lambda} P$ Any method from this family guarantees, in theory, the validity of the resulting randomly perturbed correlation matrix. Nevertheless, in practice, Opdyke2 notes that: perturbing eigenvalues fails under challenging empirical conditions, e.g. when the positive definiteness of the matrix has to be enforced algorithmically […] and eigenvalues are virtually zero (or at least unreliably estimated) In addition, controlling the $\frac{n (n-1)}{2}$ perturbations $\left | C_{i,j} - \widetilde{C}_{i,j} \right |$, $i=1..n$, $j=i+1..n$, which is ultimately what matters, through the $n$ perturbations $\left| \lambda_i - \widetilde{\lambda}_i \right|$, $i=1..n$, sounds rather difficult. For these reasons, this family of methods might not be the first choice to generate random perturbations of a correlation matrix. Random perturbations of the correlative angles The third and last family of methods to randomly perturb a correlation matrix is based on random perturbations of its correlative angles. Here, a representative member of this family is the following method, with $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ a baseline correlation matrix to be perturbed: Compute the16 hypersphere decomposition of $C$, with $C = B B {}^t$ Generate $\frac{n (n-1)}{2}$ randomly perturbed correlative angles $\widetilde{\theta}_{i,j} \in [0, \pi]$ around the baseline correlative angles $\theta_{i,j}$, $i=1..n$, $j=1..i-1$ Compute the associated randomly perturbed lower triangular matrix $\widetilde{B}$ Compute the randomly perturbed correlation matrix $\widetilde{C}$ as $\widetilde{C} = \widetilde{B} \widetilde{B} {}^t$ Any method from this family again guarantees, in theory, the validity of the resulting randomly perturbed correlation matrix. This time, though, theory seems to be confirmed in practice: Galeeva et al.4 highlight that perturbing the correlative angles is done via a robust and efficient procedure which makes the whole approach very attractive4 Opdyke2 notes that perturbing the correlative angles appears to be more robust [in practice] than competing methods […] at least under challenging empirical conditions2 One important remark at this stage is that the exact algorithms and associated probability distributions used in step #2 greatly influence the behavior of this family of methods. For reference, Opdyke2 proposes an algorithm called Cosecant, Cotangent, Cotangent (C3) able to generate a distribution of correlative angles median-centered on the baseline correlative angles and satisfying many other desirable properties17. This algorithm generates a randomly perturbed correlative angle $\widetilde{\theta}_{i,j}$ around a baseline correlative angle $\theta_{i,j}$, $i=1..n$, $j=1..i-1$, as follows: Generate a random variable $X$ of probability density function the p.d.f. of Makalic and Schmidt18 defined by $f_{X}(x) = c_k sin^k (x)$, $x \in (0, \pi)$, $k \geq 1$, where $c_k$ is a normalization constant and $k = n - j$ Compute the perturbed correlative angle $\widetilde{\theta}_{i,j}$ as $ \widetilde{\theta}_{i,j} = \arctan \left( \tan \left( \theta_{i,j}-\frac{\pi}{2} \right) + \tan \left( X - \frac{\pi}{2} \right) \right) + \frac{\pi}{2}$ This family of methods is particularly well-suited to what is called generalized (correlation) stress testing in Opdyke2. More on this later. Still, like the family of methods based on random perturbations of the eigenvalues of a correlation matrix, one limitation of this family of methods is that controlling the $\frac{n (n-1)}{2}$ perturbations $\left | C_{i,j} - \widetilde{C}_{i,j} \right |$, $i=1..n$, $j=i+1..n$ sounds once again rather difficult. Implementation in Portfolio Optimizer Portfolio Optimizer allows to generate random perturbations of a baseline correlation matrix with: The naive method of randomly perturbing the coefficients of a correlation matrix Once a (potentially invalid) randomly perturbed correlation matrix is generated on client side, the endpoint /assets/correlation/matrix/nearest can be used to compute the nearest correlation matrix to this matrix. The method of randomly perturbing the correlative angles of a correlation matrix Together with Opdyke’s C3 algorithm2, through the endpoint /assets/correlation/matrix/perturbed Together with a proprietary algorithm able to control the magnitude of the perturbations of the correlation coefficients, again through the endpoint /assets/correlation/matrix/perturbed In this case, the distribution of the randomly perturbed correlation matrices is asymptotically uniform over the space of positive definite correlation matrices whose distance in terms of max norm to the baseline correlation matrix is at most equal to (resp. exactly equal to) a given maximum noise level (resp. a given exact noise level), similar in spirit to the method of Hardin et al.12 Example of application - Generalized stress testing Suppose that we are managing the Harry Browne’s permanent portfolio introduced earlier. Suppose also that on 18 February 2020, we feel something is off and would like to assess the impact of a potential correlation breakdown on this portfolio. Because this potential correlation breakdown could manifest in many ways (increased correlations between certain ETFs, decreased correlation between other ETFs…), it would be a mistake to impose any prior on how correlations should behave or should not behave19. So, what could we do? Direct correlation stress testing Following the previous sections, one possibility is to generate random perturbations around the current correlation matrix of the ETFs in the portfolio, which will allow to simulate many potential correlation breakdowns in a prior-free way. Once this is done, it will then be possible to evaluate the portfolio sensitivity to these random shocks. Such a direct (correlation) stress testing procedure allows to catch difficult-to-anticipate and/or difficult-to quantify second and third order effects of a large, multivariate, impactful scenario (e.g. pandemic + economic upheaval)2. In order to apply this procedure to the portfolio at hand, three prerequisites are necessary: Estimating the current correlation matrix $C_{PP}$ of the ETFs in the portfolio I will estimate $C_{PP}$ as the correlation matrix of the four ETFs in the portfolio20 over the 24-day period 14 January 2020 - 18 February 202021, which gives \[C_{PP} \approx \begin{pmatrix} 1 & -0.81 & -0.82 & -0.65 & \\ -0.81 & 1 & 0.84 & 0.70 \\ -0.82 & 0.84 & 1 & 0.75 \\ -0.65 & 0.70 & 0.75 & 1 \end{pmatrix}\] Selecting a method to randomly perturb the current correlation matrix $C_{PP}$ I will generate random perturbations of $C_{PP}$ thanks to Opdyke’s C3 algorithm2 as implemented through the Portfolio Optimizer endpoint /assets/correlation/matrix/perturbed. Determining how to evaluate the portfolio sensitivity to the random perturbations of the current correlation matrix $C_{PP}$ To keep things simple, I will evaluate the portfolio effective number of bets22 (ENB), using the Portfolio Optimizer endpoint /portfolio/analysis/effective-number-of-bets. With these prerequisites met, it is possible to generate random perturbations around the current correlation matrix $C_{PP}$ and compute the corresponding ENB distribution. An example of ENB distribution is provided in Figure 5, in the case of 10000 randomly perturbed correlation matrices. Figure 5. Distribution of the Effective Number of Bets (ENB), 10000 randomly perturbed correlation matrices around the current correlation matrix $C_{PP}$. Some associated summary statistics: Mean 1.89 Standard deviation 0.42 Minimum 1.01 5% percentile 1.33 25% percentile 1.60 Median 1.81 75% percentile 2.12 95% percentile 2.72 Maximum 3.98 And, for reference, the value of the current ENB of the portfolio, computed with the current correlation matrix $C_{PP}$: 1.87. A couple of comments: More than half of the ENB are located very close23 to the current ENB (1.87) These ENB are not representative of any real correlation breakdown. The 95% percentile of the ENB distribution (2.72) is much further apart from the current ENB (1.87) than the 5% percentile of the ENB distribution (1.33) This means that a correlation breakdown with the biggest impact on the ENB would correspond, maybe counter-intuitively, to a scenario of de-correlation24 of the four ETFs in the portfolio. As a side note, and again maybe counter-intuitively, the impact of such a correlation breakdown would then be rather harmless, because an increase in ENB is usually desirable from a portfolio diversification perspective. The minimum (1.01) and maximum (3.98) ENB both correspond to the theoretical minimum (1) and maximum (4) ENB This shows that all possible (correlation) unknown unknowns have been covered by the stress testing procedure. Reverse correlation stress testing In the previous sub-section, we have (empirically) established that the most impactful correlation breakdown scenario for the ENB of the portfolio corresponds to a de-correlation of the ETFs. The next logical step is now to compute a correlation matrix that would somehow best illustrate this de-correlation scenario, a procedure known as reverse (correlation) stress testing. For this, inspired by the concept of market states from Stepanov et al25, I propose to apply a k-means clustering algorithm26, with $k = 2$, to the randomly perturbed correlation matrices generated during the direct stress testing procedure. One output of this algorithm is two “representative” correlation matrices27, which are, in the case of the 10000 randomly perturbed correlation matrices of the previous sub-section: A correlation matrix $\widetilde{C}_{PP,1}$ “representative” of all the randomly perturbed correlation matrices that are “maximally similar” to the current correlation matrix $C_{PP}$ \[\widetilde{C}_{PP,1} \approx \begin{pmatrix} 1 & -0.76 & -0.78 & -0.67 & \\ -0.76 & 1 & 0.75 & 0.66 \\ -0.78 & 0.75 & 1 & 0.67 \\ -0.67 & 0.66 & 0.67 & 1 \end{pmatrix}\] A correlation matrix $\widetilde{C}_{PP,2}$ “representative” of all the randomly perturbed correlation matrices that are “maximally dissimilar” from $\widetilde{C}_{PP,1}$ \[\widetilde{C}_{PP,2} \approx \begin{pmatrix} 1 & -0.64 & -0.64 & -0.12 & \\ -0.64 & 1 & 0.53 & 0.07 \\ -0.64 & 0.53 & 1 & 0.06 \\ -0.12 & 0.07 & 0.06 & 1 \end{pmatrix}\] , with “representative”, “maximally similar” and “maximally dissimilar” loosely defined but usually corresponding to intuition28. In terms of market states25: The correlation matrix $\widetilde{C}_{PP,1}$ embodies the current market state Indeed, $\widetilde{C}_{PP,1}$ is very close to $C_{PP}$, as confirmed by the small Frobenius distance between these two matrices (0.21). In the current market state, the ENB is concentrated29 around the current ENB of the portfolio (1.87). The correlation matrix $\widetilde{C}_{PP,2}$ embodies a market state maximally distinct from the current market state, which I will call the de-correlation market state The rationale for this name is that a comparison between $C_{PP}$ and $\widetilde{C}_{PP,2}$ shows that this second market state corresponds to a de-correlation of the four ETFs in the portfolio30. In the de-correlation market state, the ENB is much higher that in the current market state, with for example the ENB computed with the correlation matrix $\widetilde{C}_{PP,2}$ equal to 2.97, well above the 95% percentile of the ENB distribution (2.72). Thanks to these observations, it is possible to conclude that $\widetilde{C}_{PP,2}$ is the correlation matrix that best illustrate the most impactful correlation breakdown scenario for the ENB of the portfolio. Reality check I will conclude this example on generalized stress testing by a reality check on the results obtained in the previous sub-sections. The correlation matrix $C_{PP, COVID}$ below is the correlation matrix of the four ETFs in the portfolio20 over the subsequent 24-day “full crisis” period 19 February 2020 - 23 March 202021. \[C_{PP, COVID} \approx \begin{pmatrix} 1 & -0.50 & -0.40 & 0.00 & \\ -0.50 & 1 & 0.71 & 0.25 \\ -0.40 & 0.71 & 1 & 0.19 \\ 0.00 & 0.25 & 0.19 & 1 \end{pmatrix}\] Of particular interest: The resemblance between $C_{PP, COVID}$ and $\widetilde{C}_{PP,2}$, confirmed by a relatively small Frobenius distance between these two matrices (0.58) The close match between the ENB computed with $C_{PP, COVID}$ (2.84) and the ENB computed with $\widetilde{C}_{PP,2}$ (2.97) In other words: The most theoretically impactful correlation breakdown scenario for the ENB of the portfolio actually occurred in practice, with an associated asset correlation matrix relatively close to the forecast asset correlation matrix The forecast of the impact on the ENB of the portfolio of this theoretical correlation breakdown scenario was nearly spot on! Conclusion The possibility to generate random perturbations of a correlation matrix has many other applications in risk management and even beyond. As an example, in mean-variance optimization, the resampled efficient frontier is partially based on random perturbations of a baseline correlation matrix. Also, as a last remark on Opdyke’s C3 algorithm2, a fully nonparametric version of it is described on Opdyke’s website. This extended version, called Nonparametric Angles-based Correlation (NAbC), covers not only correlation matrices based on any underlying data distributions31 but also correlation matrices beyond the standard Pearson’s correlation matrix, like Spearman’s Rho correlation matrix or Kendall’s Tau correlation matrix. For more random quantitative discussions, feel free to connect with me on LinkedIn or to follow me on Twitter. – Otherwise, these unknown unknows would become known unknows! ↩ See Opdyke, JD, Full Probabilistic Control for Direct and Robust, Generalized and Targeted Stressing of the Correlation Matrix (Even When Eigenvalues are Empirically Challenging) (May 30, 2020). QuantMinds/RiskMinds September 22-23, 2020. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 See Rapisarda, F., Brigo, D. and Mercurio, F. (2007) Parameterizing correlations: a geometric interpretation, IMA Journal of Management Mathematics, 18(1), pp. 55–73. ↩ See Risk Management in Commodity Markets: From Shipping to Agriculturals and Energy, Chapter 6, Roza Galeeva, Jiri Hoogland, and Alexander Eydeland, Measuring Correlation Risk for Energy Derivatives. ↩ ↩2 ↩3 ↩4 Of course, many other methods exist; for example, if the data generating process is known, it is possible to use a Monte-Carlo method to generate random samples from this process and compute their associated (sample) correlation matrix, which is then a perturbed version of the original correlation matrix. ↩ See A Changing Stock–Bond Correlation: Drivers and Implications, Alfie Brixton, Jordan Brooks, Pete Hecht, Antti Ilmanen, Thomas Maloney, Nicholas McQuinn, The Journal of Portfolio Management, Multi-Asset Special Issue 2023, 49 (4) 64 - 80. ↩ I’ll skip the math, but the interested reader can for example compute the eigenvalues of the asset correlation matrix represented in Figure 1 with the U.S. stock-bond correlation altered from -0.33 to 0.5. ↩ Nicholas J. Higham, Computing the Nearest Correlation Matrix—A Problem from Finance, IMA J. Numer. Anal. 22, 329–343, 2002. ↩ ↩2 Assuming that an algorithm to compute the nearest correlation matrix is available; otherwise, this method becomes immediately less straightforward to implement… ↩ Except if the initial randomly perturbed correlation matrices are actually valid, non-singular, correlation matrices. ↩ It is sometimes possible, though, to integrate an additional constraint on the minimum eigenvalue of the computed nearest valid correlation matrix into these algorithms. ↩ See Hardin, Johanna; Garcia, Stephan Ramon; Golan, David. A method for generating realistic correlation matrices. Ann. Appl. Stat. 7 (2013), no. 3, 1733–1762. ↩ ↩2 ↩3 ↩4 ↩5 This is maybe less of a problem in other applications, like in biology. ↩ Because the smallest eigenvalue of the asset correlation matrix represented in Figure 1 is 0.25. ↩ Similarly, Hardin et al.’s method cannot be used to generate random pertubations of the U.S. stock-bond correlation that would bring this correlation to a level lower than -0.58. ↩ Strictly speaking, when the correlation matrix $C$ is positive semi-definite, its hypersphere decomposition is not unique. ↩ C.f. Opdyke2 for the complete list of goals of his proposed approach. ↩ See Enes Makalic & Daniel F. Schmidt (2022) An efficient algorithm for sampling from sink(x) for generating random correlation matrices, Communications in Statistics - Simulation and Computation, 51:5, 2731-2735. ↩ For example, assuming that all correlations would go to one in case of a correlation breakdown is a prior. ↩ More specifically, of the daily arithmetic total returns of the four ETFs in the portfolio, whose prices have been retrieved using Tiingo. ↩ ↩2 I used a 24-day period because the period 19 February 2020 - 23 March 2020, which corresponds to the peak of the COVID financial crisis - c.f. Wikipedia - is also a 24-day period. ↩ ↩2 Due to personal preferences, I will use the effective number of bets based on principal components analysis as the factors extraction method; in addition, I will use the asset correlation matrix as if it were the asset covariance matrix to not introduce any additional variables (volatilities). ↩ More precisely, within a +/- 0.30 interval around 1.87. ↩ Intuitively, the higher the ENB of an equally-weighted portfolio, the more uncorrelated its constituents. ↩ See Stepanov, Y., Rinn, P., Guhr, T., Peinke, J., & Schafer, R. (2015). Stability and hierarchy of quasi-stationary states: financial markets as an example. Journal of Statistical Mechanics: Theory and Experiment, 2015(8), P08011. ↩ ↩2 I used the standard Scikit-Learn $k$-means algorithm. ↩ The k-means algorithm does not guarantee that the cluster centroids are valid correlation matrices; if this is not the case, it is possible to use either the $k$-medoids instead, or to compute the nearest correlation matrices to the cluster centroid. ↩ And more rigorously defined as per the $k$-means algorithm. ↩ To be noted that the ENB computed with the correlation matrix $C^{‘}_{PP,1}$ is nearly identical to the ENB computed with the current correlation matrix $C_{PP}$ (1.87). ↩ For example, U.S. stocks and Gold move from anti-correlated (-0.65) to nearly uncorrelated (-0.12). ↩ That is, data distributions characterized by any degree of serial correlation, asymmetry, non-stationarity, and/or heavy-tailedness. ↩Managing Missing Asset Returns in Portfolio Analysis and Optimization: Backfilling through Residuals Recycling2023-07-26T00:00:00-05:002023-07-26T00:00:00-05:00https://portfoliooptimizer.io/blog/managing-missing-asset-returns-in-portfolio-analysis-and-optimization-backfilling-through-residuals-recycling<p>In a multi-asset portfolio, it is usual that some assets have shorter return histories than others<sup id="fnref:0" role="doc-noteref"><a href="#fn:0" class="footnote">1</a></sup>.</p>
<p>Problem is, the presence of assets whose return histories differ in length makes it nearly impossible to use standard portfolio analysis and optimization methods…</p>
<p>Estimating the historical covariance matrix of a multi-asset portfolio, for example, is not possible when assets have unequal return histories, so that a typical workaround used in practice
is to consider only the common returns history. Unfortunately, this workaround has the side effect of discarding information contained
in the longer return histories, which might greatly impact the quality of the estimated covariance matrix<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote">2</a></sup>.</p>
<p>Sebastien Page proposes a solution to this problem in his paper <em>How to Combine Long and Short Return Histories Efficiently</em><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup>. It consists in simulating missing asset returns
based on the relationships observed between all assets over their common returns history while accounting for the associated estimation error.</p>
<p>In this blog post, I will describe in detail Page’s method and analyze how it behaves empirically with a two-asset class portfolio made of U.S. and E.M. stocks.</p>
<blockquote>
<p><strong><em>Notes:</em></strong></p>
<ul>
<li>A Jupyter notebook corresponding to this post is available on Binder - <a href="https://mybinder.org/v2/gh/lequant40/portfolio-optimizer-notebooks/HEAD?labpath=managing_missing_asset_returns_backfilling_through_residuals_recycling.ipynb"><img src="https://mybinder.org/badge_logo.svg" alt="Binder" /></a></li>
</ul>
</blockquote>
<h2 id="pages-method-to-backfill-missing-asset-returns">Page’s method to backfill missing asset returns</h2>
<h3 id="single-starting-date">Single starting date</h3>
<p>Let be two groups of assets $X$ and $Y$ such that:</p>
<ul>
<li>The “long” group of assets $X = \left( X_1,…,X_n \right)$ is made of $n \geq 1$ assets, all sharing together a common returns history of length $L$</li>
<li>The “short” group of assets $Y = \left( Y_1,…,X_m \right)$ is made of $m \geq 1$ assets, all sharing together a common returns history of length $S < L$ as well as sharing a common (ending) returns history of length $L - S + 1$ with the group of assets $X$</li>
</ul>
<p>In such a situation, illustrated in Figure 1 adapted from Page<sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup>, returns for the group of assets $Y$ are missing for the whole (beginning) returns history $t = 1..L - S$.</p>
<figure>
<a href="/assets/images/blog/backfilled-page-methodology.png"><img src="/assets/images/blog/backfilled-page-methodology.png" alt="Missing asset returns with a single starting date. Source: Page." /></a>
<figcaption>Figure 1. Missing asset returns with a single starting date. Source: Page.</figcaption>
</figure>
<p>Building on the <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation">maximum likelihood procedure</a><sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote">4</a></sup> described in Stambaugh<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote">5</a></sup>,
Page<sup id="fnref:2:2" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup> introduces a 3-step method in order to <em>combine [these] long and short return histories efficiently</em><sup id="fnref:2:3" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup> and backfill the $ m \times \left( L - S \right)$ missing asset returns.</p>
<h4 id="step-1---estimation-of-the-asset-long-mean-returns">Step 1 - Estimation of the asset “long” mean returns</h4>
<p>The vector $\hat{\mu}_{Y,L} \in \mathbb{R}^{m}$ of the mean returns of the assets belonging to the group $Y$ is estimated over the long returns history by:</p>
\[\hat{\mu}_{Y,L} = \mu_{Y,S} + \beta \left( \mu_{X,L} - \mu_{X,S} \right)\]
<p>, with:</p>
<ul>
<li>$\mu_{Y,S} = \left( \mu_{Y_1,S}, …, \mu_{Y_m,S} \right) {}^t \in \mathbb{R}^{m}$ the vector of the mean returns of the assets belonging to the group $Y$, computed over the short returns history</li>
<li>$\beta \in \mathcal{M}(\mathbb{R}^{m \times n})$, the vector of <a href="https://en.wikipedia.org/wiki/Ordinary_least_squares">standard regression coefficients</a> defined by $\beta = \Sigma_{XX,S}^{-1} \Sigma_{XY,S}$, with:
<ul>
<li>$\Sigma_{XX,S} \in \mathcal{M}(\mathbb{R}^{n \times n})$ the covariance matrix of the assets belonging to the group $X$, computed over the short returns history</li>
<li>$\Sigma_{XY,S} \in \mathcal{M}(\mathbb{R}^{m \times n})$ the covariance matrix between the assets belonging to the group $X$ and the assets belonging to the group $Y$, computed over the short returns history</li>
</ul>
</li>
<li>$\mu_{X,L} = \left( \mu_{X_1,L}, …, \mu_{X_n,L} \right) {}^t \in \mathbb{R}^{n}$, the vector of the mean returns of the assets belonging to the group $X$, computed over the long returns history</li>
<li>$\mu_{X,S} = \left( \mu_{X_1,S}, …, \mu_{X_n,S} \right) {}^t\in \mathbb{R}^{n}$, the vector of the mean returns of the assets belonging to the group $X$, computed over the short returns history</li>
</ul>
<h4 id="step-2-estimation-of-the-asset-long-covariance-matrix">Step 2: Estimation of the asset long covariance matrix</h4>
<p>The covariance matrix $\hat{\Sigma}_{YY,L} \in \mathcal{M}(\mathbb{R}^{m \times m})$ of the assets belonging to the group $Y$ is estimated over the long returns history by:</p>
\[\hat{\Sigma}_{YY,L} = \Sigma_{YY,S} + \beta \left( \Sigma_{XX,L} - \Sigma_{XX,S} \right) \beta {}^t\]
<p>, with:</p>
<ul>
<li>$\Sigma_{YY,S} \in \mathcal{M}(\mathbb{R}^{m \times m})$ the covariance matrix of the assets belonging to the group $Y$, computed over the short returns history</li>
<li>$\Sigma_{XX,L} \in \mathcal{M}(\mathbb{R}^{n \times n})$ the covariance matrix of the assets belonging to the group $X$, computed over the long returns history</li>
</ul>
<p>Similarly, the covariance matrix $\hat{\Sigma}_{XY,L} \in \mathcal{M}(\mathbb{R}^{m \times n})$ between the assets belonging to the group $X$ and the assets belonging to the group $Y$
is estimated over the long returns history by:</p>
\[\hat{\Sigma}_{XY,L} = \Sigma_{XY,S} + \beta \left( \Sigma_{XX,L} - \Sigma_{XX,S} \right)\]
<h4 id="step-3-backfilling-of-the-missing-long-asset-returns">Step 3: Backfilling of the missing long asset returns</h4>
<p>Once the long mean vectors and covariance matrix have been estimated thanks to step 1 and step 2, it is possible to simulate the missing (multivariate) asset returns
$Y_t = \left( Y_{1,t},…,Y_{m,t} \right) {}^t \in \mathbb{R}^{m}$ for $t = 1..L - S$.</p>
<p>Page<sup id="fnref:2:4" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup> mentions 3 backfilling procedures for doing so, all based on a transformation of the long (multivariate) asset returns $X_t = \left( X_{1,t},…,X_{n,t} \right) {}^t \in \mathbb{R}^{n}$:</p>
<ul>
<li>
<p>Beta adjustment</p>
<p>The <em>beta adjustment</em> backfilling procedure is based on the deterministic transformation:</p>
\[Y_t = \mu_{b_t}\]
<p>, with $\mu_{b_t} = \hat{\mu}_{Y,L} + \hat{\Sigma}_{XY,L} \Sigma_{XX,L}^{-1} \left( X_t - \mu_{X,L} \right) \in \mathbb{R}^{m}$</p>
<p>The main problem with this procedure is that it gives a false sense of uniqueness for the backfilled asset returns.</p>
<p>Indeed, as Page puts it<sup id="fnref:2:5" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup>:</p>
<blockquote>
<p>[…] the solution will not be unique: Many sets of simulated missing returns correspond to a given covariance matrix. This feature of the backfilling process is intuitive because, after all, the missing returns are unknown and so the model must recognize the uncertainty around the estimates.</p>
</blockquote>
<p>So, this backfilling procedure is probably best used only for bechmarking purposes.</p>
</li>
<li>
<p>Conditional sampling</p>
<p>In order to take into account estimation error into the backfilled asset returns, the <em>conditional sampling</em> backfilling procedure models the missing asset returns as a (multivariate) Gaussian distribution:</p>
\[Y_t \sim \mathcal{N} \left(\mu_{b_t}, \Sigma_b \right)\]
<p>, with:</p>
<ul>
<li>$\mu_{b_t}$, defined in the beta adjustment backfilling procedure, the mean vector of the Gaussian distribution</li>
<li>$\Sigma_b = \hat{\Sigma}_{YY,L} - \hat{\Sigma}_{XY,L} \Sigma_{XX,L}^{-1} \hat{\Sigma}_{XY,L} {}^t \in \mathcal{M}(\mathbb{R}^{m \times m})$ the covariance matrix of the Gaussian distribution<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote">6</a></sup></li>
</ul>
<p>Here, Page<sup id="fnref:2:6" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup> notes that at the null noise limit (i.e., $\Sigma_b = 0$), this backfilling procedure becomes equivalent to the beta adjustment backfilling procedure.</p>
</li>
<li>
<p>Residuals recycling</p>
<p>Modeling the missing asset returns by a Gaussian distribution might be appropriate in some cases, depending on the assets and on the returns measurement frequency<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote">7</a></sup>,
but generally speaking, financial assets exhibit skewed and fat-tailed return distributions.</p>
<p>So, it would make sense if backfilled asset returns were to take into account these characteristics.</p>
<p>This is the aim of the <em>residuals recycling</em> backfilling procedure, which works as follows:</p>
<ul>
<li>For $t = L - S + 1 .. T$, the difference $R_t$ between the (non-missing) asset returns $X_t$ and $\mu_{b_t}$, defined in the beta adjustment backfilling procedure, is computed</li>
</ul>
\[R_t = X_t - \mu_{b_t}\]
<ul>
<li>For $t = 1..L - S$, the missing asset returns are backfilled as</li>
</ul>
\[Y_t = \mu_{b_t} + R_{t'}\]
<p>, with $t’ \in [L - S + 1..T]$ chosen uniformly at random.</p>
<p>Page<sup id="fnref:2:7" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup> highlights that this backfilling procedure <em>represents a hybrid between [maximum likelihood estimation] and bootstrapping</em><sup id="fnref:2:8" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup> and that it <em>provides a simple, relatively assumption-free approach to account for fat
tails and other features of the distribution beyond means and covariances in the backfilling process.</em><sup id="fnref:2:9" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup>.</p>
</li>
</ul>
<h3 id="multiple-starting-dates">Multiple starting dates</h3>
<p>Page’s method as described in the previous paragraph assumes that all the assets belonging to the short group of assets $Y$ share a common returns history, and in particular a common returns history starting date.</p>
<p>In practice, though, most assets do not usually share a common returns history starting date, as illustrated in Figure 2 adapted from Gramacy et al.<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote">8</a></sup>.</p>
<figure>
<a href="/assets/images/blog/backfilled-gramacy-multiple-dates.png"><img src="/assets/images/blog/backfilled-gramacy-multiple-dates.png" alt="Missing asset returns with several starting dates. Source: Gramacy et al." /></a>
<figcaption>Figure 2. Missing asset returns with several starting dates. Source: Gramacy et al.</figcaption>
</figure>
<p>In such a situation, a possible way to extend Page’s method is to apply it iteratively as proposed in Jiang and Martin<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote">9</a></sup>.</p>
<p>For this, let be $G_1,…,G_J, J \geq 1$ groups of assets whose length of returns history $L = L_1 > L_2 > … > L_J \geq 1$ differ, but which share a common returns history ending date, as illustrated in Figure 2.</p>
<p>Then, Page’s method can be extended as follows:</p>
<ul>
<li>Apply Page’s method to the long group of assets $X = G_1$ and to the short group of assets $Y = G_2$</li>
<li>Once missing asset returns in the group $Y = G_2$ have been backfilled, apply Page’s method to the long group of assets $X = G_1 \cup G_2$ and to the short group of assets $Y = G_3$</li>
<li>…</li>
<li>Once missing asset returns in the group $Y = G_{J-1}$ have been backfilled, apply Page’s method to the long group of assets $X = G_1 \cup G_2 \cup … \cup G_{J-1}$ and to the short group of assets $Y = G_J$</li>
</ul>
<h3 id="practical-details">Practical details</h3>
<p>Some numerical subtelties need to be taken into account when implementing Page’s method, among which that:</p>
<ul>
<li>The covariance matrix $\Sigma_{XX,S}$ of the assets belonging to the group $X$, computed over the short returns history, might not be invertible, c.f. for example Gramacy et al.<sup id="fnref:10:1" role="doc-noteref"><a href="#fn:10" class="footnote">8</a></sup></li>
<li>The covariance matrix $\Sigma_b$ of the Gaussian distribution appearing in the <em>conditional sampling</em> backfilling method might not be <a href="https://en.wikipedia.org/wiki/Positive_semidefinite_matrix">positive semi-definite</a></li>
</ul>
<h3 id="caveats">Caveats</h3>
<p>Page<sup id="fnref:2:10" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup> highlights that his method does not <em>magically transforms missing data into additional information</em><sup id="fnref:2:11" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup> and lists several of its limitations.</p>
<p>I think one of the most important of these is that<sup id="fnref:2:12" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup></p>
<blockquote>
<p>The model assumes that betas between the existing [asset returns] and the missing [asset returns] do not change, which is not necessarily a realistic assumption.</p>
</blockquote>
<p>To also be noted that even with this method, backfilling missing returns for a completely new asset class might unfortunately remain elusive.</p>
<p>For example, in their piece <em><a href="https://www.twosigma.com/articles/risk-analysis-of-crypto-assets/">Risk Analysis of Crypto Assets</a></em>, people at <a href="https://www.twosigma.com/">Two Sigma</a>
concludes that <em>Bitcoin is not easily explained by the Two Sigma Factor Lens, nor is it substantially correlated to other currencies or any of the major commodities</em>, so that no long returns history
of any asset class seems to contain sufficient information to accurately backfill Bitcoin returns…</p>
<h2 id="implementation-in-porfolio-optimizer">Implementation in Porfolio Optimizer</h2>
<p><strong>Portfolio Optimizer</strong> implements the extension of Page’s method for multiple starting dates described in the previous section, together with specific care for the numerical subtelties also described in the previous section,
through the endpoint <a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/assets/returns/backfilled</code></a>.</p>
<h2 id="quality-of-backfilled-returns">Quality of backfilled returns</h2>
<p>Page’s residuals recycling backfilling procedure has been designed <em>to better account for non-normal distributions</em><sup id="fnref:2:13" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup>.</p>
<p>To which extent is this goal reached in practice?</p>
<p>Let’s check.</p>
<h3 id="theoretical-asset-returns-distribution">Theoretical asset returns distribution</h3>
<p>Page<sup id="fnref:2:14" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup> uses a simulation framework in order to compare backfilled v.s. expected returns for a <a href="https://en.wikipedia.org/wiki/Multivariate_t-distribution">bivariate $t$-distribution</a> and obtain the results
displayed in Figure 3, taken from Page<sup id="fnref:2:15" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup>.</p>
<figure>
<a href="/assets/images/blog/backfilled-page-student-t.png"><img src="/assets/images/blog/backfilled-page-student-t.png" alt="Higher moments for backfilled simulated returns compared with known bivariate fat-tailed t-distribution. Source: Page." /></a>
<figcaption>Figure 3. Higher moments for backfilled simulated returns compared with known bivariate fat-tailed t-distribution. Source: Page.</figcaption>
</figure>
<p>Results from Figure 3 leads to the following conclusions:</p>
<ul>
<li>Missing returns backfilled with the conditional sampling backfilling procedure converge to a (univariate) Gaussian distribution</li>
<li>Missing returns backfilled with the residuals recycling backfilling procedure seem to have a sample kurtosis very close to the sample kurtosis of the theoretical bivariate $t$-distribution</li>
</ul>
<p>In other words, the residuals recycling backfilling procedure seems to reach its advertised goal, at least when applied to a known theoretical distribution.</p>
<h3 id="empirical-asset-returns-distribution">Empirical asset returns distribution</h3>
<p>More empirically, Page<sup id="fnref:2:16" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup> uses monthly returns on:</p>
<ul>
<li>U.S. stocks, represented by the <a href="https://fred.stlouisfed.org/series/WILL5000IND">Wilshire 5000 Total Market Index</a></li>
<li>E.M. stocks, represented by the <a href="https://www.msci.com/end-of-day-data-search">MSCI Emerging Markets Index (Total Return)</a></li>
</ul>
<p>in order to compare backfilled v.s. actual returns for E.M. stocks.</p>
<p>In more details:</p>
<ul>
<li>
<p>Returns on U.S. stocks from January 1988 to May 2011 (long returns history) and returns on E.M. stocks from February 1998 to May 2011 (short returns history) are used to backfill returns on E.M. stocks from January 1988 to January 1998<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote">10</a></sup></p>
<p>This process is repeated 10000 times to obtain 10000 different <em>backfilled paths for emerging-market stocks</em><sup id="fnref:2:17" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup>.</p>
</li>
<li>
<p>Moments are computed on each backfilled path, and the grand average of these moments is computed over all backfilled paths</p>
<p>The moments of interest are the mean, the variance, the skewness and the kurtosis of backfilled returns.</p>
</li>
<li>
<p>Moments are computed on E.M. stocks using actual returns data from January 1988 to January 1998<sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote">11</a></sup></p>
</li>
</ul>
<p>Using <strong>Portfolio Optimizer</strong>, this test can easily be reproduced<sup id="fnref:14" role="doc-noteref"><a href="#fn:14" class="footnote">12</a></sup>, c.f. <a href="https://mybinder.org/v2/gh/lequant40/portfolio-optimizer-notebooks/HEAD?labpath=managing_missing_asset_returns_backfilling_through_residuals_recycling.ipynb">the Jupyter notebook corresponding to this post</a>, which gives for example the figures below:</p>
<table>
<thead>
<tr>
<th>Backfilling procedure</th>
<th>Mean</th>
<th>Variance</th>
<th>Skewness</th>
<th>Kurtosis</th>
</tr>
</thead>
<tbody>
<tr>
<td>None (actual returns)</td>
<td>1.5%</td>
<td>0.0039</td>
<td>-0.25</td>
<td>3.83</td>
</tr>
<tr>
<td>Conditional sampling</td>
<td>2.2%</td>
<td>0.0036</td>
<td>-0.07</td>
<td>3.11</td>
</tr>
<tr>
<td>Recycled residuals</td>
<td>2.2%</td>
<td>0.0036</td>
<td>-0.20</td>
<td>3.24</td>
</tr>
</tbody>
</table>
<p>These figures clearly show that the recycled residuals backfilling procedure generate asset returns that are closer, in terms of higher moments, to actual returns v.s. the conditional sampling backfilling procedure.</p>
<p>From this perspective, and even though the mean of backfilled returns is quite far off the mean of actual returns, the recycled residuals backfilling procedure can definitely be considered to
properly <em>recover fat tails in the missing [returns] data</em><sup id="fnref:2:18" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup>.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Page’s method <em>provides a formal, plug-and-play solution</em><sup id="fnref:2:19" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup> to the problem of unequal return histories in portfolio analysis and optimization.</p>
<p>While other methods certainly exist, like methods based on risk factors, these other methods usually tend to be more complex, so that Page’s method is a very good choice for anyone requiring a simple way to manage
missing asset returns.</p>
<p>For more quantitative methods with just the right level of complexity, feel free to <a href="https://www.linkedin.com/in/roman-rubsamen/">connect with me on LinkedIn</a> or to <a href="https://twitter.com/portfoliooptim">follow me on Twitter</a>.</p>
<p>–</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:0" role="doc-endnote">
<p>For instance, historical returns of Emerging Markets (E.M.) stocks are available from the late 1980s<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote">13</a></sup> while historical returns of U.S. stocks are available from the late 1920s<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote">14</a></sup> or even earlier. <a href="#fnref:0" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:1" role="doc-endnote">
<p>See <a href="http://www.jstor.org/stable/4480761">Steven P. Peterson, John T. Grier, Covariance Misspecification in Asset Allocation, Financial Analysts Journal, Vol. 62, No. 4 (Jul. - Aug., 2006), pp. 76-85</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>See <a href="https://www.tandfonline.com/doi/abs/10.2469/faj.v69.n1.3">Sebastien Page (2013) How to Combine Long and Short Return Histories Efficiently, Financial Analysts Journal, 69:1, 45-52</a>. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:2:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:2:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:2:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a> <a href="#fnref:2:5" class="reversefootnote" role="doc-backlink">↩<sup>6</sup></a> <a href="#fnref:2:6" class="reversefootnote" role="doc-backlink">↩<sup>7</sup></a> <a href="#fnref:2:7" class="reversefootnote" role="doc-backlink">↩<sup>8</sup></a> <a href="#fnref:2:8" class="reversefootnote" role="doc-backlink">↩<sup>9</sup></a> <a href="#fnref:2:9" class="reversefootnote" role="doc-backlink">↩<sup>10</sup></a> <a href="#fnref:2:10" class="reversefootnote" role="doc-backlink">↩<sup>11</sup></a> <a href="#fnref:2:11" class="reversefootnote" role="doc-backlink">↩<sup>12</sup></a> <a href="#fnref:2:12" class="reversefootnote" role="doc-backlink">↩<sup>13</sup></a> <a href="#fnref:2:13" class="reversefootnote" role="doc-backlink">↩<sup>14</sup></a> <a href="#fnref:2:14" class="reversefootnote" role="doc-backlink">↩<sup>15</sup></a> <a href="#fnref:2:15" class="reversefootnote" role="doc-backlink">↩<sup>16</sup></a> <a href="#fnref:2:16" class="reversefootnote" role="doc-backlink">↩<sup>17</sup></a> <a href="#fnref:2:17" class="reversefootnote" role="doc-backlink">↩<sup>18</sup></a> <a href="#fnref:2:18" class="reversefootnote" role="doc-backlink">↩<sup>19</sup></a> <a href="#fnref:2:19" class="reversefootnote" role="doc-backlink">↩<sup>20</sup></a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>Page<sup id="fnref:2:20" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup> note that in the context of his paper, asset returns series are not assumed to be multivariate Gaussian, so that the maximum likelihood procedure of Stambaugh actually becomes a <a href="https://en.wikipedia.org/wiki/Quasi-maximum_likelihood_estimate">quasi-maximum likelihood</a> procedure. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>See <a href="https://www.sciencedirect.com/science/article/pii/S0304405X97000202">Stambaugh, Robert F. 1997. Analyzing Investments Whose Histories Differ in Length. Journal of Financial Economics, vol. 45, no. 3 (September):285–331.</a>. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>To be noted that contrary to $\mu_{b_t}$ , $\Sigma_b$ is time-independant. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>Asset returns have a tendency to follow a distribution closer and closer to a Gaussian distribution the more the time period over which they are computed increases; this empirical property is called <em>aggregational Gaussianity</em>, c.f. Cont<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote">15</a></sup>. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>See <a href="https://arxiv.org/abs/0710.5837">Robert B. Gramacy, Joo Hee Lee, Ricardo Silva, On estimating covariances between many assets with histories of highly variable length, arXiv</a>. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:10:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:11" role="doc-endnote">
<p>See <a href="https://ssrn.com/abstract=2833057">Jiang, Yindeng and Martin, R. Douglas, Turning Long and Short Return Histories into Equal Histories: A Better Way to Backfill Returns (August 31, 2016)</a>. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:12" role="doc-endnote">
<p>To be noted that there is a typo in the heading of Table 3 in Page<sup id="fnref:2:21" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup>, because known data is taken over January 1988 - January 1999, which is not 10 years but 20 years! <a href="#fnref:12" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:13" role="doc-endnote">
<p>Returns on E.M. stocks are indeed available from the full period January 1988 to May 2011. <a href="#fnref:13" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:14" role="doc-endnote">
<p>Results are not strictly identical to those of Page<sup id="fnref:2:22" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup>, due to the random nature of the test; in addition, the skewness of actual E.M. returns is -0.26 in Page<sup id="fnref:2:23" role="doc-noteref"><a href="#fn:2" class="footnote">3</a></sup> v.s. -0.25 here, probably due to some slight different in returns data. <a href="#fnref:14" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>C.f. <a href="https://www.msci.com/end-of-day-data-search">the MSCI website</a>. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>C.f. <a href="http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html">the Kenneth French’s website</a>. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>See <a href="https://www.tandfonline.com/doi/abs/10.1080/713665670">R. Cont (2001) Empirical properties of asset returns: stylized facts and statistical issues, Quantitative Finance, 1:2, 223-236</a>. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Roman R.In a multi-asset portfolio, it is usual that some assets have shorter return histories than others1. Problem is, the presence of assets whose return histories differ in length makes it nearly impossible to use standard portfolio analysis and optimization methods… Estimating the historical covariance matrix of a multi-asset portfolio, for example, is not possible when assets have unequal return histories, so that a typical workaround used in practice is to consider only the common returns history. Unfortunately, this workaround has the side effect of discarding information contained in the longer return histories, which might greatly impact the quality of the estimated covariance matrix2. Sebastien Page proposes a solution to this problem in his paper How to Combine Long and Short Return Histories Efficiently3. It consists in simulating missing asset returns based on the relationships observed between all assets over their common returns history while accounting for the associated estimation error. In this blog post, I will describe in detail Page’s method and analyze how it behaves empirically with a two-asset class portfolio made of U.S. and E.M. stocks. Notes: A Jupyter notebook corresponding to this post is available on Binder - Page’s method to backfill missing asset returns Single starting date Let be two groups of assets $X$ and $Y$ such that: The “long” group of assets $X = \left( X_1,…,X_n \right)$ is made of $n \geq 1$ assets, all sharing together a common returns history of length $L$ The “short” group of assets $Y = \left( Y_1,…,X_m \right)$ is made of $m \geq 1$ assets, all sharing together a common returns history of length $S < L$ as well as sharing a common (ending) returns history of length $L - S + 1$ with the group of assets $X$ In such a situation, illustrated in Figure 1 adapted from Page3, returns for the group of assets $Y$ are missing for the whole (beginning) returns history $t = 1..L - S$. Figure 1. Missing asset returns with a single starting date. Source: Page. Building on the maximum likelihood procedure4 described in Stambaugh5, Page3 introduces a 3-step method in order to combine [these] long and short return histories efficiently3 and backfill the $ m \times \left( L - S \right)$ missing asset returns. Step 1 - Estimation of the asset “long” mean returns The vector $\hat{\mu}_{Y,L} \in \mathbb{R}^{m}$ of the mean returns of the assets belonging to the group $Y$ is estimated over the long returns history by: \[\hat{\mu}_{Y,L} = \mu_{Y,S} + \beta \left( \mu_{X,L} - \mu_{X,S} \right)\] , with: $\mu_{Y,S} = \left( \mu_{Y_1,S}, …, \mu_{Y_m,S} \right) {}^t \in \mathbb{R}^{m}$ the vector of the mean returns of the assets belonging to the group $Y$, computed over the short returns history $\beta \in \mathcal{M}(\mathbb{R}^{m \times n})$, the vector of standard regression coefficients defined by $\beta = \Sigma_{XX,S}^{-1} \Sigma_{XY,S}$, with: $\Sigma_{XX,S} \in \mathcal{M}(\mathbb{R}^{n \times n})$ the covariance matrix of the assets belonging to the group $X$, computed over the short returns history $\Sigma_{XY,S} \in \mathcal{M}(\mathbb{R}^{m \times n})$ the covariance matrix between the assets belonging to the group $X$ and the assets belonging to the group $Y$, computed over the short returns history $\mu_{X,L} = \left( \mu_{X_1,L}, …, \mu_{X_n,L} \right) {}^t \in \mathbb{R}^{n}$, the vector of the mean returns of the assets belonging to the group $X$, computed over the long returns history $\mu_{X,S} = \left( \mu_{X_1,S}, …, \mu_{X_n,S} \right) {}^t\in \mathbb{R}^{n}$, the vector of the mean returns of the assets belonging to the group $X$, computed over the short returns history Step 2: Estimation of the asset long covariance matrix The covariance matrix $\hat{\Sigma}_{YY,L} \in \mathcal{M}(\mathbb{R}^{m \times m})$ of the assets belonging to the group $Y$ is estimated over the long returns history by: \[\hat{\Sigma}_{YY,L} = \Sigma_{YY,S} + \beta \left( \Sigma_{XX,L} - \Sigma_{XX,S} \right) \beta {}^t\] , with: $\Sigma_{YY,S} \in \mathcal{M}(\mathbb{R}^{m \times m})$ the covariance matrix of the assets belonging to the group $Y$, computed over the short returns history $\Sigma_{XX,L} \in \mathcal{M}(\mathbb{R}^{n \times n})$ the covariance matrix of the assets belonging to the group $X$, computed over the long returns history Similarly, the covariance matrix $\hat{\Sigma}_{XY,L} \in \mathcal{M}(\mathbb{R}^{m \times n})$ between the assets belonging to the group $X$ and the assets belonging to the group $Y$ is estimated over the long returns history by: \[\hat{\Sigma}_{XY,L} = \Sigma_{XY,S} + \beta \left( \Sigma_{XX,L} - \Sigma_{XX,S} \right)\] Step 3: Backfilling of the missing long asset returns Once the long mean vectors and covariance matrix have been estimated thanks to step 1 and step 2, it is possible to simulate the missing (multivariate) asset returns $Y_t = \left( Y_{1,t},…,Y_{m,t} \right) {}^t \in \mathbb{R}^{m}$ for $t = 1..L - S$. Page3 mentions 3 backfilling procedures for doing so, all based on a transformation of the long (multivariate) asset returns $X_t = \left( X_{1,t},…,X_{n,t} \right) {}^t \in \mathbb{R}^{n}$: Beta adjustment The beta adjustment backfilling procedure is based on the deterministic transformation: \[Y_t = \mu_{b_t}\] , with $\mu_{b_t} = \hat{\mu}_{Y,L} + \hat{\Sigma}_{XY,L} \Sigma_{XX,L}^{-1} \left( X_t - \mu_{X,L} \right) \in \mathbb{R}^{m}$ The main problem with this procedure is that it gives a false sense of uniqueness for the backfilled asset returns. Indeed, as Page puts it3: […] the solution will not be unique: Many sets of simulated missing returns correspond to a given covariance matrix. This feature of the backfilling process is intuitive because, after all, the missing returns are unknown and so the model must recognize the uncertainty around the estimates. So, this backfilling procedure is probably best used only for bechmarking purposes. Conditional sampling In order to take into account estimation error into the backfilled asset returns, the conditional sampling backfilling procedure models the missing asset returns as a (multivariate) Gaussian distribution: \[Y_t \sim \mathcal{N} \left(\mu_{b_t}, \Sigma_b \right)\] , with: $\mu_{b_t}$, defined in the beta adjustment backfilling procedure, the mean vector of the Gaussian distribution $\Sigma_b = \hat{\Sigma}_{YY,L} - \hat{\Sigma}_{XY,L} \Sigma_{XX,L}^{-1} \hat{\Sigma}_{XY,L} {}^t \in \mathcal{M}(\mathbb{R}^{m \times m})$ the covariance matrix of the Gaussian distribution6 Here, Page3 notes that at the null noise limit (i.e., $\Sigma_b = 0$), this backfilling procedure becomes equivalent to the beta adjustment backfilling procedure. Residuals recycling Modeling the missing asset returns by a Gaussian distribution might be appropriate in some cases, depending on the assets and on the returns measurement frequency7, but generally speaking, financial assets exhibit skewed and fat-tailed return distributions. So, it would make sense if backfilled asset returns were to take into account these characteristics. This is the aim of the residuals recycling backfilling procedure, which works as follows: For $t = L - S + 1 .. T$, the difference $R_t$ between the (non-missing) asset returns $X_t$ and $\mu_{b_t}$, defined in the beta adjustment backfilling procedure, is computed \[R_t = X_t - \mu_{b_t}\] For $t = 1..L - S$, the missing asset returns are backfilled as \[Y_t = \mu_{b_t} + R_{t'}\] , with $t’ \in [L - S + 1..T]$ chosen uniformly at random. Page3 highlights that this backfilling procedure represents a hybrid between [maximum likelihood estimation] and bootstrapping3 and that it provides a simple, relatively assumption-free approach to account for fat tails and other features of the distribution beyond means and covariances in the backfilling process.3. Multiple starting dates Page’s method as described in the previous paragraph assumes that all the assets belonging to the short group of assets $Y$ share a common returns history, and in particular a common returns history starting date. In practice, though, most assets do not usually share a common returns history starting date, as illustrated in Figure 2 adapted from Gramacy et al.8. Figure 2. Missing asset returns with several starting dates. Source: Gramacy et al. In such a situation, a possible way to extend Page’s method is to apply it iteratively as proposed in Jiang and Martin9. For this, let be $G_1,…,G_J, J \geq 1$ groups of assets whose length of returns history $L = L_1 > L_2 > … > L_J \geq 1$ differ, but which share a common returns history ending date, as illustrated in Figure 2. Then, Page’s method can be extended as follows: Apply Page’s method to the long group of assets $X = G_1$ and to the short group of assets $Y = G_2$ Once missing asset returns in the group $Y = G_2$ have been backfilled, apply Page’s method to the long group of assets $X = G_1 \cup G_2$ and to the short group of assets $Y = G_3$ … Once missing asset returns in the group $Y = G_{J-1}$ have been backfilled, apply Page’s method to the long group of assets $X = G_1 \cup G_2 \cup … \cup G_{J-1}$ and to the short group of assets $Y = G_J$ Practical details Some numerical subtelties need to be taken into account when implementing Page’s method, among which that: The covariance matrix $\Sigma_{XX,S}$ of the assets belonging to the group $X$, computed over the short returns history, might not be invertible, c.f. for example Gramacy et al.8 The covariance matrix $\Sigma_b$ of the Gaussian distribution appearing in the conditional sampling backfilling method might not be positive semi-definite Caveats Page3 highlights that his method does not magically transforms missing data into additional information3 and lists several of its limitations. I think one of the most important of these is that3 The model assumes that betas between the existing [asset returns] and the missing [asset returns] do not change, which is not necessarily a realistic assumption. To also be noted that even with this method, backfilling missing returns for a completely new asset class might unfortunately remain elusive. For example, in their piece Risk Analysis of Crypto Assets, people at Two Sigma concludes that Bitcoin is not easily explained by the Two Sigma Factor Lens, nor is it substantially correlated to other currencies or any of the major commodities, so that no long returns history of any asset class seems to contain sufficient information to accurately backfill Bitcoin returns… Implementation in Porfolio Optimizer Portfolio Optimizer implements the extension of Page’s method for multiple starting dates described in the previous section, together with specific care for the numerical subtelties also described in the previous section, through the endpoint /assets/returns/backfilled. Quality of backfilled returns Page’s residuals recycling backfilling procedure has been designed to better account for non-normal distributions3. To which extent is this goal reached in practice? Let’s check. Theoretical asset returns distribution Page3 uses a simulation framework in order to compare backfilled v.s. expected returns for a bivariate $t$-distribution and obtain the results displayed in Figure 3, taken from Page3. Figure 3. Higher moments for backfilled simulated returns compared with known bivariate fat-tailed t-distribution. Source: Page. Results from Figure 3 leads to the following conclusions: Missing returns backfilled with the conditional sampling backfilling procedure converge to a (univariate) Gaussian distribution Missing returns backfilled with the residuals recycling backfilling procedure seem to have a sample kurtosis very close to the sample kurtosis of the theoretical bivariate $t$-distribution In other words, the residuals recycling backfilling procedure seems to reach its advertised goal, at least when applied to a known theoretical distribution. Empirical asset returns distribution More empirically, Page3 uses monthly returns on: U.S. stocks, represented by the Wilshire 5000 Total Market Index E.M. stocks, represented by the MSCI Emerging Markets Index (Total Return) in order to compare backfilled v.s. actual returns for E.M. stocks. In more details: Returns on U.S. stocks from January 1988 to May 2011 (long returns history) and returns on E.M. stocks from February 1998 to May 2011 (short returns history) are used to backfill returns on E.M. stocks from January 1988 to January 199810 This process is repeated 10000 times to obtain 10000 different backfilled paths for emerging-market stocks3. Moments are computed on each backfilled path, and the grand average of these moments is computed over all backfilled paths The moments of interest are the mean, the variance, the skewness and the kurtosis of backfilled returns. Moments are computed on E.M. stocks using actual returns data from January 1988 to January 199811 Using Portfolio Optimizer, this test can easily be reproduced12, c.f. the Jupyter notebook corresponding to this post, which gives for example the figures below: Backfilling procedure Mean Variance Skewness Kurtosis None (actual returns) 1.5% 0.0039 -0.25 3.83 Conditional sampling 2.2% 0.0036 -0.07 3.11 Recycled residuals 2.2% 0.0036 -0.20 3.24 These figures clearly show that the recycled residuals backfilling procedure generate asset returns that are closer, in terms of higher moments, to actual returns v.s. the conditional sampling backfilling procedure. From this perspective, and even though the mean of backfilled returns is quite far off the mean of actual returns, the recycled residuals backfilling procedure can definitely be considered to properly recover fat tails in the missing [returns] data3. Conclusion Page’s method provides a formal, plug-and-play solution3 to the problem of unequal return histories in portfolio analysis and optimization. While other methods certainly exist, like methods based on risk factors, these other methods usually tend to be more complex, so that Page’s method is a very good choice for anyone requiring a simple way to manage missing asset returns. For more quantitative methods with just the right level of complexity, feel free to connect with me on LinkedIn or to follow me on Twitter. – For instance, historical returns of Emerging Markets (E.M.) stocks are available from the late 1980s13 while historical returns of U.S. stocks are available from the late 1920s14 or even earlier. ↩ See Steven P. Peterson, John T. Grier, Covariance Misspecification in Asset Allocation, Financial Analysts Journal, Vol. 62, No. 4 (Jul. - Aug., 2006), pp. 76-85. ↩ See Sebastien Page (2013) How to Combine Long and Short Return Histories Efficiently, Financial Analysts Journal, 69:1, 45-52. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17 ↩18 ↩19 ↩20 Page3 note that in the context of his paper, asset returns series are not assumed to be multivariate Gaussian, so that the maximum likelihood procedure of Stambaugh actually becomes a quasi-maximum likelihood procedure. ↩ See Stambaugh, Robert F. 1997. Analyzing Investments Whose Histories Differ in Length. Journal of Financial Economics, vol. 45, no. 3 (September):285–331.. ↩ To be noted that contrary to $\mu_{b_t}$ , $\Sigma_b$ is time-independant. ↩ Asset returns have a tendency to follow a distribution closer and closer to a Gaussian distribution the more the time period over which they are computed increases; this empirical property is called aggregational Gaussianity, c.f. Cont15. ↩ See Robert B. Gramacy, Joo Hee Lee, Ricardo Silva, On estimating covariances between many assets with histories of highly variable length, arXiv. ↩ ↩2 See Jiang, Yindeng and Martin, R. Douglas, Turning Long and Short Return Histories into Equal Histories: A Better Way to Backfill Returns (August 31, 2016). ↩ To be noted that there is a typo in the heading of Table 3 in Page3, because known data is taken over January 1988 - January 1999, which is not 10 years but 20 years! ↩ Returns on E.M. stocks are indeed available from the full period January 1988 to May 2011. ↩ Results are not strictly identical to those of Page3, due to the random nature of the test; in addition, the skewness of actual E.M. returns is -0.26 in Page3 v.s. -0.25 here, probably due to some slight different in returns data. ↩ C.f. the MSCI website. ↩ C.f. the Kenneth French’s website. ↩ See R. Cont (2001) Empirical properties of asset returns: stylized facts and statistical issues, Quantitative Finance, 1:2, 223-236. ↩Simulation from a Multivariate Normal Distribution with Exact Sample Mean Vector and Sample Covariance Matrix2023-07-06T00:00:00-05:002023-07-06T00:00:00-05:00https://portfoliooptimizer.io/blog/simulation-from-a-multivariate-normal-distribution-with-exact-sample-mean-vector-and-sample-covariance-matrix<p>In the research report <em>Random rotations and multivariate normal simulation</em><sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup>, <a href="https://en.wikipedia.org/wiki/Robert_Wedderburn_(statistician)">Robert Wedderburn</a>
introduced an algorithm to simulate <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables">i.i.d. samples</a>
from a <a href="https://en.wikipedia.org/wiki/Multivariate_normal_distribution">multivariate normal (Gaussian) distribution</a> when the desired sample mean vector and sample covariance matrix are known
in advance<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote">2</a></sup>.</p>
<p>Wedderburn unfortunately never had the opportunity to publish his report<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote">3</a></sup> and his work was forgotten until Li<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote">4</a></sup> rediscovered it nearly 20 years later.</p>
<p>In this short blog post, I will first describe the standard algorithm used to simulate i.i.d. samples from a multivariate normal distribution and
I will then detail Wedderburn’s original algorithm as well as some of the modifications proposed by Li.</p>
<h2 id="mathematical-preliminaries">Mathematical preliminaries</h2>
<h3 id="affine-transformation-of-a-multivariate-normal-distribution">Affine transformation of a multivariate normal distribution</h3>
<p><a href="https://en.wikipedia.org/wiki/Multivariate_normal_distribution#Affine_transformation">A textbook result</a> related to the multivariate normal distribution is that any linear combination of
normally distributed random variables is also normally distributed.</p>
<p>More formally:</p>
<p><strong>Property 1</strong>: Let $X$ be a <em>n</em>-dimensional random variable following a multivariate normal distribution $\mathcal{N} \left( \mu, \Sigma \right)$ of mean vector $\mu \in \mathbb{R}^{n}$ and of covariance matrix $\Sigma \in \mathcal{M}(\mathbb{R}^{n \times n})$. Then, any affine transformation $Z = AX + b$ with $A \in \mathcal{M}(\mathbb{R}^{n \times m})$ and $b \in \mathbb{R}^{m}$, $m \ge 1$, follows a <em>m</em>-dimensional multivariate normal distribution $\mathcal{N} \left( A \mu + b, A \Sigma A {}^t \right)$.</p>
<h3 id="orthogonal-matrices">Orthogonal matrices</h3>
<p>An <em><a href="https://en.wikipedia.org/wiki/Orthogonal_matrix">orthogonal matrix</a></em> of order $n$ is a matrix $Q \in \mathcal{M}(\mathbb{R}^{n \times n})$ such that $Q {}^t Q = Q Q {}^t = \mathbb{I_n}$,
with $\mathbb{I_n}$ the identity matrix of order $n$.</p>
<p>By extension, a <em>rectangular orthogonal matrix</em> is a matrix $Q \in \mathcal{M}(\mathbb{R}^{m \times n}), m \geq n$ such that $Q {}^t Q = \mathbb{I_n}$.</p>
<h3 id="random-orthogonal-matrices">Random orthogonal matrices</h3>
<p>A <em>random orthogonal matrix</em> of order $n$ is a random matrix $Q \in \mathcal{M}(\mathbb{R}^{n \times n})$ distributed according to <a href="https://en.wikipedia.org/wiki/Haar_measure">the Haar measure</a>
over <a href="https://en.wikipedia.org/wiki/Orthogonal_group">the group of orthogonal matrices</a>, c.f. Anderson et al.<sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote">5</a></sup>.</p>
<p>By extension, a <em>random rectangular orthogonal matrix</em> is a matrix $Q \in \mathcal{M}(\mathbb{R}^{m \times n}) , m \geq n$, whose columns are, for example, the first $n$ columns of
a random orthogonal matrix of order $m$, c.f. Li<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote">4</a></sup>.</p>
<h3 id="helmert-orthogonal-matrices">Helmert orthogonal matrices</h3>
<p>An <em>Helmert matrix</em> of order $n$ is a square orthogonal matrix $H \in \mathcal{M}(\mathbb{R}^{n \times n})$ <em>having a prescribed first row and a triangle of zeroes above the diagonal</em><sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote">6</a></sup>.</p>
<p>For example, the matrix $H_n$ defined by</p>
\[H_n = \begin{pmatrix} \frac{1}{\sqrt n} &\frac{1}{\sqrt n} & \frac{1}{\sqrt n} & \dots & \frac{1}{\sqrt n} \\ \frac{1}{\sqrt 2} & -\frac{1}{\sqrt 2} & 0 & \dots & 0 \\ \frac{1}{\sqrt 6} & \frac{1}{\sqrt 6} & -\frac{2}{\sqrt 6} & \dots & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots\\ \frac{1}{\sqrt { n(n-1) }} & \frac{1}{\sqrt { n(n-1) }} & \frac{1}{\sqrt { n(n-1) }} &\dots & -\frac{n-1}{\sqrt { n(n-1) }} \end{pmatrix}\]
<p>is a Helmert matrix.</p>
<p>A <em>generalized Helmert matrix</em> of order $n$ is a square orthogonal matrix $G \in \mathcal{M}(\mathbb{R}^{n \times n})$ that can be <em>transformed by permutations of its rows and
columns and by transposition and by change of sign of rows, to a form of a [standard] Helmert matrix</em><sup id="fnref:5:1" role="doc-noteref"><a href="#fn:5" class="footnote">6</a></sup>.</p>
<p>For example, the matrix $G_n$ defined by</p>
\[G_n = \begin{pmatrix} \frac{1}{\sqrt n} &\frac{1}{\sqrt n} & \frac{1}{\sqrt n} & \dots & \frac{1}{\sqrt n} \\ -\frac{1}{\sqrt 2} & \frac{1}{\sqrt 2} & 0 & \dots & 0 \\ -\frac{1}{\sqrt 6} & -\frac{1}{\sqrt 6} & \frac{2}{\sqrt 6} & \dots & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots\\ -\frac{n-1}{\sqrt { n(n-1) }} & -\frac{1}{\sqrt { n(n-1) }} & -\frac{1}{\sqrt { n(n-1) }} &\dots & \frac{n-1}{\sqrt { n(n-1) }} \end{pmatrix}\]
<p>is a generalized Helmert matrix, obtained from the matrix $H_n$ by change of sign of rows $i=2..n$.</p>
<h2 id="simulation-from-a-multivariate-normal-distribution">Simulation from a multivariate normal distribution</h2>
<p>Let be:</p>
<ul>
<li>$n$ a number of random variables<sup id="fnref:14" role="doc-noteref"><a href="#fn:14" class="footnote">7</a></sup></li>
<li>$\mu \in \mathbb{R}^{n}$ a vector</li>
<li>$\Sigma \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ a <a href="https://en.wikipedia.org/wiki/Positive_semidefinite_matrix">positive semi-definite matrix</a></li>
</ul>
<p><a href="https://en.wikipedia.org/wiki/Multivariate_normal_distribution#Drawing_values_from_the_distribution">One of the most well known algorithm</a> to generate $m \geq 1$ i.i.d. samples $X_1, …, X_m$
from the $n$-dimensional multivariate normal distribution $\mathcal{N}(\mu, \Sigma)$ relies on the <a href="https://en.wikipedia.org/wiki/Cholesky_decomposition">Cholesky decomposition</a> of the covariance matrix $\Sigma$.</p>
<h3 id="algorithm">Algorithm</h3>
<p>In details, this algorithm is as follows:</p>
<ul>
<li>Compute the<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote">8</a></sup> Cholesky decomposition of $\Sigma$
<ul>
<li>This gives $\Sigma = L L {}^t$, with $L \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ a lower triangular matrix</li>
</ul>
</li>
<li>Generate $m$ i.i.d. samples $Z_1,…, Z_m$ from the standard $n$-dimensional multivariate normal distribution $\mathcal{N}(0, \mathbb{I_n})$
<ul>
<li>This is done by generating $m \times n$ i.i.d. samples $z_{11}, z_{21}, …, z_{n1}, z_{1m}, z_{2m}, …, z_{nm}$ from the standard univariate normal distribution $\mathcal{N}(0, 1)$ and re-organizing these samples in $m$ vectors of $n$ variables $Z_1 = \left( z_{11}, z_{21}, …, z_{n1} \right) {} ^t, …, Z_m = \left( z_{1m}, z_{2m}, …, z_{nm} \right) {} ^t$</li>
</ul>
</li>
<li>Transform the samples $Z_1,…, Z_m$ into the samples $X_1,…,X_m$ using the affine transformation $X_i = L Z_i + \mu, i = 1..m$
<ul>
<li>From <strong>Property 1</strong>, $X_1,…,X_m$ are then $m$ i.i.d. samples from the multivariate normal distribution $\mathcal{N}(\mu, \Sigma)$</li>
</ul>
</li>
</ul>
<h3 id="theoretical-moments-vs-sample-moments">Theoretical moments v.s. sample moments</h3>
<p>When the previous algorithm is used to generate $m$ i.i.d. samples $X_1, …, X_m$ from the $n$-dimensional multivariate normal distribution $\mathcal{N}(\mu, \Sigma)$, the sample mean vector</p>
\[\bar{X} = \frac{1}{m} \sum_{i = 1}^m X_i\]
<p>and the (unbiased) sample covariance matrix</p>
\[Cov(X) = \frac{1}{m-1} \sum_{i = 1}^m \left(X_i - \bar{X} \right) \left(X_i - \bar{X} \right) {}^t\]
<p>will be different from their theoretical counterparts, as illustrated in Figure 1 with $\mu = \left( 0, 0 \right){}^t$, $\Sigma = \begin{bmatrix} 3 & 1 \newline 1 & 2 \end{bmatrix}$ and $m = 250$.</p>
<figure>
<a href="/assets/images/blog/wedderburn-multivariate-normal-distribution-simulation.png"><img src="/assets/images/blog/wedderburn-multivariate-normal-distribution-simulation.png" alt="Simulation from a multivariate normal distribution, first two sample moments v.s. first two theoretical moments." /></a>
<figcaption>Figure 1. Simulation from a multivariate normal distribution, first two sample moments v.s. first two theoretical moments.</figcaption>
</figure>
<p>While convergence of the first two sample moments toward the first two theoretical moments is guaranteed when $m \to +\infty$, their mismatch for finite $m$ is usually<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote">9</a></sup> an issue
in practical applications.</p>
<p>Indeed, a large number of samples is then usually required in order to reach a reasonable level of accuracy for whatever statistical estimator is being computed,
and generating such a large number of samples is costly in computation time.</p>
<h2 id="simulation-from-a-multivariate-normal-distribution-with-exact-sample-mean-vector-and-sample-covariance-matrix">Simulation from a multivariate normal distribution with exact sample mean vector and sample covariance matrix</h2>
<p>Let be:</p>
<ul>
<li>$n$ a number of random variables<sup id="fnref:14:1" role="doc-noteref"><a href="#fn:14" class="footnote">7</a></sup></li>
<li>$\bar{\mu} \in \mathbb{R}^{n}$ a vector</li>
<li>$\bar{\Sigma} \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ a <a href="https://en.wikipedia.org/wiki/Definite_matrix">positive definite matrix</a></li>
</ul>
<p>Wedderburn’s algorithm<sup id="fnref:4:1" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup> is a conditional Monte Carlo algorithm <em>to generate multivariate normal samples conditional on a given mean and dispersion matrix</em><sup id="fnref:4:2" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup>.</p>
<p>In other words, given a desired sample mean vector $\bar{\mu}$ and a desired (unbiased) sample covariance matrix $\bar{\Sigma}$, Wedderburn’s algorithm allows to generate
$m \geq n + 1$ i.i.d. samples $X_1, …, X_m$ from a $n$-dimensional multivariate normal distribution satisfying the two relationships</p>
\[\bar{X} = \frac{1}{m} \sum_{i = 1}^m X_i = \bar{\mu}\]
<p>and</p>
\[Cov(X) = \frac{1}{m-1} \sum_{i = 1}^m \left(X_i - \bar{X} \right) \left(X_i - \bar{X} \right) {}^t = \bar{\Sigma}\]
<p>By enforcing an exact match for finite $m$ between the first two sample moments and the first two theoretical moments of a multivariate normal distribution, Wedderburn’s algorithm
allows to reduce the number of samples required in order to reach a reasonable level of accuracy for whatever statistical estimator is being computed, hence the total computation time<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote">10</a></sup>.</p>
<p>From this perspective, Wedderburn’s algorithm can be considered as a <a href="https://en.wikipedia.org/wiki/Variance_reduction">Monte Carlo variance reduction technique</a>.</p>
<h3 id="wedderburns-algorithm">Wedderburn’s algorithm</h3>
<p>In details, Wedderburn’s algorithm is as follows<sup id="fnref:4:3" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup><sup id="fnref:1:2" role="doc-noteref"><a href="#fn:1" class="footnote">4</a></sup>:</p>
<ul>
<li>Generate a random rectangular orthonormal matrix $P \in \mathcal{M} \left( \mathbb{R}^{(m-1) \times n} \right)$, with $m \geq n + 1$</li>
<li>Compute the<sup id="fnref:9:1" role="doc-noteref"><a href="#fn:9" class="footnote">8</a></sup> Cholesky decomposition of the matrix $\bar{\Sigma}$
<ul>
<li>This gives $\bar{\Sigma} = L L {}^t$, with $L \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ a lower triangular matrix</li>
</ul>
</li>
<li>Define $X = \sqrt{m-1} T {}^t P L {}^t + \mathbb{1}_{m} \bar{\mu} {}^t$, with $T \in \mathcal{M} \left( \mathbb{R}^{(m-1) \times m} \right)$ made of the last $m-1$ rows of the $m \times m$ generalized Helmert matrix $G_n$ and $\mathbb{1}_{m} \in \mathbb{R}^{m}$ a vector made of ones
<ul>
<li>The rows $X_1, …, X_m$ of $X \in \mathcal{M} \left( \mathbb{R}^{m \times n} \right)$ are then $m$ i.i.d. samples from a multivariate normal distribution whose sample mean vector is equal to $\bar{\mu}$ and whose (unbiased) sample covariance matrix is equal to $\bar{\Sigma}$</li>
</ul>
</li>
</ul>
<h3 id="lis-modifications-of-wedderburns-algorithm">Li’s modifications of Wedderburn’s algorithm</h3>
<p>Li<sup id="fnref:1:3" role="doc-noteref"><a href="#fn:1" class="footnote">4</a></sup> proposes several modifications to the original Wedderburn’s algorithm and show in particular how to manage a positive semi-definite covariance matrix $\bar{\Sigma}$.</p>
<p>In details, Wedderburn-Li’s algorithm is as follows<sup id="fnref:1:4" role="doc-noteref"><a href="#fn:1" class="footnote">4</a></sup>:</p>
<ul>
<li>Let $1 \leq r \leq n$ be the rank of the desired (unbiased) sample covariance matrix $\bar{\Sigma}$</li>
<li>Generate a random rectangular orthonormal matrix $P \in \mathcal{M} \left( \mathbb{R}^{(m-1) \times r} \right)$, with $m \geq r + 1$</li>
<li>Compute the<sup id="fnref:9:2" role="doc-noteref"><a href="#fn:9" class="footnote">8</a></sup> reduced Cholesky decomposition of the matrix matrix $\bar{\Sigma}$
<ul>
<li>This gives $\bar{\Sigma} = L L {}^t$, with $L \in \mathcal{M} \left( \mathbb{R}^{n \times r} \right)$ a “lower triangular” matrix</li>
</ul>
</li>
<li>Define $X = \sqrt{m-1} T {}^t P L {}^t + \mathbb{1}_{m} \bar{\mu}$, with $T \in \mathcal{M} \left( \mathbb{R}^{(m-1) \times m} \right)$ made of the last $m-1$ rows of the $m \times m$ generalized Helmert matrix $G_n$ and $\mathbb{1}_{m} \in \mathbb{R}^{m}$ a vector made of ones
<ul>
<li>The rows $X_1, …, X_m$ of $X \in \mathcal{M} \left( \mathbb{R}^{m \times n} \right)$ are then $m$ i.i.d. samples from a multivariate normal distribution whose sample mean vector is equal to $\bar{\mu}$ and whose (unbiased) sample covariance matrix is equal to $\bar{\Sigma}$</li>
</ul>
</li>
</ul>
<h3 id="misc-remarks">Misc. remarks</h3>
<p>A couple of remarks:</p>
<ul>
<li>
<p>Contrary to the algorithm described in the previous section, it appears at first sight that no sample from the univariate standard normal distribution $\mathcal{N}(0, 1)$ needs to be generated when using Wedderburn’s algorithm.</p>
<p>This is actually not the case, because generating a random orthogonal matrix implicitely relies on the generation of such samples!</p>
</li>
<li>
<p>Wedderburn<sup id="fnref:4:4" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup> uses <a href="https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix">the eigenvalue decomposition</a> of the covariance matrix $\bar{\Sigma}$ instead of its Cholesky decomposition, but Li<sup id="fnref:1:5" role="doc-noteref"><a href="#fn:1" class="footnote">4</a></sup> demonstrates that it is actually possible to use any decomposition of $\bar{\Sigma}$ such that $\bar{\Sigma} = A A {}^t$ with $A \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ and advocates for the usage of the Cholesky decomposition.</p>
</li>
<li>
<p>As highlighted in Wedderburn<sup id="fnref:4:5" role="doc-noteref"><a href="#fn:4" class="footnote">1</a></sup>, the theoretical mean vector and covariance matrix of the multivariate normal distribution are irrelevant.</p>
</li>
</ul>
<h2 id="implementation-in-portfolio-optimizer">Implementation in Portfolio Optimizer</h2>
<p><strong>Portfolio Optimizer</strong> implements both the standard algorithm and Wedderburn-Li’s algorithm to simulate from a multivariate normal distribution through the endpoint
<a href="https://docs.portfoliooptimizer.io/"><code class="language-plaintext highlighter-rouge">/assets/returns/simulation/monte-carlo/gaussian/multivariate</code></a>.</p>
<p>To be noted, though, that for internal consistency reasons, the input covariance matrix when using Wedderburn-Li’s algorithm is assumed to be the desired biaised sample covariance matrix
and not the desired unbiased sample covariance matrix.</p>
<h2 id="conclusion">Conclusion</h2>
<p>To conclude this post, a word about applications of Wedderburn’s algorithm.</p>
<p>These are of course numerous in finance, c.f. for example applications of similar Monte Carlo variance reduction techniques in asset pricing in Wang<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote">11</a></sup> or in risk management in Meucci<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote">12</a></sup>.</p>
<p>But Wedderburn’s algorithm is more applicable in any context requiring the simulation from a multivariate normal distribution, which makes it a very interesting generic algorithm to
have in one’s toolbox!</p>
<p>For more analysis of forgotten research reports and algorithms, feel free to <a href="https://www.linkedin.com/in/roman-rubsamen/">connect with me on LinkedIn</a> or to <a href="https://twitter.com/portfoliooptim">follow me on Twitter</a>.</p>
<p>–</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:4" role="doc-endnote">
<p>See <a href="https://www.langsrud.com/stat/wedderburn.doc">Wedderburn, R.W.M. (1975), Random rotations and multivariate normal simulation. Research Report, Rothamsted Experimental Station</a>. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:4:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:4:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:4:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:4:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a> <a href="#fnref:4:5" class="reversefootnote" role="doc-backlink">↩<sup>6</sup></a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>That is, the samples are simulated so that they have the desired sample mean and sample covariance matrix. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>Because he died suddenly in 1975… <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:1" role="doc-endnote">
<p>See <a href="https://www.tandfonline.com/doi/abs/10.1080/00949659208811424">K.-H. Li, Generation of random matrices with orthonormal columns and multivariate normal variates with given sample mean and covariance, J. Statist. Comput. Simulation 43 (1992) 11–18</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:1:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a> <a href="#fnref:1:3" class="reversefootnote" role="doc-backlink">↩<sup>4</sup></a> <a href="#fnref:1:4" class="reversefootnote" role="doc-backlink">↩<sup>5</sup></a> <a href="#fnref:1:5" class="reversefootnote" role="doc-backlink">↩<sup>6</sup></a></p>
</li>
<li id="fn:13" role="doc-endnote">
<p>See <a href="https://doi.org/10.1137/0908055">T. W. Anderson, I. Olkin, L. G. Underhill, Generation of Random Orthogonal Matrices, SIAM Journal on Scientific and Statistical ComputingVol. 8, Iss. 4 (1987)</a>. <a href="#fnref:13" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>See <a href="https://www.tandfonline.com/doi/abs/10.1080/00029890.1965.12345681">H. O. Lancaster, The Helmert Matrices, The American Mathematical Monthly, 72(1965), no. 1, 4-12</a>. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:5:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:14" role="doc-endnote">
<p>On this blog such variables are typically assets, but they can also be genes or species in a biology context, etc. <a href="#fnref:14" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:14:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>In case a matrix is positive definite, its Cholesky decomposition exists and is unique; in case a matrix is only positive semi-definite, its Cholesky decomposition exists but is not unique in general. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:9:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:9:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>Not always, as variability in the sample mean and in the sample covariance matrix might be desired. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:11" role="doc-endnote">
<p>Under the assumption that the total time taken to generate this reduced number of samples + to compute the associated estimator is (much) lower than the total time taken to generate the initial larger number of samples + to compute the associated estimator. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>See <a href="https://www.pm-research.com/content/iijderiv/16/1/7">Jr-Yan Wang, Variance Reduction for Multivariate Monte Carlo Simulation, The Journal of Derivatives Fall 2008, 16 (1) 7-28</a>. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>See <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1415699">Meucci, Attilio, Simulations with Exact Means and Covariances (June 7, 2009)</a>. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Roman R.In the research report Random rotations and multivariate normal simulation1, Robert Wedderburn introduced an algorithm to simulate i.i.d. samples from a multivariate normal (Gaussian) distribution when the desired sample mean vector and sample covariance matrix are known in advance2. Wedderburn unfortunately never had the opportunity to publish his report3 and his work was forgotten until Li4 rediscovered it nearly 20 years later. In this short blog post, I will first describe the standard algorithm used to simulate i.i.d. samples from a multivariate normal distribution and I will then detail Wedderburn’s original algorithm as well as some of the modifications proposed by Li. Mathematical preliminaries Affine transformation of a multivariate normal distribution A textbook result related to the multivariate normal distribution is that any linear combination of normally distributed random variables is also normally distributed. More formally: Property 1: Let $X$ be a n-dimensional random variable following a multivariate normal distribution $\mathcal{N} \left( \mu, \Sigma \right)$ of mean vector $\mu \in \mathbb{R}^{n}$ and of covariance matrix $\Sigma \in \mathcal{M}(\mathbb{R}^{n \times n})$. Then, any affine transformation $Z = AX + b$ with $A \in \mathcal{M}(\mathbb{R}^{n \times m})$ and $b \in \mathbb{R}^{m}$, $m \ge 1$, follows a m-dimensional multivariate normal distribution $\mathcal{N} \left( A \mu + b, A \Sigma A {}^t \right)$. Orthogonal matrices An orthogonal matrix of order $n$ is a matrix $Q \in \mathcal{M}(\mathbb{R}^{n \times n})$ such that $Q {}^t Q = Q Q {}^t = \mathbb{I_n}$, with $\mathbb{I_n}$ the identity matrix of order $n$. By extension, a rectangular orthogonal matrix is a matrix $Q \in \mathcal{M}(\mathbb{R}^{m \times n}), m \geq n$ such that $Q {}^t Q = \mathbb{I_n}$. Random orthogonal matrices A random orthogonal matrix of order $n$ is a random matrix $Q \in \mathcal{M}(\mathbb{R}^{n \times n})$ distributed according to the Haar measure over the group of orthogonal matrices, c.f. Anderson et al.5. By extension, a random rectangular orthogonal matrix is a matrix $Q \in \mathcal{M}(\mathbb{R}^{m \times n}) , m \geq n$, whose columns are, for example, the first $n$ columns of a random orthogonal matrix of order $m$, c.f. Li4. Helmert orthogonal matrices An Helmert matrix of order $n$ is a square orthogonal matrix $H \in \mathcal{M}(\mathbb{R}^{n \times n})$ having a prescribed first row and a triangle of zeroes above the diagonal6. For example, the matrix $H_n$ defined by \[H_n = \begin{pmatrix} \frac{1}{\sqrt n} &\frac{1}{\sqrt n} & \frac{1}{\sqrt n} & \dots & \frac{1}{\sqrt n} \\ \frac{1}{\sqrt 2} & -\frac{1}{\sqrt 2} & 0 & \dots & 0 \\ \frac{1}{\sqrt 6} & \frac{1}{\sqrt 6} & -\frac{2}{\sqrt 6} & \dots & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots\\ \frac{1}{\sqrt { n(n-1) }} & \frac{1}{\sqrt { n(n-1) }} & \frac{1}{\sqrt { n(n-1) }} &\dots & -\frac{n-1}{\sqrt { n(n-1) }} \end{pmatrix}\] is a Helmert matrix. A generalized Helmert matrix of order $n$ is a square orthogonal matrix $G \in \mathcal{M}(\mathbb{R}^{n \times n})$ that can be transformed by permutations of its rows and columns and by transposition and by change of sign of rows, to a form of a [standard] Helmert matrix6. For example, the matrix $G_n$ defined by \[G_n = \begin{pmatrix} \frac{1}{\sqrt n} &\frac{1}{\sqrt n} & \frac{1}{\sqrt n} & \dots & \frac{1}{\sqrt n} \\ -\frac{1}{\sqrt 2} & \frac{1}{\sqrt 2} & 0 & \dots & 0 \\ -\frac{1}{\sqrt 6} & -\frac{1}{\sqrt 6} & \frac{2}{\sqrt 6} & \dots & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots\\ -\frac{n-1}{\sqrt { n(n-1) }} & -\frac{1}{\sqrt { n(n-1) }} & -\frac{1}{\sqrt { n(n-1) }} &\dots & \frac{n-1}{\sqrt { n(n-1) }} \end{pmatrix}\] is a generalized Helmert matrix, obtained from the matrix $H_n$ by change of sign of rows $i=2..n$. Simulation from a multivariate normal distribution Let be: $n$ a number of random variables7 $\mu \in \mathbb{R}^{n}$ a vector $\Sigma \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ a positive semi-definite matrix One of the most well known algorithm to generate $m \geq 1$ i.i.d. samples $X_1, …, X_m$ from the $n$-dimensional multivariate normal distribution $\mathcal{N}(\mu, \Sigma)$ relies on the Cholesky decomposition of the covariance matrix $\Sigma$. Algorithm In details, this algorithm is as follows: Compute the8 Cholesky decomposition of $\Sigma$ This gives $\Sigma = L L {}^t$, with $L \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ a lower triangular matrix Generate $m$ i.i.d. samples $Z_1,…, Z_m$ from the standard $n$-dimensional multivariate normal distribution $\mathcal{N}(0, \mathbb{I_n})$ This is done by generating $m \times n$ i.i.d. samples $z_{11}, z_{21}, …, z_{n1}, z_{1m}, z_{2m}, …, z_{nm}$ from the standard univariate normal distribution $\mathcal{N}(0, 1)$ and re-organizing these samples in $m$ vectors of $n$ variables $Z_1 = \left( z_{11}, z_{21}, …, z_{n1} \right) {} ^t, …, Z_m = \left( z_{1m}, z_{2m}, …, z_{nm} \right) {} ^t$ Transform the samples $Z_1,…, Z_m$ into the samples $X_1,…,X_m$ using the affine transformation $X_i = L Z_i + \mu, i = 1..m$ From Property 1, $X_1,…,X_m$ are then $m$ i.i.d. samples from the multivariate normal distribution $\mathcal{N}(\mu, \Sigma)$ Theoretical moments v.s. sample moments When the previous algorithm is used to generate $m$ i.i.d. samples $X_1, …, X_m$ from the $n$-dimensional multivariate normal distribution $\mathcal{N}(\mu, \Sigma)$, the sample mean vector \[\bar{X} = \frac{1}{m} \sum_{i = 1}^m X_i\] and the (unbiased) sample covariance matrix \[Cov(X) = \frac{1}{m-1} \sum_{i = 1}^m \left(X_i - \bar{X} \right) \left(X_i - \bar{X} \right) {}^t\] will be different from their theoretical counterparts, as illustrated in Figure 1 with $\mu = \left( 0, 0 \right){}^t$, $\Sigma = \begin{bmatrix} 3 & 1 \newline 1 & 2 \end{bmatrix}$ and $m = 250$. Figure 1. Simulation from a multivariate normal distribution, first two sample moments v.s. first two theoretical moments. While convergence of the first two sample moments toward the first two theoretical moments is guaranteed when $m \to +\infty$, their mismatch for finite $m$ is usually9 an issue in practical applications. Indeed, a large number of samples is then usually required in order to reach a reasonable level of accuracy for whatever statistical estimator is being computed, and generating such a large number of samples is costly in computation time. Simulation from a multivariate normal distribution with exact sample mean vector and sample covariance matrix Let be: $n$ a number of random variables7 $\bar{\mu} \in \mathbb{R}^{n}$ a vector $\bar{\Sigma} \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ a positive definite matrix Wedderburn’s algorithm1 is a conditional Monte Carlo algorithm to generate multivariate normal samples conditional on a given mean and dispersion matrix1. In other words, given a desired sample mean vector $\bar{\mu}$ and a desired (unbiased) sample covariance matrix $\bar{\Sigma}$, Wedderburn’s algorithm allows to generate $m \geq n + 1$ i.i.d. samples $X_1, …, X_m$ from a $n$-dimensional multivariate normal distribution satisfying the two relationships \[\bar{X} = \frac{1}{m} \sum_{i = 1}^m X_i = \bar{\mu}\] and \[Cov(X) = \frac{1}{m-1} \sum_{i = 1}^m \left(X_i - \bar{X} \right) \left(X_i - \bar{X} \right) {}^t = \bar{\Sigma}\] By enforcing an exact match for finite $m$ between the first two sample moments and the first two theoretical moments of a multivariate normal distribution, Wedderburn’s algorithm allows to reduce the number of samples required in order to reach a reasonable level of accuracy for whatever statistical estimator is being computed, hence the total computation time10. From this perspective, Wedderburn’s algorithm can be considered as a Monte Carlo variance reduction technique. Wedderburn’s algorithm In details, Wedderburn’s algorithm is as follows14: Generate a random rectangular orthonormal matrix $P \in \mathcal{M} \left( \mathbb{R}^{(m-1) \times n} \right)$, with $m \geq n + 1$ Compute the8 Cholesky decomposition of the matrix $\bar{\Sigma}$ This gives $\bar{\Sigma} = L L {}^t$, with $L \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ a lower triangular matrix Define $X = \sqrt{m-1} T {}^t P L {}^t + \mathbb{1}_{m} \bar{\mu} {}^t$, with $T \in \mathcal{M} \left( \mathbb{R}^{(m-1) \times m} \right)$ made of the last $m-1$ rows of the $m \times m$ generalized Helmert matrix $G_n$ and $\mathbb{1}_{m} \in \mathbb{R}^{m}$ a vector made of ones The rows $X_1, …, X_m$ of $X \in \mathcal{M} \left( \mathbb{R}^{m \times n} \right)$ are then $m$ i.i.d. samples from a multivariate normal distribution whose sample mean vector is equal to $\bar{\mu}$ and whose (unbiased) sample covariance matrix is equal to $\bar{\Sigma}$ Li’s modifications of Wedderburn’s algorithm Li4 proposes several modifications to the original Wedderburn’s algorithm and show in particular how to manage a positive semi-definite covariance matrix $\bar{\Sigma}$. In details, Wedderburn-Li’s algorithm is as follows4: Let $1 \leq r \leq n$ be the rank of the desired (unbiased) sample covariance matrix $\bar{\Sigma}$ Generate a random rectangular orthonormal matrix $P \in \mathcal{M} \left( \mathbb{R}^{(m-1) \times r} \right)$, with $m \geq r + 1$ Compute the8 reduced Cholesky decomposition of the matrix matrix $\bar{\Sigma}$ This gives $\bar{\Sigma} = L L {}^t$, with $L \in \mathcal{M} \left( \mathbb{R}^{n \times r} \right)$ a “lower triangular” matrix Define $X = \sqrt{m-1} T {}^t P L {}^t + \mathbb{1}_{m} \bar{\mu}$, with $T \in \mathcal{M} \left( \mathbb{R}^{(m-1) \times m} \right)$ made of the last $m-1$ rows of the $m \times m$ generalized Helmert matrix $G_n$ and $\mathbb{1}_{m} \in \mathbb{R}^{m}$ a vector made of ones The rows $X_1, …, X_m$ of $X \in \mathcal{M} \left( \mathbb{R}^{m \times n} \right)$ are then $m$ i.i.d. samples from a multivariate normal distribution whose sample mean vector is equal to $\bar{\mu}$ and whose (unbiased) sample covariance matrix is equal to $\bar{\Sigma}$ Misc. remarks A couple of remarks: Contrary to the algorithm described in the previous section, it appears at first sight that no sample from the univariate standard normal distribution $\mathcal{N}(0, 1)$ needs to be generated when using Wedderburn’s algorithm. This is actually not the case, because generating a random orthogonal matrix implicitely relies on the generation of such samples! Wedderburn1 uses the eigenvalue decomposition of the covariance matrix $\bar{\Sigma}$ instead of its Cholesky decomposition, but Li4 demonstrates that it is actually possible to use any decomposition of $\bar{\Sigma}$ such that $\bar{\Sigma} = A A {}^t$ with $A \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ and advocates for the usage of the Cholesky decomposition. As highlighted in Wedderburn1, the theoretical mean vector and covariance matrix of the multivariate normal distribution are irrelevant. Implementation in Portfolio Optimizer Portfolio Optimizer implements both the standard algorithm and Wedderburn-Li’s algorithm to simulate from a multivariate normal distribution through the endpoint /assets/returns/simulation/monte-carlo/gaussian/multivariate. To be noted, though, that for internal consistency reasons, the input covariance matrix when using Wedderburn-Li’s algorithm is assumed to be the desired biaised sample covariance matrix and not the desired unbiased sample covariance matrix. Conclusion To conclude this post, a word about applications of Wedderburn’s algorithm. These are of course numerous in finance, c.f. for example applications of similar Monte Carlo variance reduction techniques in asset pricing in Wang11 or in risk management in Meucci12. But Wedderburn’s algorithm is more applicable in any context requiring the simulation from a multivariate normal distribution, which makes it a very interesting generic algorithm to have in one’s toolbox! For more analysis of forgotten research reports and algorithms, feel free to connect with me on LinkedIn or to follow me on Twitter. – See Wedderburn, R.W.M. (1975), Random rotations and multivariate normal simulation. Research Report, Rothamsted Experimental Station. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 That is, the samples are simulated so that they have the desired sample mean and sample covariance matrix. ↩ Because he died suddenly in 1975… ↩ See K.-H. Li, Generation of random matrices with orthonormal columns and multivariate normal variates with given sample mean and covariance, J. Statist. Comput. Simulation 43 (1992) 11–18. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 See T. W. Anderson, I. Olkin, L. G. Underhill, Generation of Random Orthogonal Matrices, SIAM Journal on Scientific and Statistical ComputingVol. 8, Iss. 4 (1987). ↩ See H. O. Lancaster, The Helmert Matrices, The American Mathematical Monthly, 72(1965), no. 1, 4-12. ↩ ↩2 On this blog such variables are typically assets, but they can also be genes or species in a biology context, etc. ↩ ↩2 In case a matrix is positive definite, its Cholesky decomposition exists and is unique; in case a matrix is only positive semi-definite, its Cholesky decomposition exists but is not unique in general. ↩ ↩2 ↩3 Not always, as variability in the sample mean and in the sample covariance matrix might be desired. ↩ Under the assumption that the total time taken to generate this reduced number of samples + to compute the associated estimator is (much) lower than the total time taken to generate the initial larger number of samples + to compute the associated estimator. ↩ See Jr-Yan Wang, Variance Reduction for Multivariate Monte Carlo Simulation, The Journal of Derivatives Fall 2008, 16 (1) 7-28. ↩ See Meucci, Attilio, Simulations with Exact Means and Covariances (June 7, 2009). ↩