<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.1.1">Jekyll</generator><link href="https://portfoliooptimizer.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://portfoliooptimizer.io/" rel="alternate" type="text/html" /><updated>2026-03-04T06:09:30-06:00</updated><id>https://portfoliooptimizer.io/feed.xml</id><title type="html">Portfolio Optimizer</title><subtitle>Portfolio Optimizer is a Web API democratizing the access to the Nobel Prize-winning science of portfolio optimization.</subtitle><author><name>Roman R.</name></author><entry><title type="html">The Market Rank Indicator: Measuring Financial Risk, Part 3</title><link href="https://portfoliooptimizer.io/blog/the-market-rank-indicator-measuring-financial-risk-part-3/" rel="alternate" type="text/html" title="The Market Rank Indicator: Measuring Financial Risk, Part 3" /><published>2026-03-04T00:00:00-06:00</published><updated>2026-03-04T00:00:00-06:00</updated><id>https://portfoliooptimizer.io/blog/the-market-rank-indicator-measuring-financial-risk-part-3</id><content type="html" xml:base="https://portfoliooptimizer.io/blog/the-market-rank-indicator-measuring-financial-risk-part-3/">&lt;p&gt;In the &lt;a href=&quot;/blog/the-absorption-ratio-measuring-financial-risk/&quot;&gt;previous post&lt;/a&gt; of this series on measuring financial risk, I described the &lt;em&gt;absorption ratio&lt;/em&gt;, a measure of financial market fragility based on 
&lt;a href=&quot;https://en.wikipedia.org/wiki/Principal_component_analysis&quot;&gt;principal components analysis&lt;/a&gt;, introduced in Kritzman et al.&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In this new blog post, I will describe another measure of financial distress called the &lt;em&gt;market rank indicator (MRI)&lt;/em&gt;, this time &lt;em&gt;related to the notion of &lt;a href=&quot;https://en.wikipedia.org/wiki/Condition_number&quot;&gt;condition number&lt;/a&gt;&lt;/em&gt;&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; of a matrix, 
introduced in Figini et al.&lt;sup id=&quot;fnref:1:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;As an example of usage, I will show how to use the market rank indicator to dynamically scale the market exposure of a portfolio of U.S. equities.&lt;/p&gt;

&lt;h2 id=&quot;the-market-rank-indicator&quot;&gt;The market rank indicator&lt;/h2&gt;

&lt;h3 id=&quot;definition&quot;&gt;Definition&lt;/h3&gt;

&lt;p&gt;Let be:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$n$, the number of assets&lt;/li&gt;
  &lt;li&gt;$T$, the number of time periods, with $T &amp;gt; n$&lt;/li&gt;
  &lt;li&gt;$X \in \mathbb{R}^{T \times n}$, the matrix of the asset arithmetic or logarithmic returns for each of the $T$ time periods&lt;/li&gt;
  &lt;li&gt;$\sigma_1,…,\sigma_n$, the singular values of $X$ ordered such that $0 &amp;lt; \sigma_1 \leq … \leq \sigma_n$&lt;/li&gt;
  &lt;li&gt;$1\leq k \leq n$, the number of singular values $\lambda_1,…,\lambda_k$ to retain in the computation of the market rank indicator&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The market rank indicator $\text{MRI}$ of the assets is defined&lt;sup id=&quot;fnref:1:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; as &lt;em&gt;the ratio between the largest singular value and the geometric mean of the $k$ smallest singular values of the matrix $X$&lt;/em&gt;&lt;sup id=&quot;fnref:1:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, that is&lt;/p&gt;

\[\text{MRI}_k = \frac{ \sigma_n }{ \left( \prod_{i=1}^k \sigma_i \right)^{1/k} }\]

&lt;h3 id=&quot;alternative-definition&quot;&gt;Alternative definition&lt;/h3&gt;

&lt;p&gt;Let be:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$n$, the number of assets&lt;/li&gt;
  &lt;li&gt;$T$, the number of time periods, with $T &amp;gt; n$&lt;/li&gt;
  &lt;li&gt;$X \in \mathbb{R}^{T \times n}$, the matrix of the asset arithmetic or logarithmic returns for each of the $T$ time periods&lt;/li&gt;
  &lt;li&gt;$\Sigma = \frac{1}{T} X {}^t X \in \mathcal{M}(\mathbb{R}^{n \times n})$, the asset returns covariance matrix&lt;/li&gt;
  &lt;li&gt;$\lambda_1,…,\lambda_n$, the eigenvalues of $\Sigma$ ordered such that $0 &amp;lt; \lambda_1 \leq … \leq \lambda_n$&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks to the relationship $\lambda_i = \frac{\sigma_i^2}{T}$, $i=1..n$, between the eigenvalues of $\Sigma$ and the singular values of $X$, the market rank indicator can alternatively be defined as the square root of 
the ratio between the largest eigenvalue and the geometric mean of the $k$ smallest eigenvalues of the matrix $\Sigma$, that is&lt;/p&gt;

\[\text{MRI}_k = \sqrt{ \frac{ \lambda_n  }{ \left( \prod_{i=1}^k \lambda_i \right)^{1/k} } }\]

&lt;h3 id=&quot;generalized-definition&quot;&gt;Generalized definition&lt;/h3&gt;

&lt;p&gt;In the two previous sub-sections, it is assumed that $T &amp;gt; n$.&lt;/p&gt;

&lt;p&gt;This is to guarantee that the singular values of the matrix $X$ or that the eigenvalues of the matrix $\Sigma$ are not null.&lt;/p&gt;

&lt;p&gt;In case $T \leq n$, the definition of the market rank indicator needs to be adapted, which is simply done by &lt;em&gt;redefining [it] on the first $T$ [non zero] singular values&lt;/em&gt;&lt;sup id=&quot;fnref:1:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h3 id=&quot;rationale&quot;&gt;Rationale&lt;/h3&gt;

&lt;p&gt;Figini et al.&lt;sup id=&quot;fnref:1:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; highlights that &lt;em&gt;what contains valuable information on market synchronization in a principal components analysis on asset returns is not only the strength of the 
first principal component, but also the weakness of the last components&lt;/em&gt;&lt;sup id=&quot;fnref:1:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;This observation leads to the proposal of the market rank indicator as &lt;em&gt;a generalization of the condition number $\kappa(X)$ of the matrix $X$&lt;/em&gt;&lt;sup id=&quot;fnref:1:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, with $\text{MRI}_1 = \kappa(X)$ as a special case.&lt;/p&gt;

&lt;h3 id=&quot;interpretation&quot;&gt;Interpretation&lt;/h3&gt;

&lt;p&gt;Figini et al.&lt;sup id=&quot;fnref:1:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; proposes both a mathematical and a financial interpretation of the market rank indicator:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Mathematically, the market rank indicator &lt;em&gt;measures the difficulty to span $\mathbb{R}^n$ with the $n$ columns of $X$&lt;/em&gt;&lt;sup id=&quot;fnref:1:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;p&gt;Indeed, for a given $1\leq k \leq n$, a large value of $\text{MRI}_k$ &lt;em&gt;indicates that […] the columns of $X$ are a basis for an $(n − k)$-dimensional space&lt;/em&gt;&lt;sup id=&quot;fnref:1:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Financially, the market rank indicator &lt;em&gt;may be interpreted as a measure of distance of the asset return series from a market where the number of independent assets is reduced&lt;/em&gt;&lt;sup id=&quot;fnref:1:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;p&gt;Indeed, again for a given $1\leq k \leq n$, a large value of $\text{MRI}_k$ &lt;em&gt;indicates that $k$ dimensions/assets out of $n$ are not very useful in diversifying a portfolio&lt;/em&gt;&lt;sup id=&quot;fnref:1:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;theoretical-and-empirical-properties&quot;&gt;Theoretical and empirical properties&lt;/h3&gt;

&lt;p&gt;Figini et al.&lt;sup id=&quot;fnref:1:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; establishes several theoretical properties of the market rank indicator, among which:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Property 1&lt;/strong&gt;: For any $1 \leq k \leq n$, $ 1 \leq \text{MRI}_k \leq \text{MRI}_1 $.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Property 2&lt;/strong&gt;: For a given matrix $X$, $\text{MRI}_k$ is non-increasing in $1 \leq k \leq n$.&lt;/p&gt;

&lt;p&gt;In addition, Figini et al.&lt;sup id=&quot;fnref:1:14&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; empirically demonstrates that the market rank indicator is &lt;em&gt;a suitable early warning system for future turbulence on the market&lt;/em&gt;&lt;sup id=&quot;fnref:1:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; by showing&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; that 
&lt;em&gt;a large value of [that indicator] signals that the market will experience worse performances in the near future and that the probability of a large and negative return increases&lt;/em&gt;&lt;sup id=&quot;fnref:1:16&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h3 id=&quot;how-to-determine-the-number-of-singular-values-to-retain&quot;&gt;How to determine the number of singular values to retain?&lt;/h3&gt;

&lt;p&gt;When computing the market rank indicator, Figini et al.&lt;sup id=&quot;fnref:1:17&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; suggests to use a number of singular values of about 1/3th the number of assets&lt;sup id=&quot;fnref:1:18&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;The value for $k$ is set after a careful sensitivity analysis: an extensive comparison of the results obtained under different parameter settings showed that the best choice for $k$ is around one third of $n$.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To be noted, though, that depending on the specific context at hand, the number of singular values to retain &lt;em&gt;could also be chosen using the numerical optimization method, assuming for example $k$ stochastic&lt;/em&gt;&lt;sup id=&quot;fnref:1:19&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h2 id=&quot;comparison-with-other-measures-of-financial-risk&quot;&gt;Comparison with other measures of financial risk&lt;/h2&gt;

&lt;h3 id=&quot;comparison-with-the-absorption-ratio&quot;&gt;Comparison with the absorption ratio&lt;/h3&gt;

&lt;p&gt;Figini et al.&lt;sup id=&quot;fnref:1:20&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; notes that &lt;em&gt;critical market conditions&lt;/em&gt;&lt;sup id=&quot;fnref:1:21&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; are characterized by two connected but &lt;em&gt;not perfectly equivalent&lt;/em&gt;&lt;sup id=&quot;fnref:1:22&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; phenomena:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;&lt;em&gt;An increase in the weight of the first principal components&lt;/em&gt;&lt;sup id=&quot;fnref:1:23&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;A reduction in the weight of the last components&lt;/em&gt;&lt;sup id=&quot;fnref:1:24&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The absorption ration &lt;em&gt;focuses on the quantity of variability explained by the largest components&lt;/em&gt;&lt;sup id=&quot;fnref:1:25&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, that is, it focuses on the first phenomenon.&lt;/p&gt;

&lt;p&gt;The market rank indicator, on the other hand, focuses on the &lt;em&gt;second phenomenon&lt;/em&gt;&lt;sup id=&quot;fnref:1:26&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, which theoretically makes it a perfect complement to the absorption ratio as a measure of financial risk.&lt;/p&gt;

&lt;h3 id=&quot;comparison-with-the-covariance-matrix-effective-rank&quot;&gt;Comparison with the covariance matrix effective rank&lt;/h3&gt;

&lt;p&gt;Familiar readers might remember that &lt;a href=&quot;/blog/the-matrix-effective-rank-measuring-the-dimensionality-of-a-universe-of-assets/&quot;&gt;in a previous blog post&lt;/a&gt;, I discussed the &lt;em&gt;effective rank&lt;/em&gt;&lt;sup id=&quot;fnref:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; of a covariance matrix, 
which is a real-valued extension of &lt;a href=&quot;https://en.wikipedia.org/wiki/Rank_(linear_algebra)&quot;&gt;its rank&lt;/a&gt; allowing to measure the dimensionality of the associated universe of assets.&lt;/p&gt;

&lt;p&gt;It turns out - as its name suggests - that the market rank indicator is very similar to the effective rank.&lt;/p&gt;

&lt;p&gt;Indeed, when there is &lt;em&gt;an increase in the co-movement of the asset returns&lt;/em&gt;&lt;sup id=&quot;fnref:1:27&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, &lt;em&gt;the vectors of the return time series become closer&lt;/em&gt;&lt;sup id=&quot;fnref:1:28&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, which leads to 
&lt;em&gt;a reduction in the diversification opportunities or, from a numerical point of view, a reduction in the market “dimension”&lt;/em&gt;&lt;sup id=&quot;fnref:1:29&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; which is also captured by the effective rank.&lt;/p&gt;

&lt;p&gt;As an illustration, Figure 1 compares both indicators when applied to the 11 Vanguard U.S. sector ETFs&lt;sup id=&quot;fnref:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; over the period 31th October 2005 - 30th January 2026, using the same methodology as described in &lt;a href=&quot;#example-of-usage---scaling-the-market-exposure-of-a-portfolio-of-us-equities&quot;&gt;the Exemple of usage section&lt;/a&gt;.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/market-rank-indicator-comparison-with-effective-rank.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/market-rank-indicator-comparison-with-effective-rank-small.png&quot; alt=&quot;Figure 1. Market rank indicator v.s. effective rank, 11 Vanguard U.S. sector ETFs, 31th October 2005 - 30th January 2026.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 1. Market rank indicator v.s. effective rank, 11 Vanguard U.S. sector ETFs, 31th October 2005 - 30th January 2026.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;On Figure 1, it is clearly visible that the market rank indicator and the effective rank move in a near perfect opposite direction to each other, which is confirmed numerically by a -84% correlation between these two indicators.&lt;/p&gt;

&lt;p&gt;Still, the inverse relationship between the market rank indicator and the effective rank is not perfect&lt;sup id=&quot;fnref:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:10&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;, so that they might be considered as two similar-but-different indicators.&lt;/p&gt;

&lt;h3 id=&quot;horse-race-for-value-at-risk-forecasting&quot;&gt;Horse race for Value-at-Risk forecasting&lt;/h3&gt;

&lt;p&gt;Figini et al.&lt;sup id=&quot;fnref:1:30&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; compares the empirical performances of the market rank indicator to those of three other measures of financial risk (the absorption ratio, &lt;a href=&quot;/blog/the-turbulence-index-measuring-financial-risk/&quot;&gt;the turbulence index&lt;/a&gt; and 
the average correlation) when forecasting the level of the 20-day ahead Value-at-Risk, and shows that it is &lt;em&gt;the indicator which displays the best performances&lt;/em&gt;&lt;sup id=&quot;fnref:1:31&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In more details, Figini et al.&lt;sup id=&quot;fnref:1:32&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;First notes that &lt;em&gt;in general, a good systemic risk indicator would allow to well discriminate between the regular behavior of the market and periods of distress&lt;/em&gt;&lt;sup id=&quot;fnref:1:33&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; because 
&lt;em&gt;intuitively, the VaR (or other risk indicators) should be low when the indicator is low and large when the indicator is large&lt;/em&gt;&lt;sup id=&quot;fnref:1:34&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Then proceeds to empirically test this assumption in the case of the S&amp;amp;P 500 index, &lt;a href=&quot;https://en.wikipedia.org/wiki/STOXX_Europe_600&quot;&gt;the STOXX Europe 600 index&lt;/a&gt; and &lt;a href=&quot;https://en.wikipedia.org/wiki/DAX&quot;&gt;the DAX index&lt;/a&gt; and concludes that:&lt;/p&gt;

    &lt;blockquote&gt;
      &lt;p&gt;Although each indicator presents a degree of forecasting power, the MRI is the indicator which displays the best performances for the three markets.&lt;/p&gt;

      &lt;p&gt;In fact, [as can be seen in Figure 2] not only the VaR associated to the highest percentile interval is the largest, but also the difference between the VaR associated to the highest interval and the one associated to the lowest interval present the largest outcomes as well […].&lt;/p&gt;

      &lt;p&gt;This shows that the MRI can best discriminate between regular and turbulent periods, with respect to other systemic risk indicators.&lt;/p&gt;
    &lt;/blockquote&gt;

    &lt;figure&gt;
      &lt;a href=&quot;/assets/images/blog/market-rank-indicator-comparison-with-other-measures-figini.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/market-rank-indicator-comparison-with-other-measures-figini-small.png&quot; alt=&quot;Figure 2. Future 20-day S&amp;amp;P 500 Value-at-Risk at 5% level v.s. quintiles of various systemic risk indicators, 02nd January 1992 - 01st July 2011. Source: Figini et al.&quot; /&gt;&lt;/a&gt;
      &lt;figcaption&gt;Figure 2. Future 20-day S&amp;amp;P 500 Value-at-Risk at 5% level v.s. quintiles of various systemic risk indicators, 02nd January 1992 - 01st July 2011. Source: Figini et al.&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;implementation-in-portfolio-optimizer&quot;&gt;Implementation in Portfolio Optimizer&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Portfolio Optimizer&lt;/strong&gt; implements the computation of the market rank indicator through the endpoint &lt;a href=&quot;https://docs.portfoliooptimizer.io/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/assets/analysis/market-rank-indicator&lt;/code&gt;&lt;/a&gt;, with the default number of singular values to retain 
suggested in Figini et al.&lt;sup id=&quot;fnref:1:35&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h2 id=&quot;example-of-usage---scaling-the-market-exposure-of-a-portfolio-of-us-equities&quot;&gt;Example of usage - Scaling the market exposure of a portfolio of U.S. equities&lt;/h2&gt;

&lt;p&gt;I will now illustrate the performances of the market rank indicator in a trading strategy desgined to capture the idea that exposure to stocks should be decreased when the market rank indicator is increasing.&lt;/p&gt;

&lt;p&gt;For this:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;I will work within the universe of U.S. equities represented by the SPY ETF and the 11 Vanguard U.S. sector ETFs&lt;sup id=&quot;fnref:5:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;I will rely on daily return data&lt;sup id=&quot;fnref:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;7&lt;/a&gt;&lt;/sup&gt; over the period 31th October 2005 - 30th January 2026.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;I will use the following trading strategy at the end of each month:&lt;/p&gt;
    &lt;ul&gt;
      &lt;li&gt;Compute the sample covariance matrix of the 11 Vanguard U.S. sector ETFs over the past month, using the ETF daily arithmetic returns.&lt;/li&gt;
      &lt;li&gt;Compute the market rank indicator associated to that covariance matrix, retaining 3 eigenvalues.&lt;/li&gt;
      &lt;li&gt;Determine how elevated is this market rank indicator relative to its past 12-month history on a scale $s$ from 0% to 100%, thanks to &lt;a href=&quot;https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileofscore.html&quot;&gt;a percentile rank&lt;/a&gt;.&lt;/li&gt;
      &lt;li&gt;If $s$ is greater than 75% (i.e., relatively elevated recent market rank indicator), allocate the portfolio to cash (with 0% interest) else allocate the portfolio to U.S. equities (SPY ETF); in both cases, hold that portfolio for the next month.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The equity curve associated to that market rank indicator-based trading stategy is depicted in Figure 3.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/market-rank-indicator-spy-strategy.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/market-rank-indicator-spy-strategy-small.png&quot; alt=&quot;Figure 3. MRI-based trading strategy v.s. buy and hold, SPY ETF, 31th October 2005 - 30th January 2026.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 3. MRI-based trading strategy v.s. buy and hold, SPY ETF, 31th October 2005 - 30th January 2026.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;From Figure 3, the market rank indicator-based trading stategy does a good job in avoiding some serious drawdowns when compared to the buy and hold strategy, but unfortunately lags in terms of total return.&lt;/p&gt;

&lt;p&gt;Figures:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Portfolio Management Strategy&lt;/th&gt;
      &lt;th&gt;Average Exposure&lt;/th&gt;
      &lt;th&gt;CAGR&lt;/th&gt;
      &lt;th&gt;Annualized Sharpe Ratio&lt;/th&gt;
      &lt;th&gt;Maximum (Monthly) Drawdown&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Buy and hold&lt;/td&gt;
      &lt;td&gt;100%&lt;/td&gt;
      &lt;td&gt;10.9%&lt;/td&gt;
      &lt;td&gt;0.78&lt;/td&gt;
      &lt;td&gt;51%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;MRI-based&lt;/td&gt;
      &lt;td&gt;68%&lt;/td&gt;
      &lt;td&gt;8.2%&lt;/td&gt;
      &lt;td&gt;0.79&lt;/td&gt;
      &lt;td&gt;33%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;For comparison, Figure 4 additionally depicts the equity curve associated to a similar strategy, this time based on the absorption ratio of the 11 Vanguard U.S. sector ETFs&lt;sup id=&quot;fnref:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/market-rank-indicator-spy-strategy-extended.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/market-rank-indicator-spy-strategy-extended-small.png&quot; alt=&quot;Figure 4. MRI-based and AR-based trading strategies v.s. buy and hold, SPY ETF, 31th October 2005 - 30th January 2026.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 4. MRI-based and AR-based trading strategies v.s. buy and hold, SPY ETF, 31th October 2005 - 30th January 2026.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;With figures:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Portfolio Management Strategy&lt;/th&gt;
      &lt;th&gt;Average Exposure&lt;/th&gt;
      &lt;th&gt;CAGR&lt;/th&gt;
      &lt;th&gt;Annualized Sharpe Ratio&lt;/th&gt;
      &lt;th&gt;Maximum (Monthly) Drawdown&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Buy and hold&lt;/td&gt;
      &lt;td&gt;100%&lt;/td&gt;
      &lt;td&gt;10.9%&lt;/td&gt;
      &lt;td&gt;0.78&lt;/td&gt;
      &lt;td&gt;51%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;MRI-based&lt;/td&gt;
      &lt;td&gt;68%&lt;/td&gt;
      &lt;td&gt;8.2%&lt;/td&gt;
      &lt;td&gt;0.79&lt;/td&gt;
      &lt;td&gt;33%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;AR-based&lt;/td&gt;
      &lt;td&gt;69%&lt;/td&gt;
      &lt;td&gt;10.6%&lt;/td&gt;
      &lt;td&gt;1.03&lt;/td&gt;
      &lt;td&gt;21%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;From Figure 4, it seems that that the absorption ratio is superior&lt;sup id=&quot;fnref:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt; to the market rank indicator in terms of predictive performances for the specific universe of assets and over the specific period considered here.&lt;/p&gt;

&lt;p&gt;Incidentally, that result is at odds with the findings of Figini et al.&lt;sup id=&quot;fnref:1:36&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, so that the interested reader might want to dig deeper…&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Together with the turbulence index and the absorption ratio&lt;sup id=&quot;fnref:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:9&quot; class=&quot;footnote&quot;&gt;10&lt;/a&gt;&lt;/sup&gt;, the market rank indicator is a simple &lt;em&gt;early warning indicator&lt;/em&gt;&lt;sup id=&quot;fnref:1:37&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; for dangerous periods on financial markets.&lt;/p&gt;

&lt;p&gt;Although it did not particularly shine v.s. the absorption ratio in the specific example studied in this blog post, the market rank indicator is definitely an &lt;em&gt;additional tool to integrate&lt;/em&gt;&lt;sup id=&quot;fnref:1:38&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; so as to &lt;em&gt;improve [one’s] warning system&lt;/em&gt;&lt;sup id=&quot;fnref:1:39&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Waiting for the analysis of the other measures of financial risk, feel free to &lt;a href=&quot;https://www.linkedin.com/in/roman-rubsamen/&quot;&gt;connect with me on LinkedIn&lt;/a&gt; or to &lt;a href=&quot;https://twitter.com/portfoliooptim&quot;&gt;follow me on Twitter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;–&lt;/p&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://jpm.pm-research.com/content/37/4/112&quot;&gt;Mark Kritzman, Yuanzhen Li, Sebastien Page and Roberto Rigobon, Principal Components as a Measure of Systemic Risk, The Journal of Portfolio Management Summer 2011, 37 (4) 112-126&lt;/a&gt;. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://doi.org/10.1016/j.ecosta.2017.12.001&quot;&gt;Silvia Figini, Mario Maggi, Pierpaolo Uberti, The market rank indicator to detect financial distress, Econometrics and Statistics, Volume 14, 2020, Pages 63-73&lt;/a&gt;. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:1:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;11&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;12&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:12&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;13&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:13&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;14&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:14&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;15&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:15&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;16&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:16&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;17&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:17&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;18&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:18&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;19&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:19&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;20&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:20&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;21&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:21&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;22&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:22&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;23&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:23&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;24&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:24&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;25&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:25&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;26&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:26&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;27&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:27&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;28&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:28&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;29&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:29&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;30&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:30&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;31&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:31&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;32&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:32&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;33&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:33&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;34&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:34&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;35&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:35&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;36&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:36&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;37&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:37&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;38&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:38&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;39&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:39&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;40&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Using returns of the 10 sectors composing the &lt;a href=&quot;https://en.wikipedia.org/wiki/S%26P_500&quot;&gt;S&amp;amp;P 500 index&lt;/a&gt;. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:4&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://ieeexplore.ieee.org/document/7098875&quot;&gt;Olivier Roy and Martin Vetterli, The effective rank: A measure of effective dimensionality, 15th European Signal Processing Conference, 2007&lt;/a&gt;. &lt;a href=&quot;#fnref:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:5&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;These are “VOX”, “VCR”, “VDC”, “VDE”, “VFH”, “VHT”, “VIS”, “VGT”, “VAW”, “VNQ”, “VPU”, c.f. &lt;a href=&quot;https://investor.vanguard.com/investment-products/etfs/sector-etfs&quot;&gt;https://investor.vanguard.com/investment-products/etfs/sector-etfs&lt;/a&gt;. &lt;a href=&quot;#fnref:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:5:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:10&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Most probably due to the fact that the number of eigenvalues used by each indicator is different. &lt;a href=&quot;#fnref:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:6&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;(Adjusted) prices of the ETFs have have been retrieved using &lt;a href=&quot;https://api.tiingo.com/&quot;&gt;Tiingo&lt;/a&gt;. &lt;a href=&quot;#fnref:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:8&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The absorption ratio-based trading strategy is exactly the same as the market rank indicator-based trading strategy, except that the absorption ratio is computed instead of the market rank indicator; that’s the only change. &lt;a href=&quot;#fnref:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:7&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The two indicators are agreeing most of the time - their correlation is 85%; as a side note, it could be interesting to understand what happens when they are disagreeing… &lt;a href=&quot;#fnref:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:9&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;As well as the effective rank; I cannot insist too much on that one, though, because - to this date - I did not analyze its usefulness as a measure of financial risk on the blog. &lt;a href=&quot;#fnref:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name>Roman R.</name></author><category term="covariance matrix" /><category term="systemic risk" /><category term="absorption ratio" /><category term="market rank indicator" /><category term="principal component analysis" /><summary type="html">In the previous post of this series on measuring financial risk, I described the absorption ratio, a measure of financial market fragility based on principal components analysis, introduced in Kritzman et al.1. In this new blog post, I will describe another measure of financial distress called the market rank indicator (MRI), this time related to the notion of condition number2 of a matrix, introduced in Figini et al.2. As an example of usage, I will show how to use the market rank indicator to dynamically scale the market exposure of a portfolio of U.S. equities. The market rank indicator Definition Let be: $n$, the number of assets $T$, the number of time periods, with $T &amp;gt; n$ $X \in \mathbb{R}^{T \times n}$, the matrix of the asset arithmetic or logarithmic returns for each of the $T$ time periods $\sigma_1,…,\sigma_n$, the singular values of $X$ ordered such that $0 &amp;lt; \sigma_1 \leq … \leq \sigma_n$ $1\leq k \leq n$, the number of singular values $\lambda_1,…,\lambda_k$ to retain in the computation of the market rank indicator The market rank indicator $\text{MRI}$ of the assets is defined2 as the ratio between the largest singular value and the geometric mean of the $k$ smallest singular values of the matrix $X$2, that is \[\text{MRI}_k = \frac{ \sigma_n }{ \left( \prod_{i=1}^k \sigma_i \right)^{1/k} }\] Alternative definition Let be: $n$, the number of assets $T$, the number of time periods, with $T &amp;gt; n$ $X \in \mathbb{R}^{T \times n}$, the matrix of the asset arithmetic or logarithmic returns for each of the $T$ time periods $\Sigma = \frac{1}{T} X {}^t X \in \mathcal{M}(\mathbb{R}^{n \times n})$, the asset returns covariance matrix $\lambda_1,…,\lambda_n$, the eigenvalues of $\Sigma$ ordered such that $0 &amp;lt; \lambda_1 \leq … \leq \lambda_n$ Thanks to the relationship $\lambda_i = \frac{\sigma_i^2}{T}$, $i=1..n$, between the eigenvalues of $\Sigma$ and the singular values of $X$, the market rank indicator can alternatively be defined as the square root of the ratio between the largest eigenvalue and the geometric mean of the $k$ smallest eigenvalues of the matrix $\Sigma$, that is \[\text{MRI}_k = \sqrt{ \frac{ \lambda_n }{ \left( \prod_{i=1}^k \lambda_i \right)^{1/k} } }\] Generalized definition In the two previous sub-sections, it is assumed that $T &amp;gt; n$. This is to guarantee that the singular values of the matrix $X$ or that the eigenvalues of the matrix $\Sigma$ are not null. In case $T \leq n$, the definition of the market rank indicator needs to be adapted, which is simply done by redefining [it] on the first $T$ [non zero] singular values2. Rationale Figini et al.2 highlights that what contains valuable information on market synchronization in a principal components analysis on asset returns is not only the strength of the first principal component, but also the weakness of the last components2. This observation leads to the proposal of the market rank indicator as a generalization of the condition number $\kappa(X)$ of the matrix $X$2, with $\text{MRI}_1 = \kappa(X)$ as a special case. Interpretation Figini et al.2 proposes both a mathematical and a financial interpretation of the market rank indicator: Mathematically, the market rank indicator measures the difficulty to span $\mathbb{R}^n$ with the $n$ columns of $X$2. Indeed, for a given $1\leq k \leq n$, a large value of $\text{MRI}_k$ indicates that […] the columns of $X$ are a basis for an $(n − k)$-dimensional space2. Financially, the market rank indicator may be interpreted as a measure of distance of the asset return series from a market where the number of independent assets is reduced2. Indeed, again for a given $1\leq k \leq n$, a large value of $\text{MRI}_k$ indicates that $k$ dimensions/assets out of $n$ are not very useful in diversifying a portfolio2. Theoretical and empirical properties Figini et al.2 establishes several theoretical properties of the market rank indicator, among which: Property 1: For any $1 \leq k \leq n$, $ 1 \leq \text{MRI}_k \leq \text{MRI}_1 $. Property 2: For a given matrix $X$, $\text{MRI}_k$ is non-increasing in $1 \leq k \leq n$. In addition, Figini et al.2 empirically demonstrates that the market rank indicator is a suitable early warning system for future turbulence on the market2 by showing3 that a large value of [that indicator] signals that the market will experience worse performances in the near future and that the probability of a large and negative return increases2. How to determine the number of singular values to retain? When computing the market rank indicator, Figini et al.2 suggests to use a number of singular values of about 1/3th the number of assets2: The value for $k$ is set after a careful sensitivity analysis: an extensive comparison of the results obtained under different parameter settings showed that the best choice for $k$ is around one third of $n$. To be noted, though, that depending on the specific context at hand, the number of singular values to retain could also be chosen using the numerical optimization method, assuming for example $k$ stochastic2. Comparison with other measures of financial risk Comparison with the absorption ratio Figini et al.2 notes that critical market conditions2 are characterized by two connected but not perfectly equivalent2 phenomena: An increase in the weight of the first principal components2 A reduction in the weight of the last components2 The absorption ration focuses on the quantity of variability explained by the largest components2, that is, it focuses on the first phenomenon. The market rank indicator, on the other hand, focuses on the second phenomenon2, which theoretically makes it a perfect complement to the absorption ratio as a measure of financial risk. Comparison with the covariance matrix effective rank Familiar readers might remember that in a previous blog post, I discussed the effective rank4 of a covariance matrix, which is a real-valued extension of its rank allowing to measure the dimensionality of the associated universe of assets. It turns out - as its name suggests - that the market rank indicator is very similar to the effective rank. Indeed, when there is an increase in the co-movement of the asset returns2, the vectors of the return time series become closer2, which leads to a reduction in the diversification opportunities or, from a numerical point of view, a reduction in the market “dimension”2 which is also captured by the effective rank. As an illustration, Figure 1 compares both indicators when applied to the 11 Vanguard U.S. sector ETFs5 over the period 31th October 2005 - 30th January 2026, using the same methodology as described in the Exemple of usage section. Figure 1. Market rank indicator v.s. effective rank, 11 Vanguard U.S. sector ETFs, 31th October 2005 - 30th January 2026. On Figure 1, it is clearly visible that the market rank indicator and the effective rank move in a near perfect opposite direction to each other, which is confirmed numerically by a -84% correlation between these two indicators. Still, the inverse relationship between the market rank indicator and the effective rank is not perfect6, so that they might be considered as two similar-but-different indicators. Horse race for Value-at-Risk forecasting Figini et al.2 compares the empirical performances of the market rank indicator to those of three other measures of financial risk (the absorption ratio, the turbulence index and the average correlation) when forecasting the level of the 20-day ahead Value-at-Risk, and shows that it is the indicator which displays the best performances2. In more details, Figini et al.2: First notes that in general, a good systemic risk indicator would allow to well discriminate between the regular behavior of the market and periods of distress2 because intuitively, the VaR (or other risk indicators) should be low when the indicator is low and large when the indicator is large2. Then proceeds to empirically test this assumption in the case of the S&amp;amp;P 500 index, the STOXX Europe 600 index and the DAX index and concludes that: Although each indicator presents a degree of forecasting power, the MRI is the indicator which displays the best performances for the three markets. In fact, [as can be seen in Figure 2] not only the VaR associated to the highest percentile interval is the largest, but also the difference between the VaR associated to the highest interval and the one associated to the lowest interval present the largest outcomes as well […]. This shows that the MRI can best discriminate between regular and turbulent periods, with respect to other systemic risk indicators. Figure 2. Future 20-day S&amp;amp;P 500 Value-at-Risk at 5% level v.s. quintiles of various systemic risk indicators, 02nd January 1992 - 01st July 2011. Source: Figini et al. Implementation in Portfolio Optimizer Portfolio Optimizer implements the computation of the market rank indicator through the endpoint /assets/analysis/market-rank-indicator, with the default number of singular values to retain suggested in Figini et al.2. Example of usage - Scaling the market exposure of a portfolio of U.S. equities I will now illustrate the performances of the market rank indicator in a trading strategy desgined to capture the idea that exposure to stocks should be decreased when the market rank indicator is increasing. For this: I will work within the universe of U.S. equities represented by the SPY ETF and the 11 Vanguard U.S. sector ETFs5. I will rely on daily return data7 over the period 31th October 2005 - 30th January 2026. I will use the following trading strategy at the end of each month: Compute the sample covariance matrix of the 11 Vanguard U.S. sector ETFs over the past month, using the ETF daily arithmetic returns. Compute the market rank indicator associated to that covariance matrix, retaining 3 eigenvalues. Determine how elevated is this market rank indicator relative to its past 12-month history on a scale $s$ from 0% to 100%, thanks to a percentile rank. If $s$ is greater than 75% (i.e., relatively elevated recent market rank indicator), allocate the portfolio to cash (with 0% interest) else allocate the portfolio to U.S. equities (SPY ETF); in both cases, hold that portfolio for the next month. The equity curve associated to that market rank indicator-based trading stategy is depicted in Figure 3. Figure 3. MRI-based trading strategy v.s. buy and hold, SPY ETF, 31th October 2005 - 30th January 2026. From Figure 3, the market rank indicator-based trading stategy does a good job in avoiding some serious drawdowns when compared to the buy and hold strategy, but unfortunately lags in terms of total return. Figures: Portfolio Management Strategy Average Exposure CAGR Annualized Sharpe Ratio Maximum (Monthly) Drawdown Buy and hold 100% 10.9% 0.78 51% MRI-based 68% 8.2% 0.79 33% For comparison, Figure 4 additionally depicts the equity curve associated to a similar strategy, this time based on the absorption ratio of the 11 Vanguard U.S. sector ETFs8. Figure 4. MRI-based and AR-based trading strategies v.s. buy and hold, SPY ETF, 31th October 2005 - 30th January 2026. With figures: Portfolio Management Strategy Average Exposure CAGR Annualized Sharpe Ratio Maximum (Monthly) Drawdown Buy and hold 100% 10.9% 0.78 51% MRI-based 68% 8.2% 0.79 33% AR-based 69% 10.6% 1.03 21% From Figure 4, it seems that that the absorption ratio is superior9 to the market rank indicator in terms of predictive performances for the specific universe of assets and over the specific period considered here. Incidentally, that result is at odds with the findings of Figini et al.2, so that the interested reader might want to dig deeper… Conclusion Together with the turbulence index and the absorption ratio10, the market rank indicator is a simple early warning indicator2 for dangerous periods on financial markets. Although it did not particularly shine v.s. the absorption ratio in the specific example studied in this blog post, the market rank indicator is definitely an additional tool to integrate2 so as to improve [one’s] warning system2. Waiting for the analysis of the other measures of financial risk, feel free to connect with me on LinkedIn or to follow me on Twitter. – See Mark Kritzman, Yuanzhen Li, Sebastien Page and Roberto Rigobon, Principal Components as a Measure of Systemic Risk, The Journal of Portfolio Management Summer 2011, 37 (4) 112-126. &amp;#8617; See Silvia Figini, Mario Maggi, Pierpaolo Uberti, The market rank indicator to detect financial distress, Econometrics and Statistics, Volume 14, 2020, Pages 63-73. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 &amp;#8617;10 &amp;#8617;11 &amp;#8617;12 &amp;#8617;13 &amp;#8617;14 &amp;#8617;15 &amp;#8617;16 &amp;#8617;17 &amp;#8617;18 &amp;#8617;19 &amp;#8617;20 &amp;#8617;21 &amp;#8617;22 &amp;#8617;23 &amp;#8617;24 &amp;#8617;25 &amp;#8617;26 &amp;#8617;27 &amp;#8617;28 &amp;#8617;29 &amp;#8617;30 &amp;#8617;31 &amp;#8617;32 &amp;#8617;33 &amp;#8617;34 &amp;#8617;35 &amp;#8617;36 &amp;#8617;37 &amp;#8617;38 &amp;#8617;39 &amp;#8617;40 Using returns of the 10 sectors composing the S&amp;amp;P 500 index. &amp;#8617; See Olivier Roy and Martin Vetterli, The effective rank: A measure of effective dimensionality, 15th European Signal Processing Conference, 2007. &amp;#8617; These are “VOX”, “VCR”, “VDC”, “VDE”, “VFH”, “VHT”, “VIS”, “VGT”, “VAW”, “VNQ”, “VPU”, c.f. https://investor.vanguard.com/investment-products/etfs/sector-etfs. &amp;#8617; &amp;#8617;2 Most probably due to the fact that the number of eigenvalues used by each indicator is different. &amp;#8617; (Adjusted) prices of the ETFs have have been retrieved using Tiingo. &amp;#8617; The absorption ratio-based trading strategy is exactly the same as the market rank indicator-based trading strategy, except that the absorption ratio is computed instead of the market rank indicator; that’s the only change. &amp;#8617; The two indicators are agreeing most of the time - their correlation is 85%; as a side note, it could be interesting to understand what happens when they are disagreeing… &amp;#8617; As well as the effective rank; I cannot insist too much on that one, though, because - to this date - I did not analyze its usefulness as a measure of financial risk on the blog. &amp;#8617;</summary></entry><entry><title type="html">More Bootstrap Simulations with Portfolio Optimizer: the Autoregressive Online Bootstrap</title><link href="https://portfoliooptimizer.io/blog/more-bootstrap-simulations-with-portfolio-optimizer-the-autoregressive-online-bootstrap/" rel="alternate" type="text/html" title="More Bootstrap Simulations with Portfolio Optimizer: the Autoregressive Online Bootstrap" /><published>2026-02-01T00:00:00-06:00</published><updated>2026-02-01T00:00:00-06:00</updated><id>https://portfoliooptimizer.io/blog/more-bootstrap-simulations-with-portfolio-optimizer-the-autoregressive-online-bootstrap</id><content type="html" xml:base="https://portfoliooptimizer.io/blog/more-bootstrap-simulations-with-portfolio-optimizer-the-autoregressive-online-bootstrap/">&lt;p&gt;In &lt;a href=&quot;/blog/bootstrap-simulation-with-portfolio-optimizer-usage-for-financial-planning&quot;&gt;a previous article&lt;/a&gt;, I described several classical bootstrap techniques — i.i.d. bootstrap, circular block bootstrap, and stationary block bootstrap — and 
showed how the stationary block bootstrap could be used to simulate future price paths for financial assets by following the methodology of Anarkulova et al.&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In this blog post, I will detail another bootstrap technique called &lt;em&gt;the autoregressive online bootstrap&lt;/em&gt;&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; and introduced in Palm and Nagler&lt;sup id=&quot;fnref:2:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, that is best described as a multiplier bootstrap&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; coupled 
with an autoregressive sequence of weights specifically chosen to make it useable with streaming time series data.&lt;/p&gt;

&lt;p&gt;As an example of usage, I will simulate alternative price histories for the SPY and TLT ETFs, which are ETFs representative of the US stock market and of the long-term US Treasury bonds.&lt;/p&gt;

&lt;h2 id=&quot;mathematical-preliminaries&quot;&gt;Mathematical preliminaries&lt;/h2&gt;

&lt;p&gt;Let $X_1, …, X_n$&lt;sup id=&quot;fnref:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;, $n \geq 1$ be a sample of data observed from a population.&lt;/p&gt;

&lt;h3 id=&quot;limitations-of-classical-bootstrap-techniques-in-an-online-setting&quot;&gt;Limitations of classical bootstrap techniques in an online setting&lt;/h3&gt;

&lt;p&gt;In a typical&lt;sup id=&quot;fnref:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; online setting, the length $n$ of the sample of data $X_1, …, X_n$ is growing infinitely, with new &lt;em&gt;events [that] are observed at the moment they occur&lt;/em&gt;&lt;sup id=&quot;fnref:2:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In such a setting, the classical bootstrap techniques described in &lt;a href=&quot;/blog/bootstrap-simulation-with-portfolio-optimizer-usage-for-financial-planning&quot;&gt;the previous blog post on bootstrap simulations&lt;/a&gt; require &lt;em&gt;all data observed so far&lt;/em&gt;&lt;sup id=&quot;fnref:2:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; at any given point in time.&lt;/p&gt;

&lt;p&gt;Indeed:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The i.i.d. bootstrap, by definition, &lt;em&gt;requires keeping track of the entire observed sample $ {X_1, …, X_n} $&lt;/em&gt;&lt;sup id=&quot;fnref:2:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
  &lt;li&gt;The circular and the stationary block bootstrap require &lt;em&gt;all blocks […] to increase in size with $n$&lt;/em&gt;&lt;sup id=&quot;fnref:2:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, so that &lt;em&gt;to compute the bootstrap in practice, the entire data set $ {X_1, …, X_n} $ needs to be kept in memory and processed fully, every time the block size changes&lt;/em&gt;&lt;sup id=&quot;fnref:2:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Depending on the domain of application, this can be a serious computational limitation when:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;n&lt;/em&gt; is large&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;n&lt;/em&gt; is moderately large but the underlying $n$ observations require a lot of computer memory to be stored&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;n&lt;/em&gt; is moderately large but the computation time is limited (e.g. real-time applications)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;the-multiplier-bootstrap&quot;&gt;The multiplier bootstrap&lt;/h3&gt;

&lt;p&gt;The multiplier bootstrap&lt;sup id=&quot;fnref:3:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; is &lt;em&gt;a general class of bootstrapping schemes based on perturbations of the original observations with suitable weights&lt;/em&gt;&lt;sup id=&quot;fnref:2:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In other words, compared to classical bootstrap techniques, the multiplier bootstrap replaces the idea of “randomly resampling observations” with the idea of “randomly reweighting observations”, 
which enables it to be applied in an online setting.&lt;/p&gt;

&lt;p&gt;For i.i.d. data $X_1, …, X_n$, it is defined as follows:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Let $V_1, …, V_n$ be $n$ i.i.d. random variables of unit mean/variance.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Let $ \bar V_i = \frac{1}{i} \sum_{j=1}^{i} V_j $ the running mean of these random variables, which can be computed recursively through the formula&lt;/p&gt;

\[\bar V_i = \frac{(i-1) \bar V_{i-1} + V_i }{i}\]
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The multiplier bootstrap samples $ X_1^*,…,X_n^* $ are then defined as&lt;/p&gt;

\[X_i^* = \frac{V_i}{\bar V_i} X_i\]
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;the-autoregressive-online-bootstrap&quot;&gt;The autoregressive online bootstrap&lt;/h2&gt;

&lt;h3 id=&quot;methodology&quot;&gt;Methodology&lt;/h3&gt;

&lt;p&gt;The autoregressive online bootstrap&lt;sup id=&quot;fnref:2:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; is a specific instance of the multiplier bootstrap that generates a sequence of random weights evolving according to an autoregressive process centered around 1:&lt;/p&gt;

\[V_i = 1 + \rho_i \left( V_{i-1} - 1 \right)  + \sqrt{1 - \rho_i^2} \zeta_i, i=1..n\]

&lt;p&gt;, with:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$V_0 = 0$&lt;/li&gt;
  &lt;li&gt;$\rho_i = 1 - i^{-\beta}$, $ 0 &amp;lt; \beta &amp;lt; \frac{1}{2}$&lt;/li&gt;
  &lt;li&gt;$\zeta_i \sim \mathcal N(0,1), i=1..n$&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The associated autoregressive online bootstrap samples $ X_1^*,…,X_n^* $ are then defined as&lt;/p&gt;

\[X_i^* = \frac{V_i}{\bar V_i} X_i\]

&lt;h3 id=&quot;rationale&quot;&gt;Rationale&lt;/h3&gt;

&lt;p&gt;Palm and Nagler&lt;sup id=&quot;fnref:2:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; notes that for the multiplier bootstrap to remain valid for time series:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;The dependencies between weights $V_i$ and $V_j$ must increase with the sample size $n$, but at the same time remain almost independent when the time gap $|i-j|$ is sufficiently large compared to $n$&lt;/em&gt;&lt;sup id=&quot;fnref:2:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;A scaling of the weights by their arithmetic mean is also necessary&lt;/em&gt;&lt;sup id=&quot;fnref:2:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is what they propose with the autoregressive online bootstrap technique.&lt;/p&gt;

&lt;h3 id=&quot;properties&quot;&gt;Properties&lt;/h3&gt;

&lt;p&gt;The main result of Palm and Nagler&lt;sup id=&quot;fnref:2:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; is that under mild conditions on the original data $X_1, …, X_n$, the autoregressive online bootstrap is a &lt;a href=&quot;https://en.wikipedia.org/wiki/Consistency_(statistics)&quot;&gt;consistent&lt;/a&gt; resampling scheme 
for the mean and for any continously differentiable transformation of the mean&lt;sup id=&quot;fnref:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; of an univariate or a multivariate time series.&lt;/p&gt;

&lt;h3 id=&quot;how-to-select-the-parameter-beta&quot;&gt;How to select the parameter $\beta$?&lt;/h3&gt;

&lt;p&gt;The parameter $\beta$ controls the behaviour of the autoregressive online bootstrap samples in a way similar to how the block length controls the behaviour of the block bootstrap samples in classical block bootstrap techniques.&lt;/p&gt;

&lt;p&gt;Intuitively:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;A small value of $\beta$ leads to slowly changing weights and thus to long stretches of bootstrap samples that are consistent with the original observations, although up- or down-weighted.&lt;/p&gt;

    &lt;p&gt;This is similar in spirit to having a long block size in a block bootstrap technique.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;On the contrary, a large value of $\beta$ leads to quickly changing weights and thus to rapidly varying bootstrap samples.&lt;/p&gt;

    &lt;p&gt;This is similar in spirit to having a short block size in a block bootstrap technique.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Palm and Nagler&lt;sup id=&quot;fnref:2:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; demonstrates that $\beta_{opt} = \sqrt{2} - 1$ allows for &lt;em&gt;an optimal bias-variance trade-off&lt;/em&gt;&lt;sup id=&quot;fnref:2:14&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, so that unless there is a specific need to play with this parameter, using that value should be the default choice.&lt;/p&gt;

&lt;h3 id=&quot;practical-performances&quot;&gt;Practical performances&lt;/h3&gt;

&lt;p&gt;Palm and Nagler&lt;sup id=&quot;fnref:2:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; compares the practical performances of the autoregressive online bootstrap to an i.i.d. multiplier bootstrap and to a block bootstrap technique called &lt;em&gt;the moving average block bootstrap&lt;/em&gt;&lt;sup id=&quot;fnref:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In particular, Palm and Nagler&lt;sup id=&quot;fnref:2:16&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; shows that:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The two time series bootstraps &lt;em&gt;achieve approximately correct coverage in all [studied] scenarios, even in the presence of nonlinear [AR(2)-GARCH(1,1)] dependencies&lt;/em&gt;&lt;sup id=&quot;fnref:2:17&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; corresponding to a stochastic volatility process.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The autoregressive online bootstrap allows for cheap online updates - in constant time, as illustrated in Figure 1 - contrary to the moving average block bootstrap.&lt;/p&gt;

    &lt;figure&gt;
    &lt;a href=&quot;/assets/images/blog/more-bootstrap-simulations-computation-time-palm-nagler.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/more-bootstrap-simulations-computation-time-palm-nagler-small.png&quot; alt=&quot;Figure 1. Computation time per online update of 200 bootstrap samples as the bootstrap techniques progress through a stream of 2000 samples. Source: Palm and Nagler.&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Figure 1. Computation time per online update of 200 bootstrap samples as the bootstrap techniques progress through a stream of 2000 samples. Source: Palm and Nagler.&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;/li&gt;
  &lt;li&gt;The autoregressive online bootstrap has a slightly higher variance compared to the moving average block bootstrap, which is &lt;em&gt;the cost [to] pay for its computational advantage&lt;/em&gt;&lt;sup id=&quot;fnref:2:18&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;caveats&quot;&gt;Caveats&lt;/h3&gt;

&lt;p&gt;One important limitation of the autoregressive online bootstrap is that the weights $V_i$,$i=1..n$ can be negative.&lt;/p&gt;

&lt;p&gt;For this reason, in finance, logarithmic asset returns should then be prefered to either asset prices (since they could become negative) or arithmetic asset returns (singe they could become &amp;lt; -100%).&lt;/p&gt;

&lt;h2 id=&quot;implementations&quot;&gt;Implementations&lt;/h2&gt;

&lt;h3 id=&quot;implementation-in-portfolio-optimizer&quot;&gt;Implementation in Portfolio Optimizer&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Portfolio Optimizer&lt;/strong&gt; implements the autoregressive online bootstrap through the endpoint &lt;a href=&quot;https://docs.portfoliooptimizer.io/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/assets/returns/simulation/bootstrap/online&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;implementation-elsewhere&quot;&gt;Implementation elsewhere&lt;/h3&gt;

&lt;p&gt;An implementation of the autoregressive online bootstrap in Python is available at &lt;a href=&quot;https://github.com/nicolaipalm/online-bootstrap-implementation&quot;&gt;https://github.com/nicolaipalm/online-bootstrap-implementation&lt;/a&gt; by its authors.&lt;/p&gt;

&lt;h2 id=&quot;example-of-usage---simulation-of-alternative-price-histories-for-etfs&quot;&gt;Example of usage - Simulation of alternative price histories for ETFs&lt;/h2&gt;

&lt;p&gt;As an example of usage, I propose to use the autoregressive online bootstrap to generate alternative price histories for the SPY and TLT ETFs and compute a couple of associated descriptive statistics&lt;sup id=&quot;fnref:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:9&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h3 id=&quot;alternative-price-histories-for-the-spy-etf&quot;&gt;Alternative price histories for the SPY ETF&lt;/h3&gt;

&lt;p&gt;Figure 2 illustrates 10 synthetic price histories generated by applying the autoregressive online bootstrap to the logarithmic returns of the SPY ETF over the period 1st January 2025 - 31th December 2025&lt;sup id=&quot;fnref:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;figure&gt;
  &lt;a href=&quot;/assets/images/blog/more-bootstrap-simulations-spy-replicas.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/more-bootstrap-simulations-spy-replicas-small.png&quot; alt=&quot;Figure 2. Alternative price histories for the SPY ETF, autoregressive online bootstrap, 1st January 2025 - 31th December 2025.&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Figure 2. Alternative price histories for the SPY ETF, autoregressive online bootstrap, 1st January 2025 - 31th December 2025.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;On Figure 2, it is visible that the autoregressive online bootstrap allows for a wide variety of scenarios to be generated.&lt;/p&gt;

&lt;p&gt;This is confirmed in Figure 3, which depicts the distributions of the first four moments of the logarithmic returns of 1000 synthetic price histories generated by applying the autoregressive online bootstrap to the logarithmic returns of the SPY ETF over the period 1st January 2025 - 31th December 2025&lt;sup id=&quot;fnref:8:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;figure&gt;
  &lt;a href=&quot;/assets/images/blog/more-bootstrap-simulations-spy-replicas-moments.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/more-bootstrap-simulations-spy-replicas-moments-small.png&quot; alt=&quot;Figure 3. Distributions of the first four moments of 1000 alternative price histories for the SPY ETF, autoregressive online bootstrap, 1st January 2025 - 31th December 2025.&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Figure 3. Distributions of the first four moments of 1000 alternative price histories for the SPY ETF, autoregressive online bootstrap, 1st January 2025 - 31th December 2025.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Figure 3 empirically demonstrates two points:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;It confirms the mean-preservation property of the autoregressive online bootstrap.&lt;/p&gt;

    &lt;p&gt;As a side note, in case this is a problem for the usage at hand, it is always possible to alter the mean of the generated scenarios in a perfectly controlled way, c.f. &lt;a href=&quot;/blog/bootstrap-simulations-with-exact-sample-mean-vector-and-sample-covariance-matrix&quot;&gt;a previous blog post&lt;/a&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;It shows that the autoregressive online bootstrap can generate scenarios with a wildly different standard deviation/skewness/kurtosis compared to the original one.&lt;/p&gt;

    &lt;p&gt;Here, to be noted that some scenarios might seem implausible, with a kurtosis &amp;gt; 100 for example.&lt;/p&gt;

    &lt;p&gt;Nevertheless, implausible does not equal impossible&lt;sup id=&quot;fnref:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:10&quot; class=&quot;footnote&quot;&gt;10&lt;/a&gt;&lt;/sup&gt;, so that depending on the usage at hand, such scenarios could be eliminated or put aside for specific stress-testing.&lt;/p&gt;

    &lt;p&gt;Figure 4 illustrates the variety of scenarios in another way, by looking at the correlation between the logarithmic returns of the SPY ETF and of its 1000 alternatives, which approximatively ranges from -0.4 to 1.&lt;/p&gt;

    &lt;figure&gt;
    &lt;a href=&quot;/assets/images/blog/more-bootstrap-simulations-spy-replicas-correlation.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/more-bootstrap-simulations-spy-replicas-correlation-small.png&quot; alt=&quot;Figure 4. Distribution of the correlation between the SPY ETF price history and 1000 alternative price histories, autoregressive online bootstrap, 1st January 2025 - 31th December 2025.&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Figure 4. Distribution of the correlation between the SPY ETF price history and 1000 alternative price histories, autoregressive online bootstrap, 1st January 2025 - 31th December 2025.&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;alternative-price-histories-for-the-spy-and-tlt-etfs&quot;&gt;Alternative price histories for the SPY and TLT ETFs&lt;/h3&gt;

&lt;p&gt;Figure 5 illustrates the behaviour of the autoregressive online boostrap in terms of bivariate correlation when applied to the logarithmic returns of both the SPY and TLT ETFs over the period 1st January 2025 - 31th December 2025&lt;sup id=&quot;fnref:8:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;figure&gt;
  &lt;a href=&quot;/assets/images/blog/more-bootstrap-simulations-spy-tlt-replicas-correlation.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/more-bootstrap-simulations-spy-tlt-replicas-correlation-small.png&quot; alt=&quot;Figure 5. Distribution of the correlation between the two components of 1000 alternative price histories for the SPY and TLT ETFs, autoregressive online bootstrap, 1st January 2025 - 31th December 2025.&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Figure 5. Distribution of the correlation between the two components of 1000 alternative price histories for the SPY and TLT ETFs, autoregressive online bootstrap, 1st January 2025 - 31th December 2025.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;From Figure 5, it appears that the whole range of admissible correlations $[-1,1]$ is achievable through the autoregressive online bootstrap, with the majority of scenarios falling in the interval $[-0.25, 0.50]$.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;The autoregressive online bootstrap introduced in Palm and Nagler&lt;sup id=&quot;fnref:2:19&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; and detailled in this blog post provides an alternative to classical block bootstrap techniques for time series that is specifically taylored to streaming data.&lt;/p&gt;

&lt;p&gt;Don’t hesitate to experiment with it and evaluate whether it could replace your current boostrap methodology!&lt;/p&gt;

&lt;p&gt;As usual, feel also free to &lt;a href=&quot;https://www.linkedin.com/in/roman-rubsamen/&quot;&gt;connect with me on LinkedIn&lt;/a&gt; or to &lt;a href=&quot;https://twitter.com/portfoliooptim&quot;&gt;follow me on Twitter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;–&lt;/p&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3964908&quot;&gt;Anarkulova, Aizhan and Cederburg, Scott and O’Doherty, Michael S., The Long-Horizon Returns of Stocks, Bonds, and Bills: Evidence from a Broad Sample of Developed Markets (November 15, 2021)&lt;/a&gt;. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://proceedings.mlr.press/v238/palm24a/palm24a.pdf&quot;&gt;Nicolai Palm, Thomas Nagler; An Online Bootstrap for Time Series; Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:190-198&lt;/a&gt;. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:2:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;11&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;12&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:12&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;13&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:13&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;14&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:14&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;15&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:15&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;16&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:16&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;17&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:17&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;18&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:18&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;19&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:19&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;20&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://link.springer.com/book/10.1007/978-1-4757-2545-2&quot;&gt;Van Der Vaart, A. W. and Wellner, J. A. Weak convergence. In Weak convergence and empirical processes, pp. 16–28. Springer, 1996&lt;/a&gt;. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:3:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:4&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The observations $X_1, …, X_n$ can be observations of random variables or of random vectors. &lt;a href=&quot;#fnref:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:5&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;It is also possible that for computational reasons - like a huge $n$ - &lt;em&gt;a complete data set is available from the start&lt;/em&gt;&lt;sup id=&quot;fnref:2:20&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, but is processed &lt;em&gt;sequentially or in batches&lt;/em&gt;&lt;sup id=&quot;fnref:2:21&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:6&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Like the variance, c.f. Palm and Nagler&lt;sup id=&quot;fnref:2:22&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:7&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.research-collection.ethz.ch/entities/publication/17842c19-7fae-4937-b759-6c13e857d8eb&quot;&gt;Buhlmann, P. L. (1993). The blockwise bootstrap in time series and empirical processes. PhD thesis, ETH Zurich&lt;/a&gt;. &lt;a href=&quot;#fnref:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:9&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;In practice, the generation of such alternative price histories would typically be integrated into a backtesting engine. &lt;a href=&quot;#fnref:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:8&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;(Adjusted) prices of the SPY and TLT ETFs have have been retrieved using &lt;a href=&quot;https://api.tiingo.com/&quot;&gt;Tiingo&lt;/a&gt;. &lt;a href=&quot;#fnref:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:8:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:8:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:10&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Who could have predicted the price action of gold and silver on 30th January 2026? &lt;a href=&quot;#fnref:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name>Roman R.</name></author><category term="bootstrap" /><category term="monte carlo" /><summary type="html">In a previous article, I described several classical bootstrap techniques — i.i.d. bootstrap, circular block bootstrap, and stationary block bootstrap — and showed how the stationary block bootstrap could be used to simulate future price paths for financial assets by following the methodology of Anarkulova et al.1. In this blog post, I will detail another bootstrap technique called the autoregressive online bootstrap2 and introduced in Palm and Nagler2, that is best described as a multiplier bootstrap3 coupled with an autoregressive sequence of weights specifically chosen to make it useable with streaming time series data. As an example of usage, I will simulate alternative price histories for the SPY and TLT ETFs, which are ETFs representative of the US stock market and of the long-term US Treasury bonds. Mathematical preliminaries Let $X_1, …, X_n$4, $n \geq 1$ be a sample of data observed from a population. Limitations of classical bootstrap techniques in an online setting In a typical5 online setting, the length $n$ of the sample of data $X_1, …, X_n$ is growing infinitely, with new events [that] are observed at the moment they occur2. In such a setting, the classical bootstrap techniques described in the previous blog post on bootstrap simulations require all data observed so far2 at any given point in time. Indeed: The i.i.d. bootstrap, by definition, requires keeping track of the entire observed sample $ {X_1, …, X_n} $2. The circular and the stationary block bootstrap require all blocks […] to increase in size with $n$2, so that to compute the bootstrap in practice, the entire data set $ {X_1, …, X_n} $ needs to be kept in memory and processed fully, every time the block size changes2. Depending on the domain of application, this can be a serious computational limitation when: n is large n is moderately large but the underlying $n$ observations require a lot of computer memory to be stored n is moderately large but the computation time is limited (e.g. real-time applications) The multiplier bootstrap The multiplier bootstrap3 is a general class of bootstrapping schemes based on perturbations of the original observations with suitable weights2. In other words, compared to classical bootstrap techniques, the multiplier bootstrap replaces the idea of “randomly resampling observations” with the idea of “randomly reweighting observations”, which enables it to be applied in an online setting. For i.i.d. data $X_1, …, X_n$, it is defined as follows: Let $V_1, …, V_n$ be $n$ i.i.d. random variables of unit mean/variance. Let $ \bar V_i = \frac{1}{i} \sum_{j=1}^{i} V_j $ the running mean of these random variables, which can be computed recursively through the formula \[\bar V_i = \frac{(i-1) \bar V_{i-1} + V_i }{i}\] The multiplier bootstrap samples $ X_1^*,…,X_n^* $ are then defined as \[X_i^* = \frac{V_i}{\bar V_i} X_i\] The autoregressive online bootstrap Methodology The autoregressive online bootstrap2 is a specific instance of the multiplier bootstrap that generates a sequence of random weights evolving according to an autoregressive process centered around 1: \[V_i = 1 + \rho_i \left( V_{i-1} - 1 \right) + \sqrt{1 - \rho_i^2} \zeta_i, i=1..n\] , with: $V_0 = 0$ $\rho_i = 1 - i^{-\beta}$, $ 0 &amp;lt; \beta &amp;lt; \frac{1}{2}$ $\zeta_i \sim \mathcal N(0,1), i=1..n$ The associated autoregressive online bootstrap samples $ X_1^*,…,X_n^* $ are then defined as \[X_i^* = \frac{V_i}{\bar V_i} X_i\] Rationale Palm and Nagler2 notes that for the multiplier bootstrap to remain valid for time series: The dependencies between weights $V_i$ and $V_j$ must increase with the sample size $n$, but at the same time remain almost independent when the time gap $|i-j|$ is sufficiently large compared to $n$2. A scaling of the weights by their arithmetic mean is also necessary2 This is what they propose with the autoregressive online bootstrap technique. Properties The main result of Palm and Nagler2 is that under mild conditions on the original data $X_1, …, X_n$, the autoregressive online bootstrap is a consistent resampling scheme for the mean and for any continously differentiable transformation of the mean6 of an univariate or a multivariate time series. How to select the parameter $\beta$? The parameter $\beta$ controls the behaviour of the autoregressive online bootstrap samples in a way similar to how the block length controls the behaviour of the block bootstrap samples in classical block bootstrap techniques. Intuitively: A small value of $\beta$ leads to slowly changing weights and thus to long stretches of bootstrap samples that are consistent with the original observations, although up- or down-weighted. This is similar in spirit to having a long block size in a block bootstrap technique. On the contrary, a large value of $\beta$ leads to quickly changing weights and thus to rapidly varying bootstrap samples. This is similar in spirit to having a short block size in a block bootstrap technique. Palm and Nagler2 demonstrates that $\beta_{opt} = \sqrt{2} - 1$ allows for an optimal bias-variance trade-off2, so that unless there is a specific need to play with this parameter, using that value should be the default choice. Practical performances Palm and Nagler2 compares the practical performances of the autoregressive online bootstrap to an i.i.d. multiplier bootstrap and to a block bootstrap technique called the moving average block bootstrap7. In particular, Palm and Nagler2 shows that: The two time series bootstraps achieve approximately correct coverage in all [studied] scenarios, even in the presence of nonlinear [AR(2)-GARCH(1,1)] dependencies2 corresponding to a stochastic volatility process. The autoregressive online bootstrap allows for cheap online updates - in constant time, as illustrated in Figure 1 - contrary to the moving average block bootstrap. Figure 1. Computation time per online update of 200 bootstrap samples as the bootstrap techniques progress through a stream of 2000 samples. Source: Palm and Nagler. The autoregressive online bootstrap has a slightly higher variance compared to the moving average block bootstrap, which is the cost [to] pay for its computational advantage2. Caveats One important limitation of the autoregressive online bootstrap is that the weights $V_i$,$i=1..n$ can be negative. For this reason, in finance, logarithmic asset returns should then be prefered to either asset prices (since they could become negative) or arithmetic asset returns (singe they could become &amp;lt; -100%). Implementations Implementation in Portfolio Optimizer Portfolio Optimizer implements the autoregressive online bootstrap through the endpoint /assets/returns/simulation/bootstrap/online. Implementation elsewhere An implementation of the autoregressive online bootstrap in Python is available at https://github.com/nicolaipalm/online-bootstrap-implementation by its authors. Example of usage - Simulation of alternative price histories for ETFs As an example of usage, I propose to use the autoregressive online bootstrap to generate alternative price histories for the SPY and TLT ETFs and compute a couple of associated descriptive statistics8. Alternative price histories for the SPY ETF Figure 2 illustrates 10 synthetic price histories generated by applying the autoregressive online bootstrap to the logarithmic returns of the SPY ETF over the period 1st January 2025 - 31th December 20259. Figure 2. Alternative price histories for the SPY ETF, autoregressive online bootstrap, 1st January 2025 - 31th December 2025. On Figure 2, it is visible that the autoregressive online bootstrap allows for a wide variety of scenarios to be generated. This is confirmed in Figure 3, which depicts the distributions of the first four moments of the logarithmic returns of 1000 synthetic price histories generated by applying the autoregressive online bootstrap to the logarithmic returns of the SPY ETF over the period 1st January 2025 - 31th December 20259. Figure 3. Distributions of the first four moments of 1000 alternative price histories for the SPY ETF, autoregressive online bootstrap, 1st January 2025 - 31th December 2025. Figure 3 empirically demonstrates two points: It confirms the mean-preservation property of the autoregressive online bootstrap. As a side note, in case this is a problem for the usage at hand, it is always possible to alter the mean of the generated scenarios in a perfectly controlled way, c.f. a previous blog post. It shows that the autoregressive online bootstrap can generate scenarios with a wildly different standard deviation/skewness/kurtosis compared to the original one. Here, to be noted that some scenarios might seem implausible, with a kurtosis &amp;gt; 100 for example. Nevertheless, implausible does not equal impossible10, so that depending on the usage at hand, such scenarios could be eliminated or put aside for specific stress-testing. Figure 4 illustrates the variety of scenarios in another way, by looking at the correlation between the logarithmic returns of the SPY ETF and of its 1000 alternatives, which approximatively ranges from -0.4 to 1. Figure 4. Distribution of the correlation between the SPY ETF price history and 1000 alternative price histories, autoregressive online bootstrap, 1st January 2025 - 31th December 2025. Alternative price histories for the SPY and TLT ETFs Figure 5 illustrates the behaviour of the autoregressive online boostrap in terms of bivariate correlation when applied to the logarithmic returns of both the SPY and TLT ETFs over the period 1st January 2025 - 31th December 20259. Figure 5. Distribution of the correlation between the two components of 1000 alternative price histories for the SPY and TLT ETFs, autoregressive online bootstrap, 1st January 2025 - 31th December 2025. From Figure 5, it appears that the whole range of admissible correlations $[-1,1]$ is achievable through the autoregressive online bootstrap, with the majority of scenarios falling in the interval $[-0.25, 0.50]$. Conclusion The autoregressive online bootstrap introduced in Palm and Nagler2 and detailled in this blog post provides an alternative to classical block bootstrap techniques for time series that is specifically taylored to streaming data. Don’t hesitate to experiment with it and evaluate whether it could replace your current boostrap methodology! As usual, feel also free to connect with me on LinkedIn or to follow me on Twitter. – See Anarkulova, Aizhan and Cederburg, Scott and O’Doherty, Michael S., The Long-Horizon Returns of Stocks, Bonds, and Bills: Evidence from a Broad Sample of Developed Markets (November 15, 2021). &amp;#8617; See Nicolai Palm, Thomas Nagler; An Online Bootstrap for Time Series; Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:190-198. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 &amp;#8617;10 &amp;#8617;11 &amp;#8617;12 &amp;#8617;13 &amp;#8617;14 &amp;#8617;15 &amp;#8617;16 &amp;#8617;17 &amp;#8617;18 &amp;#8617;19 &amp;#8617;20 See Van Der Vaart, A. W. and Wellner, J. A. Weak convergence. In Weak convergence and empirical processes, pp. 16–28. Springer, 1996. &amp;#8617; &amp;#8617;2 The observations $X_1, …, X_n$ can be observations of random variables or of random vectors. &amp;#8617; It is also possible that for computational reasons - like a huge $n$ - a complete data set is available from the start2, but is processed sequentially or in batches2. &amp;#8617; Like the variance, c.f. Palm and Nagler2. &amp;#8617; See Buhlmann, P. L. (1993). The blockwise bootstrap in time series and empirical processes. PhD thesis, ETH Zurich. &amp;#8617; In practice, the generation of such alternative price histories would typically be integrated into a backtesting engine. &amp;#8617; (Adjusted) prices of the SPY and TLT ETFs have have been retrieved using Tiingo. &amp;#8617; &amp;#8617;2 &amp;#8617;3 Who could have predicted the price action of gold and silver on 30th January 2026? &amp;#8617;</summary></entry><entry><title type="html">Completing a Correlation Matrix: Another Problem from Finance</title><link href="https://portfoliooptimizer.io/blog/completing-a-correlation-matrix-another-problem-from-finance/" rel="alternate" type="text/html" title="Completing a Correlation Matrix: Another Problem from Finance" /><published>2026-01-06T00:00:00-06:00</published><updated>2026-01-06T00:00:00-06:00</updated><id>https://portfoliooptimizer.io/blog/completing-a-correlation-matrix-another-problem-from-finance</id><content type="html" xml:base="https://portfoliooptimizer.io/blog/completing-a-correlation-matrix-another-problem-from-finance/">&lt;p&gt;&lt;a href=&quot;/blog/computing-the-nearest-correlation-matrix-a-problem-from-finance/&quot;&gt;The previous post&lt;/a&gt; of this series on mathematical problems related to &lt;a href=&quot;https://en.wikipedia.org/wiki/Correlation_and_dependence#Correlation_matrices&quot;&gt;correlation matrices&lt;/a&gt; introduced 
&lt;em&gt;the nearest correlation matrix problem&lt;/em&gt;&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, which consists in determining the closest&lt;sup id=&quot;fnref:18&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:18&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; valid correlation matrix to an approximate correlation matrix&lt;/p&gt;

&lt;p&gt;In this blog post, I will now describe &lt;em&gt;the correlation matrix completion problem&lt;/em&gt;, which consist in filling in the missing coefficients of a partially specified correlation matrix in order to produce a valid correlation matrix.&lt;/p&gt;

&lt;p&gt;As noted in Dreyer et al.&lt;sup id=&quot;fnref:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, &lt;em&gt;such matrices often arise in financial applications when the number of stochastic variables becomes large or when several smaller models are combined in a larger model&lt;/em&gt;&lt;sup id=&quot;fnref:6:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;After a couple of reminders on correlation matrices, I will detail the mathematical formulation of the correlation matrix completion problem, discuss its exact and approximate solutions and I will finally illustrate how this problem 
naturally appears when working with &lt;a href=&quot;/blog/capital-market-assumptions-combining-institutions-forecasts-for-improved-accuracy/&quot;&gt;capital market assumptions from financial institutions&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;mathematical-preliminaries&quot;&gt;Mathematical preliminaries&lt;/h2&gt;

&lt;p&gt;Let $n$ be a number of assets.&lt;/p&gt;

&lt;h3 id=&quot;correlation-matrices&quot;&gt;Correlation matrices&lt;/h3&gt;

&lt;p&gt;Let $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ be a square matrix.&lt;/p&gt;

&lt;p&gt;$C$ is a correlation matrix if and only if&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$C$ is symmetric: $C {}^t = C$&lt;/li&gt;
  &lt;li&gt;$C$ is unit diagonal: $C_{i,i} = 1$, $i=1..n$&lt;/li&gt;
  &lt;li&gt;$C$ is &lt;a href=&quot;https://en.wikipedia.org/wiki/Positive_semidefinite_matrix&quot;&gt;positive semi-definite&lt;/a&gt;, that is, $x {}^t C x \geqslant 0, \forall x \in \mathbb{R}^n $&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;the-sets-of-correlation-matrices&quot;&gt;The sets of correlation matrices&lt;/h3&gt;

&lt;p&gt;Let be the following convex sets:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$\mathcal{S}^n_{+} = \{ X \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ such that $X {}^t = X$ and $X$ is positive semi-definite $\}$&lt;/li&gt;
  &lt;li&gt;$\mathcal{S}^n_{++} = \{ X \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ such that $X {}^t = X$ and $X$ is positive definite $\}$&lt;/li&gt;
  &lt;li&gt;$\mathcal{E}^n = \{ X \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ such that $X {}^t = X$ and $x_{ii} = 1, i = 1,…,n \}$&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The set of correlation matrices is the convex compact set $\mathcal{S}^n_{+} \cap \mathcal{E}^n$.&lt;/li&gt;
  &lt;li&gt;The set of invertible correlation matrices is the open bounded convex set $\mathcal{S}^n_{++} \cap \mathcal{E}^n$.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a geometrical side note, the set of vectorized correlation matrices defines a subset of the unit hypercube in $\mathbb{R}^n$ called &lt;em&gt;the elliptope&lt;/em&gt;, 
c.f. for example &lt;a href=&quot;https://www.convexoptimization.com/dattorro/elliptope_and_fantope.html&quot;&gt;the website Convex Optimization&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;partial-correlation-matrices-completion&quot;&gt;Partial correlation matrices completion&lt;/h3&gt;

&lt;p&gt;A partial correlation matrix $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ is a symmetric unit diagonal matrix&lt;sup id=&quot;fnref:19&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:19&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; whose coefficients $c_{ij} = c_{ji}$ are not all specified.&lt;/p&gt;

&lt;p&gt;A partial correlation matrix is said to be partial positive (semi-)definite when its specified &lt;a href=&quot;https://en.wikipedia.org/wiki/Minor_(linear_algebra)&quot;&gt;principal minors&lt;/a&gt; 
are positive (semi-)definite.&lt;/p&gt;

&lt;p&gt;A positive semi-definite completion - or simply a completion - of a partial correlation matrix $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ is a correlation matrix $C^* \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ 
such that $c_{ij}^{*} = c_{ij}$ whenever the coefficient $c_{ij}$ is specified in $C$.&lt;/p&gt;

&lt;p&gt;A positive definite completion of a partial correlation matrix $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ is a positive definite correlation matrix $C^* \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ 
such that $c_{ij}^{*} = c_{ij}$ whenever the coefficient $c_{ij}$ is specified in $C$.&lt;/p&gt;

&lt;h3 id=&quot;undirected-graph-associated-to-a-correlation-matrix&quot;&gt;Undirected graph associated to a correlation matrix&lt;/h3&gt;

&lt;p&gt;Let $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ be a partial correlation matrix.&lt;/p&gt;

&lt;p&gt;It is possible&lt;sup id=&quot;fnref:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; to associate &lt;a href=&quot;https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)#Graph&quot;&gt;an undirected graph&lt;/a&gt; $G = \left( V, E \right)$ to $C$, defined as follows:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$V = \{ 1, 2, …, n \}$ is the set of vertices&lt;/li&gt;
  &lt;li&gt;$E$ is the set of edges, with $(i,j) \in E$ whenever the coefficient $c_{ij}$ is specified, with $i \ne j$.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The graph $G$ is then said to be &lt;em&gt;chordal&lt;/em&gt; if &lt;em&gt;every cycle of length greater than or equal to 4 has a chord, which is an edge that is not part of the cycle but connects two vertices of the cycle&lt;/em&gt;&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;As a reminder, &lt;a href=&quot;https://en.wikipedia.org/wiki/Cycle_(graph_theory)&quot;&gt;a cycle&lt;/a&gt; in G is a sequence of $k \geq 3$ pairwise distinct vertices $\left( v_1, …, v_s \right)$ such that 
$\left( v_1, v_2 \right)$, $\left( v_2, v_3 \right)$, …, $\left( v_{k-1} v_k \right)$, $\left( v_k v_1 \right)$ $\in E$, with $k$ called the length of the cycle.&lt;/p&gt;

&lt;h2 id=&quot;the-correlation-matrix-completion-problem&quot;&gt;The correlation matrix completion problem&lt;/h2&gt;

&lt;h3 id=&quot;problem-formulation&quot;&gt;Problem formulation&lt;/h3&gt;

&lt;p&gt;Let be:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ a partial correlation matrix.&lt;/li&gt;
  &lt;li&gt;$\mathcal{N}$ the index set of the specified off-diagonal elements of $C$.&lt;/li&gt;
  &lt;li&gt;$\mathcal{U}^n = \{ X \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ such that $x_{ij}=c_{ij}$ for $(i,j) \in \mathcal{N} \}$&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The correlation matrix completion problem is the problem of finding a correlation matrix $C^* \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ that completes the matrix $C$, that is, finding $C^* \in \mathcal{S}^n_{+} \cap \mathcal{U}^n$.&lt;/li&gt;
  &lt;li&gt;The positive definite correlation matrix completion problem is the problem of finding a positive definite correlation matrix $C^* \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ that completes the matrix $C$, that is, finding $C^* \in \mathcal{S}^n_{++} \cap \mathcal{U}^n$.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;existence-of-solutions&quot;&gt;Existence of solutions&lt;/h3&gt;

&lt;p&gt;Contrary to &lt;a href=&quot;/blog/computing-the-nearest-correlation-matrix-a-problem-from-finance/&quot;&gt;the nearest correlation matrix problem&lt;/a&gt;, the correlation matrix completion problem does not always admit a solution.&lt;/p&gt;

&lt;p&gt;Indeed, an already necessary condition for a completion to exist is for the partial correlation matrix to be partial positive semi-definite&lt;sup id=&quot;fnref:8:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;So, for example, the following partial correlation matrix does not admit any completion:&lt;/p&gt;

\[C_1 = 
\begin{pmatrix}
    1 &amp;amp; -1 &amp;amp; -1 &amp;amp; -1 \\
    -1 &amp;amp; 1 &amp;amp; -1 &amp;amp; -1 \\
    -1 &amp;amp; -1 &amp;amp; 1 &amp;amp; . \\
    -1 &amp;amp; -1 &amp;amp; . &amp;amp; 1 
\end{pmatrix}\]

&lt;p&gt;Beyond this necessary condition, Fiedler&lt;sup id=&quot;fnref:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:9&quot; class=&quot;footnote&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;, Grone et al.&lt;sup id=&quot;fnref:8:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; and Smith&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt; all establish the following result&lt;sup id=&quot;fnref:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:10&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;A partial positive semi-definite correlation matrix is completable regardless of the values of the specified correlations if and only if the undirected graph associated to these specified correlations is &lt;a href=&quot;https://en.wikipedia.org/wiki/Chordal_graph&quot;&gt;&lt;em&gt;chordal&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To be noted that when that graph is not chordal, nothing can be said in general because the existence of a completion then depends on the exact values of the specified correlations.&lt;/p&gt;

&lt;p&gt;As an illustration:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Let $C_2 \in \mathcal{M} \left( \mathbb{R}^{4 \times 4} \right)$ be the following partial positive definite correlation matrix:&lt;/p&gt;

\[C_2 = 
  \begin{pmatrix}
      1 &amp;amp; 1 &amp;amp; . &amp;amp; 1 \\
      1 &amp;amp; 1 &amp;amp; 1 &amp;amp; . \\
      . &amp;amp; 1 &amp;amp; 1 &amp;amp; . \\
      1 &amp;amp; . &amp;amp; . &amp;amp; 1 
  \end{pmatrix}\]

    &lt;p&gt;The undirected graph associated to that partial correlation matrix is the graph with the set of vertives $V = \{ 1, 2, 3, 4 \}$ and the set of edges $G = \{ (1,2), (1,4), (2,3) \}$, depicted in Figure 1.&lt;/p&gt;

    &lt;figure&gt;
    &lt;a href=&quot;/assets/images/blog/correlation-matrix-completion-chordal-graph-example.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/correlation-matrix-completion-chordal-graph-example-small.png&quot; alt=&quot;Figure 1. Undirected graph associated to the partial correlation matrix C2.&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Figure 1. Undirected graph associated to the partial correlation matrix C2.&lt;/figcaption&gt;
  &lt;/figure&gt;

    &lt;p&gt;From Figure 1, that graph is chordal, because there are no cycles of length &amp;gt;= 4.&lt;/p&gt;

    &lt;p&gt;Consequently, the partial correlation matrix $C_2$ admit a positive definite completion and the existence of that completion does actually even not depend on the values of the specified correlations (here, all 1s).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Let now $C_3 \in \mathcal{M} \left( \mathbb{R}^{4 \times 4} \right)$ be the following partial positive semi-definite correlation matrix:&lt;/p&gt;

\[C_3 = 
  \begin{pmatrix}
      1 &amp;amp; 1 &amp;amp; . &amp;amp; 0 \\
      1 &amp;amp; 1 &amp;amp; 1 &amp;amp; . \\
      . &amp;amp; 1 &amp;amp; 1 &amp;amp; 1 \\
      0 &amp;amp; . &amp;amp; 1 &amp;amp; 1 
  \end{pmatrix}\]

    &lt;p&gt;The undirected graph associated to that partial correlation matrix is the graph with the set of vertives $V = \{ 1, 2, 3, 4 \}$ and the set of edges $G = \{ (1,2), (1,4), (2,3), (3,4) \}$, depicted in Figure 2.&lt;/p&gt;

    &lt;figure&gt;
    &lt;a href=&quot;/assets/images/blog/correlation-matrix-completion-chordal-graph-example-2.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/correlation-matrix-completion-chordal-graph-example-2-small.png&quot; alt=&quot;Figure 2. Undirected graph associated to the partial correlation matrix C3.&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Figure 2. Undirected graph associated to the partial correlation matrix C3.&lt;/figcaption&gt;
  &lt;/figure&gt;

    &lt;p&gt;From Figure 2, that graph is not chordal, because there is only one cycle of length 4 and that cycle does not have any chord.&lt;/p&gt;

    &lt;p&gt;Consequently, the partial correlation matrix $C_3$ might or might not admit a completion, nothing can be said at this stage.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the interested reader, a couple of other theoretical results can be found in Fiedler&lt;sup id=&quot;fnref:9:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:9&quot; class=&quot;footnote&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;, which also notes that &lt;em&gt;the general [completion] problem […] seems to be difficult&lt;/em&gt;&lt;sup id=&quot;fnref:9:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:9&quot; class=&quot;footnote&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h3 id=&quot;unicity-of-solutions&quot;&gt;Unicity of solutions&lt;/h3&gt;

&lt;p&gt;Again contrary to the nearest correlation matrix problem, the correlation matrix completion problem does not generally admit a unique solution when one exists.&lt;/p&gt;

&lt;p&gt;Indeed, Grone et al.&lt;sup id=&quot;fnref:8:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; establishes that the set of all positive semi-definite completions of a partial correlation matrix is in general a convex compact set and not a singleton.&lt;/p&gt;

&lt;p&gt;This result can be illustrated with one and two missing correlations:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;One missing correlation&lt;/p&gt;

    &lt;p&gt;The partial correlation matrix $C_4 = \begin{pmatrix}  1 &amp;amp; . \\ . &amp;amp; 1 \end{pmatrix} $ has one missing correlation.&lt;/p&gt;

    &lt;p&gt;By introducing the variable $x$ representing that missing correlation, a completion is a valid completion if and only if it has a positive or null determinant, that is, $\det(x) = 1 - x^2 \geq 0$.&lt;/p&gt;

    &lt;p&gt;That condition being equivalent to the condition $x \in [-1,1]$, all correlation matrices of the form $ C^*_4(x) = \begin{pmatrix} 1 &amp;amp; x \\ x &amp;amp; 1 \end{pmatrix} $ with $x \in [-1,1]$ are completions of $C_4$.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Two missing correlations&lt;/p&gt;

    &lt;p&gt;The partial correlation matrix $C_5 \in \mathcal{M} \left( \mathbb{R}^{5 \times 5} \right)$ represented in Figure 3 has two missing correlations.&lt;/p&gt;

    &lt;figure&gt;
    &lt;a href=&quot;/assets/images/blog/correlation-matrix-completion-example-matrix-georgescu.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/correlation-matrix-completion-example-matrix-georgescu-small.png&quot; alt=&quot;Figure 3. Partial correlation matrix with 2 missing correlations. Source: Georgescu et al.&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Figure 3. Partial correlation matrix with 2 missing correlations. Source: Georgescu et al.&lt;/figcaption&gt;
  &lt;/figure&gt;

    &lt;p&gt;Because all the specified sub-matrices of $C_5$ are positive semi-definite, a completion is again a valid completion if and only if it has a positive or null determinant.&lt;/p&gt;

    &lt;p&gt;By introducing the variables $x$ and $y$ representing the two missing correlations, the determinant of $C_5$ can be factored as $ \det(x,y) $ $ \approx$ $ −0.18 \left( 0.67x^2 +0.75y^2 −xy −0.32x−0.24y+0.17 \right)$.&lt;/p&gt;

    &lt;p&gt;Non-negativity of that determinant is then equivalent to the condition&lt;/p&gt;

\[0.67x^2 +0.75y^2 −xy −0.32x−0.24y+0.17 \leq 0\]

    &lt;p&gt;, which defines the 2-dimensional ellipse (and its interior) depicted in Figure 4.&lt;/p&gt;

    &lt;figure&gt;
    &lt;a href=&quot;/assets/images/blog/correlation-matrix-completion-example-matrix-georgescu-feasible-region.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/correlation-matrix-completion-example-matrix-georgescu-feasible-region-small.png&quot; alt=&quot;Figure 4. Feasible region for Georgescu et al.'s partial correlation matrix completions.&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Figure 4. Feasible region for Georgescu et al.'s partial correlation matrix completions.&lt;/figcaption&gt;
  &lt;/figure&gt;

    &lt;p&gt;All correlation matrices $C_5^{*}(x,y) \in \mathcal{M} \left( \mathbb{R}^{5 \times 5} \right)$ with the same correlations as $C_4$, a correlation $c^{*}_{34}$ $=$ $x$ and a correlation $c^{*}_{35}$ $=$ $y$ - with $x,y$ satisfying the above relationship - are thus completions of $C_5$.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The existence of infinitely many completions&lt;sup id=&quot;fnref:20&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:20&quot; class=&quot;footnote&quot;&gt;10&lt;/a&gt;&lt;/sup&gt; naturally leads to trying to find &lt;em&gt;a best-estimate completion in some sense&lt;/em&gt;&lt;sup id=&quot;fnref:2:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;, which does exist and is called the &lt;em&gt;maximum determinant completion&lt;/em&gt;&lt;sup id=&quot;fnref:2:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h2 id=&quot;the-maximum-determinant-correlation-matrix-completion-problem&quot;&gt;The maximum determinant correlation matrix completion problem&lt;/h2&gt;

&lt;p&gt;Fiedler&lt;sup id=&quot;fnref:9:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:9&quot; class=&quot;footnote&quot;&gt;7&lt;/a&gt;&lt;/sup&gt; originally demonstrated that if a positive definite partial correlation matrix admits a completion, then, &lt;em&gt;there is a unique matrix in the (nonempty) class of all positive definite completions […] 
that has maximum determinant&lt;/em&gt;&lt;sup id=&quot;fnref:3:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;More recently, by introducing &lt;em&gt;a generalized determinant that gives the determinant of the nonsingular part of [a] matrix&lt;/em&gt;&lt;sup id=&quot;fnref:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;11&lt;/a&gt;&lt;/sup&gt;, Dreyer&lt;sup id=&quot;fnref:5:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;11&lt;/a&gt;&lt;/sup&gt; established a similar result for positive semi-definite partial correlation matrices.&lt;/p&gt;

&lt;p&gt;That completion, called &lt;em&gt;the maximum determinant completion&lt;/em&gt; - or &lt;em&gt;the Max-Det completion&lt;/em&gt; - has several intesting theoretical properties, which makes it an ideal candidate for guaranteeing the unicity of the completion of a correlation matrix.&lt;/p&gt;

&lt;h3 id=&quot;problem-formulation-1&quot;&gt;Problem formulation&lt;/h3&gt;

&lt;p&gt;Let be:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ a partial correlation matrix.&lt;/li&gt;
  &lt;li&gt;$\mathcal{N}$ the index set of the specified off-diagonal elements of $C$.&lt;/li&gt;
  &lt;li&gt;$\mathcal{U}^n = \{ X \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ such that $x_{ij}=c_{ij}$ for $(i,j) \in \mathcal{N} \}$&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;The maximum determinant correlation matrix completion problem&lt;/em&gt; can be cast as a generalization of &lt;a href=&quot;https://en.wikipedia.org/wiki/Semidefinite_programming&quot;&gt;a semidefinite programming problem&lt;/a&gt;:&lt;/p&gt;

\[C^* = \operatorname{argmax} \log ( \det (X) ) \text{ s.t. } X \in \mathcal{S}^n_{+} \cap \mathcal{E}^n \cap \mathcal{U}^n\]

&lt;p&gt;Assuming that a solution exist, c.f. the previous section, Grone et al.&lt;sup id=&quot;fnref:8:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; and Olvera Astivia&lt;sup id=&quot;fnref:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt; show using standard results from &lt;a href=&quot;https://en.wikipedia.org/wiki/Mathematical_optimization&quot;&gt;mathematical optimization theory&lt;/a&gt; 
that it is necessarily unique.&lt;/p&gt;

&lt;h3 id=&quot;mathematical-properties-of-the-solution&quot;&gt;Mathematical properties of the solution&lt;/h3&gt;

&lt;p&gt;Georgescu et al.&lt;sup id=&quot;fnref:2:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; summarizes the main properties of the maximum determinant correlation matrix completion:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;For the multivariate normal model, it maximizes the entropy of the distribution described by the matrix&lt;/p&gt;

    &lt;p&gt;van der Schans and Boer&lt;sup id=&quot;fnref:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt; comments this property as follows:&lt;/p&gt;

    &lt;blockquote&gt;
      &lt;p&gt;Since the already specified [correlations] imply a dependence between variables between which no dependence is specified and dependence reduces the amount of uncertainty in a system, the intuitive interpretation of entropy maximization is not introducing more dependence than is already implied by the already specified [correlations].&lt;/p&gt;
    &lt;/blockquote&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Again for the multivariate normal model, it maximizes the likelihood of the correlation matrix&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;It corresponds to &lt;em&gt;the analytic centre of the feasible region described by the positive semi-definiteness constraints&lt;/em&gt;&lt;sup id=&quot;fnref:2:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

    &lt;p&gt;In other words, for a given partial correlation matrix, its maximum determinant completion lies as “deep” as possible&lt;sup id=&quot;fnref:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:15&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt; inside the set of all its positive definite completions.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;computation-of-the-solution&quot;&gt;Computation of the solution&lt;/h3&gt;

&lt;p&gt;The original algorithmic approach to solving determinant maximization problems with matrix constraints has been to use interior point methods&lt;sup id=&quot;fnref:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:11&quot; class=&quot;footnote&quot;&gt;15&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;More recently, dual projected gradient methods have been developed that are better able to scale with the dimensionality of the problem&lt;sup id=&quot;fnref:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:13&quot; class=&quot;footnote&quot;&gt;16&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Nevertheless, &lt;em&gt;the drawbacks of these algorithms, from an application point of view, are that they are difficult to implement […], do not always converge if inconsistent starting [correlations] are specified and, finally, global optimization is too slow 
and memory consuming for large matrices&lt;/em&gt;&lt;sup id=&quot;fnref:12:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;More on “inconsistent starting correlations” later.&lt;/p&gt;

&lt;p&gt;For these reasons, it is &lt;em&gt;tempting to [simply] set the missing [correlations] to zero&lt;/em&gt;&lt;sup id=&quot;fnref:6:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, but &lt;em&gt;this approach has several shortcomings&lt;/em&gt;&lt;sup id=&quot;fnref:6:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, among which the fact that forcing unspecified correlations to zero&lt;sup id=&quot;fnref:14&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:14&quot; class=&quot;footnote&quot;&gt;17&lt;/a&gt;&lt;/sup&gt; might not be ideal for financial applications 
where most assets are correlated to each other.&lt;/p&gt;

&lt;p&gt;To illustrate this point, Figure 5 represents a partial correlation matrix with a block of missing correlations.&lt;/p&gt;

&lt;figure&gt;
  &lt;a href=&quot;/assets/images/blog/correlation-matrix-completion-example-matrix-georgescu-2.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/correlation-matrix-completion-example-matrix-georgescu-2-small.png&quot; alt=&quot;Figure 5. Partial correlation matrix with a block of missing correlations. Source: Georgescu et al.&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Figure 5. Partial correlation matrix with a block of missing correlations. Source: Georgescu et al.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;The maximum determinant completion of the block of missing correlations is&lt;sup id=&quot;fnref:2:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

\[E_1 = 
\begin{pmatrix}
0.1000 &amp;amp; 0.1500 &amp;amp; 0.0500 &amp;amp; 0.0750 \\
0.2400 &amp;amp; 0.3600 &amp;amp; 0.1200 &amp;amp; 0.1800 \\
0.2200 &amp;amp; 0.3300 &amp;amp; 0.1100 &amp;amp; 0.1650 \\
0.2600 &amp;amp; 0.3900 &amp;amp; 0.1300 &amp;amp; 0.1950 \\
0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0
\end{pmatrix}\]

&lt;p&gt;, while the completion of the same block of missing correlations obtained after setting them to zero and computing the nearest correlation matrix to the resulting full approximate correlation matrix is&lt;sup id=&quot;fnref:2:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

\[E_2 = 
\begin{pmatrix}
0.0022 &amp;amp; 0.0084 &amp;amp; 0.0004 &amp;amp; 0.0035 \\
0.0003 &amp;amp; 0.0011 &amp;amp; 0.0001 &amp;amp; 0.0005 \\
0.0025 &amp;amp; 0.0098 &amp;amp; 0.0005 &amp;amp; 0.0040 \\
0.0042 &amp;amp; 0.0164 &amp;amp; 0.0008 &amp;amp; 0.0067 \\
0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0
\end{pmatrix}\]

&lt;p&gt;Comparing the two sub-matrices $E_1$ and $E_2$, it is clear that the completed correlations of $E_2$ are much closer to zero than those of $E_1$, thus imposing an additional soft constraint 
of “closeness to zero” to the correlation matrix completion problem.&lt;/p&gt;

&lt;p&gt;Reverting to simple heuristics for the correlation matrix completion problem might thus be potentially dangerous, depending on the context.&lt;/p&gt;

&lt;p&gt;Hopefully, van der Schans and Boer&lt;sup id=&quot;fnref:12:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt; proposes an heuristic which does not suffer from these shortcomings and that:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Is fast, &lt;em&gt;which makes it suitable for applications in which the computation time is important&lt;/em&gt;&lt;sup id=&quot;fnref:12:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Tries to not introduce more dependence between variables than is implied by the initially specified [correlations]&lt;/em&gt;&lt;sup id=&quot;fnref:12:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;, contrary to the “setting missing correlations to zero” heuristic&lt;/li&gt;
  &lt;li&gt;Additionally &lt;em&gt;corrects for inconsistencies in the already specified [correlations]&lt;/em&gt;&lt;sup id=&quot;fnref:12:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;, but again, more on this later&lt;/li&gt;
  &lt;li&gt;Empirically produces completions with an &lt;em&gt;average correlation difference with the maxdet completion [that is] reasonable&lt;/em&gt;&lt;sup id=&quot;fnref:12:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A drawback of this heuristic, though, &lt;em&gt;is that it depends on the ordering of the rows and columns of the [correlation] matrix, i.e. the heuristic will yield a different result 
if first rows and columns of [the matrix] are interchanged before starting the procedure&lt;/em&gt;&lt;sup id=&quot;fnref:12:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;, which might or might not be acceptable in practice.&lt;/p&gt;

&lt;p&gt;As a side note, explicit solutions of the maximum determinant correlation matrix completion problem are known in specific cases, for example for &lt;em&gt;correlation matrices that follow an L-shaped, 
block diagonal pattern which is a common structure in insurance problems&lt;/em&gt;&lt;sup id=&quot;fnref:4:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt;, c.f. Georgescu et al.&lt;sup id=&quot;fnref:2:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;h2 id=&quot;the-infeasible-maximum-determinant-correlation-matrix-completion-problem&quot;&gt;The infeasible maximum determinant correlation matrix completion problem&lt;/h2&gt;

&lt;p&gt;In the previous section, a solution to the maximum determinant correlation matrix completion problem was assumed to exist.&lt;/p&gt;

&lt;p&gt;There are at least two practical problems with this assumption:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;A partial correlation matrix might theoretically admit a completion, but numerical round-off errors might prevent an algorithm from finding it&lt;/p&gt;

    &lt;p&gt;An example of such an ill-behaved matrix can be found in Glunt et al.&lt;sup id=&quot;fnref:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;18&lt;/a&gt;&lt;/sup&gt;, but it is a partial covariance matrix and not a partial correlation matrix…&lt;/p&gt;

    &lt;p&gt;Building a similar example using a partial correlation matrix is certainly doable, but I failed to do so quickly, so, no example here.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;More typically, &lt;em&gt;the initially specified [correlations] can be inconsistent in the sense that no valid completion exists&lt;/em&gt;&lt;sup id=&quot;fnref:12:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

    &lt;p&gt;Here, the partial correlation matrix $C_1$ perfectly illustrates this point, even though it is a rather extreme example.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In both cases, simply failing to compute a solution might be unacceptable (e.g. for downstream pipelines), so that something must be done.&lt;/p&gt;

&lt;p&gt;Unfortunately, &lt;em&gt;the currently existing literature does not focus on algorithms that both complete partially specified matrices and also correct for inconsistencies&lt;/em&gt;&lt;sup id=&quot;fnref:12:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In other words, there is no well-established formulation of what could be called &lt;em&gt;the infeasible maximum determinant correlation matrix completion problem&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;At minimum, what can be said is that any solution to this new problem must involve a trade-off between&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The closeness to the initial partial correlation matrix, for example in terms of the Frobenius distance&lt;/li&gt;
  &lt;li&gt;The value of the determinant&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Incidentally, this is exactly how the heuristic algorithm of van der Schans and Boer&lt;sup id=&quot;fnref:12:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt; is working: &lt;em&gt;the specified [correlations] are adjusted as little as possible&lt;/em&gt;&lt;sup id=&quot;fnref:12:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt; and 
&lt;em&gt;the introduced extra (conditional) dependence between variables is as little as possible&lt;/em&gt;&lt;sup id=&quot;fnref:12:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;When applied to the partial correlation matrix $C_1$, van der Schans and Boer’s algorithm&lt;sup id=&quot;fnref:12:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt; gives&lt;/p&gt;

\[C^*_1 = 
\begin{pmatrix}
    1 &amp;amp; -0.99 &amp;amp; -0.07 &amp;amp; -0.07 \\
    -0.99 &amp;amp; 1 &amp;amp; -0.07 &amp;amp; -0.07 \\
    -0.07 &amp;amp; -0.07 &amp;amp; 1 &amp;amp; 0.98 \\
    -0.07 &amp;amp; -0.07 &amp;amp; 0.98 &amp;amp; 1 
\end{pmatrix}\]

&lt;p&gt;As expected, the completed correlation matrix $C^*_1$ is very far from the partial correlation matrix $C_1$, with only one correlation close to its initial value of -1 (-0.99).&lt;/p&gt;

&lt;p&gt;Again, this is an extreme infeasible example, but it empirically demonstrates that adjusting the initially specified correlations, even &lt;em&gt;as little as possible&lt;/em&gt;&lt;sup id=&quot;fnref:12:14&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;, can lead to a 
completed correlation matrix that does not ressemble at all the initial partial correlation matrix.&lt;/p&gt;

&lt;h2 id=&quot;implementation-in-portfolio-optimizer&quot;&gt;Implementation in Portfolio Optimizer&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Portfolio Optimizer&lt;/strong&gt; implements two methods to complete a partial correlation matrix through the endpoint &lt;a href=&quot;https://docs.portfoliooptimizer.io/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/assets/correlation/matrix/completed&lt;/code&gt;&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;One proprietary method that guarantees to find the maximum determinant completion if it exists or that guarantees to minimally adjust - in terms of Frobenius distance - the partially specified correlation matrix so that a maximum determinant completion exists&lt;/li&gt;
  &lt;li&gt;The heuristic method of van der Schans and Boer&lt;sup id=&quot;fnref:12:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;, with additional tweaks to improve its numerical robustness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For comparison with van der Schans and Boer’s algorithm&lt;sup id=&quot;fnref:12:16&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;, the proprietary method of &lt;strong&gt;Portfolio Optimizer&lt;/strong&gt; applied to the partial correlation matrix $C_1$ gives:&lt;/p&gt;

\[C^{**}_1 = 
\begin{pmatrix}
    1 &amp;amp; -0.30 &amp;amp; -0.59 &amp;amp; -0.59 \\
    -0.30 &amp;amp; 1 &amp;amp; -0.59 &amp;amp; -0.59 \\
    -0.59 &amp;amp; -0.59 &amp;amp; 1 &amp;amp; 1 \\
    -0.59 &amp;amp; -0.59 &amp;amp; 1 &amp;amp; 1 
\end{pmatrix}\]

&lt;p&gt;Comparing the two completed matrices $C_1^{*}$ and $C_1^{**}$, it appears that the matrix $C_1^{**}$ is much&lt;sup id=&quot;fnref:16&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:16&quot; class=&quot;footnote&quot;&gt;19&lt;/a&gt;&lt;/sup&gt; closer to the partial correlation matrix $C_1$ than the matrix $C_1^{*}$, which is consistent with the expected behaviour of &lt;strong&gt;Portfolio Optimizer&lt;/strong&gt;.&lt;/p&gt;

&lt;h2 id=&quot;example-of-usage---completing-partial-correlation-matrices-from-financial-institutions&quot;&gt;Example of usage - Completing partial correlation matrices from financial institutions&lt;/h2&gt;

&lt;p&gt;Major financial institutions regularly provide forecasts of future risk/return characteristics for broad asset classes over the next 5 to 20 years, called &lt;em&gt;(Long Term) Capital Market Assumptions (LTCMA)&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;In addition to the future expected volatility, the risk forecasts sometimes also include future expected correlation matrices, which is for example the case with &lt;a href=&quot;https://am.jpmorgan.com/lu/en/asset-management/institutional/insights/portfolio-insights/ltcma/interactive-assumptions-matrices/&quot;&gt;J.P. Morgan&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Unfortunately, these correlation matrices are typically partially specified.&lt;/p&gt;

&lt;p&gt;In addition, even if they were fully specified, &lt;a href=&quot;/blog/capital-market-assumptions-combining-institutions-forecasts-for-improved-accuracy/&quot;&gt;combining capital market assumptions from several financial institutions&lt;/a&gt; would render them partial 
due to all institutions not covering the same asset classes…&lt;/p&gt;

&lt;p&gt;So, as an example of usage, I propose to compute the maximum determinant completion of the partial correlation matrix provided by &lt;a href=&quot;https://www.blackrock.com/institutions/en-global/institutional-insights/thought-leadership/capital-market-assumptions&quot;&gt;Blackrock&lt;/a&gt; 
as part of their November 2025 5-year capital market assumptions and represented in Figure 6.&lt;/p&gt;

&lt;figure&gt;
  &lt;a href=&quot;/assets/images/blog/correlation-matrix-completion-blackrock.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/correlation-matrix-completion-blackrock-small.png&quot; alt=&quot;Figure 6. Five-year expected future partial correlations between major asset classes, NZD currency, 13th November 2025. Source: Blackrock.&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Figure 6. Five-year expected future partial correlations between major asset classes, NZD currency, 13th November 2025. Source: Blackrock.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;The resulting maximum determinant completed correlation matrix is displayed in Figure 7.&lt;/p&gt;

&lt;figure&gt;
  &lt;a href=&quot;/assets/images/blog/correlation-matrix-completion-blackrock-completed.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/correlation-matrix-completion-blackrock-completed-small.png&quot; alt=&quot;Figure 7. Five-year expected future Max-Det completed correlations between major asset classes, NZD currency, 13th November 2025.&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Figure 7. Five-year expected future Max-Det completed correlations between major asset classes, NZD currency, 13th November 2025.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;I would like to mention that such an example of usage was inspired by &lt;a href=&quot;https://www.linkedin.com/posts/peterurbani_similarly-to-jp-morgan-httpsbitly3xbfmhb-activity-7397026488002224130-hRBZ?utm_source=share&amp;amp;utm_medium=member_desktop&amp;amp;rcm=ACoAADg4MJUBESNabpuXkl-TQnxmTK_DzOcFu0g&quot;&gt;a LinkedIn post from Peter Urbani&lt;/a&gt;, 
who proposes to complete Blackrock’s partial correlation matrix into the correlation matrix displayed in Figure 8 thanks to a 2-factor model built from the fully specified global equities/global bonds correlations.&lt;/p&gt;

&lt;figure&gt;
  &lt;a href=&quot;/assets/images/blog/correlation-matrix-completion-blackrock-completed-urbani.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/correlation-matrix-completion-blackrock-completed-urbani-small.png&quot; alt=&quot;Figure 8. Five-year expected future 2-factor model completed correlations between major asset classes, NZD currency, 13th November 2025. Source: Peter Urbani.&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Figure 8. Five-year expected future 2-factor model completed correlations between major asset classes, NZD currency, 13th November 2025. Source: Peter Urbani.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Comparing the “Max-Det” completed correlation matrix with Peter Urbani’s “2-factor model” completed correlation matrix , both matrices are perfectly identical&lt;sup id=&quot;fnref:17&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:17&quot; class=&quot;footnote&quot;&gt;20&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;As a closing (fun) remark, Olvera Astivia&lt;sup id=&quot;fnref:4:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt; highlights that &lt;em&gt;having half of the entries in a correlation matrix missing [can be] considered a rather extreme condition&lt;/em&gt;&lt;sup id=&quot;fnref:4:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;What could then be said about the initial Blackrock’s partial correlation matrix that has around 82% of missing entries?!&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;With this first blog post of 2026, my hope is that you have added to your quantitative toolbox a useful methodology &lt;em&gt;to obtain, in a mathematically principled way, values implied in correlational structure of the data, even if said data has not (or cannot) be obtained&lt;/em&gt;&lt;sup id=&quot;fnref:4:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In any cases, for more mathematics of correlation matrices, feel free to &lt;a href=&quot;https://www.linkedin.com/in/roman-rubsamen/&quot;&gt;connect with me on LinkedIn&lt;/a&gt; or to &lt;a href=&quot;https://twitter.com/portfoliooptim&quot;&gt;follow me on Twitter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;–&lt;/p&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;&lt;a href=&quot;https://academic.oup.com/imajna/article-abstract/22/3/329/708688&quot;&gt;Nicholas J. Higham, Computing the Nearest Correlation Matrix—A Problem from Finance, IMA J. Numer. Anal. 22, 329–343, 2002.&lt;/a&gt;. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:18&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;In terms of &lt;a href=&quot;https://en.wikipedia.org/wiki/Matrix_norm#Frobenius_norm&quot;&gt;the Frobenius distance&lt;/a&gt;. &lt;a href=&quot;#fnref:18&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:6&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2111.12640&quot;&gt;Olaf Dreyer, Horst Kohler, Thomas Streuer, Completing correlation matrices, arXiv&lt;/a&gt;. &lt;a href=&quot;#fnref:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:6:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:6:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:6:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:19&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;That is, belonging to the set $\mathcal{E}^n$. &lt;a href=&quot;#fnref:19&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:8&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;&lt;a href=&quot;https://doi.org/10.1016/0024-3795(84)90207-6&quot;&gt;R. Grone, C.R. Johnson, E. Sa, H. Wolkowicz, Positive definite completions of partial Hermitian matrices, Linear Algebra Appl. 58 (1984) 109–124&lt;/a&gt;. &lt;a href=&quot;#fnref:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:8:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:8:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:8:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:8:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;&lt;a href=&quot;https://royalsocietypublishing.org/rsos/article/5/3/172348/87513/Explicit-solutions-to-correlation-matrix&quot;&gt;Georgescu DI, Higham NJ, Peters GW. 2018 Explicit solutions to correlation matrix completion problems, with an application to risk management and insurance&lt;/a&gt;. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:2:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:9&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;&lt;a href=&quot;https://doi.org/10.1007/BF02166030&quot;&gt;Fiedler, M. Matrix Inequalities. Numer. Math. 9, 109–119 (1966)&lt;/a&gt;. &lt;a href=&quot;#fnref:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:9:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:9:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:9:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;&lt;a href=&quot;https://doi.org/10.1016/j.laa.2008.04.020&quot;&gt;Ronald L. Smith, The positive definite completion problem revisited, Linear Algebra and its Applications, Volume 429, Issue 7, 2008,Pages 1442-1452&lt;/a&gt;. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:3:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:10&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;One interesting feature of that result is that the existence of a correlation matrix completion - which seems quite algebric in nature - is actually equivalent to a “visual” condition. &lt;a href=&quot;#fnref:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:20&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;When one exists. &lt;a href=&quot;#fnref:20&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:5&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2112.03758&quot;&gt;Olaf Dreyer, Matrix completion and semidefinite matrices, arXiv&lt;/a&gt;. &lt;a href=&quot;#fnref:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:5:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:4&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;&lt;a href=&quot;https://doi.org/10.1080/15366367.2020.1827883&quot;&gt;Oscar L. Olvera Astivia (2021) A Note on the General Solution to Completing Partially Specified Correlation Matrices, Measurement: Interdisciplinary Research and Perspectives, 19:2, 115-123&lt;/a&gt;. &lt;a href=&quot;#fnref:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:4:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:4:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:4:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:4:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:12&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3748416&quot;&gt;van der Schans, Martin and Boer, Alex, A Heuristic for Completing Covariance And Correlation Matrices (March 14, 2013). Technical Working Paper 2014-01 November 2014&lt;/a&gt;. &lt;a href=&quot;#fnref:12&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:12:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:12:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:12:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:12:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:12:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:12:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:12:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:12:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:12:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:12:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;11&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:12:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;12&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:12:12&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;13&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:12:13&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;14&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:12:14&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;15&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:12:15&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;16&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:12:16&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;17&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:15&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The maximum determinant correlation matrix completion &lt;em&gt;maximizes the product of distances to the defining hyperplanes&lt;/em&gt;&lt;sup id=&quot;fnref:2:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:15&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:11&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://epubs.siam.org/doi/10.1137/S0895479896303430&quot;&gt;Vandenberghe, Lieven and Boyd, Stephen and Wu, Shao-Po, Determinant Maximization with Linear Matrix Inequality Constraints, SIAM Journal on Matrix Analysis and Applications, Volume 19, Number 2, Pages 499-533&lt;/a&gt;. &lt;a href=&quot;#fnref:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:13&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://link.springer.com/article/10.1007/s10589-020-00166-2&quot;&gt;Nakagaki, T., Fukuda, M., Kim, S. et al. A dual spectral projected gradient method for log-determinant semidefinite problems. Comput Optim Appl 76, 33–68 (2020)&lt;/a&gt;. &lt;a href=&quot;#fnref:13&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:14&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Or close to zero in case the nearest correlation matrix to the resulting approximate correlation matrix is computed. &lt;a href=&quot;#fnref:14&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:7&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;&lt;a href=&quot;https://doi.org/10.1016/S0024-3795(98)10211-2&quot;&gt;W. Glunt, T.L. Hayden, Charles R. Johnson, P. Tarazaga, Positive definite completions and determinant maximization, Linear Algebra and its Applications, Volume 288, 1999, Pages 1-10&lt;/a&gt;. &lt;a href=&quot;#fnref:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:16&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The Frobenius distance between $C_1$ and $C^*_1$ is $\approx 2.630$, while the Frobenius distance between $C_1$ and $C^{**}_1$ is $\approx 1.575$. &lt;a href=&quot;#fnref:16&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:17&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Incidentally, this allows to conclude - due to the entropy maximization property of the maximum determinant correlation matrix completion under a multivariate normal model - that the 2-factor model used by Peter Urbani is a vanilla multivariate normal model. &lt;a href=&quot;#fnref:17&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name>Roman R.</name></author><category term="correlation matrix" /><summary type="html">The previous post of this series on mathematical problems related to correlation matrices introduced the nearest correlation matrix problem1, which consists in determining the closest2 valid correlation matrix to an approximate correlation matrix In this blog post, I will now describe the correlation matrix completion problem, which consist in filling in the missing coefficients of a partially specified correlation matrix in order to produce a valid correlation matrix. As noted in Dreyer et al.3, such matrices often arise in financial applications when the number of stochastic variables becomes large or when several smaller models are combined in a larger model3. After a couple of reminders on correlation matrices, I will detail the mathematical formulation of the correlation matrix completion problem, discuss its exact and approximate solutions and I will finally illustrate how this problem naturally appears when working with capital market assumptions from financial institutions. Mathematical preliminaries Let $n$ be a number of assets. Correlation matrices Let $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ be a square matrix. $C$ is a correlation matrix if and only if $C$ is symmetric: $C {}^t = C$ $C$ is unit diagonal: $C_{i,i} = 1$, $i=1..n$ $C$ is positive semi-definite, that is, $x {}^t C x \geqslant 0, \forall x \in \mathbb{R}^n $ The sets of correlation matrices Let be the following convex sets: $\mathcal{S}^n_{+} = \{ X \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ such that $X {}^t = X$ and $X$ is positive semi-definite $\}$ $\mathcal{S}^n_{++} = \{ X \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ such that $X {}^t = X$ and $X$ is positive definite $\}$ $\mathcal{E}^n = \{ X \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ such that $X {}^t = X$ and $x_{ii} = 1, i = 1,…,n \}$ Then: The set of correlation matrices is the convex compact set $\mathcal{S}^n_{+} \cap \mathcal{E}^n$. The set of invertible correlation matrices is the open bounded convex set $\mathcal{S}^n_{++} \cap \mathcal{E}^n$. As a geometrical side note, the set of vectorized correlation matrices defines a subset of the unit hypercube in $\mathbb{R}^n$ called the elliptope, c.f. for example the website Convex Optimization. Partial correlation matrices completion A partial correlation matrix $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ is a symmetric unit diagonal matrix4 whose coefficients $c_{ij} = c_{ji}$ are not all specified. A partial correlation matrix is said to be partial positive (semi-)definite when its specified principal minors are positive (semi-)definite. A positive semi-definite completion - or simply a completion - of a partial correlation matrix $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ is a correlation matrix $C^* \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ such that $c_{ij}^{*} = c_{ij}$ whenever the coefficient $c_{ij}$ is specified in $C$. A positive definite completion of a partial correlation matrix $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ is a positive definite correlation matrix $C^* \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ such that $c_{ij}^{*} = c_{ij}$ whenever the coefficient $c_{ij}$ is specified in $C$. Undirected graph associated to a correlation matrix Let $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ be a partial correlation matrix. It is possible5 to associate an undirected graph $G = \left( V, E \right)$ to $C$, defined as follows: $V = \{ 1, 2, …, n \}$ is the set of vertices $E$ is the set of edges, with $(i,j) \in E$ whenever the coefficient $c_{ij}$ is specified, with $i \ne j$. The graph $G$ is then said to be chordal if every cycle of length greater than or equal to 4 has a chord, which is an edge that is not part of the cycle but connects two vertices of the cycle6. As a reminder, a cycle in G is a sequence of $k \geq 3$ pairwise distinct vertices $\left( v_1, …, v_s \right)$ such that $\left( v_1, v_2 \right)$, $\left( v_2, v_3 \right)$, …, $\left( v_{k-1} v_k \right)$, $\left( v_k v_1 \right)$ $\in E$, with $k$ called the length of the cycle. The correlation matrix completion problem Problem formulation Let be: $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ a partial correlation matrix. $\mathcal{N}$ the index set of the specified off-diagonal elements of $C$. $\mathcal{U}^n = \{ X \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ such that $x_{ij}=c_{ij}$ for $(i,j) \in \mathcal{N} \}$ Then: The correlation matrix completion problem is the problem of finding a correlation matrix $C^* \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ that completes the matrix $C$, that is, finding $C^* \in \mathcal{S}^n_{+} \cap \mathcal{U}^n$. The positive definite correlation matrix completion problem is the problem of finding a positive definite correlation matrix $C^* \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ that completes the matrix $C$, that is, finding $C^* \in \mathcal{S}^n_{++} \cap \mathcal{U}^n$. Existence of solutions Contrary to the nearest correlation matrix problem, the correlation matrix completion problem does not always admit a solution. Indeed, an already necessary condition for a completion to exist is for the partial correlation matrix to be partial positive semi-definite5. So, for example, the following partial correlation matrix does not admit any completion: \[C_1 = \begin{pmatrix} 1 &amp;amp; -1 &amp;amp; -1 &amp;amp; -1 \\ -1 &amp;amp; 1 &amp;amp; -1 &amp;amp; -1 \\ -1 &amp;amp; -1 &amp;amp; 1 &amp;amp; . \\ -1 &amp;amp; -1 &amp;amp; . &amp;amp; 1 \end{pmatrix}\] Beyond this necessary condition, Fiedler7, Grone et al.5 and Smith8 all establish the following result9: A partial positive semi-definite correlation matrix is completable regardless of the values of the specified correlations if and only if the undirected graph associated to these specified correlations is chordal. To be noted that when that graph is not chordal, nothing can be said in general because the existence of a completion then depends on the exact values of the specified correlations. As an illustration: Let $C_2 \in \mathcal{M} \left( \mathbb{R}^{4 \times 4} \right)$ be the following partial positive definite correlation matrix: \[C_2 = \begin{pmatrix} 1 &amp;amp; 1 &amp;amp; . &amp;amp; 1 \\ 1 &amp;amp; 1 &amp;amp; 1 &amp;amp; . \\ . &amp;amp; 1 &amp;amp; 1 &amp;amp; . \\ 1 &amp;amp; . &amp;amp; . &amp;amp; 1 \end{pmatrix}\] The undirected graph associated to that partial correlation matrix is the graph with the set of vertives $V = \{ 1, 2, 3, 4 \}$ and the set of edges $G = \{ (1,2), (1,4), (2,3) \}$, depicted in Figure 1. Figure 1. Undirected graph associated to the partial correlation matrix C2. From Figure 1, that graph is chordal, because there are no cycles of length &amp;gt;= 4. Consequently, the partial correlation matrix $C_2$ admit a positive definite completion and the existence of that completion does actually even not depend on the values of the specified correlations (here, all 1s). Let now $C_3 \in \mathcal{M} \left( \mathbb{R}^{4 \times 4} \right)$ be the following partial positive semi-definite correlation matrix: \[C_3 = \begin{pmatrix} 1 &amp;amp; 1 &amp;amp; . &amp;amp; 0 \\ 1 &amp;amp; 1 &amp;amp; 1 &amp;amp; . \\ . &amp;amp; 1 &amp;amp; 1 &amp;amp; 1 \\ 0 &amp;amp; . &amp;amp; 1 &amp;amp; 1 \end{pmatrix}\] The undirected graph associated to that partial correlation matrix is the graph with the set of vertives $V = \{ 1, 2, 3, 4 \}$ and the set of edges $G = \{ (1,2), (1,4), (2,3), (3,4) \}$, depicted in Figure 2. Figure 2. Undirected graph associated to the partial correlation matrix C3. From Figure 2, that graph is not chordal, because there is only one cycle of length 4 and that cycle does not have any chord. Consequently, the partial correlation matrix $C_3$ might or might not admit a completion, nothing can be said at this stage. For the interested reader, a couple of other theoretical results can be found in Fiedler7, which also notes that the general [completion] problem […] seems to be difficult7. Unicity of solutions Again contrary to the nearest correlation matrix problem, the correlation matrix completion problem does not generally admit a unique solution when one exists. Indeed, Grone et al.5 establishes that the set of all positive semi-definite completions of a partial correlation matrix is in general a convex compact set and not a singleton. This result can be illustrated with one and two missing correlations: One missing correlation The partial correlation matrix $C_4 = \begin{pmatrix} 1 &amp;amp; . \\ . &amp;amp; 1 \end{pmatrix} $ has one missing correlation. By introducing the variable $x$ representing that missing correlation, a completion is a valid completion if and only if it has a positive or null determinant, that is, $\det(x) = 1 - x^2 \geq 0$. That condition being equivalent to the condition $x \in [-1,1]$, all correlation matrices of the form $ C^*_4(x) = \begin{pmatrix} 1 &amp;amp; x \\ x &amp;amp; 1 \end{pmatrix} $ with $x \in [-1,1]$ are completions of $C_4$. Two missing correlations The partial correlation matrix $C_5 \in \mathcal{M} \left( \mathbb{R}^{5 \times 5} \right)$ represented in Figure 3 has two missing correlations. Figure 3. Partial correlation matrix with 2 missing correlations. Source: Georgescu et al. Because all the specified sub-matrices of $C_5$ are positive semi-definite, a completion is again a valid completion if and only if it has a positive or null determinant. By introducing the variables $x$ and $y$ representing the two missing correlations, the determinant of $C_5$ can be factored as $ \det(x,y) $ $ \approx$ $ −0.18 \left( 0.67x^2 +0.75y^2 −xy −0.32x−0.24y+0.17 \right)$. Non-negativity of that determinant is then equivalent to the condition \[0.67x^2 +0.75y^2 −xy −0.32x−0.24y+0.17 \leq 0\] , which defines the 2-dimensional ellipse (and its interior) depicted in Figure 4. Figure 4. Feasible region for Georgescu et al.'s partial correlation matrix completions. All correlation matrices $C_5^{*}(x,y) \in \mathcal{M} \left( \mathbb{R}^{5 \times 5} \right)$ with the same correlations as $C_4$, a correlation $c^{*}_{34}$ $=$ $x$ and a correlation $c^{*}_{35}$ $=$ $y$ - with $x,y$ satisfying the above relationship - are thus completions of $C_5$. The existence of infinitely many completions10 naturally leads to trying to find a best-estimate completion in some sense6, which does exist and is called the maximum determinant completion6. The maximum determinant correlation matrix completion problem Fiedler7 originally demonstrated that if a positive definite partial correlation matrix admits a completion, then, there is a unique matrix in the (nonempty) class of all positive definite completions […] that has maximum determinant8. More recently, by introducing a generalized determinant that gives the determinant of the nonsingular part of [a] matrix11, Dreyer11 established a similar result for positive semi-definite partial correlation matrices. That completion, called the maximum determinant completion - or the Max-Det completion - has several intesting theoretical properties, which makes it an ideal candidate for guaranteeing the unicity of the completion of a correlation matrix. Problem formulation Let be: $C \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ a partial correlation matrix. $\mathcal{N}$ the index set of the specified off-diagonal elements of $C$. $\mathcal{U}^n = \{ X \in \mathcal{M} \left( \mathbb{R}^{n \times n} \right)$ such that $x_{ij}=c_{ij}$ for $(i,j) \in \mathcal{N} \}$ The maximum determinant correlation matrix completion problem can be cast as a generalization of a semidefinite programming problem: \[C^* = \operatorname{argmax} \log ( \det (X) ) \text{ s.t. } X \in \mathcal{S}^n_{+} \cap \mathcal{E}^n \cap \mathcal{U}^n\] Assuming that a solution exist, c.f. the previous section, Grone et al.5 and Olvera Astivia12 show using standard results from mathematical optimization theory that it is necessarily unique. Mathematical properties of the solution Georgescu et al.6 summarizes the main properties of the maximum determinant correlation matrix completion: For the multivariate normal model, it maximizes the entropy of the distribution described by the matrix van der Schans and Boer13 comments this property as follows: Since the already specified [correlations] imply a dependence between variables between which no dependence is specified and dependence reduces the amount of uncertainty in a system, the intuitive interpretation of entropy maximization is not introducing more dependence than is already implied by the already specified [correlations]. Again for the multivariate normal model, it maximizes the likelihood of the correlation matrix It corresponds to the analytic centre of the feasible region described by the positive semi-definiteness constraints6 In other words, for a given partial correlation matrix, its maximum determinant completion lies as “deep” as possible14 inside the set of all its positive definite completions. Computation of the solution The original algorithmic approach to solving determinant maximization problems with matrix constraints has been to use interior point methods15. More recently, dual projected gradient methods have been developed that are better able to scale with the dimensionality of the problem16. Nevertheless, the drawbacks of these algorithms, from an application point of view, are that they are difficult to implement […], do not always converge if inconsistent starting [correlations] are specified and, finally, global optimization is too slow and memory consuming for large matrices13. More on “inconsistent starting correlations” later. For these reasons, it is tempting to [simply] set the missing [correlations] to zero3, but this approach has several shortcomings3, among which the fact that forcing unspecified correlations to zero17 might not be ideal for financial applications where most assets are correlated to each other. To illustrate this point, Figure 5 represents a partial correlation matrix with a block of missing correlations. Figure 5. Partial correlation matrix with a block of missing correlations. Source: Georgescu et al. The maximum determinant completion of the block of missing correlations is6 \[E_1 = \begin{pmatrix} 0.1000 &amp;amp; 0.1500 &amp;amp; 0.0500 &amp;amp; 0.0750 \\ 0.2400 &amp;amp; 0.3600 &amp;amp; 0.1200 &amp;amp; 0.1800 \\ 0.2200 &amp;amp; 0.3300 &amp;amp; 0.1100 &amp;amp; 0.1650 \\ 0.2600 &amp;amp; 0.3900 &amp;amp; 0.1300 &amp;amp; 0.1950 \\ 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 \end{pmatrix}\] , while the completion of the same block of missing correlations obtained after setting them to zero and computing the nearest correlation matrix to the resulting full approximate correlation matrix is6 \[E_2 = \begin{pmatrix} 0.0022 &amp;amp; 0.0084 &amp;amp; 0.0004 &amp;amp; 0.0035 \\ 0.0003 &amp;amp; 0.0011 &amp;amp; 0.0001 &amp;amp; 0.0005 \\ 0.0025 &amp;amp; 0.0098 &amp;amp; 0.0005 &amp;amp; 0.0040 \\ 0.0042 &amp;amp; 0.0164 &amp;amp; 0.0008 &amp;amp; 0.0067 \\ 0 &amp;amp; 0 &amp;amp; 0 &amp;amp; 0 \end{pmatrix}\] Comparing the two sub-matrices $E_1$ and $E_2$, it is clear that the completed correlations of $E_2$ are much closer to zero than those of $E_1$, thus imposing an additional soft constraint of “closeness to zero” to the correlation matrix completion problem. Reverting to simple heuristics for the correlation matrix completion problem might thus be potentially dangerous, depending on the context. Hopefully, van der Schans and Boer13 proposes an heuristic which does not suffer from these shortcomings and that: Is fast, which makes it suitable for applications in which the computation time is important13 Tries to not introduce more dependence between variables than is implied by the initially specified [correlations]13, contrary to the “setting missing correlations to zero” heuristic Additionally corrects for inconsistencies in the already specified [correlations]13, but again, more on this later Empirically produces completions with an average correlation difference with the maxdet completion [that is] reasonable13 A drawback of this heuristic, though, is that it depends on the ordering of the rows and columns of the [correlation] matrix, i.e. the heuristic will yield a different result if first rows and columns of [the matrix] are interchanged before starting the procedure13, which might or might not be acceptable in practice. As a side note, explicit solutions of the maximum determinant correlation matrix completion problem are known in specific cases, for example for correlation matrices that follow an L-shaped, block diagonal pattern which is a common structure in insurance problems12, c.f. Georgescu et al.6 The infeasible maximum determinant correlation matrix completion problem In the previous section, a solution to the maximum determinant correlation matrix completion problem was assumed to exist. There are at least two practical problems with this assumption: A partial correlation matrix might theoretically admit a completion, but numerical round-off errors might prevent an algorithm from finding it An example of such an ill-behaved matrix can be found in Glunt et al.18, but it is a partial covariance matrix and not a partial correlation matrix… Building a similar example using a partial correlation matrix is certainly doable, but I failed to do so quickly, so, no example here. More typically, the initially specified [correlations] can be inconsistent in the sense that no valid completion exists13 Here, the partial correlation matrix $C_1$ perfectly illustrates this point, even though it is a rather extreme example. In both cases, simply failing to compute a solution might be unacceptable (e.g. for downstream pipelines), so that something must be done. Unfortunately, the currently existing literature does not focus on algorithms that both complete partially specified matrices and also correct for inconsistencies13. In other words, there is no well-established formulation of what could be called the infeasible maximum determinant correlation matrix completion problem. At minimum, what can be said is that any solution to this new problem must involve a trade-off between The closeness to the initial partial correlation matrix, for example in terms of the Frobenius distance The value of the determinant Incidentally, this is exactly how the heuristic algorithm of van der Schans and Boer13 is working: the specified [correlations] are adjusted as little as possible13 and the introduced extra (conditional) dependence between variables is as little as possible13. When applied to the partial correlation matrix $C_1$, van der Schans and Boer’s algorithm13 gives \[C^*_1 = \begin{pmatrix} 1 &amp;amp; -0.99 &amp;amp; -0.07 &amp;amp; -0.07 \\ -0.99 &amp;amp; 1 &amp;amp; -0.07 &amp;amp; -0.07 \\ -0.07 &amp;amp; -0.07 &amp;amp; 1 &amp;amp; 0.98 \\ -0.07 &amp;amp; -0.07 &amp;amp; 0.98 &amp;amp; 1 \end{pmatrix}\] As expected, the completed correlation matrix $C^*_1$ is very far from the partial correlation matrix $C_1$, with only one correlation close to its initial value of -1 (-0.99). Again, this is an extreme infeasible example, but it empirically demonstrates that adjusting the initially specified correlations, even as little as possible13, can lead to a completed correlation matrix that does not ressemble at all the initial partial correlation matrix. Implementation in Portfolio Optimizer Portfolio Optimizer implements two methods to complete a partial correlation matrix through the endpoint /assets/correlation/matrix/completed: One proprietary method that guarantees to find the maximum determinant completion if it exists or that guarantees to minimally adjust - in terms of Frobenius distance - the partially specified correlation matrix so that a maximum determinant completion exists The heuristic method of van der Schans and Boer13, with additional tweaks to improve its numerical robustness For comparison with van der Schans and Boer’s algorithm13, the proprietary method of Portfolio Optimizer applied to the partial correlation matrix $C_1$ gives: \[C^{**}_1 = \begin{pmatrix} 1 &amp;amp; -0.30 &amp;amp; -0.59 &amp;amp; -0.59 \\ -0.30 &amp;amp; 1 &amp;amp; -0.59 &amp;amp; -0.59 \\ -0.59 &amp;amp; -0.59 &amp;amp; 1 &amp;amp; 1 \\ -0.59 &amp;amp; -0.59 &amp;amp; 1 &amp;amp; 1 \end{pmatrix}\] Comparing the two completed matrices $C_1^{*}$ and $C_1^{**}$, it appears that the matrix $C_1^{**}$ is much19 closer to the partial correlation matrix $C_1$ than the matrix $C_1^{*}$, which is consistent with the expected behaviour of Portfolio Optimizer. Example of usage - Completing partial correlation matrices from financial institutions Major financial institutions regularly provide forecasts of future risk/return characteristics for broad asset classes over the next 5 to 20 years, called (Long Term) Capital Market Assumptions (LTCMA). In addition to the future expected volatility, the risk forecasts sometimes also include future expected correlation matrices, which is for example the case with J.P. Morgan. Unfortunately, these correlation matrices are typically partially specified. In addition, even if they were fully specified, combining capital market assumptions from several financial institutions would render them partial due to all institutions not covering the same asset classes… So, as an example of usage, I propose to compute the maximum determinant completion of the partial correlation matrix provided by Blackrock as part of their November 2025 5-year capital market assumptions and represented in Figure 6. Figure 6. Five-year expected future partial correlations between major asset classes, NZD currency, 13th November 2025. Source: Blackrock. The resulting maximum determinant completed correlation matrix is displayed in Figure 7. Figure 7. Five-year expected future Max-Det completed correlations between major asset classes, NZD currency, 13th November 2025. I would like to mention that such an example of usage was inspired by a LinkedIn post from Peter Urbani, who proposes to complete Blackrock’s partial correlation matrix into the correlation matrix displayed in Figure 8 thanks to a 2-factor model built from the fully specified global equities/global bonds correlations. Figure 8. Five-year expected future 2-factor model completed correlations between major asset classes, NZD currency, 13th November 2025. Source: Peter Urbani. Comparing the “Max-Det” completed correlation matrix with Peter Urbani’s “2-factor model” completed correlation matrix , both matrices are perfectly identical20. As a closing (fun) remark, Olvera Astivia12 highlights that having half of the entries in a correlation matrix missing [can be] considered a rather extreme condition12. What could then be said about the initial Blackrock’s partial correlation matrix that has around 82% of missing entries?! Conclusion With this first blog post of 2026, my hope is that you have added to your quantitative toolbox a useful methodology to obtain, in a mathematically principled way, values implied in correlational structure of the data, even if said data has not (or cannot) be obtained12. In any cases, for more mathematics of correlation matrices, feel free to connect with me on LinkedIn or to follow me on Twitter. – Nicholas J. Higham, Computing the Nearest Correlation Matrix—A Problem from Finance, IMA J. Numer. Anal. 22, 329–343, 2002.. &amp;#8617; In terms of the Frobenius distance. &amp;#8617; Olaf Dreyer, Horst Kohler, Thomas Streuer, Completing correlation matrices, arXiv. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 That is, belonging to the set $\mathcal{E}^n$. &amp;#8617; R. Grone, C.R. Johnson, E. Sa, H. Wolkowicz, Positive definite completions of partial Hermitian matrices, Linear Algebra Appl. 58 (1984) 109–124. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 Georgescu DI, Higham NJ, Peters GW. 2018 Explicit solutions to correlation matrix completion problems, with an application to risk management and insurance. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 Fiedler, M. Matrix Inequalities. Numer. Math. 9, 109–119 (1966). &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 Ronald L. Smith, The positive definite completion problem revisited, Linear Algebra and its Applications, Volume 429, Issue 7, 2008,Pages 1442-1452. &amp;#8617; &amp;#8617;2 One interesting feature of that result is that the existence of a correlation matrix completion - which seems quite algebric in nature - is actually equivalent to a “visual” condition. &amp;#8617; When one exists. &amp;#8617; Olaf Dreyer, Matrix completion and semidefinite matrices, arXiv. &amp;#8617; &amp;#8617;2 Oscar L. Olvera Astivia (2021) A Note on the General Solution to Completing Partially Specified Correlation Matrices, Measurement: Interdisciplinary Research and Perspectives, 19:2, 115-123. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 See van der Schans, Martin and Boer, Alex, A Heuristic for Completing Covariance And Correlation Matrices (March 14, 2013). Technical Working Paper 2014-01 November 2014. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 &amp;#8617;10 &amp;#8617;11 &amp;#8617;12 &amp;#8617;13 &amp;#8617;14 &amp;#8617;15 &amp;#8617;16 &amp;#8617;17 The maximum determinant correlation matrix completion maximizes the product of distances to the defining hyperplanes6. &amp;#8617; See Vandenberghe, Lieven and Boyd, Stephen and Wu, Shao-Po, Determinant Maximization with Linear Matrix Inequality Constraints, SIAM Journal on Matrix Analysis and Applications, Volume 19, Number 2, Pages 499-533. &amp;#8617; See Nakagaki, T., Fukuda, M., Kim, S. et al. A dual spectral projected gradient method for log-determinant semidefinite problems. Comput Optim Appl 76, 33–68 (2020). &amp;#8617; Or close to zero in case the nearest correlation matrix to the resulting approximate correlation matrix is computed. &amp;#8617; W. Glunt, T.L. Hayden, Charles R. Johnson, P. Tarazaga, Positive definite completions and determinant maximization, Linear Algebra and its Applications, Volume 288, 1999, Pages 1-10. &amp;#8617; The Frobenius distance between $C_1$ and $C^*_1$ is $\approx 2.630$, while the Frobenius distance between $C_1$ and $C^{**}_1$ is $\approx 1.575$. &amp;#8617; Incidentally, this allows to conclude - due to the entropy maximization property of the maximum determinant correlation matrix completion under a multivariate normal model - that the 2-factor model used by Peter Urbani is a vanilla multivariate normal model. &amp;#8617;</summary></entry><entry><title type="html">Covariance Matrix Forecasting: Average Oracle Method</title><link href="https://portfoliooptimizer.io/blog/covariance-matrix-forecasting-average-oracle-method/" rel="alternate" type="text/html" title="Covariance Matrix Forecasting: Average Oracle Method" /><published>2025-12-10T00:00:00-06:00</published><updated>2025-12-10T00:00:00-06:00</updated><id>https://portfoliooptimizer.io/blog/covariance-matrix-forecasting-average-oracle-method</id><content type="html" xml:base="https://portfoliooptimizer.io/blog/covariance-matrix-forecasting-average-oracle-method/">&lt;p&gt;Continuing this series on covariance matrix forecasting (c.f. &lt;a href=&quot;/blog/from-volatility-forecasting-to-covariance-matrix-forecasting-the-return-of-simple-and-exponentially-weighted-moving-average-models&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;/blog/covariance-matrix-forecasting-iterated-exponentially-weighted-moving-average-model&quot;&gt;there&lt;/a&gt; for the previous posts), 
I will now describe a relatively recent&lt;sup id=&quot;fnref:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; &lt;em&gt;data-driven, model-free, way to [forecast] covariance [and correlation] matrices of time-varying systems&lt;/em&gt;&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; rooted in &lt;a href=&quot;https://en.wikipedia.org/wiki/Random_matrix&quot;&gt;random matrix theory&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This method - introduced in Bongiorno et al.&lt;sup id=&quot;fnref:1:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; and called &lt;em&gt;Average Oracle&lt;/em&gt; - consists in replacing the eigenvalues of a (noisy) estimate of a time-varying covariance matrix by time-independant 
&lt;em&gt;eigenvalues that encode the average influence of the future on present eigenvalues&lt;/em&gt;&lt;sup id=&quot;fnref:1:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In this blog post, I will describe that method and, as now usual in this series, I will illustrate its empirical performances in the context of monthly covariance matrix forecasting for a multi-asset class ETF portfolio.&lt;/p&gt;

&lt;h2 id=&quot;mathematical-preliminaries&quot;&gt;Mathematical preliminaries&lt;/h2&gt;

&lt;p&gt;Some of these sub-sections contain reminders from a &lt;a href=&quot;/blog/from-volatility-forecasting-to-covariance-matrix-forecasting-the-return-of-simple-and-exponentially-weighted-moving-average-models/&quot;&gt;previous blog post&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;dynamic-covariance-and-correlation-matrices&quot;&gt;Dynamic covariance and correlation matrices&lt;/h3&gt;

&lt;p&gt;Let $n$ be the number of assets in a universe of assets and $r_t \in \mathbb{R}^n$ be the vector of the (&lt;a href=&quot;https://en.wikipedia.org/wiki/Rate_of_return#Logarithmic_or_continuously_compounded_return&quot;&gt;logarithmic&lt;/a&gt;) 
return process of these assets over a time period $t$ (a day, a week, a month..) over which their mean return vector $\mu_t \in \mathbb{R}^n$ is supposed to be null.&lt;/p&gt;

&lt;p&gt;Then:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;The asset covariance matrix $\Sigma_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ over the time period $t$ is defined as $\Sigma_t = \mathbb{E} \left[ r_t r_t {}^t \right]$.&lt;/p&gt;

    &lt;p&gt;That matrix is called&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; the &lt;em&gt;population&lt;/em&gt; (or &lt;em&gt;true&lt;/em&gt;) covariance matrix of the asset returns over the time period $t$.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The asset correlation matrix $C_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ over the time period $t$ is defined as the correlation matrix $ C_t = V_t^{-1} \Sigma_t V_t^{-1} $ associated to the covariance matrix $\Sigma_t$, where $V_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ is the diagonal matrix of the asset standard deviations.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let now be $T$ time periods $t = 1..T$.&lt;/p&gt;

&lt;p&gt;Then:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;The averaged&lt;sup id=&quot;fnref:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; covariance matrix $\Sigma_{1:T} \in \mathcal{M}(\mathbb{R}^{n \times n})$ over the $T$ time periods $t = 1..T$ is defined as $ \Sigma_{1:T} = \frac{1}{T} \sum_{t=1}^{T} \Sigma_{t} $.&lt;/p&gt;

    &lt;p&gt;In case the return process is time-invariant, the averaged covariance matrix $\Sigma_{1:T}$ is equal to the (constant) population covariance matrix $\left( \Sigma = \right) \Sigma_t, t = 1..T$.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The averaged&lt;sup id=&quot;fnref:6:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; correlation matrix $C_{1:T} \in \mathcal{M}(\mathbb{R}^{n \times n})$ over the $T$ time periods $t = 1..T$ is defined as $ C_{1:T} = \frac{1}{T} \sum_{t=1}^{T} C_{t} $.&lt;/p&gt;

    &lt;p&gt;In case the return process is time-invariant, the averaged correlation matrix $C_{1:T}$ is equal to the (constant) correlation matrix $\left( C = \right) = C_t, t = 1..T$.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The pseudo-averaged&lt;sup id=&quot;fnref:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; correlation matrix $C_{p, 1:T} \in \mathcal{M}(\mathbb{R}^{n \times n})$ over the $T$ time periods $t = 1..T$ is defined as the correlation matrix associated to the averaged covariance matrix $\Sigma_{1:T}$.&lt;/p&gt;

    &lt;p&gt;In case the return process is time-invariant, the pseudo-averaged correlation matrix $C_{p, 1:T} $ is equal to the averaged correlation matrix $C_{1:T}$, but in general, due to time-varying asset standard deviations, the two correlation matrices are different.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;dynamic-covariance-and-correlation-matrices-sample-estimators&quot;&gt;Dynamic covariance and correlation matrices sample estimators&lt;/h3&gt;

&lt;p&gt;In practice, the asset return process $r_t$ is usually not known and the only available information is the vectors of realized asset returns $ \tilde{r}_1,…, \tilde{r}_T \in \mathbb{R}^n$ for $T$ time periods.&lt;/p&gt;

&lt;p&gt;From each of these vectors:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;The classical way to estimate the covariances is to compute the empirical (or sample) covariance matrix thanks to Pearson estimator&lt;/em&gt;&lt;sup id=&quot;fnref:2:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; defined as $\tilde{\Sigma}_t = \tilde{r}_t \tilde{r}_t {}^t $ over each time period $t = 1..T$.&lt;/p&gt;

    &lt;p&gt;Here, the &lt;a href=&quot;https://en.wikipedia.org/wiki/Outer_product&quot;&gt;outer product&lt;/a&gt; of the realized asset returns $ \tilde{r}_t \tilde{r}_t {}^t $ over the time period $t$ is called a covariance estimate $\tilde{\Sigma}_t$ - or covariance proxy&lt;sup id=&quot;fnref:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; - for the (unobserved) asset returns covariance matrix over that time period.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The empirical (or sample) correlation matrix over each time period $t = 1..T$ is defined as the correlation matrix $ \tilde{C}_t $ associated to the covariance matrix $\tilde{\Sigma}_t$.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The empirical averaged covariance matrix over the $T$ time periods $t = 1..T$ is typically estimated from the realized asset returns $ \tilde{r}_1,…, \tilde{r}_T \in \mathbb{R}^n$ thanks to the averaged Pearson estimator defined as $\tilde{\Sigma}_{1:T} = \frac{1}{T} \sum_{t=1}^{T} \tilde{r}_t \tilde{r}_t {}^t $.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The empirical averaged correlation matrix over the $T$ time periods $t = 1..T$ is defined as $ \tilde{C}_{1:T} = \frac{1}{T} \sum_{t=1}^{T} \tilde{C}_{t} $.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The empirical pseudo-averaged correlation matrix over the $T$ time periods $t = 1..T$ is defined as the correlation matrix $ \tilde{C}_{p, 1:T}$ associated to the empirical averaged covariance matrix $\tilde{\Sigma}_{1:T}$.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;covariance-and-correlation-matrices-rotationally-invariant-estimators&quot;&gt;Covariance and correlation matrices rotationally invariant estimators&lt;/h3&gt;

&lt;h4 id=&quot;rotationally-invariant-estimators&quot;&gt;Rotationally invariant estimators&lt;/h4&gt;

&lt;p&gt;As already mentioned in &lt;a href=&quot;/blog/correlation-matrices-denoising-results-from-random-matrix-theory&quot;&gt;a previous blog post on correlation matrices denoising&lt;/a&gt;, the estimation of empirical covariance and correlation matrices in finance is affected by noise, in the form of measurement error, 
due in part to the short length of the time series of asset returns typically used in their computation.&lt;/p&gt;

&lt;p&gt;Indeed:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Constructing a well-diversified portfolio requires many assets&lt;/em&gt;&lt;sup id=&quot;fnref:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:9&quot; class=&quot;footnote&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;, that is, a big $n$.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;In contrast, rapid shifts in financial market dependencies can only be captured by short calibration windows&lt;/em&gt;&lt;sup id=&quot;fnref:9:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:9&quot; class=&quot;footnote&quot;&gt;7&lt;/a&gt;&lt;/sup&gt; for estimating asset correlations, that is, a small $T$.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This situation leads to an &lt;em&gt;aspect ratio&lt;/em&gt; $q = \frac{n}{T}$ of empirical matrices either close to 1 - or even worse, much greater than 1 -, which is catastrophic from an estimation perspective, 
c.f. the above blog post and references therein.&lt;/p&gt;

&lt;p&gt;Hopefully, &lt;em&gt;numerous techniques have been developed to improve the estimation of noisy covariance [or correlation] matrices&lt;/em&gt;&lt;sup id=&quot;fnref:9:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:9&quot; class=&quot;footnote&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Several of these techniques, like &lt;a href=&quot;/blog/correlation-matrices-denoising-results-from-random-matrix-theory&quot;&gt;the eigenvalue clipping method&lt;/a&gt;&lt;sup id=&quot;fnref:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:15&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;, involve a specific class of matrix estimators - known as &lt;em&gt;Rotationally Invariant Estimators (RIE)&lt;/em&gt; or &lt;em&gt;Orthogonally Invariant Estimators (OIE)&lt;/em&gt; - that leaves the eigenvectors of the empirical matrices untouched while altering their eigenvalues.&lt;/p&gt;

&lt;p&gt;In other words, an RIE estimator $\Xi \left( \tilde{\Sigma}_t \right) \in \mathcal{M}(\mathbb{R}^{n \times n})$ of a true covariance or correlation matrix $\Sigma_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ obtained from its empirical counterpart $\tilde{\Sigma}_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ has the general form&lt;/p&gt;

\[\Xi \left( \tilde{\Sigma}_t \right) = \tilde{V}_t \Lambda_t \tilde{V}_t {}^t\]

&lt;p&gt;, where:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$\Lambda_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ is a diagonal matrix of &lt;em&gt;well-chosen&lt;/em&gt;&lt;sup id=&quot;fnref:1:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; eigenvalues.&lt;/li&gt;
  &lt;li&gt;$\tilde{V}_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ is the matrix of eigenvectors of $\tilde{\Sigma}_t$, defined through &lt;a href=&quot;https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix&quot;&gt;the spectral decomposition&lt;/a&gt; $ \tilde{\Sigma}_t = \tilde{V}_t \tilde{\Lambda}_t \tilde{V}_t {}^t $ with $ \tilde{\Lambda}_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ the diagonal matrix of eigenvalues of $\tilde{\Sigma}_t$.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bun et al.&lt;sup id=&quot;fnref:2:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; explains the underlying rationale as follows:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;The true matrix [$\Sigma_t$] is unknown and we do not have any particular insights on its components (the eigenvectors).&lt;/p&gt;

  &lt;p&gt;Therefore we would like our estimator [$\Xi \left( \tilde{\Sigma}_t \right)$] to be constructed in a rotationally invariant way from the noisy observation [$\tilde{\Sigma}_t$] that we have.&lt;/p&gt;

  &lt;p&gt;In simple terms, this means that there is no privileged direction in the $n$-dimensional space that would allow one to bias the eigenvectors of the estimator [$\Xi \left( \tilde{\Sigma}_t \right)$] in some special directions.&lt;/p&gt;

  &lt;p&gt;More formally, the estimator construction must obey: $ \Omega \Xi \left( \tilde{\Sigma}_t \right) \Omega {}^t$ $=$ $\Xi \left( \Omega \tilde{\Sigma}_t \Omega {}^t \right)$ for any rotation matrix $\Omega \in \mathcal{M}(\mathbb{R}^{n \times n})$.&lt;/p&gt;

  &lt;p&gt;Any estimator satisfying [that equation] will be referred to as a Rotational Invariant Estimator (RIE).&lt;/p&gt;

  &lt;p&gt;In this case, it turns out that the eigenvectors of the estimator [$\Xi \left( \tilde{\Sigma}_t \right)$] have to be the same as those of the noisy matrix [$\tilde{\Sigma}_t$].&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4 id=&quot;rotationally-invariant-oracle-estimator&quot;&gt;Rotationally invariant oracle estimator&lt;/h4&gt;

&lt;p&gt;Bun et al.&lt;sup id=&quot;fnref:2:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; shows that the optimal RIE estimator of the unknown matrix $\Sigma_t$ in terms of &lt;a href=&quot;https://en.wikipedia.org/wiki/Matrix_norm&quot;&gt;Frobenius norm&lt;/a&gt; is the RIE estimator whose diagonal matrix of eigenvalues $\Lambda_O$ satisfy&lt;/p&gt;

\[\Lambda_O = \text{diag} \left(  \tilde{V}_t {}^t \Sigma_t \tilde{V}_t \right)\]

&lt;p&gt;That estimator is &lt;em&gt;sometimes called the oracle estimator because it depends explicitly on the knowledge of the true signal [$\Sigma_t$]&lt;/em&gt;&lt;sup id=&quot;fnref:2:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; and so is not directly usable in practice.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Remarkably, [though], asymptotically optimal RIEs that converge to the oracle estimator can be obtained without the knowledge of the true covariance; however, such estimators require that: i) the ground truth does not change, ii) the data matrix is very large, and iii) the data has at least finite fourth moments&lt;/em&gt;&lt;sup id=&quot;fnref:1:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Those conditions, and especially the first one, are definitely not satisfied by asset returns, which leads to suboptimal estimators in practice…&lt;/p&gt;

&lt;p&gt;As a side note, and maybe contrary to intuition, the optimal eigenvalues $\Lambda_O$ are NOT equal to the eigenvalues of $\Sigma_t$, because this would result in &lt;em&gt;a spectrum that is too wide&lt;/em&gt;&lt;sup id=&quot;fnref:2:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h2 id=&quot;the-average-oracle-covariance-matrix-forecasting-method&quot;&gt;The Average Oracle covariance matrix forecasting method&lt;/h2&gt;

&lt;h3 id=&quot;forecasting-formulas&quot;&gt;Forecasting formulas&lt;/h3&gt;

&lt;p&gt;Let be:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$n$ be the number of assets in a universe of assets&lt;/li&gt;
  &lt;li&gt;$\tilde{r}_t \tilde{r}_t {}^t$, $t=1..T$ the outer products of the observed asset returns over each of $T$ past periods&lt;/li&gt;
  &lt;li&gt;$1 \leq h_{in} \ll T$ a chosen number of past periods&lt;/li&gt;
  &lt;li&gt;$\mathcal{I}_{cal} = [t_{cal}, T - h_{next}]$ a long calibration window, with $t_{cal}$ chosen so that $t_{cal} \geq h_{in} - 1$ and $| \mathcal{I}_{cal} | \gg h_{in}$&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;asset-returns-averaged-covariance-matrix&quot;&gt;Asset returns averaged covariance matrix&lt;/h4&gt;

&lt;p&gt;The Average Oracle covariance matrix forecasting model estimates the asset returns averaged covariance matrix $\hat{\Sigma}_{T+1:T+h_{next}}$ over the next $h_{next} \geq 1$ periods as follows&lt;sup id=&quot;fnref:1:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Choose a number of random time periods $n_B \geq 1$ to generate.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;For $b = 1..n_B$ do
    &lt;ul&gt;
      &lt;li&gt;
        &lt;p&gt;Select uniformly at random with replacement a time period $t^{(b)} \in \mathcal{I}_{cal}$&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;Compute the “past” averaged asset returns covariance matrix $ \tilde{\Sigma}^{(b)}_{in} $ on the train window $\mathcal{I}^{(b)}_{in} = [t^{(b)} - h_{in} + 1, t^{(b)}]$, defined by $\tilde{\Sigma}^{(b)}_{in} = \tilde{\Sigma}_{t^{(b)} - h_{in} + 1:t^{(b)}} = \frac{1}{h_{in}} \sum_{t=t^{(b)} - h_{in} + 1}^{t^{(b)}} \tilde{r}_t \tilde{r}_t {}^t $ and its associated correlation matrix $\tilde{C}^{(b)}_{in} = \tilde{C}_{p, t^{(b)} - h_{in} + 1:t^{(b)}}$ whose spectral decomposition is given by $\tilde{C}^{(b)}_{in} = \tilde{V}^{(b)}_{in} \tilde{\Lambda}^{(b)}_{in} \tilde{V}^{(b)}_{in} {}^t $.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;Compute the “future” averaged asset returns covariance matrix $ \tilde{\Sigma}^{(b)}_{next} $ on the test window $\mathcal{I}^{(b)}_{next} = [t^{(b)} + 1, t^{(b)} + h_{next}]$, defined by  $ \tilde{\Sigma}^{(b)}_{next} = \tilde{\Sigma}_{t^{(b)} + 1:t^{(b)} + h_{next}} = \frac{1}{h_{next}} \sum_{t=t^{(b)} + 1}^{t^{(b)} + h_{next}} \tilde{r}_t \tilde{r}_t {}^t $ and its associated correlation matrix $\tilde{C}^{(b)}_{next} = \tilde{C}_{p, t^{(b)} + 1:t^{(b)} + h_{next}}$.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;Compute the diagonal matrix of oracle eigenvalues $\tilde{\Lambda}_O^{(b)} = \text{diag} \left( \tilde{V}^{(b)}_{in} {}^t \tilde{C}^{(b)}_{next} \tilde{V}^{(b)}_{in} \right) $.&lt;/p&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Compute the diagonal matrix of Average Oracle eigenvalues $\tilde{\Lambda}_{AO} = \frac{1}{n_B} \sum_{b=1}^{n_B} \tilde{\Lambda}_O^{(b)} $.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Compute the most recent “past” averaged asset returns covariance matrix $ \tilde{\Sigma}_{in} $  on the window $\mathcal{I}_{in} = [T - h_{in} + 1, T]$, defined by $ \tilde{\Sigma}_{in}  = \tilde{\Sigma}_{T - h_{in} + 1:T} = \frac{1}{h_{in}} \sum_{t=T - h_{in} + 1}^{T} \tilde{r}_t \tilde{r}_t {}^t $ and its associated correlation matrix $\tilde{C}_{in} = \tilde{C}_{p, T - h_{in} + 1:T}$ whose spectral decomposition is given by $\tilde{C}_{in} = \tilde{V}_{in} \tilde{\Lambda}_{in} \tilde{V}_{in} {}^t $.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;Compute $\hat{\Sigma}_{T+1:T+h_{next}} = D_{in} \tilde{V}_{in} \tilde{\Lambda}_{AO} \tilde{V}_{in} {}^t D_{in} $, where $D_{in} \in \mathcal{M}(\mathbb{R}^{n \times n})$ is the diagonal matrix of the standard deviations $\sqrt{ \left( \tilde{\Sigma}_{in} \right)_ii }, i=1..n$.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For more visual clarity, Figure 1 illustrates that process.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/covariance-matrix-forecasting-ao-bongiorno-method.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/covariance-matrix-forecasting-ao-bongiorno-method-small.png&quot; alt=&quot;Illustration of the average oracle eigenvalues computation process, which uses different windows included in a long calibration window. Source: Adapted from Bongiorno et al.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 1. Illustration of the Average Oracle eigenvalues computation process, which uses different windows included in a long calibration window. Source: Adapted from Bongiorno et al.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h4 id=&quot;asset-returns-averaged-and-pseudo-averaged-correlation-matrix&quot;&gt;Asset returns averaged and pseudo-averaged correlation matrix&lt;/h4&gt;

&lt;p&gt;The Average Oracle covariance matrix forecasting model does not easily&lt;sup id=&quot;fnref:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:10&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt; allow to estimate the asset returns averaged correlation matrix $\hat{C}_{T+1:T+h_{next}}$, because it does not rely on the estimation of the individual covariance matrices $\hat{\Sigma}_{T+1}$, $…$, $\hat{\Sigma}_{T+h_{next}}$.&lt;/p&gt;

&lt;p&gt;The asset returns pseudo-averaged correlation matrix $\hat{C}_{p, T+1:T+h_{next}}$ over the next $h_{next}$ periods, though, corresponds to the correlation matrix associated to the averaged covariance matrix $\hat{\Sigma}_{T+1:T+h_{next}}$.&lt;/p&gt;

&lt;h3 id=&quot;rationale&quot;&gt;Rationale&lt;/h3&gt;

&lt;p&gt;The Average Oracle covariance matrix forecasting method &lt;em&gt;captures the average transition from two consecutive time windows&lt;/em&gt;&lt;sup id=&quot;fnref:1:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; by &lt;em&gt;averaging [oracle eigenvalues], rank-wise, over many randomly selected consecutive intervals taken from a long calibration window&lt;/em&gt;&lt;sup id=&quot;fnref:1:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;That covariance matrix forecasting method thus &lt;em&gt;tackles the evolution of [asset returns] dependencies with a time-invariant eigenvalue cleaning scheme&lt;/em&gt;&lt;sup id=&quot;fnref:1:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;As noted in Bongiorno et al.&lt;sup id=&quot;fnref:1:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;This is a zeroth order approximation, as the fluctuations of the optimal eigenvalue matrix around $\tilde{\Lambda}_{AO}$ sometimes most probably contain valuable additional information (as may do those of the eigenvectors). Nevertheless, this approximation is a powerful filtering tool and is easily computed from data without any modeling assumptions about the underlying system.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3 id=&quot;performances&quot;&gt;Performances&lt;/h3&gt;

&lt;p&gt;From a practical perspective, Bongiorno et al.&lt;sup id=&quot;fnref:1:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; and Bongiorno and Challet&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;10&lt;/a&gt;&lt;/sup&gt; empirically demonstrate that the Average Oracle covariance matrix forecasting method is more performant than the &lt;em&gt;current state-of-the-art (and complex) methods, Dynamic Conditional Covariance coupled to Non-Linear Shrinkage (DCC+NLS)&lt;/em&gt;&lt;sup id=&quot;fnref:3:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;10&lt;/a&gt;&lt;/sup&gt;, 
both in terms of Frobenius distance - as highlighted in Figure 2 - and in terms of &lt;em&gt;four key portfolio metrics: Sharpe ratio, turnover, gross leverage, and diversification&lt;/em&gt;&lt;sup id=&quot;fnref:3:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;10&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/covariance-matrix-forecasting-ao-bongiorno-frobenius-distance.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/covariance-matrix-forecasting-ao-bongiorno-frobenius-distance-small.png&quot; alt=&quot;Average Frobenius distance between the forecasted and the out-of-sample covariance matrices of n = 100 U.S. stocks as a function of the number of past periods $h_{in}$. Source: Adapted from Bongiorno et al.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 2. Average Frobenius distance between the forecasted and the out-of-sample covariance matrices of n = 100 U.S. stocks as a function of the number of past periods $h_{in}$. Source: Adapted from Bongiorno et al.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;These performances are commented as follows in Bongiorno et al.&lt;sup id=&quot;fnref:1:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;The fact that the Average Oracle is a better estimator for time-evolving covariance matrices most often implies that the most recent information contained in the sample eigenvalues is less relevant (and more noisy) than the AO ones that focus on the average transition.&lt;/p&gt;

  &lt;p&gt;Thus, the advantage of the Average Oracle is precisely that it captures some part of the average dynamics that is discarded by the assumption of a constant true covariance matrix [made in the DCC+NLS method].&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3 id=&quot;implementation-details&quot;&gt;Implementation details&lt;/h3&gt;

&lt;h4 id=&quot;how-to-choose-the-number-of-past-periods-h_in&quot;&gt;How to choose the number of past periods $h_{in}$?&lt;/h4&gt;

&lt;p&gt;Through extensive simulations, Bongiorno et al.&lt;sup id=&quot;fnref:1:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; concludes that the Average Oracle eigenvalues $\tilde{\Lambda}_{AO}$ mainly depend on the number of assets $n$ and&lt;sup id=&quot;fnref:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:11&quot; class=&quot;footnote&quot;&gt;11&lt;/a&gt;&lt;/sup&gt; on the number of past periods $h_{in}$ over which to compute the averaged asset returns covariance matrix.&lt;/p&gt;

&lt;p&gt;A natural question - unfortunately neither answered in Bongiorno et al.&lt;sup id=&quot;fnref:1:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; nor in Bongiorno and Challet&lt;sup id=&quot;fnref:3:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;10&lt;/a&gt;&lt;/sup&gt; - is then how to choose the value of $h_{in}$ in order to obtain the best forecasting performances?&lt;/p&gt;

&lt;p&gt;One possible answer, that relies on the interpretation of the Average Oracle covariance matrix forecasting method as a covariance matrix &lt;em&gt;cleaning scheme&lt;/em&gt;&lt;sup id=&quot;fnref:1:14&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, is to select $h_{in}$ so as to maximize the forecasting performances of 
&lt;a href=&quot;/blog/from-volatility-forecasting-to-covariance-matrix-forecasting-the-return-of-simple-and-exponentially-weighted-moving-average-models/&quot;&gt;a simple moving average covariance matrix forecasting model&lt;/a&gt; with a window size equal to $h_{in}$ 
for the considered value of $h_{next}$.&lt;/p&gt;

&lt;h4 id=&quot;how-to-choose-the-number-of-random-time-periods-n_b&quot;&gt;How to choose the number of random time periods $n_B$?&lt;/h4&gt;

&lt;p&gt;The number of random time periods $n_B$ must be high enough to ensure that the Average Oracle eigenvalues are stable enough - in particular when $h_{next}$ is small - because &lt;em&gt;by reducing [$h_{next}$], 
the estimation becomes noisier and thus requires more train and test windows […] to yield average eigenvalues with the same level of precision&lt;/em&gt;&lt;sup id=&quot;fnref:1:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Two examples&lt;sup id=&quot;fnref:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Bongiorno et al.&lt;sup id=&quot;fnref:1:16&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; uses $n_B = 10 000$ together with $h_{in} = h_{next} = 252$.&lt;/li&gt;
  &lt;li&gt;Bongiorno and Challet&lt;sup id=&quot;fnref:3:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;10&lt;/a&gt;&lt;/sup&gt; uses $n_B = 10 000$ together with $h_{in} \in \lbrace 240, 1200 \rbrace$ and $h_{next} \in \lbrace 5, 20 \rbrace$.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To be noted, though, that depending on the length of the calibration window $\mathcal{I}_{cal}$ and/or on the number of assets $n$, the time periods do not need to be generated at random - they can perfectly be generated deterministically so as to cover the whole calibration window.&lt;/p&gt;

&lt;h4 id=&quot;how-to-enforce-a-proper-ordering-of-the-average-oracle-eigenvalues&quot;&gt;How to enforce a proper ordering of the Average Oracle eigenvalues?&lt;/h4&gt;

&lt;p&gt;Bongiorno et al.&lt;sup id=&quot;fnref:1:17&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; stresses &lt;em&gt;that the columns of the eigenvectors [$\tilde{V}^{(b)}_{in}$ and $\tilde{V}_{in}$] must always follow the same the eigenvalue ordering convention&lt;/em&gt;&lt;sup id=&quot;fnref:1:18&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Nevertheless, despite enforcing such a convention, the order of the resulting Average Oracle eigenvalues $\tilde{\Lambda}_{AO}$ is not necessarily preserved &lt;em&gt;due to the finite size of the sample&lt;/em&gt;&lt;sup id=&quot;fnref:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;This &lt;em&gt;may be an unwanted feature within a rotational invariant assumption&lt;/em&gt;&lt;sup id=&quot;fnref:5:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;, since &lt;em&gt;there is no reason a priori to expect that it is optimal to modify the order of the eigenvalues, that is to say, the variance associated with the principal components&lt;/em&gt;&lt;sup id=&quot;fnref:5:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Bun et al.&lt;sup id=&quot;fnref:5:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt; proposes two solutions to this problem:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Sort the resulting eigenvalues&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Perform &lt;a href=&quot;https://en.wikipedia.org/wiki/Isotonic_regression&quot;&gt;an isotonic regression&lt;/a&gt; on the resulting eigenvalues&lt;/p&gt;

    &lt;p&gt;In the context the cross-validated eigenvalues cleaning scheme described in Reigneron et al.&lt;sup id=&quot;fnref:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:13&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt;, the impact of using an isotonic regression is depicted in Figure 3.&lt;/p&gt;

    &lt;figure&gt;
    &lt;a href=&quot;/assets/images/blog/covariance-matrix-forecasting-ao-reigneron-isotonic-regression.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/covariance-matrix-forecasting-ao-reigneron-isotonic-regression-small.png&quot; alt=&quot;Raw, cross-validated and isotonic eigenvalues eigenvalues as a function of in-sample eigenvalues. Source: Reigneron et al.&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Figure 3. Raw, cross-validated and isotonic eigenvalues eigenvalues as a function of in-sample eigenvalues. Source: Reigneron et al.&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;implementation-in-portfolio-optimizer&quot;&gt;Implementation in Portfolio Optimizer&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Portfolio Optimizer&lt;/strong&gt; implements the Average Oracle covariance and correlation matrix forecasting models through the endpoints 
&lt;a href=&quot;https://docs.portfoliooptimizer.io/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/assets/covariance/matrix/forecast/average-oracle&lt;/code&gt;&lt;/a&gt; and &lt;a href=&quot;https://docs.portfoliooptimizer.io/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/assets/correlation/matrix/forecast/average-oracle&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;These endpoints support the 2 covariance proxies below:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Squared (close-to-close) returns&lt;/li&gt;
  &lt;li&gt;Demeaned squared (close-to-close) returns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These endpoints also:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Implement an isotonic regression correction step for the Average Oracle eigenvalues.&lt;/li&gt;
  &lt;li&gt;Allow to automatically determine the number of past periods $h_{in}$, using a proprietary procedure.&lt;/li&gt;
  &lt;li&gt;Allow to either generate a given number of time periods $n_B$ uniformly at random within the calibration window $\mathcal{I}_{cal}$ or use all the time periods available within that window.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;example-of-usage---covariance-matrix-forecasting-at-monthly-level-for-a-portfolio-of-various-etfs&quot;&gt;Example of usage - Covariance matrix forecasting at monthly level for a portfolio of various ETFs&lt;/h2&gt;

&lt;p&gt;As an example of usage, I propose to evaluate the empirical performances of the Average Oracle covariance matrix forecating model within the framework of &lt;a href=&quot;&quot;&gt;the previous blog bost&lt;/a&gt;, whose aim is 
to forecast monthly covariance and correlation matrices for a portfolio of 10 ETFs representative&lt;sup id=&quot;fnref:17&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:17&quot; class=&quot;footnote&quot;&gt;15&lt;/a&gt;&lt;/sup&gt; of misc. asset classes:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;U.S. stocks (SPY ETF)&lt;/li&gt;
  &lt;li&gt;European stocks (EZU ETF)&lt;/li&gt;
  &lt;li&gt;Japanese stocks (EWJ ETF)&lt;/li&gt;
  &lt;li&gt;Emerging markets stocks (EEM ETF)&lt;/li&gt;
  &lt;li&gt;U.S. REITs (VNQ ETF)&lt;/li&gt;
  &lt;li&gt;International REITs (RWX ETF)&lt;/li&gt;
  &lt;li&gt;U.S. 7-10 year Treasuries (IEF ETF)&lt;/li&gt;
  &lt;li&gt;U.S. 20+ year Treasuries (TLT ETF)&lt;/li&gt;
  &lt;li&gt;Commodities (DBC ETF)&lt;/li&gt;
  &lt;li&gt;Gold (GLD ETF)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;results---covariance-matrix-forecasting&quot;&gt;Results - Covariance matrix forecasting&lt;/h3&gt;

&lt;p&gt;Results over the period 31st January 2008 - 31st July 2023&lt;sup id=&quot;fnref:22&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:22&quot; class=&quot;footnote&quot;&gt;16&lt;/a&gt;&lt;/sup&gt; for covariance matrices are the following&lt;sup id=&quot;fnref:23&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;17&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Covariance matrix model&lt;/th&gt;
      &lt;th&gt;Covariance matrix MSE&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;SMA, window size of all the previous months (historical average model)&lt;/td&gt;
      &lt;td&gt;9.59 $10^{-6}$&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SMA, window size of the previous year&lt;/td&gt;
      &lt;td&gt;9.08 $10^{-6}$&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Average Oracle, optimal&lt;sup id=&quot;fnref:24&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:24&quot; class=&quot;footnote&quot;&gt;18&lt;/a&gt;&lt;/sup&gt; $h_{in}$&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;6.77 $10^{-6}$&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;EWMA, optimal&lt;sup id=&quot;fnref:24:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:24&quot; class=&quot;footnote&quot;&gt;18&lt;/a&gt;&lt;/sup&gt; $\lambda$&lt;/td&gt;
      &lt;td&gt;6.52 $10^{-6}$&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;IEWMA, optimal&lt;sup id=&quot;fnref:24:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:24&quot; class=&quot;footnote&quot;&gt;18&lt;/a&gt;&lt;/sup&gt; $\left(\lambda_{vol},\lambda_{cor}\right)$&lt;/td&gt;
      &lt;td&gt;6.16 $10^{-6}$&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SMA, window size of the previous month (random walk model)&lt;/td&gt;
      &lt;td&gt;6.06 $10^{-6}$&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Within this specific evaluation framework, the Average Oracle covariance matrix forecasting model&lt;sup id=&quot;fnref:14&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:14&quot; class=&quot;footnote&quot;&gt;19&lt;/a&gt;&lt;/sup&gt; unfortunately does not seem to exhibit improved performances v.s. much simplier models, like the EWMA covariance matrix forecasting model.&lt;/p&gt;

&lt;h3 id=&quot;results---correlation-matrix-forecasting&quot;&gt;Results - Correlation matrix forecasting&lt;/h3&gt;

&lt;p&gt;Results over the period 31st January 2008 - 31st July 2023&lt;sup id=&quot;fnref:22:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:22&quot; class=&quot;footnote&quot;&gt;16&lt;/a&gt;&lt;/sup&gt; for the correlation matrices associated to the covariance matrices of the previous sub-section are the following&lt;sup id=&quot;fnref:23:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;17&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Covariance matrix model&lt;/th&gt;
      &lt;th&gt;Correlation matrix MSE&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;SMA, window size of the previous month (random walk model)&lt;/td&gt;
      &lt;td&gt;8.19&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SMA, window size of all the previous months (historical average model)&lt;/td&gt;
      &lt;td&gt;8.10&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Average Oracle, optimal&lt;sup id=&quot;fnref:24:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:24&quot; class=&quot;footnote&quot;&gt;18&lt;/a&gt;&lt;/sup&gt; $h_{in}$&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;6.59&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SMA, window size of the previous year&lt;/td&gt;
      &lt;td&gt;6.50&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;EWMA, optimal&lt;sup id=&quot;fnref:24:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:24&quot; class=&quot;footnote&quot;&gt;18&lt;/a&gt;&lt;/sup&gt; $\lambda$&lt;/td&gt;
      &lt;td&gt;5.87&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;IEWMA, optimal&lt;sup id=&quot;fnref:24:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:24&quot; class=&quot;footnote&quot;&gt;18&lt;/a&gt;&lt;/sup&gt; $\left(\lambda_{vol},\lambda_{cor}\right)$&lt;/td&gt;
      &lt;td&gt;5.70&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Here again, the Average Oracle model&lt;sup id=&quot;fnref:14:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:14&quot; class=&quot;footnote&quot;&gt;19&lt;/a&gt;&lt;/sup&gt; does not seem to particularly shine v.s. simplier models…&lt;/p&gt;

&lt;h3 id=&quot;comments&quot;&gt;Comments&lt;/h3&gt;

&lt;p&gt;Results from the previous sub-sections seem contradictory to those obtained in Bongiorno et al.&lt;sup id=&quot;fnref:1:19&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;However, as noted in Tan and Zohren&lt;sup id=&quot;fnref:4:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, &lt;em&gt;this is probably due to the fact that at relatively small dimensions, a covariance estimator may benefit more from picking up more recent time series variations&lt;/em&gt;&lt;sup id=&quot;fnref:4:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; like the EWMA and IEWMA covariance estimators.&lt;/p&gt;

&lt;p&gt;In other words, the Average Oracle is certainly a good choice &lt;em&gt;when the dimension of the problem becomes very large&lt;/em&gt;&lt;sup id=&quot;fnref:2:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; but is probably not competitive otherwise.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;The Average Oracle is a covariance and correlation matrix forecasting method very different in spirit from the moving average-based methods already described in the previous posts of this series.&lt;/p&gt;

&lt;p&gt;Unfortunately, the empirical performances of the Average Oracle method in terms of covariance and correlation matrix forecasting do not seem to improve over the much simplier EWMA and IEWMA methods, at least under the specific asset allocation context described in the previous section.&lt;/p&gt;

&lt;p&gt;To be noted, though, that this conclusion might be different with other Oracle-based covariance matrix forecasting methods, like those described in Tan and Zohren&lt;sup id=&quot;fnref:4:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; or in Reigneron et al.&lt;sup id=&quot;fnref:13:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:13&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt;…&lt;/p&gt;

&lt;p&gt;Anyway, feel free to &lt;a href=&quot;https://www.linkedin.com/in/roman-rubsamen/&quot;&gt;connect with me on LinkedIn&lt;/a&gt; or to &lt;a href=&quot;https://twitter.com/portfoliooptim&quot;&gt;follow me on Twitter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;–&lt;/p&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:4&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.pm-research.com/content/iijpormgmt/51/4/83&quot;&gt;Vincent Tan, Stefan Zohren, Estimation of Large Financial Covariances: A Cross-Validation Approach, The Journal of Portfolio Management  February 2025, 51 (4) 83-95&lt;/a&gt;. &lt;a href=&quot;#fnref:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:4:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:4:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:4:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://iopscience.iop.org/article/10.1088/1742-5468/acb7ed/meta&quot;&gt;Bongiorno, C., Challet, D. and Loeper, G., Filtering time-dependent covariance matrices using time-independent eigenvalues. J. Stat. Mech.: Theory and Experiment, 2023, 2, 023402&lt;/a&gt;. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:1:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;11&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;12&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:12&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;13&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:13&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;14&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:14&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;15&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:15&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;16&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:16&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;17&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:17&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;18&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:18&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;19&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:19&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;20&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://ieeexplore.ieee.org/document/7587390&quot;&gt;J. Bun, R. Allez, J. -P. Bouchaud and M. Potters, Rotational Invariant Estimator for General Noisy Matrices, IEEE Transactions on Information Theory, vol. 62, no. 12, pp. 7475-7490, Dec. 2016&lt;/a&gt;. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:2:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:6&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://doi.org/10.1016/j.jbankfin.2022.106426&quot;&gt;Gianluca De Nard, Robert F. Engle, Olivier Ledoit, Michael Wolf, Large dynamic covariance matrices: Enhancements based on intraday data, Journal of Banking &amp;amp; Finance, Volume 138, 2022, 106426&lt;/a&gt;. &lt;a href=&quot;#fnref:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:6:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:7&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;For the lack of a better term. &lt;a href=&quot;#fnref:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:8&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://link.springer.com/chapter/10.1007/978-3-540-71297-8_36&quot;&gt;Patton, A.J., Sheppard, K. (2009). Evaluating Volatility and Correlation Forecasts. In: Mikosch, T., Kreiß, JP., Davis, R., Andersen, T. (eds) Handbook of Financial Time Series. Springer, Berlin, Heidelberg&lt;/a&gt;. &lt;a href=&quot;#fnref:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:9&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://doi.org/10.1016/j.physa.2024.130225&quot;&gt;Christian Bongiorno and Lamia Lamrani, Quantifying the information lost in optimal covariance matrix cleaning, Physica A: Statistical Mechanics and its Applications, 657, 130225, 2025&lt;/a&gt;. &lt;a href=&quot;#fnref:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:9:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:9:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:15&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.83.1467&quot;&gt;Laurent Laloux, Pierre Cizeau, Jean-Philippe Bouchaud, and Marc Potters, Noise Dressing of Financial Correlation Matrices, Phys. Rev. Lett. 83, 1467&lt;/a&gt;. &lt;a href=&quot;#fnref:15&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:10&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;At least directly; by varying $h_{next}$, it is possible - but cumbersome - to compute the individual covariance matrices $\hat{\Sigma}_{T+1}$, $…$, $\hat{\Sigma}_{T+h_{next}}$ and deduce the averaged correlation matrix $\hat{C}_{T+1:T+h_{next}}$. &lt;a href=&quot;#fnref:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.tandfonline.com/doi/pdf/10.1080/14697688.2024.2372053&quot;&gt;Bongiorno, C., &amp;amp; Challet, D. (2024). Covariance matrix filtering and portfolio optimisation: the average oracle vs non-linear shrinkage and all the variants of DCC-NLS. Quantitative Finance, 24(9), 1227–1234&lt;/a&gt;. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:3:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:11&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;And interestingly, not so much on the value of $h_{next}$. &lt;a href=&quot;#fnref:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:12&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;In both cases, the 10 000  time periods random selection also incorporates a random selection of U.S. stocks within each time window. &lt;a href=&quot;#fnref:12&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:5&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://journals.aps.org/pre/abstract/10.1103/PhysRevE.98.052145&quot;&gt;Bun, J., Bouchaud, J. P., &amp;amp; Potters, M., Overlaps between eigenvectors of correlated random matrices. Physical Review E, 98(5), 052145 (2018)&lt;/a&gt;. &lt;a href=&quot;#fnref:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:5:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:5:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:5:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:13&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.pm-research.com/content/iijpormgmt/46/4/22&quot;&gt;Pierre-Alain Reigneron, Vincent Nguyen, Stefano Ciliberti, Philip Seager, Jean-Philippe Bouchaud, Agnostic Allocation Portfolios: A Sweet Spot in the Risk-Based Jungle?, The Journal of Portfolio Management  March 2020, 46 (4) 22-38&lt;/a&gt;. &lt;a href=&quot;#fnref:13&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:13:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:17&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;These ETFs are used in the &lt;em&gt;Adaptative Asset Allocation&lt;/em&gt; strategy from &lt;a href=&quot;https://investresolve.com/&quot;&gt;ReSolve Asset Management&lt;/a&gt;, described in the paper &lt;em&gt;Adaptive Asset Allocation: A Primer&lt;/em&gt;&lt;sup id=&quot;fnref:18&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:18&quot; class=&quot;footnote&quot;&gt;20&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:17&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:22&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;(Adjusted) daily prices have have been retrieved using &lt;a href=&quot;https://api.tiingo.com/&quot;&gt;Tiingo&lt;/a&gt;. &lt;a href=&quot;#fnref:22&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:22:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:23&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Using the outer product of asset returns - assuming a mean return of 0 - as covariance proxy, and using an expanding historical window of asset returns. &lt;a href=&quot;#fnref:23&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:23:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:24&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;As determined by &lt;strong&gt;Portfolio Optimizer&lt;/strong&gt; at the end of every month using all the available asset returns history up to that point in time; thus, there is no look-ahead bias. &lt;a href=&quot;#fnref:24&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:24:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:24:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:24:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:24:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:24:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:14&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Results using a couple of fixed value for $h_{in}$ (21,…) are worse and so not presented here. &lt;a href=&quot;#fnref:14&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:14:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:18&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2328254&quot;&gt;Butler, Adam and Philbrick, Mike and Gordillo, Rodrigo and Varadi, David, Adaptive Asset Allocation: A Primer&lt;/a&gt;. &lt;a href=&quot;#fnref:18&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name>Roman R.</name></author><category term="covariance matrix" /><summary type="html">Continuing this series on covariance matrix forecasting (c.f. here and there for the previous posts), I will now describe a relatively recent1 data-driven, model-free, way to [forecast] covariance [and correlation] matrices of time-varying systems2 rooted in random matrix theory. This method - introduced in Bongiorno et al.2 and called Average Oracle - consists in replacing the eigenvalues of a (noisy) estimate of a time-varying covariance matrix by time-independant eigenvalues that encode the average influence of the future on present eigenvalues2. In this blog post, I will describe that method and, as now usual in this series, I will illustrate its empirical performances in the context of monthly covariance matrix forecasting for a multi-asset class ETF portfolio. Mathematical preliminaries Some of these sub-sections contain reminders from a previous blog post. Dynamic covariance and correlation matrices Let $n$ be the number of assets in a universe of assets and $r_t \in \mathbb{R}^n$ be the vector of the (logarithmic) return process of these assets over a time period $t$ (a day, a week, a month..) over which their mean return vector $\mu_t \in \mathbb{R}^n$ is supposed to be null. Then: The asset covariance matrix $\Sigma_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ over the time period $t$ is defined as $\Sigma_t = \mathbb{E} \left[ r_t r_t {}^t \right]$. That matrix is called3 the population (or true) covariance matrix of the asset returns over the time period $t$. The asset correlation matrix $C_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ over the time period $t$ is defined as the correlation matrix $ C_t = V_t^{-1} \Sigma_t V_t^{-1} $ associated to the covariance matrix $\Sigma_t$, where $V_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ is the diagonal matrix of the asset standard deviations. Let now be $T$ time periods $t = 1..T$. Then: The averaged4 covariance matrix $\Sigma_{1:T} \in \mathcal{M}(\mathbb{R}^{n \times n})$ over the $T$ time periods $t = 1..T$ is defined as $ \Sigma_{1:T} = \frac{1}{T} \sum_{t=1}^{T} \Sigma_{t} $. In case the return process is time-invariant, the averaged covariance matrix $\Sigma_{1:T}$ is equal to the (constant) population covariance matrix $\left( \Sigma = \right) \Sigma_t, t = 1..T$. The averaged4 correlation matrix $C_{1:T} \in \mathcal{M}(\mathbb{R}^{n \times n})$ over the $T$ time periods $t = 1..T$ is defined as $ C_{1:T} = \frac{1}{T} \sum_{t=1}^{T} C_{t} $. In case the return process is time-invariant, the averaged correlation matrix $C_{1:T}$ is equal to the (constant) correlation matrix $\left( C = \right) = C_t, t = 1..T$. The pseudo-averaged5 correlation matrix $C_{p, 1:T} \in \mathcal{M}(\mathbb{R}^{n \times n})$ over the $T$ time periods $t = 1..T$ is defined as the correlation matrix associated to the averaged covariance matrix $\Sigma_{1:T}$. In case the return process is time-invariant, the pseudo-averaged correlation matrix $C_{p, 1:T} $ is equal to the averaged correlation matrix $C_{1:T}$, but in general, due to time-varying asset standard deviations, the two correlation matrices are different. Dynamic covariance and correlation matrices sample estimators In practice, the asset return process $r_t$ is usually not known and the only available information is the vectors of realized asset returns $ \tilde{r}_1,…, \tilde{r}_T \in \mathbb{R}^n$ for $T$ time periods. From each of these vectors: The classical way to estimate the covariances is to compute the empirical (or sample) covariance matrix thanks to Pearson estimator3 defined as $\tilde{\Sigma}_t = \tilde{r}_t \tilde{r}_t {}^t $ over each time period $t = 1..T$. Here, the outer product of the realized asset returns $ \tilde{r}_t \tilde{r}_t {}^t $ over the time period $t$ is called a covariance estimate $\tilde{\Sigma}_t$ - or covariance proxy6 - for the (unobserved) asset returns covariance matrix over that time period. The empirical (or sample) correlation matrix over each time period $t = 1..T$ is defined as the correlation matrix $ \tilde{C}_t $ associated to the covariance matrix $\tilde{\Sigma}_t$. The empirical averaged covariance matrix over the $T$ time periods $t = 1..T$ is typically estimated from the realized asset returns $ \tilde{r}_1,…, \tilde{r}_T \in \mathbb{R}^n$ thanks to the averaged Pearson estimator defined as $\tilde{\Sigma}_{1:T} = \frac{1}{T} \sum_{t=1}^{T} \tilde{r}_t \tilde{r}_t {}^t $. The empirical averaged correlation matrix over the $T$ time periods $t = 1..T$ is defined as $ \tilde{C}_{1:T} = \frac{1}{T} \sum_{t=1}^{T} \tilde{C}_{t} $. The empirical pseudo-averaged correlation matrix over the $T$ time periods $t = 1..T$ is defined as the correlation matrix $ \tilde{C}_{p, 1:T}$ associated to the empirical averaged covariance matrix $\tilde{\Sigma}_{1:T}$. Covariance and correlation matrices rotationally invariant estimators Rotationally invariant estimators As already mentioned in a previous blog post on correlation matrices denoising, the estimation of empirical covariance and correlation matrices in finance is affected by noise, in the form of measurement error, due in part to the short length of the time series of asset returns typically used in their computation. Indeed: Constructing a well-diversified portfolio requires many assets7, that is, a big $n$. In contrast, rapid shifts in financial market dependencies can only be captured by short calibration windows7 for estimating asset correlations, that is, a small $T$. This situation leads to an aspect ratio $q = \frac{n}{T}$ of empirical matrices either close to 1 - or even worse, much greater than 1 -, which is catastrophic from an estimation perspective, c.f. the above blog post and references therein. Hopefully, numerous techniques have been developed to improve the estimation of noisy covariance [or correlation] matrices7. Several of these techniques, like the eigenvalue clipping method8, involve a specific class of matrix estimators - known as Rotationally Invariant Estimators (RIE) or Orthogonally Invariant Estimators (OIE) - that leaves the eigenvectors of the empirical matrices untouched while altering their eigenvalues. In other words, an RIE estimator $\Xi \left( \tilde{\Sigma}_t \right) \in \mathcal{M}(\mathbb{R}^{n \times n})$ of a true covariance or correlation matrix $\Sigma_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ obtained from its empirical counterpart $\tilde{\Sigma}_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ has the general form \[\Xi \left( \tilde{\Sigma}_t \right) = \tilde{V}_t \Lambda_t \tilde{V}_t {}^t\] , where: $\Lambda_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ is a diagonal matrix of well-chosen2 eigenvalues. $\tilde{V}_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ is the matrix of eigenvectors of $\tilde{\Sigma}_t$, defined through the spectral decomposition $ \tilde{\Sigma}_t = \tilde{V}_t \tilde{\Lambda}_t \tilde{V}_t {}^t $ with $ \tilde{\Lambda}_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ the diagonal matrix of eigenvalues of $\tilde{\Sigma}_t$. Bun et al.3 explains the underlying rationale as follows: The true matrix [$\Sigma_t$] is unknown and we do not have any particular insights on its components (the eigenvectors). Therefore we would like our estimator [$\Xi \left( \tilde{\Sigma}_t \right)$] to be constructed in a rotationally invariant way from the noisy observation [$\tilde{\Sigma}_t$] that we have. In simple terms, this means that there is no privileged direction in the $n$-dimensional space that would allow one to bias the eigenvectors of the estimator [$\Xi \left( \tilde{\Sigma}_t \right)$] in some special directions. More formally, the estimator construction must obey: $ \Omega \Xi \left( \tilde{\Sigma}_t \right) \Omega {}^t$ $=$ $\Xi \left( \Omega \tilde{\Sigma}_t \Omega {}^t \right)$ for any rotation matrix $\Omega \in \mathcal{M}(\mathbb{R}^{n \times n})$. Any estimator satisfying [that equation] will be referred to as a Rotational Invariant Estimator (RIE). In this case, it turns out that the eigenvectors of the estimator [$\Xi \left( \tilde{\Sigma}_t \right)$] have to be the same as those of the noisy matrix [$\tilde{\Sigma}_t$]. Rotationally invariant oracle estimator Bun et al.3 shows that the optimal RIE estimator of the unknown matrix $\Sigma_t$ in terms of Frobenius norm is the RIE estimator whose diagonal matrix of eigenvalues $\Lambda_O$ satisfy \[\Lambda_O = \text{diag} \left( \tilde{V}_t {}^t \Sigma_t \tilde{V}_t \right)\] That estimator is sometimes called the oracle estimator because it depends explicitly on the knowledge of the true signal [$\Sigma_t$]3 and so is not directly usable in practice. Remarkably, [though], asymptotically optimal RIEs that converge to the oracle estimator can be obtained without the knowledge of the true covariance; however, such estimators require that: i) the ground truth does not change, ii) the data matrix is very large, and iii) the data has at least finite fourth moments2. Those conditions, and especially the first one, are definitely not satisfied by asset returns, which leads to suboptimal estimators in practice… As a side note, and maybe contrary to intuition, the optimal eigenvalues $\Lambda_O$ are NOT equal to the eigenvalues of $\Sigma_t$, because this would result in a spectrum that is too wide3. The Average Oracle covariance matrix forecasting method Forecasting formulas Let be: $n$ be the number of assets in a universe of assets $\tilde{r}_t \tilde{r}_t {}^t$, $t=1..T$ the outer products of the observed asset returns over each of $T$ past periods $1 \leq h_{in} \ll T$ a chosen number of past periods $\mathcal{I}_{cal} = [t_{cal}, T - h_{next}]$ a long calibration window, with $t_{cal}$ chosen so that $t_{cal} \geq h_{in} - 1$ and $| \mathcal{I}_{cal} | \gg h_{in}$ Asset returns averaged covariance matrix The Average Oracle covariance matrix forecasting model estimates the asset returns averaged covariance matrix $\hat{\Sigma}_{T+1:T+h_{next}}$ over the next $h_{next} \geq 1$ periods as follows2: Choose a number of random time periods $n_B \geq 1$ to generate. For $b = 1..n_B$ do Select uniformly at random with replacement a time period $t^{(b)} \in \mathcal{I}_{cal}$ Compute the “past” averaged asset returns covariance matrix $ \tilde{\Sigma}^{(b)}_{in} $ on the train window $\mathcal{I}^{(b)}_{in} = [t^{(b)} - h_{in} + 1, t^{(b)}]$, defined by $\tilde{\Sigma}^{(b)}_{in} = \tilde{\Sigma}_{t^{(b)} - h_{in} + 1:t^{(b)}} = \frac{1}{h_{in}} \sum_{t=t^{(b)} - h_{in} + 1}^{t^{(b)}} \tilde{r}_t \tilde{r}_t {}^t $ and its associated correlation matrix $\tilde{C}^{(b)}_{in} = \tilde{C}_{p, t^{(b)} - h_{in} + 1:t^{(b)}}$ whose spectral decomposition is given by $\tilde{C}^{(b)}_{in} = \tilde{V}^{(b)}_{in} \tilde{\Lambda}^{(b)}_{in} \tilde{V}^{(b)}_{in} {}^t $. Compute the “future” averaged asset returns covariance matrix $ \tilde{\Sigma}^{(b)}_{next} $ on the test window $\mathcal{I}^{(b)}_{next} = [t^{(b)} + 1, t^{(b)} + h_{next}]$, defined by $ \tilde{\Sigma}^{(b)}_{next} = \tilde{\Sigma}_{t^{(b)} + 1:t^{(b)} + h_{next}} = \frac{1}{h_{next}} \sum_{t=t^{(b)} + 1}^{t^{(b)} + h_{next}} \tilde{r}_t \tilde{r}_t {}^t $ and its associated correlation matrix $\tilde{C}^{(b)}_{next} = \tilde{C}_{p, t^{(b)} + 1:t^{(b)} + h_{next}}$. Compute the diagonal matrix of oracle eigenvalues $\tilde{\Lambda}_O^{(b)} = \text{diag} \left( \tilde{V}^{(b)}_{in} {}^t \tilde{C}^{(b)}_{next} \tilde{V}^{(b)}_{in} \right) $. Compute the diagonal matrix of Average Oracle eigenvalues $\tilde{\Lambda}_{AO} = \frac{1}{n_B} \sum_{b=1}^{n_B} \tilde{\Lambda}_O^{(b)} $. Compute the most recent “past” averaged asset returns covariance matrix $ \tilde{\Sigma}_{in} $ on the window $\mathcal{I}_{in} = [T - h_{in} + 1, T]$, defined by $ \tilde{\Sigma}_{in} = \tilde{\Sigma}_{T - h_{in} + 1:T} = \frac{1}{h_{in}} \sum_{t=T - h_{in} + 1}^{T} \tilde{r}_t \tilde{r}_t {}^t $ and its associated correlation matrix $\tilde{C}_{in} = \tilde{C}_{p, T - h_{in} + 1:T}$ whose spectral decomposition is given by $\tilde{C}_{in} = \tilde{V}_{in} \tilde{\Lambda}_{in} \tilde{V}_{in} {}^t $. Compute $\hat{\Sigma}_{T+1:T+h_{next}} = D_{in} \tilde{V}_{in} \tilde{\Lambda}_{AO} \tilde{V}_{in} {}^t D_{in} $, where $D_{in} \in \mathcal{M}(\mathbb{R}^{n \times n})$ is the diagonal matrix of the standard deviations $\sqrt{ \left( \tilde{\Sigma}_{in} \right)_ii }, i=1..n$. For more visual clarity, Figure 1 illustrates that process. Figure 1. Illustration of the Average Oracle eigenvalues computation process, which uses different windows included in a long calibration window. Source: Adapted from Bongiorno et al. Asset returns averaged and pseudo-averaged correlation matrix The Average Oracle covariance matrix forecasting model does not easily9 allow to estimate the asset returns averaged correlation matrix $\hat{C}_{T+1:T+h_{next}}$, because it does not rely on the estimation of the individual covariance matrices $\hat{\Sigma}_{T+1}$, $…$, $\hat{\Sigma}_{T+h_{next}}$. The asset returns pseudo-averaged correlation matrix $\hat{C}_{p, T+1:T+h_{next}}$ over the next $h_{next}$ periods, though, corresponds to the correlation matrix associated to the averaged covariance matrix $\hat{\Sigma}_{T+1:T+h_{next}}$. Rationale The Average Oracle covariance matrix forecasting method captures the average transition from two consecutive time windows2 by averaging [oracle eigenvalues], rank-wise, over many randomly selected consecutive intervals taken from a long calibration window2. That covariance matrix forecasting method thus tackles the evolution of [asset returns] dependencies with a time-invariant eigenvalue cleaning scheme2. As noted in Bongiorno et al.2: This is a zeroth order approximation, as the fluctuations of the optimal eigenvalue matrix around $\tilde{\Lambda}_{AO}$ sometimes most probably contain valuable additional information (as may do those of the eigenvectors). Nevertheless, this approximation is a powerful filtering tool and is easily computed from data without any modeling assumptions about the underlying system. Performances From a practical perspective, Bongiorno et al.2 and Bongiorno and Challet10 empirically demonstrate that the Average Oracle covariance matrix forecasting method is more performant than the current state-of-the-art (and complex) methods, Dynamic Conditional Covariance coupled to Non-Linear Shrinkage (DCC+NLS)10, both in terms of Frobenius distance - as highlighted in Figure 2 - and in terms of four key portfolio metrics: Sharpe ratio, turnover, gross leverage, and diversification10. Figure 2. Average Frobenius distance between the forecasted and the out-of-sample covariance matrices of n = 100 U.S. stocks as a function of the number of past periods $h_{in}$. Source: Adapted from Bongiorno et al. These performances are commented as follows in Bongiorno et al.2: The fact that the Average Oracle is a better estimator for time-evolving covariance matrices most often implies that the most recent information contained in the sample eigenvalues is less relevant (and more noisy) than the AO ones that focus on the average transition. Thus, the advantage of the Average Oracle is precisely that it captures some part of the average dynamics that is discarded by the assumption of a constant true covariance matrix [made in the DCC+NLS method]. Implementation details How to choose the number of past periods $h_{in}$? Through extensive simulations, Bongiorno et al.2 concludes that the Average Oracle eigenvalues $\tilde{\Lambda}_{AO}$ mainly depend on the number of assets $n$ and11 on the number of past periods $h_{in}$ over which to compute the averaged asset returns covariance matrix. A natural question - unfortunately neither answered in Bongiorno et al.2 nor in Bongiorno and Challet10 - is then how to choose the value of $h_{in}$ in order to obtain the best forecasting performances? One possible answer, that relies on the interpretation of the Average Oracle covariance matrix forecasting method as a covariance matrix cleaning scheme2, is to select $h_{in}$ so as to maximize the forecasting performances of a simple moving average covariance matrix forecasting model with a window size equal to $h_{in}$ for the considered value of $h_{next}$. How to choose the number of random time periods $n_B$? The number of random time periods $n_B$ must be high enough to ensure that the Average Oracle eigenvalues are stable enough - in particular when $h_{next}$ is small - because by reducing [$h_{next}$], the estimation becomes noisier and thus requires more train and test windows […] to yield average eigenvalues with the same level of precision2. Two examples12: Bongiorno et al.2 uses $n_B = 10 000$ together with $h_{in} = h_{next} = 252$. Bongiorno and Challet10 uses $n_B = 10 000$ together with $h_{in} \in \lbrace 240, 1200 \rbrace$ and $h_{next} \in \lbrace 5, 20 \rbrace$. To be noted, though, that depending on the length of the calibration window $\mathcal{I}_{cal}$ and/or on the number of assets $n$, the time periods do not need to be generated at random - they can perfectly be generated deterministically so as to cover the whole calibration window. How to enforce a proper ordering of the Average Oracle eigenvalues? Bongiorno et al.2 stresses that the columns of the eigenvectors [$\tilde{V}^{(b)}_{in}$ and $\tilde{V}_{in}$] must always follow the same the eigenvalue ordering convention2. Nevertheless, despite enforcing such a convention, the order of the resulting Average Oracle eigenvalues $\tilde{\Lambda}_{AO}$ is not necessarily preserved due to the finite size of the sample13. This may be an unwanted feature within a rotational invariant assumption13, since there is no reason a priori to expect that it is optimal to modify the order of the eigenvalues, that is to say, the variance associated with the principal components13. Bun et al.13 proposes two solutions to this problem: Sort the resulting eigenvalues Perform an isotonic regression on the resulting eigenvalues In the context the cross-validated eigenvalues cleaning scheme described in Reigneron et al.14, the impact of using an isotonic regression is depicted in Figure 3. Figure 3. Raw, cross-validated and isotonic eigenvalues eigenvalues as a function of in-sample eigenvalues. Source: Reigneron et al. Implementation in Portfolio Optimizer Portfolio Optimizer implements the Average Oracle covariance and correlation matrix forecasting models through the endpoints /assets/covariance/matrix/forecast/average-oracle and /assets/correlation/matrix/forecast/average-oracle. These endpoints support the 2 covariance proxies below: Squared (close-to-close) returns Demeaned squared (close-to-close) returns These endpoints also: Implement an isotonic regression correction step for the Average Oracle eigenvalues. Allow to automatically determine the number of past periods $h_{in}$, using a proprietary procedure. Allow to either generate a given number of time periods $n_B$ uniformly at random within the calibration window $\mathcal{I}_{cal}$ or use all the time periods available within that window. Example of usage - Covariance matrix forecasting at monthly level for a portfolio of various ETFs As an example of usage, I propose to evaluate the empirical performances of the Average Oracle covariance matrix forecating model within the framework of the previous blog bost, whose aim is to forecast monthly covariance and correlation matrices for a portfolio of 10 ETFs representative15 of misc. asset classes: U.S. stocks (SPY ETF) European stocks (EZU ETF) Japanese stocks (EWJ ETF) Emerging markets stocks (EEM ETF) U.S. REITs (VNQ ETF) International REITs (RWX ETF) U.S. 7-10 year Treasuries (IEF ETF) U.S. 20+ year Treasuries (TLT ETF) Commodities (DBC ETF) Gold (GLD ETF) Results - Covariance matrix forecasting Results over the period 31st January 2008 - 31st July 202316 for covariance matrices are the following17: Covariance matrix model Covariance matrix MSE SMA, window size of all the previous months (historical average model) 9.59 $10^{-6}$ SMA, window size of the previous year 9.08 $10^{-6}$ Average Oracle, optimal18 $h_{in}$ 6.77 $10^{-6}$ EWMA, optimal18 $\lambda$ 6.52 $10^{-6}$ IEWMA, optimal18 $\left(\lambda_{vol},\lambda_{cor}\right)$ 6.16 $10^{-6}$ SMA, window size of the previous month (random walk model) 6.06 $10^{-6}$ Within this specific evaluation framework, the Average Oracle covariance matrix forecasting model19 unfortunately does not seem to exhibit improved performances v.s. much simplier models, like the EWMA covariance matrix forecasting model. Results - Correlation matrix forecasting Results over the period 31st January 2008 - 31st July 202316 for the correlation matrices associated to the covariance matrices of the previous sub-section are the following17: Covariance matrix model Correlation matrix MSE SMA, window size of the previous month (random walk model) 8.19 SMA, window size of all the previous months (historical average model) 8.10 Average Oracle, optimal18 $h_{in}$ 6.59 SMA, window size of the previous year 6.50 EWMA, optimal18 $\lambda$ 5.87 IEWMA, optimal18 $\left(\lambda_{vol},\lambda_{cor}\right)$ 5.70 Here again, the Average Oracle model19 does not seem to particularly shine v.s. simplier models… Comments Results from the previous sub-sections seem contradictory to those obtained in Bongiorno et al.2. However, as noted in Tan and Zohren1, this is probably due to the fact that at relatively small dimensions, a covariance estimator may benefit more from picking up more recent time series variations1 like the EWMA and IEWMA covariance estimators. In other words, the Average Oracle is certainly a good choice when the dimension of the problem becomes very large3 but is probably not competitive otherwise. Conclusion The Average Oracle is a covariance and correlation matrix forecasting method very different in spirit from the moving average-based methods already described in the previous posts of this series. Unfortunately, the empirical performances of the Average Oracle method in terms of covariance and correlation matrix forecasting do not seem to improve over the much simplier EWMA and IEWMA methods, at least under the specific asset allocation context described in the previous section. To be noted, though, that this conclusion might be different with other Oracle-based covariance matrix forecasting methods, like those described in Tan and Zohren1 or in Reigneron et al.14… Anyway, feel free to connect with me on LinkedIn or to follow me on Twitter. – See Vincent Tan, Stefan Zohren, Estimation of Large Financial Covariances: A Cross-Validation Approach, The Journal of Portfolio Management February 2025, 51 (4) 83-95. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 See Bongiorno, C., Challet, D. and Loeper, G., Filtering time-dependent covariance matrices using time-independent eigenvalues. J. Stat. Mech.: Theory and Experiment, 2023, 2, 023402. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 &amp;#8617;10 &amp;#8617;11 &amp;#8617;12 &amp;#8617;13 &amp;#8617;14 &amp;#8617;15 &amp;#8617;16 &amp;#8617;17 &amp;#8617;18 &amp;#8617;19 &amp;#8617;20 See J. Bun, R. Allez, J. -P. Bouchaud and M. Potters, Rotational Invariant Estimator for General Noisy Matrices, IEEE Transactions on Information Theory, vol. 62, no. 12, pp. 7475-7490, Dec. 2016. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 See Gianluca De Nard, Robert F. Engle, Olivier Ledoit, Michael Wolf, Large dynamic covariance matrices: Enhancements based on intraday data, Journal of Banking &amp;amp; Finance, Volume 138, 2022, 106426. &amp;#8617; &amp;#8617;2 For the lack of a better term. &amp;#8617; See Patton, A.J., Sheppard, K. (2009). Evaluating Volatility and Correlation Forecasts. In: Mikosch, T., Kreiß, JP., Davis, R., Andersen, T. (eds) Handbook of Financial Time Series. Springer, Berlin, Heidelberg. &amp;#8617; See Christian Bongiorno and Lamia Lamrani, Quantifying the information lost in optimal covariance matrix cleaning, Physica A: Statistical Mechanics and its Applications, 657, 130225, 2025. &amp;#8617; &amp;#8617;2 &amp;#8617;3 See Laurent Laloux, Pierre Cizeau, Jean-Philippe Bouchaud, and Marc Potters, Noise Dressing of Financial Correlation Matrices, Phys. Rev. Lett. 83, 1467. &amp;#8617; At least directly; by varying $h_{next}$, it is possible - but cumbersome - to compute the individual covariance matrices $\hat{\Sigma}_{T+1}$, $…$, $\hat{\Sigma}_{T+h_{next}}$ and deduce the averaged correlation matrix $\hat{C}_{T+1:T+h_{next}}$. &amp;#8617; See Bongiorno, C., &amp;amp; Challet, D. (2024). Covariance matrix filtering and portfolio optimisation: the average oracle vs non-linear shrinkage and all the variants of DCC-NLS. Quantitative Finance, 24(9), 1227–1234. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 And interestingly, not so much on the value of $h_{next}$. &amp;#8617; In both cases, the 10 000 time periods random selection also incorporates a random selection of U.S. stocks within each time window. &amp;#8617; See Bun, J., Bouchaud, J. P., &amp;amp; Potters, M., Overlaps between eigenvectors of correlated random matrices. Physical Review E, 98(5), 052145 (2018). &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 See Pierre-Alain Reigneron, Vincent Nguyen, Stefano Ciliberti, Philip Seager, Jean-Philippe Bouchaud, Agnostic Allocation Portfolios: A Sweet Spot in the Risk-Based Jungle?, The Journal of Portfolio Management March 2020, 46 (4) 22-38. &amp;#8617; &amp;#8617;2 These ETFs are used in the Adaptative Asset Allocation strategy from ReSolve Asset Management, described in the paper Adaptive Asset Allocation: A Primer20. &amp;#8617; (Adjusted) daily prices have have been retrieved using Tiingo. &amp;#8617; &amp;#8617;2 Using the outer product of asset returns - assuming a mean return of 0 - as covariance proxy, and using an expanding historical window of asset returns. &amp;#8617; &amp;#8617;2 As determined by Portfolio Optimizer at the end of every month using all the available asset returns history up to that point in time; thus, there is no look-ahead bias. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 Results using a couple of fixed value for $h_{in}$ (21,…) are worse and so not presented here. &amp;#8617; &amp;#8617;2 See Butler, Adam and Philbrick, Mike and Gordillo, Rodrigo and Varadi, David, Adaptive Asset Allocation: A Primer. &amp;#8617;</summary></entry><entry><title type="html">Value at Risk: Univariate Estimation Methods</title><link href="https://portfoliooptimizer.io/blog/value-at-risk-univariate-estimation-methods/" rel="alternate" type="text/html" title="Value at Risk: Univariate Estimation Methods" /><published>2025-10-28T00:00:00-05:00</published><updated>2025-10-28T00:00:00-05:00</updated><id>https://portfoliooptimizer.io/blog/value-at-risk-univariate-estimation-methods</id><content type="html" xml:base="https://portfoliooptimizer.io/blog/value-at-risk-univariate-estimation-methods/">&lt;p&gt;Value-at-Risk (&lt;em&gt;VaR&lt;/em&gt;) is &lt;em&gt;one of the most commonly used risk measures in the financial industry&lt;/em&gt;&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; in part thanks to its simplicity - because &lt;em&gt;VaR reduces the market risk associated with any portfolio to just one number&lt;/em&gt;&lt;sup id=&quot;fnref:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:15&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; - and in part due to regulatory 
requirements (&lt;a href=&quot;https://en.wikipedia.org/wiki/Basel_Accords&quot;&gt;Basel market risk frameworks&lt;/a&gt;&lt;sup id=&quot;fnref:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:9&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;sup id=&quot;fnref:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;, &lt;a href=&quot;https://www.sec.gov/resources-small-businesses/small-business-compliance-guides/use-derivatives-registered-investment-companies-business-development-companies-small-entity&quot;&gt;SEC Rule 18f-4&lt;/a&gt;&lt;sup id=&quot;fnref:14&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:14&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;…).&lt;/p&gt;

&lt;p&gt;Nevertheless, when it comes to actual computations, the above definition is &lt;em&gt;by no means constructive&lt;/em&gt;&lt;sup id=&quot;fnref:1:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; and accurately estimating VaR is &lt;em&gt;a very challenging statistical problem&lt;/em&gt;&lt;sup id=&quot;fnref:15:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:15&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; for which several methods have been developed.&lt;/p&gt;

&lt;p&gt;In this blog post, I will describe some of the most well-known univariate VaR estimation methods, ranging from non-parametric methods based on empirical quantiles to semi-parametric methods involving &lt;a href=&quot;https://en.wikipedia.org/wiki/Kernel_smoother&quot;&gt;kernel smoothing&lt;/a&gt; 
or &lt;a href=&quot;https://en.wikipedia.org/wiki/Extreme_value_theory&quot;&gt;extreme value theory&lt;/a&gt; and to parametric methods relying on distributional assumptions.&lt;/p&gt;

&lt;h2 id=&quot;value-at-risk&quot;&gt;Value-at-Risk&lt;/h2&gt;

&lt;h3 id=&quot;definition&quot;&gt;Definition&lt;/h3&gt;

&lt;p&gt;The Value-at-Risk of a portfolio of financial instruments corresponds to &lt;em&gt;the maximum potential change in value of [that portfolio] with a given probability over a certain horizon&lt;/em&gt;&lt;sup id=&quot;fnref:15:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:15&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;More formally, the Value-at-Risk $VaR_{\alpha}$ of a portfolio over a time horizon $T$ (1 day, 10 days, 20 days…) and at a confidence level $\alpha$% $\in ]0,1[$ (95%, 97.5%, 99%…) can be defined&lt;sup id=&quot;fnref:82&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:82&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; 
as the opposite&lt;sup id=&quot;fnref:20&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:20&quot; class=&quot;footnote&quot;&gt;7&lt;/a&gt;&lt;/sup&gt; of the $1 - \alpha$ quantile of the portfolio return distribution over the time horizon $T$&lt;/p&gt;

\[\text{VaR}_{\alpha} = - \inf_{x} \left\{x \in \mathbb{R}, P(X \leq x) \geq 1 - \alpha \right\}\]

&lt;p&gt;, where $X$ is a random variable representing the portfolio return over the time horizon $T$.&lt;/p&gt;

&lt;p&gt;This formula is also equivalent&lt;sup id=&quot;fnref:18&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:18&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt; to&lt;/p&gt;

\[\text{VaR}_{\alpha} = - F_X^{-1}(1 - \alpha)\]

&lt;p&gt;, where $F_X^{-1}$ is the inverse cumulative distribution function, also called the &lt;a href=&quot;https://en.wikipedia.org/wiki/Quantile_function&quot;&gt;quantile function&lt;/a&gt;, of the random variable $X$.&lt;/p&gt;

&lt;p&gt;Graphically, this definition is illustrated in:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Figure 1, for a continuous portfolio return distribution at a generic confidence level $\alpha$% and over a generic horizon.&lt;/p&gt;

    &lt;figure&gt;
  &lt;a href=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-definition-yamai.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-definition-yamai-small.png&quot; alt=&quot;Graphical illustration of a portfolio VaR as a quantile of its continuous return distribution. Source: Adapted from Yamai and Yoshiba.&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Figure 1. Graphical illustration of a portfolio VaR as a quantile of its continuous return distribution. Source: Adapted from Yamai and Yoshiba.&lt;/figcaption&gt;
&lt;/figure&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Figure 2, for a discrete portfolio return distribution at a confidence level $\alpha$ = 99% and over a 1-month horizon, which is commented as follows in Jorion&lt;sup id=&quot;fnref:16&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:16&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

    &lt;blockquote&gt;
      &lt;p&gt;We need to find the loss that will not be exceeded in 99% of cases, or such that 1% of [the 624] observations - that is, 6 out of 624 occurrences - are lower. From Figure [2], this number is about -3.6%, [resulting in a portfolio VaR of 3.6%].&lt;/p&gt;
    &lt;/blockquote&gt;

    &lt;figure&gt;
  &lt;a href=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-definition-jorion.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-definition-jorion-small.png&quot; alt=&quot;Graphical illustration of a portfolio VaR as a quantile of its discrete monthly return distribution, $\alpha$ = 99%. Source: Jorion.&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Figure 2. Graphical illustration of a portfolio VaR as a quantile of its discrete monthly return distribution, $\alpha$ = 99%. Source: Jorion.&lt;/figcaption&gt;
&lt;/figure&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;univariate-vs-multivariate-value-at-risk&quot;&gt;Univariate v.s. multivariate Value-at-Risk&lt;/h3&gt;

&lt;p&gt;A portfolio can be considered as both:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;An asset in itself, with its own return distribution.&lt;/li&gt;
  &lt;li&gt;A weighted collection of individual assets, each with their own return distribution.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This raises the question of whether &lt;em&gt;first to aggregate profit and loss data and proceed with a univariate [VaR] model for the aggregate, or to start with disaggregate data&lt;/em&gt;&lt;sup id=&quot;fnref:85&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:85&quot; class=&quot;footnote&quot;&gt;10&lt;/a&gt;&lt;/sup&gt; and proceed with a multivariate VaR model from the disaggregated data.&lt;/p&gt;

&lt;p&gt;In this blog post, I will only discuss univariate&lt;sup id=&quot;fnref:89&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:89&quot; class=&quot;footnote&quot;&gt;11&lt;/a&gt;&lt;/sup&gt; VaR models - originally suggested by Zangari&lt;sup id=&quot;fnref:87&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:87&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt; as &lt;em&gt;simple and effective approach[es] for calculating Value-at-Risk&lt;/em&gt;&lt;sup id=&quot;fnref:87:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:87&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt; - in which portfolio returns are considered as a univariate time series &lt;em&gt;without reference to the portfolio constituents&lt;/em&gt;&lt;sup id=&quot;fnref:22&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:22&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Indeed, &lt;em&gt;since the goal of VaR is to measure the market risk of a portfolio, it seems reasonable to model the portfolio return series directly&lt;/em&gt;&lt;sup id=&quot;fnref:87:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:87&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h3 id=&quot;arithmetic-returns-vs-logarithmic-returns-in-value-at-risk-calculations&quot;&gt;Arithmetic returns v.s. logarithmic returns in Value-at-Risk calculations&lt;/h3&gt;

&lt;p&gt;In VaR calculations, it is usually prefered, &lt;em&gt;for a variety of reasons, to work with logarithmic returns rather than arithmetic (simple, linear) ones&lt;/em&gt;&lt;sup id=&quot;fnref:22:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:22&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;, c.f. Ballotta&lt;sup id=&quot;fnref:22:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:22&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt; and Jorion&lt;sup id=&quot;fnref:16:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:16&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt; for more details.&lt;/p&gt;

&lt;p&gt;In that case, though, because &lt;em&gt;investors are primarily interested in simple returns&lt;/em&gt;&lt;sup id=&quot;fnref:68&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:68&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt;, the logarithmic VaR $\text{VaR}_{\alpha}^{(l)} $ needs to be converted into an arithmetic VaR $\text{VaR}_{\alpha}^{(a)} $.&lt;/p&gt;

&lt;p&gt;Thanks to the definition of VaR as a quantile of the portfolio return distribution and the relationship between arithmetic and logarithmic returns, this is easily be done through the formula&lt;/p&gt;

\[\text{VaR}_{\alpha}^{(a)}  = 1 - \exp \left( - \text{VaR}_{\alpha}^{(l)}  - 1 \right)\]

&lt;p&gt;While not frequently mentioned in the litterature&lt;sup id=&quot;fnref:17&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:17&quot; class=&quot;footnote&quot;&gt;15&lt;/a&gt;&lt;/sup&gt;, it is important to be aware of this subtlety.&lt;/p&gt;

&lt;h3 id=&quot;history-of-value-at-risk&quot;&gt;History of Value-at-Risk&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Searching for the best means to represent the risk exposure of a financial institution’s trading portfolio in a single number&lt;/em&gt;&lt;sup id=&quot;fnref:28&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:28&quot; class=&quot;footnote&quot;&gt;16&lt;/a&gt;&lt;/sup&gt; is a quest that &lt;em&gt;folklore attributes the inception of to Dennis Weatherstone at J.P. Morgan [in the late 1980s], who was looking for a way to convey meaningful 
risk exposure information to the financial institution’s board without the need for significant technical expertise on the part of the board members&lt;/em&gt;&lt;sup id=&quot;fnref:28:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:28&quot; class=&quot;footnote&quot;&gt;16&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;It is then Till Guldimann, head of global research at J.P. Morgan at that time, who designed what would come to be known as the J.P. Morgan’s daily VaR report&lt;sup id=&quot;fnref:46&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:46&quot; class=&quot;footnote&quot;&gt;17&lt;/a&gt;&lt;/sup&gt; and who thus &lt;em&gt;can be viewed as the creator of the term Value-at-Risk&lt;/em&gt;&lt;sup id=&quot;fnref:16:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:16&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;The interested reader is referred to Holton&lt;sup id=&quot;fnref:21&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:21&quot; class=&quot;footnote&quot;&gt;18&lt;/a&gt;&lt;/sup&gt; for an historical perspective on Value-at-Risk, in which the origins of VaR as a measure of risk are even &lt;em&gt;traced back as far as 1922 to capital requirements the New York Stock Exchange imposed on member firms&lt;/em&gt;&lt;sup id=&quot;fnref:21:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:21&quot; class=&quot;footnote&quot;&gt;18&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h2 id=&quot;value-at-risk-estimation&quot;&gt;Value-at-Risk estimation&lt;/h2&gt;

&lt;p&gt;When a sample of portfolio returns over a given time horizon $r_1,…,r_n$ is available - like in ex post analysis -, the Value-at-Risk of that portfolio over the same horizon&lt;sup id=&quot;fnref:27&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:27&quot; class=&quot;footnote&quot;&gt;19&lt;/a&gt;&lt;/sup&gt; at a confidence level $\alpha \in ]0,1[$ is a textbook example of VaR calculation as 
the opposite of the $(1 - \alpha)$% quantile of the empirical return distribution $r_1,…,r_n$.&lt;/p&gt;

&lt;p&gt;Problem is, the discrete nature of the extreme returns of interest makes it difficult to accurately compute that quantile, as explained in Danielsson and de Vries&lt;sup id=&quot;fnref:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;20&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;In the interior, the empirical sampling distribution is very dense, with adjacent observations very close to each other. As a result the sampling distribution is very smooth in the interior and is the mean squared error consistent estimate of the true distribution.&lt;/p&gt;

  &lt;p&gt;The closer one gets to the extremes, the longer the interval between adjacent returns becomes. This can be seen in [Figure 3] where the 7 largest and smallest returns on the stocks in the sample portfolio and SP-500 Index for 10 years are listed. These extreme observations are typically the most important for VaR analysis, however since these values are clearly discrete, the VaR will also be discrete, and hence be either underpredicted or overpredicted.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;figure&gt;
  &lt;a href=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-extreme-returns-danielsson-devries.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-extreme-returns-danielsson-devries-small.png&quot; alt=&quot;Extreme daily returns for select U.S. stocks and S&amp;amp;P 500, 1987-1996. Source: Danielsson and de Vries.&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Figure 3. Extreme daily returns for select U.S. stocks and S&amp;amp;P 500, 1987-1996. Source: Danielsson and de Vries.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;In other words, &lt;em&gt;the quantile corresponding to the estimation of the Value-at-Risk […] rather depends on the realizations of the [portfolio returns] than on their probability distribution&lt;/em&gt;&lt;sup id=&quot;fnref:1:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, so that 
&lt;em&gt;the Value-at-Risk calculated with a quantile of the empirical distribution will be highly unstable, especially when considering a Value-at-Risk with a high confidence level with only few available data&lt;/em&gt;&lt;sup id=&quot;fnref:1:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;This is why VaR estimation is &lt;em&gt;a very challenging statistical problem&lt;/em&gt;&lt;sup id=&quot;fnref:15:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:15&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, sharing many similarities with the problem of estimating the frequency and/or severity of extreme events in other domains, like floods frequency estimation&lt;sup id=&quot;fnref:66&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:66&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt; in hydrology.&lt;/p&gt;

&lt;p&gt;In order to compute a &lt;a href=&quot;https://en.wikipedia.org/wiki/Estimator&quot;&gt;statistical estimator&lt;/a&gt; of a portfolio Value-at-Risk, three main approaches exist:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Non-parametric approaches, that do not make any specific distributional assumptions on the portfolio return distribution and whose VaR estimators do not depend on any auxiliary parameter.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Semi-parametric approaches, that do not make any specific distributional assumptions on the portfolio return distribution but whose VaR estimators depend on one or several auxiliary parameters.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Parametric approaches, that make a specific distributional assumption on the portfolio return distribution and whose VaR estimators depend on one or several auxiliary parameters.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To be noted that non-parametric and semi-parametric approaches might still make distributional assumptions, in particular for convergence proofs - like assuming that returns are &lt;a href=&quot;https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables&quot;&gt;independent and identically distributed (i.i.d.)&lt;/a&gt; -, 
but these assumptions are then generic in nature, contrary to parametric approaches which assume a very specific return distribution, like &lt;a href=&quot;https://en.wikipedia.org/wiki/Normal_distribution&quot;&gt;a Gaussian distribution&lt;/a&gt;, which is &lt;em&gt;one of the most widely applied parametric probability distribution&lt;/em&gt;&lt;sup id=&quot;fnref:83&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:83&quot; class=&quot;footnote&quot;&gt;22&lt;/a&gt;&lt;/sup&gt; in finance.&lt;/p&gt;

&lt;h2 id=&quot;non-parametric-and-semi-parametric-value-at-risk-estimation&quot;&gt;Non-parametric and semi-parametric Value-at-Risk estimation&lt;/h2&gt;

&lt;p&gt;Chen and Yong Tang&lt;sup id=&quot;fnref:29&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:29&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt; notes that non-parametric and semi-parametric VaR estimators &lt;em&gt;have the advantages of (i) being free of distributional assumptions […] while being able to capture fat-tail and asymmetry distribution of returns automatically; 
and (ii) imposing much weaker assumptions on the dynamics of the return process and allowing data “speak for themselves”&lt;/em&gt;&lt;sup id=&quot;fnref:29:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:29&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h3 id=&quot;empirical-quantile-of-the-portfolio-return-distribution&quot;&gt;Empirical quantile of the portfolio return distribution&lt;/h3&gt;

&lt;p&gt;A well-known estimator of the $(1 - \alpha)$% quantile of any probability distribution is the empirical $(1 - \alpha)$% quantile of that distribution, which relies on &lt;a href=&quot;https://en.wikipedia.org/wiki/Order_statistic&quot;&gt;order statistics&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the context of VaR estimation, the underlying idea is explained in Dowd&lt;sup id=&quot;fnref:23&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;24&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;If we have a sample of $n$ profit and loss (P/L) observations, we can regard each observation as giving an estimate of VaR at an implied probability level. For example, if $n$ = 100, we can take the 5% VaR as the negative of the sixth&lt;sup id=&quot;fnref:33&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:33&quot; class=&quot;footnote&quot;&gt;25&lt;/a&gt;&lt;/sup&gt; smallest P/L observation, the 1% VaR as the negative of the second-smallest, and so on.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This leads to the empirical portfolio VaR estimator, defined&lt;sup id=&quot;fnref:35&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:35&quot; class=&quot;footnote&quot;&gt;26&lt;/a&gt;&lt;/sup&gt; as the opposite of the $n (1 - \alpha) + 1$-th highest portfolio return&lt;sup id=&quot;fnref:44&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:44&quot; class=&quot;footnote&quot;&gt;27&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

\[\text{VaR}_{\alpha} = -r_{\left( n (1 - \alpha) + 1 \right)}\]

&lt;p&gt;, where $r_{(1)} \leq r_{(2)} \leq … \leq r_{(n-1)} \leq r_{(n)}$ are the order statistics of the portfolio returns.&lt;/p&gt;

&lt;p&gt;Now, due to the discrete nature of the portfolio return distribution, there is little chance that $n (1 - \alpha) + 1$ is an integer.&lt;/p&gt;

&lt;p&gt;In that case, two&lt;sup id=&quot;fnref:35:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:35&quot; class=&quot;footnote&quot;&gt;26&lt;/a&gt;&lt;/sup&gt; possible choices are:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Either to define the opposite of the $\lfloor n (1 - \alpha)  \rfloor + 1$-th highest portfolio return&lt;sup id=&quot;fnref:44:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:44&quot; class=&quot;footnote&quot;&gt;27&lt;/a&gt;&lt;/sup&gt; as the empirical portfolio VaR estimator&lt;sup id=&quot;fnref:23:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;24&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

\[\text{VaR}_{\alpha} = - r_{\left(  \lfloor n (1 - \alpha)  \rfloor + 1 \right)}\]
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Or to define a linear interpolation&lt;sup id=&quot;fnref:24&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:24&quot; class=&quot;footnote&quot;&gt;28&lt;/a&gt;&lt;/sup&gt; between the opposite of the $\lfloor (n+1) \left( 1 - \alpha \right) \rfloor $-th and $ \lfloor (n+1)  \left( 1 - \alpha \right) \rfloor + 1$-th highest portfolio returns&lt;sup id=&quot;fnref:44:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:44&quot; class=&quot;footnote&quot;&gt;27&lt;/a&gt;&lt;/sup&gt; as the empirical portfolio VaR estimator&lt;sup id=&quot;fnref:26&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:26&quot; class=&quot;footnote&quot;&gt;29&lt;/a&gt;&lt;/sup&gt;&lt;sup id=&quot;fnref:43&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:43&quot; class=&quot;footnote&quot;&gt;30&lt;/a&gt;&lt;/sup&gt;&lt;sup id=&quot;fnref:22:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:22&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

\[\text{VaR}_{\alpha} = - \left( 1 - \gamma \right) r_{\left(  \lfloor  (n+1)  \left( 1 - \alpha \right) \rfloor \right)} - \gamma r_{\left( \lfloor (n+1) \left( 1 - \alpha \right) \rfloor + 1 \right)}\]

    &lt;p&gt;, with $\gamma$ $= (n+1) \left( 1 - \alpha \right)$  $- \lfloor (n+1) \left( 1 - \alpha \right) \rfloor$.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An interesting property of the resulting portfolio VaR estimator is that it is &lt;a href=&quot;https://en.wikipedia.org/wiki/Consistent_estimator&quot;&gt;consistent&lt;/a&gt; in the presence of &lt;a href=&quot;https://en.wikipedia.org/wiki/Mixing_(mathematics)#Mixing_in_stochastic_processes&quot;&gt;weak dependence&lt;/a&gt;&lt;sup id=&quot;fnref:37&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:37&quot; class=&quot;footnote&quot;&gt;31&lt;/a&gt;&lt;/sup&gt; between portfolio returns, c.f. Chen and Yong Tang&lt;sup id=&quot;fnref:29:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:29&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In terms of drawbacks, the two major limitations of the empirical portfolio VaR estimator are that:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;It &lt;em&gt;only takes into account a small part of the information contained in the [portfolio returns] distribution function&lt;/em&gt;&lt;sup id=&quot;fnref:1:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; - that is, at most two returns - which is highly &lt;a href=&quot;https://en.wikipedia.org/wiki/Efficiency_(statistics)&quot;&gt;inefficient&lt;/a&gt;, especially when the number of portfolio returns is already relatively small.&lt;/li&gt;
  &lt;li&gt;It &lt;em&gt;cannot generate any information about the tail of the return distribution beyond the smallest sample observation&lt;/em&gt;&lt;sup id=&quot;fnref:28:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:28&quot; class=&quot;footnote&quot;&gt;16&lt;/a&gt;&lt;/sup&gt;, which might lead to severly underestimate the true risk of the portfolio.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;kernel-smoothed-quantile-of-the-portfolio-return-distribution&quot;&gt;Kernel-smoothed quantile of the portfolio return distribution&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Another way to account for the information available in the empirical […] distribution&lt;/em&gt;&lt;sup id=&quot;fnref:1:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; than using the empirical quantile estimator discussed in the previous sub-section is to use the kernel-smoothed quantile estimator introduced in Gourieroux et al.&lt;sup id=&quot;fnref:31&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:31&quot; class=&quot;footnote&quot;&gt;32&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Kernel-smoothing is a methodology belonging to statistics and probability theory that &lt;em&gt;can be thought of as a way of generalizing a histogram constructed with the sample data&lt;/em&gt;&lt;sup id=&quot;fnref:28:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:28&quot; class=&quot;footnote&quot;&gt;16&lt;/a&gt;&lt;/sup&gt;, as illustrated in Figure 4.&lt;/p&gt;

&lt;figure&gt;
  &lt;a href=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-kernel-smoothing-wikipedia.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-kernel-smoothing-wikipedia-small.png&quot; alt=&quot;Histogram v.s. kernel-smoothed density for the same sample of data. Source: Wikipedia.&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Figure 4. Histogram v.s. kernel-smoothed density for the same sample of data. Source: Wikipedia.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;On Figure 4, &lt;em&gt;where a histogram results in a density that is piecewise constant, a kernel[-smoothed} approximation results in a smooth density&lt;/em&gt;&lt;sup id=&quot;fnref:29:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:29&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Coming back to the estimator of Gourieroux et al.&lt;sup id=&quot;fnref:31:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:31&quot; class=&quot;footnote&quot;&gt;32&lt;/a&gt;&lt;/sup&gt;, it is defined as the $(1 - \alpha)$% quantile of a kernel-smoothed approximation of the portfolio return distribution, which essentially results 
in &lt;em&gt;a weighted average of the order statistics around [$r_{\left( \lfloor n (1 - \alpha) \rfloor + 1 \right)}$] rather than […] a single order statistic&lt;/em&gt;&lt;sup id=&quot;fnref:29:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:29&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt; or a linear interpolation between two order statistics.&lt;/p&gt;

&lt;p&gt;From a practical perspective, that VaR estimator is computed as follows:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Select &lt;a href=&quot;https://en.wikipedia.org/wiki/Kernel_(statistics)#Kernel_functions_in_common_use&quot;&gt;a kernel function&lt;/a&gt; $K$, usually&lt;sup id=&quot;fnref:29:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:29&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt; taken as a symmetric&lt;sup id=&quot;fnref:40&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:40&quot; class=&quot;footnote&quot;&gt;33&lt;/a&gt;&lt;/sup&gt; probability density function.&lt;/p&gt;

    &lt;p&gt;The theoretical optimal&lt;sup id=&quot;fnref:39&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:39&quot; class=&quot;footnote&quot;&gt;34&lt;/a&gt;&lt;/sup&gt; choice for such a kernel function is the Epanechnikov kernel, defined as $ K(u) = \frac{3}{4} \left( 1 - u^2 \right) I_{|u| \leq 1}$.&lt;/p&gt;

    &lt;p&gt;However, the litterature suggests that &lt;em&gt;the form of the kernel has little effect on the [accuracy] of the [kernel-smoothed approximation of the return distribution]&lt;/em&gt;&lt;sup id=&quot;fnref:31:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:31&quot; class=&quot;footnote&quot;&gt;32&lt;/a&gt;&lt;/sup&gt;, mainly because:&lt;/p&gt;
    &lt;ul&gt;
      &lt;li&gt;The theoretical framework used to establish the optimality of the Epanechnikov kernel relies on large asymptotics, moreover in a debatable way&lt;sup id=&quot;fnref:32&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:32&quot; class=&quot;footnote&quot;&gt;35&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
      &lt;li&gt;The performances&lt;sup id=&quot;fnref:38&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:38&quot; class=&quot;footnote&quot;&gt;36&lt;/a&gt;&lt;/sup&gt; of other commonly used kernel functions are anyway very close to those of the Epanechnikov kernel&lt;sup id=&quot;fnref:30&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:30&quot; class=&quot;footnote&quot;&gt;37&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
    &lt;/ul&gt;

    &lt;p&gt;So, in applications, the most common&lt;sup id=&quot;fnref:29:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:29&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt;&lt;sup id=&quot;fnref:31:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:31&quot; class=&quot;footnote&quot;&gt;32&lt;/a&gt;&lt;/sup&gt; choice for a kernel function is rather the Gaussian kernel, defined as $ K(u) = \frac{1}{\sqrt{2 \pi }} e^{- \frac{u^2}{2} }$.&lt;/p&gt;

    &lt;p&gt;As a side note, using a kernel function to approximate the portfolio return distribution might look like a parametric approach to VaR estimation in disguise, but Butler and Schachter&lt;sup id=&quot;fnref:28:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:28&quot; class=&quot;footnote&quot;&gt;16&lt;/a&gt;&lt;/sup&gt; explains why this is not the case:&lt;/p&gt;

    &lt;blockquote&gt;
      &lt;p&gt;Note that use of a normal or Gaussian kernel estimator does not make the ultimate estimation of the VaR parametric.&lt;/p&gt;

      &lt;p&gt;As the sample size grows, the net sum of all the smoothed points approaches the true [portfolio return distribution], whatever that may be, irrespective of the method of smoothing the data. This is because the influence of each point becomes arbitrarily small as the sample size grows, so the choice of kernel imposes no restrictions on the results.&lt;/p&gt;
    &lt;/blockquote&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Select a kernel bandwidth parameter $h &amp;gt; 0$ for the kernel function.&lt;/p&gt;

    &lt;p&gt;Gourieroux et al.&lt;sup id=&quot;fnref:31:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:31&quot; class=&quot;footnote&quot;&gt;32&lt;/a&gt;&lt;/sup&gt; describes that parameter as follows:&lt;/p&gt;

    &lt;blockquote&gt;
      &lt;p&gt;The bandwidth parameter controls the range of data points that will be used to estimate the distribution.&lt;/p&gt;

      &lt;p&gt;A small bandwidth results in a rough distribution that does not improve appreciably on the original data, while a large bandwidth over-smoothes the density curve and erases the underlying structure.&lt;/p&gt;
    &lt;/blockquote&gt;

    &lt;p&gt;This latter point is illustrated in Figure 5 and Figure 6.&lt;/p&gt;

    &lt;figure&gt;
    &lt;a href=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-kernel-smoothing-wand-jones.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-kernel-smoothing-wand-jones-small.png&quot; alt=&quot;Influence of the bandwidth parameter on the kernel-smoothed approximation of a normal mixture distribution (dashed) from n = 1000 observations. Source: Adapted from Wand and Jones.&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Figure 5. Influence of the bandwidth parameter on the kernel-smoothed approximation of a normal mixture distribution (dashed) from n = 1000 observations. Source: Adapted from Wand and Jones.&lt;/figcaption&gt;
  &lt;/figure&gt;

    &lt;figure&gt;
    &lt;a href=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-kernel-smoothing-KDE_bw_animation.gif&quot;&gt;&lt;img src=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-kernel-smoothing-KDE_bw_animation.gif&quot; alt=&quot;Dynamic influence of the bandwidth parameter on the kernel-smoothed approximation of a normal distribution. Source: KDEpy.&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Figure 6. Dynamic influence of the bandwidth parameter on the kernel-smoothed approximation of a normal distribution. Source: &lt;a href=&quot;https://kdepy.readthedocs.io/en/latest/bandwidth.html&quot;&gt;KDEpy&lt;/a&gt;.&lt;/figcaption&gt;
  &lt;/figure&gt;

    &lt;p&gt;On Figure 5, it is clearly visible that:&lt;/p&gt;
    &lt;ul&gt;
      &lt;li&gt;
        &lt;p&gt;a) The estimate of the normal mixture distribution is &lt;em&gt;very rough&lt;/em&gt;&lt;sup id=&quot;fnref:30:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:30&quot; class=&quot;footnote&quot;&gt;37&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

        &lt;p&gt;This corresponds to a too small bandwidth parameter $h$ that undersmoothes the observations.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;b) The estimate of the normal mixture distribution smoothes away its bimodality structure.&lt;/p&gt;

        &lt;p&gt;This corresponds to a too big bandwidth parameter $h$ that oversmoothes the observations.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;c) The estimate of the normal mixture distribution &lt;em&gt;is not overly noisy, yet the essential structure of the underlying density has been recovered&lt;/em&gt;&lt;sup id=&quot;fnref:30:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:30&quot; class=&quot;footnote&quot;&gt;37&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

        &lt;p&gt;This corresponds to an adequate bandwidth parameter $h$.&lt;/p&gt;
      &lt;/li&gt;
    &lt;/ul&gt;

    &lt;p&gt;On Figure 6, the situation is the same as in Figure 5, except that the bandwidth parameter $h$ is being dynamically increased from 0 to ~20.&lt;/p&gt;

    &lt;p&gt;Figure 5 and Figure 6 empirically demonstrate that the choice of the bandwidth is &lt;em&gt;of crucial importance&lt;/em&gt;&lt;sup id=&quot;fnref:42&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:42&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt;, although &lt;em&gt;a difficult task, especially when smoothing the tails of underlying distributions with possible data scarcity&lt;/em&gt;&lt;sup id=&quot;fnref:42:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:42&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;p&gt;The interested reader is refered to Wand and Jones&lt;sup id=&quot;fnref:30:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:30&quot; class=&quot;footnote&quot;&gt;37&lt;/a&gt;&lt;/sup&gt;, Tsybakov&lt;sup id=&quot;fnref:32:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:32&quot; class=&quot;footnote&quot;&gt;35&lt;/a&gt;&lt;/sup&gt; and Cheng and Sun&lt;sup id=&quot;fnref:48&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:48&quot; class=&quot;footnote&quot;&gt;39&lt;/a&gt;&lt;/sup&gt; for the description of several methods to choose the optimal bandwidth for a kernel function.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Compute the kernel-smoothed portfolio VaR estimator as&lt;/p&gt;

\[\text{VaR}_{\alpha} = - \hat{F}^{-1}(1 - \alpha)\]

    &lt;p&gt;, where $ \hat{F}^{-1}(1 - \alpha) $ is the solution of the quantile equation&lt;/p&gt;

\[\hat{F}(x) = \int_{-\infty}^x \hat{f}(u) \, du = 1 - \alpha\]

    &lt;p&gt;with $ \hat{f}(x) = \frac{1}{n h} \sum_{i=1}^n K \left(  \frac{x - r_i}{h} \right) $.&lt;/p&gt;

    &lt;p&gt;That part is typically done with a numerical algorithm, like the &lt;a href=&quot;https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm&quot;&gt;Gauss–Newton algorithm&lt;/a&gt; mentioned in Gourieroux et al.&lt;sup id=&quot;fnref:31:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:31&quot; class=&quot;footnote&quot;&gt;32&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Two important positive results on the kernel-smoothed VaR estimator are established in Chen and Yong Tang&lt;sup id=&quot;fnref:29:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:29&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Theoretically, it is consistent in the presence of weak dependence between portfolio returns.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Empirically, it produces more precise estimates&lt;sup id=&quot;fnref:49&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:49&quot; class=&quot;footnote&quot;&gt;40&lt;/a&gt;&lt;/sup&gt; than those obtained with the empirical VaR estimator - especially when the number of observations is small - which &lt;em&gt;can translate to a large amount in financial terms&lt;/em&gt;&lt;sup id=&quot;fnref:29:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:29&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;p&gt;Similar results - in a non-financial context - are reported in Cheng and Sun&lt;sup id=&quot;fnref:48:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:48&quot; class=&quot;footnote&quot;&gt;39&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

    &lt;blockquote&gt;
      &lt;p&gt;It turns out that kernel smoothed quantile estimators, with no matter which bandwidth selection method used, are more efficient than the empirical quantile estimator in most situations.&lt;/p&gt;

      &lt;p&gt;And when sample size is relatively small, kernel smoothed estimators are especially more efficient than the empirical quantile estimator.&lt;/p&gt;
    &lt;/blockquote&gt;

    &lt;p&gt;In other words, &lt;em&gt;the extra effort of smoothing pays of at the end&lt;/em&gt;&lt;sup id=&quot;fnref:29:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:29&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt;!&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The major limitation of the kernel-smoothed VaR estimator, though, is that if the selected kernel function does not reflect the tail features of the true portfolio return distribution, &lt;em&gt;some problems may arise when the quantile to be estimated requires an extrapolation […] 
far beyond the range of observed data&lt;/em&gt;&lt;sup id=&quot;fnref:45&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:45&quot; class=&quot;footnote&quot;&gt;41&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In the words of Danielsson and de Vries&lt;sup id=&quot;fnref:7:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;20&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Almost all kernels are estimated with the entire data set, with interior observations dominating the kernel estimation.&lt;/p&gt;

  &lt;p&gt;While even the most careful kernel estimation will provide good estimates for the interior, there is no reason to believe that the kernel will describe the tails adequately.&lt;/p&gt;

  &lt;p&gt;Tail bumpiness is a common problem in kernel estimation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So, while the kernel-smoothed  VaR estimator is capable of tail extrapolation - contrary to the empirical VaR estimator - that capability should be used with extreme caution&lt;sup id=&quot;fnref:75&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:75&quot; class=&quot;footnote&quot;&gt;42&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;A proper portfolio VaR estimator when tail extrapolation is needed thus remains elusive at this stage.&lt;/p&gt;

&lt;h3 id=&quot;extrapolated-empirical-quantile-of-the-portfolio-return-distribution&quot;&gt;Extrapolated empirical quantile of the portfolio return distribution&lt;/h3&gt;

&lt;p&gt;In order to solve the problem of tail extrapolation while retaining the simplicity of the empirical quantile estimator, Hutson&lt;sup id=&quot;fnref:43:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:43&quot; class=&quot;footnote&quot;&gt;30&lt;/a&gt;&lt;/sup&gt; proposes to extend the linearly interpolated quantile function 
into &lt;em&gt;a tail extrapolation quantile function&lt;/em&gt;&lt;sup id=&quot;fnref:32:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:32&quot; class=&quot;footnote&quot;&gt;35&lt;/a&gt;&lt;/sup&gt; that &lt;em&gt;allows for non-parametric extrapolation beyond the observed data&lt;/em&gt;&lt;sup id=&quot;fnref:43:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:43&quot; class=&quot;footnote&quot;&gt;30&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In terms of VaR estimation, Hutson’s work translates into the following extrapolated empirical portfolio VaR estimator:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;For $0 &amp;lt; \left( 1 - \alpha \right) \leq \frac{1}{n+1}$&lt;/p&gt;

\[\text{VaR}_{\alpha} = - r_{(1)} - \left( r_{(2)} - r_{(1)} \right) \log \left( (n+1) \left( 1 - \alpha \right) \right)\]
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;For $\left( 1 - \alpha \right) \in ]\frac{1}{n+1} , \frac{n}{n+1}[$, $ \text{VaR}_{\alpha} $ is defined as the standard empirical portfolio VaR estimator&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;For $\frac{n}{n+1} &amp;lt; \left( 1 - \alpha \right) &amp;lt; 1$&lt;/p&gt;

\[\text{VaR}_{\alpha} = - r_{(n)} + \left(  r_{(n)} - r_{(n-1)} \right) \log \left( (n+1) \alpha \right)\]
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hutson&lt;sup id=&quot;fnref:43:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:43&quot; class=&quot;footnote&quot;&gt;30&lt;/a&gt;&lt;/sup&gt; establishes the consistency of his quantile estimator for i.i.d. observations and empirically demonstrates using misc. theoretical distributions that it &lt;em&gt;fits well to the ideal sample for all distributions for ideal samples as small as $n = 10$&lt;/em&gt;&lt;sup id=&quot;fnref:43:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:43&quot; class=&quot;footnote&quot;&gt;30&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Unfortunately for financial applications, Hutson&lt;sup id=&quot;fnref:43:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:43&quot; class=&quot;footnote&quot;&gt;30&lt;/a&gt;&lt;/sup&gt; also notes that his quantile estimator &lt;em&gt;appears to be [unable] to completely capture the tail behavior of &lt;a href=&quot;https://en.wikipedia.org/wiki/Heavy-tailed_distribution&quot;&gt;heavy-tailed&lt;/a&gt;&lt;sup id=&quot;fnref:57&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:57&quot; class=&quot;footnote&quot;&gt;43&lt;/a&gt;&lt;/sup&gt; distributions such as Cauchy&lt;/em&gt;&lt;sup id=&quot;fnref:43:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:43&quot; class=&quot;footnote&quot;&gt;30&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;This is confirmed by Banfi et al.&lt;sup id=&quot;fnref:45:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:45&quot; class=&quot;footnote&quot;&gt;41&lt;/a&gt;&lt;/sup&gt;, who finds that Hutson’s method provides &lt;em&gt;competitive results when light-tailed distributions are of interest&lt;/em&gt;&lt;sup id=&quot;fnref:45:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:45&quot; class=&quot;footnote&quot;&gt;41&lt;/a&gt;&lt;/sup&gt; but &lt;em&gt;generates large biases in the case of heavy-tailed distributions&lt;/em&gt;&lt;sup id=&quot;fnref:45:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:45&quot; class=&quot;footnote&quot;&gt;41&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h3 id=&quot;extreme-value-theory-based-quantile-of-the-portfolio-return-distribution&quot;&gt;Extreme value theory-based quantile of the portfolio return distribution&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Another possibility to improve [the] tail extrapolation&lt;/em&gt;&lt;sup id=&quot;fnref:45:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:45&quot; class=&quot;footnote&quot;&gt;41&lt;/a&gt;&lt;/sup&gt; properties of the empirical quantile estimator is to rely on extreme value theory (EVT), which is a domain of statistics 
&lt;em&gt;concerned with the study of the asymptotical distribution of extreme events, that is to say events which are rare in frequency and huge with respect to the majority of observations&lt;/em&gt;&lt;sup id=&quot;fnref:60&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:60&quot; class=&quot;footnote&quot;&gt;44&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Indeed, because VaR &lt;em&gt;only deals with extreme quantiles of the distribution&lt;/em&gt;&lt;sup id=&quot;fnref:60:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:60&quot; class=&quot;footnote&quot;&gt;44&lt;/a&gt;&lt;/sup&gt;, EVT sounds like a natural framework for providing &lt;em&gt;more reliable VaR estimates than the usual ones, given that [it] directly concentrates on the tails of the distribution, thus avoiding a major flaw 
of [other] approaches whose estimates are somehow biased by the credit they give to the central part of the distribution, thus underestimating extremes and outliers, which are exactly what one is interested in when calculating VaR&lt;/em&gt;&lt;sup id=&quot;fnref:60:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:60&quot; class=&quot;footnote&quot;&gt;44&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Two preliminary remarks before proceeding:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;One notational&lt;/p&gt;

    &lt;p&gt;The EVT litterature&lt;sup id=&quot;fnref:54&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:54&quot; class=&quot;footnote&quot;&gt;45&lt;/a&gt;&lt;/sup&gt; typically focuses on the upper tail of distributions, so that the portfolio returns $r_1,…,r_n$ need to be replaced by their opposites $r^{‘}_1 = -r_1$,…,$r^{‘}_n=-r_n$.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;One important for applying EVT results in finance&lt;/p&gt;

    &lt;p&gt;&lt;em&gt;Most existing EVT methods require [i.i.d.] observations, whereas financial time series exhibit obvious serial dependence features such as volatility clustering&lt;/em&gt;&lt;sup id=&quot;fnref:60:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:60&quot; class=&quot;footnote&quot;&gt;44&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;p&gt;It turns out that &lt;em&gt;this issue has been addressed in works dealing with weak serial dependence [and] the main message from these studies is that usual EVT methods are still valid&lt;/em&gt;&lt;sup id=&quot;fnref:60:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:60&quot; class=&quot;footnote&quot;&gt;44&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;p&gt;For example, the Hill estimator discussed below has originally been derived under the assumptions of i.i.d. observations&lt;sup id=&quot;fnref:65&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:65&quot; class=&quot;footnote&quot;&gt;46&lt;/a&gt;&lt;/sup&gt;, but it has also been proved to be usable with weakly dependent observations&lt;sup id=&quot;fnref:72&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:72&quot; class=&quot;footnote&quot;&gt;47&lt;/a&gt;&lt;/sup&gt;; the same applies&lt;sup id=&quot;fnref:71&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:71&quot; class=&quot;footnote&quot;&gt;48&lt;/a&gt;&lt;/sup&gt; to the Weissman quantile estimator also discussed below.&lt;/p&gt;

    &lt;p&gt;In addition, even though &lt;em&gt;the [EVT] estimators obtained may be less accurate and neglecting this fact could lead to inadequate resolutions in order to cope with the risk of occurrence of extreme events&lt;/em&gt;&lt;sup id=&quot;fnref:60:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:60&quot; class=&quot;footnote&quot;&gt;44&lt;/a&gt;&lt;/sup&gt;, they are &lt;em&gt;consistent and unbiased in the presence of higher moment dependence&lt;/em&gt;&lt;sup id=&quot;fnref:68:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:68&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt; and it is even possible to &lt;em&gt;explicitly model extreme dependence using the [notion of] extremal index&lt;/em&gt;&lt;sup id=&quot;fnref:68:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:68&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With these remarks in mind, EVT offers two main methods&lt;sup id=&quot;fnref:68:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:68&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt; to model the upper tail of the portfolio return distribution $r^{‘}_1$,…,$r^{‘}_n$ and compute an EVT-based portfolio VaR estimator:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;A fully parametric method based on the &lt;a href=&quot;https://en.wikipedia.org/wiki/Generalized_Pareto_distribution&quot;&gt;generalized Pareto distribution (GPD)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;A semi-parametric method based on the Hill (or similar) estimator&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;generalized-pareto-distribution-based-method&quot;&gt;Generalized Pareto distribution-based method&lt;/h4&gt;

&lt;p&gt;This method consists in fitting a generalized Pareto distribution to the upper tail of the portfolio return &lt;a href=&quot;https://en.wikipedia.org/wiki/Cumulative_distribution_function&quot;&gt;cumulative distribution function (c.d.f.)&lt;/a&gt; and computing its $\alpha$% quantile.&lt;/p&gt;

&lt;p&gt;This is for example done in the seminal paper of McNeil and Frey&lt;sup id=&quot;fnref:52&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:52&quot; class=&quot;footnote&quot;&gt;49&lt;/a&gt;&lt;/sup&gt;, in which such a distribution is fitted to the returns&lt;sup id=&quot;fnref:99&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:99&quot; class=&quot;footnote&quot;&gt;50&lt;/a&gt;&lt;/sup&gt; of different financial instruments.&lt;/p&gt;

&lt;p&gt;The theoretical justification for this method lies in &lt;a href=&quot;https://en.wikipedia.org/wiki/Pickands%E2%80%93Balkema%E2%80%93De_Haan_theorem&quot;&gt;the Pickands-Balkema-De Hann’s theorem&lt;/a&gt;, which states that &lt;em&gt;EVT holds sufficiently far out in the tails such 
that we can obtain the distribution not only of the maxima but also of other extremely large observations&lt;/em&gt;&lt;sup id=&quot;fnref:68:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:68&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In practice, fitting a GPD to the upper tail of the portfolio return distribution is a two-step process:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;First, a threshold $r^{‘}_{(n - k)}, k \geq 1$ beyond which returns should be considered as belonging to the upper tail needs to be selected.&lt;/p&gt;

    &lt;p&gt;This threshold corresponds to the &lt;em&gt;location parameter&lt;/em&gt; $u \in \mathbb{R}$ of the GPD.&lt;/p&gt;

    &lt;p&gt;Unfortunately, &lt;em&gt;the choice of how many of the $k$-largest observations should be considered extreme is not straightforward&lt;/em&gt;&lt;sup id=&quot;fnref:45:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:45&quot; class=&quot;footnote&quot;&gt;41&lt;/a&gt;&lt;/sup&gt; and is actually &lt;em&gt;a central issue to any application of EVT&lt;/em&gt;&lt;sup id=&quot;fnref:60:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:60&quot; class=&quot;footnote&quot;&gt;44&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;p&gt;Indeed, as detailled in de Haan et al.&lt;sup id=&quot;fnref:50&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:50&quot; class=&quot;footnote&quot;&gt;51&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

    &lt;blockquote&gt;
      &lt;p&gt;Theoretically, the statistical properties of EVT-based estimators are established for $k$ such that $k \to \infty$ and $k/n \to 0$ as $n \to \infty$.&lt;/p&gt;

      &lt;p&gt;In applications with a finite sample size, it is necessary to investigate how to choose the number of high observations used in estimation.&lt;/p&gt;

      &lt;p&gt;For financial practitioners, two difficulties arise: firstly, there is no straightforward procedure for the selection; secondly, the performance of the EVT estimators is rather sensitive to this choice.&lt;/p&gt;

      &lt;p&gt;More specifically, there is a bias–variance tradeoff: with a low level of $k$, the estimation variance is at a high level which may not be acceptable for the application; by increasing $k$, i.e., using progressively more data, the variance is reduced, but at the cost of an increasing bias.&lt;/p&gt;
    &lt;/blockquote&gt;

    &lt;p&gt;The literature offers some guidance on how to choose an adequate cut-off between the central part and the upper tail of the return distribution but that choice remains notoriously difficult in general, c.f. Benito et al.&lt;sup id=&quot;fnref:70&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:70&quot; class=&quot;footnote&quot;&gt;52&lt;/a&gt;&lt;/sup&gt; for a review.&lt;/p&gt;

    &lt;p&gt;Fortunately, in the specific context of VaR estimation, &lt;em&gt;there is a large set of thresholds that provide similar GPD quantiles estimators and as a consequence similar market risk measures&lt;/em&gt;&lt;sup id=&quot;fnref:103&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:103&quot; class=&quot;footnote&quot;&gt;53&lt;/a&gt;&lt;/sup&gt; - from about the 80th percentile of observations to the 95th percentile of observations&lt;sup id=&quot;fnref:103:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:103&quot; class=&quot;footnote&quot;&gt;53&lt;/a&gt;&lt;/sup&gt; - so that &lt;em&gt;the researchers and practitioners should not focus excessively on the threshold choice&lt;/em&gt;&lt;sup id=&quot;fnref:103:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:103&quot; class=&quot;footnote&quot;&gt;53&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;p&gt;Figure 7 and Figure 8 illustrate this point with a GPD fitted to the lower tail of daily percentage returns of Deutsche mark/British pound (DEM/GBP) exchange rates from 3rd January 1984 to 31st December 1991&lt;sup id=&quot;fnref:107&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:107&quot; class=&quot;footnote&quot;&gt;54&lt;/a&gt;&lt;/sup&gt;, using two different thresholds:&lt;/p&gt;
    &lt;ul&gt;
      &lt;li&gt;
        &lt;p&gt;Figure 7 - A threshold $u \approx -1.2292$, corresponding to ~2% of the observations.&lt;/p&gt;

        &lt;figure&gt;
    &lt;a href=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-bollerslev-three.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-bollerslev-three-small.png&quot; alt=&quot;Lower tail of daily percentage returns of Deutsche mark/British pound (DEM/GBP) exchange rates, GDP fit with threshold u = -1.2292, 3rd January 1984 to 31st December 1991.&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Figure 7. Lower tail of daily percentage returns of Deutsche mark/British pound (DEM/GBP) exchange rates, GDP fit with threshold u = -1.2292, 3rd January 1984 to 31st December 1991.&lt;/figcaption&gt;
  &lt;/figure&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;Figure 8 - A threshold $u \approx -0.2683$, corresponding to ~20% of the observations.&lt;/p&gt;

        &lt;figure&gt;
    &lt;a href=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-bollerslev-two.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-bollerslev-two-small.png&quot; alt=&quot;Lower tail of daily percentage returns of Deutsche mark/British pound (DEM/GBP) exchange rates, GDP fit with threshold u = -0.2683, 3rd January 1984 to 31st December 1991.&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Figure 8. Lower tail of daily percentage returns of Deutsche mark/British pound (DEM/GBP) exchange rates, GDP fit with threshold u = -0.2683, 3rd January 1984 to 31st December 1991.&lt;/figcaption&gt;
  &lt;/figure&gt;
      &lt;/li&gt;
    &lt;/ul&gt;

    &lt;p&gt;From Figure 7 and Figure 8, both thresholds lead to an equally good extreme lower tail GPD fit, which is confirmed numerically by goodness-of-fit measures.&lt;/p&gt;

    &lt;p&gt;Consequently, in order to keep the threshold selection step as simple as possible for VaR estimation, the early suggestion of DuMouchel&lt;sup id=&quot;fnref:102&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:102&quot; class=&quot;footnote&quot;&gt;55&lt;/a&gt;&lt;/sup&gt; to use the 90th percentile of observations seems a very good starting point.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Second, the &lt;em&gt;shape parameter&lt;/em&gt; $\xi \in \mathbb{R}$ and the &lt;em&gt;scale parameter&lt;/em&gt; $\sigma &amp;gt; 0$ of the GPD need to be estimated.&lt;/p&gt;

    &lt;p&gt;This is usually done through &lt;a href=&quot;https://en.wikipedia.org/wiki/Maximum_likelihood_estimation&quot;&gt;likelihood maximization&lt;/a&gt;, but other procedures are described in the litterature (method of moments&lt;sup id=&quot;fnref:100&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:100&quot; class=&quot;footnote&quot;&gt;56&lt;/a&gt;&lt;/sup&gt;, maximization of goodness-of-fit estimators&lt;sup id=&quot;fnref:101&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:101&quot; class=&quot;footnote&quot;&gt;57&lt;/a&gt;&lt;/sup&gt;, etc.).&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once the parameters of the GPD have been determined, the EVT/GPD-based portfolio VaR estimator&lt;sup id=&quot;fnref:52:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:52&quot; class=&quot;footnote&quot;&gt;49&lt;/a&gt;&lt;/sup&gt; is defined as the $\alpha$% quantile of that GPD through the formula&lt;/p&gt;

\[\text{VaR}^{\text{GPD}}_{\alpha} = 
      \begin{cases}
          r^{'}_{(n - k)} + \frac{\hat{\sigma}}{\hat{\xi}} \left( \left( \frac{k}{T ( 1 - \alpha)} \right)^\xi -1 \right) &amp;amp;\text{if } \hat{\xi} \ne 0 \\
          r^{'}_{(n - k)} - \hat{\sigma} \ln \frac{T ( 1 - \alpha)}{k} &amp;amp;\text{if } \hat{\xi} = 0 \\ 
      \end{cases}\]

&lt;p&gt;, where:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$\hat{\xi}$ is an estimator of the shape parameter $\xi$  of the GPD.&lt;/li&gt;
  &lt;li&gt;$\hat{\sigma}$ is an estimator of the scale parameter $\sigma$ of the GPD.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Figure 9, taken from Danielsson &lt;sup id=&quot;fnref:68:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:68&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt;, illustrates the near perfect fit that can be obtained when such a method is applied to the upper and lower tails of the daily S&amp;amp;P 500 returns over the period 1970-2009.&lt;/p&gt;

&lt;figure&gt;
  &lt;a href=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-sp500-danielsson.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-sp500-danielsson-small.png&quot; alt=&quot;Upper and lower tails of daily S&amp;amp;P 500 returns fitted with an EVT-estimated distribution v.s. a normal distribution, 1970-2009. Source: Danielsson.&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Figure 9. Upper and lower tails of daily S&amp;amp;P 500 returns fitted with an EVT-estimated distribution v.s. a normal distribution, 1970-2009. Source: Danielsson.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h4 id=&quot;hill-estimator-based-method&quot;&gt;Hill estimator-based method&lt;/h4&gt;

&lt;p&gt;Under the assumption that the portfolio return distribution belongs&lt;sup id=&quot;fnref:51&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:51&quot; class=&quot;footnote&quot;&gt;58&lt;/a&gt;&lt;/sup&gt; to the generic family of heavy-tailed distributions, this method consists in deriving an asymptotic estimator of its $\alpha$% quantile.&lt;/p&gt;

&lt;p&gt;This is for example done in Danielsson and de Vries&lt;sup id=&quot;fnref:55&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:55&quot; class=&quot;footnote&quot;&gt;59&lt;/a&gt;&lt;/sup&gt; and in Drees&lt;sup id=&quot;fnref:71:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:71&quot; class=&quot;footnote&quot;&gt;48&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;This method is justified by stylized facts&lt;sup id=&quot;fnref:56&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:56&quot; class=&quot;footnote&quot;&gt;60&lt;/a&gt;&lt;/sup&gt; of asset returns, as explained in Danielsson and de Vries&lt;sup id=&quot;fnref:7:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;20&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;[…] because we know that financial return data are heavy tailed distributed, one can rely on a limit expansion for the tail behavior that is shared by all heavy tailed distributions. The importance of &lt;a href=&quot;https://en.wikipedia.org/wiki/Fisher%E2%80%93Tippett%E2%80%93Gnedenko_theorem&quot;&gt;the central limit law for extremes&lt;/a&gt; is similar to the importance of the central limit law, i.e. one does not have to choose a particular parametric distribution.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Under that generic assumption, it can be demonstrated&lt;sup id=&quot;fnref:61&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:61&quot; class=&quot;footnote&quot;&gt;61&lt;/a&gt;&lt;/sup&gt; that the upper tail of the portfolio returns decays as a power function multiplied by a slowly varying function, that is&lt;/p&gt;

\[1 - F(x) = x^{-\gamma} L(x), x &amp;gt; 0\]

&lt;p&gt;, where:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$F$ is the c.d.f. of the (opposite of the) portfolio returns.&lt;/li&gt;
  &lt;li&gt;$\gamma = \frac{1}{\xi} &amp;gt; 0$ is the &lt;em&gt;tail index&lt;/em&gt;&lt;sup id=&quot;fnref:59&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:59&quot; class=&quot;footnote&quot;&gt;62&lt;/a&gt;&lt;/sup&gt; of $F$, with $\xi$ the same shape parameter as in the GPD method.&lt;/li&gt;
  &lt;li&gt;$L$ is a slowly varying function in a sense defined in Rocco&lt;sup id=&quot;fnref:60:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:60&quot; class=&quot;footnote&quot;&gt;44&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From this asymptotic behaviour, the EVT/Weissman-based portfolio VaR estimator is defined as the $\alpha$% Weissman&lt;sup id=&quot;fnref:64&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:64&quot; class=&quot;footnote&quot;&gt;63&lt;/a&gt;&lt;/sup&gt; quantile estimator of an heavy-tailed distribution through the formula&lt;/p&gt;

\[\text{VaR}^{\text{WM}}_{\alpha} = r^{'}_{(n - k)} \left( \frac{k}{n \left( 1 - \alpha \right) } \right)^{\frac{1}{\hat{\gamma}}}\]

&lt;p&gt;, where&lt;sup id=&quot;fnref:73&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:73&quot; class=&quot;footnote&quot;&gt;64&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;$\hat{\gamma}$ is an estimator of the tail index $\gamma$.&lt;/p&gt;

    &lt;p&gt;The most frequently employed estimator of the tail index is &lt;em&gt;by far the Hill estimator&lt;/em&gt;&lt;sup id=&quot;fnref:60:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:60&quot; class=&quot;footnote&quot;&gt;44&lt;/a&gt;&lt;/sup&gt; introduced in Hill&lt;sup id=&quot;fnref:65:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:65&quot; class=&quot;footnote&quot;&gt;46&lt;/a&gt;&lt;/sup&gt; and defined conditionally on $k$ as&lt;/p&gt;

\[\hat{\gamma}  = \frac{1}{k} \sum_{i=1}^k \ln \frac{r^{'}_{(n - i + 1)}}{r^{'}_{(n - k)}}\]

    &lt;p&gt;The interested reader is referred to Fedotenkov&lt;sup id=&quot;fnref:67&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:67&quot; class=&quot;footnote&quot;&gt;65&lt;/a&gt;&lt;/sup&gt;, in which more than one hundred tail index estimators proposed in the literature are reviewed.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;$k$ is the number of observations $r^{‘}_{(n - k + 1)}$, …, $r^{‘}_{(n)}$ that should be considered extreme.&lt;/p&gt;

    &lt;p&gt;Here, and contrary to the GPD method, the threshold index $k$ has a huge influence on the Weissman quantile estimator and thus on the EVT/Weissman-based VaR estimator.&lt;/p&gt;

    &lt;p&gt;As an illustration, Figure 10 depicts the daily VaR of the Dow Jones Industrial Average index at a confidence level $\alpha = 99.9$%&lt;sup id=&quot;fnref:105&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:105&quot; class=&quot;footnote&quot;&gt;66&lt;/a&gt;&lt;/sup&gt; as a function of $k$, when estimated by the EVT/Weissman-based VaR estimator.&lt;/p&gt;

    &lt;figure&gt;
    &lt;a href=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-weissman-dehaan.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-weissman-dehaan-small.png&quot; alt=&quot;Impact of the threshold index k between the central part and the upper tail of the Dow Jones Industrial Average daily return distribution, Weissman quantile estimator of VaR 99.9%, 1980-2010. Source: de Haan et al.&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Figure 10. Impact of the threshold index k between the central part and the upper tail of the Dow Jones Industrial Average daily return distribution, Weissman quantile estimator of VaR 99.9%, 1980-2010. Source: de Haan et al.&lt;/figcaption&gt;
  &lt;/figure&gt;

    &lt;p&gt;On Figure 10, three distinct ranges of values for the index $k$ are visible:&lt;/p&gt;
    &lt;ul&gt;
      &lt;li&gt;
        &lt;p&gt;An initial range of values for $k \in [1, 150]$ that results in a bumpy VaR estimate, although on this specific figure the bumpyness is not that pronounced.&lt;/p&gt;

        &lt;p&gt;This is the “high variance” region of the estimator.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;A second range of values for $k \in [150, 300]$ that results in a flat-ish VaR estimate.&lt;/p&gt;

        &lt;p&gt;This is the “optimal bias-variance” region of the estimator, in which the exact value of $k$ is not important.&lt;/p&gt;

        &lt;p&gt;Obtaining an estimate of $k$ belonging to that region is the ultimate goal.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;A third range of values for $k \geq 300$ that results in a diverging VaR estimate.&lt;/p&gt;

        &lt;p&gt;This is the “high bias” region of the estimator.&lt;/p&gt;
      &lt;/li&gt;
    &lt;/ul&gt;

    &lt;p&gt;Hopefully, this problem can be solved by using a bias-corrected Weissman quantile estimator.&lt;/p&gt;

    &lt;p&gt;For example, Figure 11 depicts the daily VaR of the Dow Jones Industrial Average index at a confidence level $\alpha = 99.9$% as a function of $k$, when estimated by the EVT/unbiased Weissman-based VaR estimator introduced in de Haan et al.&lt;sup id=&quot;fnref:50:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:50&quot; class=&quot;footnote&quot;&gt;51&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;figure&gt;
    &lt;a href=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-weissman-corrected-dehaan.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-weissman-corrected-dehaan-small.png&quot; alt=&quot;Impact of the threshold index k between the central part and the upper tail of the Dow Jones Industrial Average daily return distribution, unbiased Weissman quantile estimator of VaR 99.9%, 1980-2010. Source: de Haan et al.&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Figure 11. Impact of the threshold index k between the central part and the upper tail of the Dow Jones Industrial Average daily return distribution, unbiased Weissman quantile estimator of VaR 99.9%, 1980-2010. Source: de Haan et al.&lt;/figcaption&gt;
  &lt;/figure&gt;

    &lt;p&gt;Comparing Figure 11 to Figure 10, the improvement in stability of the quantile estimator is striking.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One important limitation of this method compared to the GPD-based method is that depending on the exact financial data used (financial instrument, time period, returns frequency…), the stylized heavy tailedness nature of return distributions might be violated, 
with return distributions being thin-tailed or short-tailed&lt;sup id=&quot;fnref:63&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:63&quot; class=&quot;footnote&quot;&gt;67&lt;/a&gt;&lt;/sup&gt; instead of heavy-tailed, c.f. Longin and Solnik&lt;sup id=&quot;fnref:63:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:63&quot; class=&quot;footnote&quot;&gt;67&lt;/a&gt;&lt;/sup&gt; and Drees&lt;sup id=&quot;fnref:71:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:71&quot; class=&quot;footnote&quot;&gt;48&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Indicentally, the two lower tail GPD fits depicted in Figure 7 and Figure 8 both correspond to a short-tailed distribution since $\hat{\xi} \approx -0.2304$ on Figure 7 and $\hat{\xi} \approx -0.021$ on Figure 8.&lt;/p&gt;

&lt;h4 id=&quot;practical-performances&quot;&gt;Practical performances&lt;/h4&gt;

&lt;p&gt;In terms of practical performances, Danielsson&lt;sup id=&quot;fnref:68:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:68&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt; highlights that EVT-based VaR estimation &lt;em&gt;delivers good probability–quantile estimates where EVT holds&lt;/em&gt;&lt;sup id=&quot;fnref:68:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:68&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt;, but that &lt;em&gt;there are no rules that tell us when [EVT] becomes inaccurate&lt;/em&gt;&lt;sup id=&quot;fnref:68:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:68&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Indeed, &lt;em&gt;it depends on the underlying distribution of the data. In some cases, it may be accurate up to 1% or even 5%, while in other cases it is not reliable even up to 0.1%&lt;/em&gt;&lt;sup id=&quot;fnref:68:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:68&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In addition, the accuracy of EVT also depends on the selected threshold.&lt;/p&gt;

&lt;p&gt;This is illustrated in Figure 12, which is identical to Figure 7 except that the GPD fit has been graphically extended to the right of the threshold $u \approx -1.2292$.&lt;/p&gt;

&lt;figure&gt;
  &lt;a href=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-bollerslev-one.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/univariate-value-at-risk-estimation-methods-bollerslev-one-small.png&quot; alt=&quot;Upper and lower tails of daily S&amp;amp;P 500 returns fitted with an EVT-estimated distribution v.s. a normal distribution, 1970-2009. Source: Danielsson.&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Figure 12. Lower tail of daily percentage returns of Deutsche mark/British pound (DEM/GBP) exchange rates, GDP fit with threshold u = -1.2292 extended, 3rd January 1984 to 31st December 1991.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;From Figure 7 and Figure 8, both GPT fits seem usable below $u \approx -1.2292$.&lt;/p&gt;

&lt;p&gt;But while the GPD fit of Figure 8 is valid up to $u \approx -0.2683$, Figure 12 makes it clear that the GPD fit of Figure 7 is completely off above $u \approx -1.2292$.&lt;/p&gt;

&lt;h3 id=&quot;other-estimators&quot;&gt;Other estimators&lt;/h3&gt;

&lt;p&gt;It is out of scope of this blog post to list all non-parametric and semi-parametric portfolio VaR estimators discussed in the litterature, but I would like to finish this section by mentionning estimators 
based on &lt;a href=&quot;https://en.wikipedia.org/wiki/Smoothing_spline&quot;&gt;smoothing splines&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;An example of such an estimator is described in Shaker-Akhtekhane and Poorabbas&lt;sup id=&quot;fnref:76&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:76&quot; class=&quot;footnote&quot;&gt;68&lt;/a&gt;&lt;/sup&gt;, in which it is empirically demonstrated to &lt;em&gt;outperform common historical, parametric, and kernel-based methods&lt;/em&gt;&lt;sup id=&quot;fnref:76:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:76&quot; class=&quot;footnote&quot;&gt;68&lt;/a&gt;&lt;/sup&gt; 
when applied to the S&amp;amp;P500 index at VaR confidence levels of $\alpha = 95$% and $\alpha = 99$%.&lt;/p&gt;

&lt;p&gt;Misc. points of attention for such estimators:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Monotonicity constraints need to be imposed on smoothing splines in order to properly approximate the c.d.f. of the portfolio returns, as highlighted in Wood&lt;sup id=&quot;fnref:78&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:78&quot; class=&quot;footnote&quot;&gt;69&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;p&gt;These are not mentioned in Shaker-Akhtekhane and Poorabbas&lt;sup id=&quot;fnref:76:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:76&quot; class=&quot;footnote&quot;&gt;68&lt;/a&gt;&lt;/sup&gt;, but are required in practice.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Similar to kernel-smoothing, a smoothing parameter needs to be selected.&lt;/p&gt;

    &lt;p&gt;The same kind of problems arise, with solutions that are very close in spirit like generalized cross-validation, c.f. Wood&lt;sup id=&quot;fnref:78:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:78&quot; class=&quot;footnote&quot;&gt;69&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Again similar to kernel-smoothing, special care must be taken when extrapolating beyond the range of observed values.&lt;/p&gt;

    &lt;p&gt;In particular, the estimator described in Shaker-Akhtekhane and Poorabbas&lt;sup id=&quot;fnref:76:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:76&quot; class=&quot;footnote&quot;&gt;68&lt;/a&gt;&lt;/sup&gt; cannot extrapolate beyond the smallest and the highest portfolio returns due to the splines constraints associated with these two points.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;parametric-value-at-risk-estimation&quot;&gt;Parametric Value-at-Risk estimation&lt;/h2&gt;

&lt;p&gt;Contrary to the non-parametric and semi-parametric approaches, parametric - also called analytical - approaches make the assumption&lt;sup id=&quot;fnref:91&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:91&quot; class=&quot;footnote&quot;&gt;70&lt;/a&gt;&lt;/sup&gt; that the whole portfolio return distribution 
can be described by a parametric distribution $F_{\theta}$ whose parameters $\theta$ need to be estimated from observations.&lt;/p&gt;

&lt;p&gt;The $(1 - \alpha)$% quantile of the portfolio return distribution is then simply obtained by inverting that parametric distribution, which leads to the definition of the parametric portfolio VaR estimator as&lt;/p&gt;

\[\text{VaR}_{\alpha} = - F_{\theta}^{-1} (1 - \alpha)\]

&lt;p&gt;Parametric approaches thus replace the problem of accurately computing the quantile of an empirical distribution of observations by the problem of choosing a parametric distribution that best fits these observations.&lt;/p&gt;

&lt;p&gt;A couple of examples:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;If the portfolio returns are assumed to be distributed according to a Gaussian distribution, the associated portfolio VaR is called &lt;em&gt;Gaussian Value-at-Risk&lt;/em&gt; (GVaR) and is computed through the formula&lt;sup id=&quot;fnref:81&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:81&quot; class=&quot;footnote&quot;&gt;71&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

\[\text{GVaR}_{\alpha} (X) = - \mu - \sigma z_{1 - \alpha}\]

    &lt;p&gt;, where:&lt;/p&gt;
    &lt;ul&gt;
      &lt;li&gt;The location parameter $\mu$ and the scale parameter $\sigma$ are usually&lt;sup id=&quot;fnref:80&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:80&quot; class=&quot;footnote&quot;&gt;72&lt;/a&gt;&lt;/sup&gt; estimated by their sample counterparts.&lt;/li&gt;
      &lt;li&gt;$z_{1 - \alpha}$ is the $1 - \alpha$ &lt;a href=&quot;https://en.wikipedia.org/wiki/Normal_distribution#Quantile_function&quot;&gt;quantile of the standard normal distribution&lt;/a&gt;.&lt;/li&gt;
    &lt;/ul&gt;

    &lt;p&gt;This is the assumption made in the RiskMetrics model&lt;sup id=&quot;fnref:84&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:84&quot; class=&quot;footnote&quot;&gt;73&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;If the portfolio returns are assumed to be distributed according to an heavy-tailed distribution, several distributions can be used:&lt;/p&gt;
    &lt;ul&gt;
      &lt;li&gt;A &lt;a href=&quot;/blog/corrected-cornish-fisher-expansion-improving-the-accuracy-of-modified-value-at-risk/&quot;&gt;Cornish-Fisher distribution&lt;/a&gt;, whose associated portfolio VaR is known as &lt;em&gt;Modified Value-at-Risk&lt;/em&gt;&lt;/li&gt;
      &lt;li&gt;A &lt;a href=&quot;/blog/beyond-modified-value-at-risk-application-of-gaussian-mixtures-to-the-computation-of-value-at-risk/&quot;&gt;Gaussian mixture distribution&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;…&lt;/p&gt;

        &lt;p&gt;Here, as a side note, even though &lt;em&gt;it is known that financial time series usually exhibit skewed and fat-tailed distributions&lt;sup id=&quot;fnref:56:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:56&quot; class=&quot;footnote&quot;&gt;60&lt;/a&gt;&lt;/sup&gt;, there is no complete agreement on what distribution could fit them best&lt;/em&gt;&lt;sup id=&quot;fnref:60:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:60&quot; class=&quot;footnote&quot;&gt;44&lt;/a&gt;&lt;/sup&gt;, so that finding the best distribution to use is as much art as science…&lt;/p&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Parametric VaR estimation approaches might be very successful, especially with long risk horizons (months, years…) and/or with not too extreme confidence levels (90%, 95%…).&lt;/p&gt;

&lt;p&gt;Nevertheless, they might also &lt;em&gt;not be able to provide an adequate description of the whole range of data, resulting in a good fit of the body but a non accurate description of the tails&lt;/em&gt;&lt;sup id=&quot;fnref:45:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:45&quot; class=&quot;footnote&quot;&gt;41&lt;/a&gt;&lt;/sup&gt;, leading to biases in VaR estimates.&lt;/p&gt;

&lt;h2 id=&quot;implementation-in-portfolio-optimizer&quot;&gt;Implementation in Portfolio Optimizer&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Portfolio Optimizer&lt;/strong&gt; supports different portfolio Value-at-Risk estimation methods, c.f. &lt;a href=&quot;https://docs.portfoliooptimizer.io/&quot;&gt;the documentation&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Empirical portfolio VaR estimation&lt;/li&gt;
  &lt;li&gt;Extrapolated empirical portfolio VaR estimation&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Kernel-smoothed empirical portfolio VaR estimation&lt;/p&gt;

    &lt;p&gt;For that estimation method, the kernel bandwidth parameter $h$ is automatically computed using a proprietary variation of the improved Sheather and Jones rule described in Botev et al.&lt;sup id=&quot;fnref:108&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:108&quot; class=&quot;footnote&quot;&gt;74&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;EVT-based portfolio VaR estimation (both GPD-based and Hill estimator-based)&lt;/p&gt;

    &lt;p&gt;For these estimation methods:&lt;/p&gt;
    &lt;ul&gt;
      &lt;li&gt;The number of extreme observations $k$ is automatically computed using a proprietary variation of the goodness-of-fit procedure described in El-Aroui and Diebolt&lt;sup id=&quot;fnref:109&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:109&quot; class=&quot;footnote&quot;&gt;75&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
      &lt;li&gt;The estimated GPD parameters are unbiased through the formulas described in Giles et al.&lt;sup id=&quot;fnref:110&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:110&quot; class=&quot;footnote&quot;&gt;76&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
      &lt;li&gt;The estimated Hill and Weissman estimators are unbiased through the formulas described in de Haan et al.&lt;sup id=&quot;fnref:50:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:50&quot; class=&quot;footnote&quot;&gt;51&lt;/a&gt;&lt;/sup&gt; and corrected in Chavez-Demoulin and Guillou&lt;sup id=&quot;fnref:104&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:104&quot; class=&quot;footnote&quot;&gt;77&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Parametric portfolio VaR estimation&lt;/p&gt;

    &lt;p&gt;The supported parametric distributions are:&lt;/p&gt;
    &lt;ul&gt;
      &lt;li&gt;The Gaussian distribution&lt;/li&gt;
      &lt;li&gt;The Gaussian mixture distribution&lt;/li&gt;
      &lt;li&gt;The Cornish-Fisher distribution&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;This blog post described some of the most well-known methods for univariate Value-at-Risk estimation.&lt;/p&gt;

&lt;p&gt;Thanks to these, it is possible to analyze the past behaviour of a financial portfolio, but their real interest lies in univariate Value-at-Risk forecasting, which will be the subject of the next blog post in this series.&lt;/p&gt;

&lt;p&gt;Stay tuned!&lt;/p&gt;

&lt;p&gt;Meanwhile, feel free to &lt;a href=&quot;https://www.linkedin.com/in/roman-rubsamen/&quot;&gt;connect with me on LinkedIn&lt;/a&gt; or to &lt;a href=&quot;https://twitter.com/portfoliooptim&quot;&gt;follow me on Twitter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;–&lt;/p&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.ressources-actuarielles.net/C1256CFC001E6549/0/738BB4496121216AC1257A7C006702BA&quot;&gt;Gueant, O., Computing the Value at Risk of a Portfolio: Academic literature and Practionners’ response, EMMA, Working Paper&lt;/a&gt;. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:1:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:15&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://papers.ssrn.com/sol3/papers.cfm?abstract_id=356220&quot;&gt;Manganelli, Simone and Engle, Robert F., Value at Risk Models in Finance (August 2001). ECB Working Paper No. 75&lt;/a&gt;. &lt;a href=&quot;#fnref:15&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:15:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:15:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:15:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:9&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Basel II requires&lt;sup id=&quot;fnref:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:11&quot; class=&quot;footnote&quot;&gt;78&lt;/a&gt;&lt;/sup&gt; to calculate market risk capital requirements using VaR at a 99% confidence level over a 10-day horizon. &lt;a href=&quot;#fnref:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:12&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Basel III requires&lt;sup id=&quot;fnref:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:13&quot; class=&quot;footnote&quot;&gt;79&lt;/a&gt;&lt;/sup&gt; internal backtesting procedures based on VaR and Stress VaR (VaR applied to a market stress period). &lt;a href=&quot;#fnref:12&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:14&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The SEC Rule 18f-4 requires companies to calculate daily VaR at a 99% confidence level over a 20-day horizon and using at least 3 years of historical data; it also requires companies to backtest their VaR models daily over a 1-day horizon. &lt;a href=&quot;#fnref:14&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:82&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://climateimpact.edhec.edu/publications/sensitivity-portfolio-var-and-cvar-portfolio&quot;&gt;Stoyan V. Stoyanov, Svetlozar T. Rachev, Frank J. Fabozzi, Sensitivity of portfolio VaR and CVaR to portfolio return characteristics, Working paper&lt;/a&gt;. &lt;a href=&quot;#fnref:82&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:20&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;C.f. Dowd&lt;sup id=&quot;fnref:23:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;24&lt;/a&gt;&lt;/sup&gt;, &lt;em&gt;the VaR is the negative of the relevant P/L observation because P/L is positive for profitable outcomes and negative for losses, and the VaR is the maximum likely loss (rather than profit) at the specified probability&lt;/em&gt;&lt;sup id=&quot;fnref:23:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;24&lt;/a&gt;&lt;/sup&gt;, so that VaR is a positive percentage; to be noted, though, that VaR can be negative when no loss is incurred within the confidence level, in which case it is meaningless; c.f. Daníelsson&lt;sup id=&quot;fnref:68:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:68&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:20&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:18&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;This is the case when the portfolio return &lt;a href=&quot;https://en.wikipedia.org/wiki/Cumulative_distribution_function&quot;&gt;cumulative distribution function&lt;/a&gt; is strictly increasing and continuous; otherwise, a similar formula is still valid, with $F_X^{-1}$ the generalized inverse distribution function of $X$, but these subtleties - important in mathematical proofs and in numerical implementations - are out of scope of this blog post. &lt;a href=&quot;#fnref:18&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:16&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://merage.uci.edu/~jorion/var/&quot;&gt;Jorion, P. (2007). Value at risk: The new benchmark for managing financial risk. New York, NY: McGraw-Hill&lt;/a&gt;. &lt;a href=&quot;#fnref:16&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:16:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:16:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:85&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://academic.oup.com/jfec/article-abstract/4/1/53/833052?redirectedFrom=fulltext&quot;&gt;Keith Kuester, Stefan Mittnik, Marc S. Paolella, Value-at-Risk Prediction: A Comparison of Alternative Strategies, Journal of Financial Econometrics, Volume 4, Issue 1, Winter 2006, Pages 53–89&lt;/a&gt;. &lt;a href=&quot;#fnref:85&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:89&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Also called top-down VaR models&lt;sup id=&quot;fnref:22:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:22&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt; or portfolio aggregation-based models. &lt;a href=&quot;#fnref:89&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:87&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.msci.com/www/research-report/riskmetrics-monitor-riskmetrics/018920692&quot;&gt;Zangari, Peter, 1997, Streamlining the market risk measurement process, RiskMetrics Monitor, 1, 29–35&lt;/a&gt;. &lt;a href=&quot;#fnref:87&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:87:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:87:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:22&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;http://dx.doi.org/10.2139/ssrn.2942138&quot;&gt;Ballotta, L. ORCID: 0000-0002-2059-6281 and Fusai, G. ORCID: 0000-0001-9215-2586 (2017). A Gentle Introduction to Value at Risk&lt;/a&gt;. &lt;a href=&quot;#fnref:22&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:22:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:22:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:22:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:22:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:68&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://onlinelibrary.wiley.com/doi/book/10.1002/9781119205869&quot;&gt;Jon Danielsson, Financial Risk Forecasting: The Theory and Practice of Forecasting Market Risk, with Implementation in R and Matlab, Wiley 2011&lt;/a&gt;. &lt;a href=&quot;#fnref:68&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:68:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:68:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:68:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:68:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:68:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:68:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:68:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:68:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:68:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:68:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;11&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:17&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Danielsson&lt;sup id=&quot;fnref:68:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:68&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt; comes to my mind. &lt;a href=&quot;#fnref:17&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:28&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://econwpa.ub.uni-muenchen.de/econ-wp/fin/papers/9605/9605001.pdf&quot;&gt;J. S. Butler &amp;amp; Barry Schachter, 1996. Improving Value-At-Risk Estimates By Combining Kernel Estimation With Historical Simulation, Finance 9605001, University Library of Munich, Germany&lt;/a&gt;. &lt;a href=&quot;#fnref:28&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:28:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:28:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:28:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:28:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:46&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;That report was required by 4.15pm and originally became known as the 4.15 report. &lt;a href=&quot;#fnref:46&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:21&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://econpapers.repec.org/paper/wpawuwpmh/0207001.htm&quot;&gt;Glyn A. Holton, (2002), History of Value-at-Risk: 1922-1998, Method and Hist of Econ Thought, University Library of Munich, Germany&lt;/a&gt;. &lt;a href=&quot;#fnref:21&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:21:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:27&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The two horizons might be different, but this is out of scope of this blog post. &lt;a href=&quot;#fnref:27&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:7&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.jstor.org/stable/20076262&quot;&gt;Danielsson, Jon, and Casper G. De Vries. Value-at-Risk and Extreme Returns. Annales d’Économie et de Statistique, no. 60, 2000, pp. 239–70&lt;/a&gt;. &lt;a href=&quot;#fnref:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:7:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:66&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/92WR02466&quot;&gt;Lall, U., Y. Moon, and K. Bosworth (1993), Kernel flood frequency estimators: Bandwidth selection and kernel choice, Water Resour. Res.,29(4), 1003–1015&lt;/a&gt;. &lt;a href=&quot;#fnref:66&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:83&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.msci.com/documents/10199/5915b101-4206-4ba0-aee2-3449d5c7e95a&quot;&gt;RiskMetrics. Technical Document, J.P.Morgan/Reuters, New York, 1996. Fourth Edition&lt;/a&gt;. &lt;a href=&quot;#fnref:83&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:29&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://academic.oup.com/jfec/article-abstract/3/2/227/834153&quot;&gt;Song Xi Chen, Cheng Yong Tang, Nonparametric Inference of Value-at-Risk for Dependent Financial Returns, Journal of Financial Econometrics, Volume 3, Issue 2, Spring 2005, Pages 227–255&lt;/a&gt;. &lt;a href=&quot;#fnref:29&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:29:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:29:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:29:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:29:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:29:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:29:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:29:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:29:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:29:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:23&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.pm-research.com/content/iijderiv/8/3/23&quot;&gt;Kevin Dowd, Estimating VaR with Order Statistics,  The Journal of Derivatives  Spring 2001, 8 (3) 23-30&lt;/a&gt; &lt;a href=&quot;#fnref:23&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:23:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:23:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:23:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:33&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;To be noted that Dowd&lt;sup id=&quot;fnref:23:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;24&lt;/a&gt;&lt;/sup&gt; proposes to &lt;em&gt;take the sixth observation as [the] 5% VaR because we want 5% of the probability mass to lie to the left of [the] VaR&lt;/em&gt;&lt;sup id=&quot;fnref:23:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;24&lt;/a&gt;&lt;/sup&gt;, but other auhors propose to use the fifth observation instead&lt;sup id=&quot;fnref:22:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:22&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;&lt;sup id=&quot;fnref:26:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:26&quot; class=&quot;footnote&quot;&gt;29&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:33&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:35&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Many other choices - at least 9 - are possible though, c.f. Hyndman and Fan&lt;sup id=&quot;fnref:34&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:34&quot; class=&quot;footnote&quot;&gt;80&lt;/a&gt;&lt;/sup&gt;; ultimately, what is needed is an estimator of the $(1 - \alpha)$% quantile of the empirical portfolio return distribution. &lt;a href=&quot;#fnref:35&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:35:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:44&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;For $ 1 - \alpha \in ]\frac{1}{n+1} , \frac{n}{n+1}[$, that is, no extrapolation is possible. &lt;a href=&quot;#fnref:44&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:44:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:44:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:24&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Such a linearly interpolated quantile estimator has been known since at least Parzen&lt;sup id=&quot;fnref:25&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:25&quot; class=&quot;footnote&quot;&gt;81&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:24&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:26&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://academic.oup.com/jrsssb/article-abstract/63/4/717/7083398?redirectedFrom=fulltext&quot;&gt;Peter Hall and Andrew Rieck. (2001). Improving Coverage Accuracy of Nonparametric Prediction Intervals. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 63(4), 717–725&lt;/a&gt;. &lt;a href=&quot;#fnref:26&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:26:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:43&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://link.springer.com/article/10.1023/A:1020783911574&quot;&gt;Hutson, A.D. A Semi-Parametric Quantile Function Estimator for Use in Bootstrap Estimation Procedures. Statistics and Computing 12, 331–338 (2002)&lt;/a&gt;. &lt;a href=&quot;#fnref:43&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:43:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:43:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:43:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:43:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:43:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:43:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:37&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;To be noted that weak dependence is a kind of misnomer, because this type of dependence actually covers a very broad range of time series models! &lt;a href=&quot;#fnref:37&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:31&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://doi.org/10.1016/S0927-5398(00)00011-6&quot;&gt;Gourieroux, C., Scaillet, O. and Laurent, J.P. (2000). Sensitivity analysis of Values at Risk. Journal of Empirical Finance, 7, 225-245&lt;/a&gt;. &lt;a href=&quot;#fnref:31&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:31:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:31:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:31:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:31:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:31:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:40&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Asymmetric kernel functions also exist, c.f. for example Abadir and Lawford&lt;sup id=&quot;fnref:41&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:41&quot; class=&quot;footnote&quot;&gt;82&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:40&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:39&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;In a mean squared-error sense. &lt;a href=&quot;#fnref:39&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:32&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://link.springer.com/book/10.1007/b13794&quot;&gt;Alexandre B. Tsybakov. 2008. Introduction to Nonparametric Estimation (1st. ed.). Springer Publishing Company, Incorporated&lt;/a&gt;. &lt;a href=&quot;#fnref:32&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:32:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:32:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:38&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The performance of a kernel estimator $K$ is defined in terms of &lt;em&gt;the ratio of sample sizes necessary to obtain the same minimum asymptotic mean integrated squared error (for a given [function $f$ that is being kernel-smoothed] when using $K$ as when using [the Epanechnikov kernel]&lt;/em&gt;&lt;sup id=&quot;fnref:30:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:30&quot; class=&quot;footnote&quot;&gt;37&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:38&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:30&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.taylorfrancis.com/books/mono/10.1201/b14876/kernel-smoothing-wand-jones&quot;&gt;Wand, M.P., &amp;amp; Jones, M.C. (1994). Kernel Smoothing (1st ed.). Chapman and Hall/CRC&lt;/a&gt;. &lt;a href=&quot;#fnref:30&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:30:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:30:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:30:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:30:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:42&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.risk.net/journal-risk/2161066/kernel-quantile-based-estimation-expected-shortfall&quot;&gt;Keming Yu &amp;amp; Abdallah K. Ally &amp;amp; Shanchao Yang &amp;amp; David J. Hand, Kernel quantile based estimation of expected shortfall, Journal of Risk, Journal of Risk&lt;/a&gt;. &lt;a href=&quot;#fnref:42&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:42:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:48&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.airitilibrary.com/Article/Detail/05296528-200609-44-3-271-295-a&quot;&gt;Cheng, M.-Y. and S. Sun (2006). Bandwidth selection for kernel quantile estimation. Journal of the Chinese Statistical Association 44 (3), 271–295&lt;/a&gt;. &lt;a href=&quot;#fnref:48&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:48:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:49&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Yet, Chen and Yong Tang&lt;sup id=&quot;fnref:29:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:29&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt;  notes that &lt;em&gt;the reduction in RMSE is not very large for large samples&lt;/em&gt;&lt;sup id=&quot;fnref:29:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:29&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt;, confirming the theoretical results that &lt;em&gt;the reduction is of second order only&lt;/em&gt;&lt;sup id=&quot;fnref:29:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:29&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:49&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:45&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://link.springer.com/article/10.1007/s00477-021-02102-0&quot;&gt;Banfi, F., Cazzaniga, G. &amp;amp; De Michele, C. Nonparametric extrapolation of extreme quantiles: a comparison study. Stoch Environ Res Risk Assess 36, 1579–1596 (2022)&lt;/a&gt;. &lt;a href=&quot;#fnref:45&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:45:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:45:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:45:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:45:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:45:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:45:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:75&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Potential solutions to this problem might be to use 1) a data-driven mixture of a Gaussian and a Cauchy kernel as proposed in Banfi et al.&lt;sup id=&quot;fnref:45:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:45&quot; class=&quot;footnote&quot;&gt;41&lt;/a&gt;&lt;/sup&gt; or 2) a preliminary transformation of the data with a Champernowne distribution as proposed in Buch-Kromann et al.&lt;sup id=&quot;fnref:98&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:98&quot; class=&quot;footnote&quot;&gt;83&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:75&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:57&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The family of heavy-tailed distributions encompasses all the &lt;a href=&quot;https://en.wikipedia.org/wiki/Fat-tailed_distribution&quot;&gt;fat-tailed distributions&lt;/a&gt; enocuntered in finance like &lt;a href=&quot;https://en.wikipedia.org/wiki/Student%27s_t-distribution&quot;&gt;the Student-t distribution&lt;/a&gt; and more. &lt;a href=&quot;#fnref:57&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:60&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1998740&quot;&gt;Rocco, Marco, Extreme Value Theory for Finance: A Survey (February 3, 2012). Bank of Italy Occasional Paper No. 99&lt;/a&gt;. &lt;a href=&quot;#fnref:60&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:60:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:60:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:60:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:60:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:60:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:60:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:60:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:60:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:60:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:54&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://doi.org/10.1016/S0927-5398(97)00008-X&quot;&gt;Danielsson, J., de Vries, C., 1997. Tail index and quantile estimation with very high frequency data. Journal of Empirical Finance 4, 241–257&lt;/a&gt;. &lt;a href=&quot;#fnref:54&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:65&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.jstor.org/stable/2958370&quot;&gt;Hill, B.M. (1975) A Simple General Approach to Inference About the Tail of a Distribution. Annals of Statistics, 3, 1163-1174&lt;/a&gt;. &lt;a href=&quot;#fnref:65&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:65:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:72&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.intechopen.com/chapters/65787&quot;&gt;B. Karima and B. Youcef, Asymptotic Normality of Hill’s Estimator under Weak Dependence, Statistical Methodologies. IntechOpen, Feb. 26, 2020&lt;/a&gt;. &lt;a href=&quot;#fnref:72&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:71&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.jstor.org/stable/3318788&quot;&gt;Drees, Holger. Extreme Quantile Estimation for Dependent Data, with Applications to Finance. Bernoulli, vol. 9, no. 4, 2003, pp. 617–57&lt;/a&gt;. &lt;a href=&quot;#fnref:71&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:71:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:71:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:52&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://doi.org/10.1016/S0927-5398(00)00012-8&quot;&gt;Alexander J. McNeil, Rudiger Frey, Estimation of tail-related risk measures for heteroscedastic financial time series: an extreme value approach, Journal of Empirical Finance, Volume 7, Issues 3–4, 2000, Pages 271-300&lt;/a&gt;. &lt;a href=&quot;#fnref:52&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:52:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:99&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;More precisely, to the standardized residuals of an AR(1)-GARCH(1,1) model of these asset returns; the rationale is that excesses over threshold should be more justified with AR(1)-GARCH(1,1) standardized residuals than with raw asset returns. &lt;a href=&quot;#fnref:99&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:50&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://link.springer.com/article/10.1007/s00780-015-0287-6&quot;&gt;de Haan, L., Mercadier, C. &amp;amp; Zhou, C. Adapting extreme value statistics to financial time series: dealing with bias and serial dependence. Finance Stoch 20, 321–354 (2016)&lt;/a&gt;. &lt;a href=&quot;#fnref:50&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:50:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:50:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:70&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://pmc.ncbi.nlm.nih.gov/articles/PMC9818059/&quot;&gt;Benito S, Lopez-Martín C, Navarro MA. Assessing the importance of the choice threshold in quantifying market risk under the POT approach (EVT). Risk Manag. 2023;25(1)&lt;/a&gt;. &lt;a href=&quot;#fnref:70&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:103&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://link.springer.com/article/10.1057/s41283-022-00106-w&quot;&gt;Benito, S., Lopez-Martín, C. &amp;amp; Navarro, M.A. Assessing the importance of the choice threshold in quantifying market risk under the POT approach (EVT). Risk Manag 25, 6 (2023)&lt;/a&gt;. &lt;a href=&quot;#fnref:103&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:103:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:103:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:107&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.tandfonline.com/doi/abs/10.1080/07350015.1996.10524640&quot;&gt;Bollerslev, T., and Ghysels, E. (1996). Periodic Autoregressive Conditional Heteroskedasticity. Journal of Business &amp;amp; Economic Statistics, 14, 139–151&lt;/a&gt;. &lt;a href=&quot;#fnref:107&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:102&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://projecteuclid.org/journals/annals-of-statistics/volume-11/issue-4/Estimating-the-Stable-Index-alpha-in-Order-to-Measure-Tail/10.1214/aos/1176346318.full&quot;&gt;William H. DuMouchel. “Estimating the Stable Index α in Order to Measure Tail Thickness: A Critique.” Ann. Statist. 11 (4) 1019 - 1031, December, 1983&lt;/a&gt;. &lt;a href=&quot;#fnref:102&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:100&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.jstor.org/stable/1269343&quot;&gt;J. R. M. Hosking and J. R. Wallis, Parameter and Quantile Estimation for the Generalized Pareto Distribution, Technometrics, Vol. 29, No. 3 (Aug., 1987), pp. 339-349&lt;/a&gt;. &lt;a href=&quot;#fnref:100&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:101&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://doi.org/10.1016/j.csda.2005.09.011&quot;&gt;Alberto Luceno, Fitting the generalized Pareto distribution to data using maximum goodness-of-fit estimators, Computational Statistics &amp;amp; Data Analysis, Volume 51, Issue 2, 2006, Pages 904-917&lt;/a&gt;. &lt;a href=&quot;#fnref:101&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:51&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;In EVT terms, the true portfolio return distribution is assumed to be in the Fréchet domain of attraction. &lt;a href=&quot;#fnref:51&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:55&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://tinbergen.nl/discussion-paper/4174/98-016-2-beyond-the-sample-extreme-quantile-and-probability-estimation&quot;&gt;Danielsson J., de Vries C. G. (1997). Beyond the Sample: Extreme Quantile and Probability Estimation, Mimeo, Tinbergen Institute Rotterdam&lt;/a&gt;. &lt;a href=&quot;#fnref:55&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:56&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.tandfonline.com/doi/abs/10.1080/713665670&quot;&gt;R. Cont (2001) Empirical properties of asset returns: stylized facts and statistical issues, Quantitative Finance, 1:2, 223-236&lt;/a&gt;. &lt;a href=&quot;#fnref:56&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:56:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:61&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The cumulative distribution function of the portfolio returns is in the maximum domain of attraction of a Fréchet-type extreme value distribution if and only if $1 - F$ has that form, c.f. Rocco&lt;sup id=&quot;fnref:60:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:60&quot; class=&quot;footnote&quot;&gt;44&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:61&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:59&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The tail index is also known as the &lt;em&gt;extreme value index&lt;/em&gt;. &lt;a href=&quot;#fnref:59&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:64&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.jstor.org/stable/2286285&quot;&gt;Weissman, I.: Estimation of parameters and large quantiles based on the k largest observations. J. Am. Stat. Assoc. 73, 812–815 (1978)&lt;/a&gt;. &lt;a href=&quot;#fnref:64&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:73&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The formula for the EVT/Weissman-based portfolio VaR estimator is the one in Nieto and Ruiz&lt;sup id=&quot;fnref:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;84&lt;/a&gt;&lt;/sup&gt; and is slightly different from the one in Danielsson and de Vries&lt;sup id=&quot;fnref:55:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:55&quot; class=&quot;footnote&quot;&gt;59&lt;/a&gt;&lt;/sup&gt;, c.f. Danielsson&lt;sup id=&quot;fnref:68:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:68&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt; in which a general threshold $u$ is used instead of $r^’_{(\hat{k})}$ or $r^’_{(\hat{k}+1)}$. &lt;a href=&quot;#fnref:73&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:67&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://rivista-statistica.unibo.it/article/view/9533&quot;&gt;Fedotenkov, I. (2020). A Review of More than One Hundred Pareto-Tail Index Estimators. Statistica, 80(3), 245–299&lt;/a&gt;. &lt;a href=&quot;#fnref:67&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:105&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;As required under Basel III framework&lt;sup id=&quot;fnref:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;85&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:105&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:63&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://ebslgwp.hhs.se/heccah/abs/heccah0646.htm&quot;&gt;Longin, F.M., and B. Solnik (1997). Correlation structure of international equity markets during extremely volatile periods. Working Paper 97-039, ESSEC, Cergy-Pontoise, France&lt;/a&gt;. &lt;a href=&quot;#fnref:63&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:63:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:76&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.scienpress.com/journal_focus.asp?main_id=56&amp;amp;Sub_id=IV&amp;amp;Issue=2530847&quot;&gt;Saeed Shaker-Akhtekhane &amp;amp; Solmaz Poorabbas, 2023. Value-at-Risk Estimation Using an Interpolated Distribution of Financial Returns Series,  Journal of Applied Finance &amp;amp; Banking, SCIENPRESS Ltd, vol. 13(1), pages 1-6&lt;/a&gt;. &lt;a href=&quot;#fnref:76&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:76:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:76:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:76:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:78&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://epubs.siam.org/doi/10.1137/0915069&quot;&gt;Wood, S. N., Monotonic Smoothing Splines Fitted by Cross Validation, 1994, SIAM Journal on Scientific Computing, 1126-1133, 15, 5&lt;/a&gt;. &lt;a href=&quot;#fnref:78&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:78:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:91&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;In addition, with multivariate VaR models, parametric methods also needs to &lt;em&gt;us[e] approximations of the pricing formulas of each [non simple] asset in the portfolio&lt;/em&gt;&lt;sup id=&quot;fnref:1:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, leading to methods like Delta-Normal or Delta-Gamma-(Theta-)Normal based on Taylor expansions of the assets pricing formulas, c.f. Gueant&lt;sup id=&quot;fnref:1:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:91&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:81&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1024151&quot;&gt;Boudt, Kris and Peterson, Brian G. and Croux, Christophe, Estimation and Decomposition of Downside Risk for Portfolios with Non-Normal Returns (October 31, 2007). Journal of Risk, Vol. 11, No. 2, pp. 79-103, 2008&lt;/a&gt;. &lt;a href=&quot;#fnref:81&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:80&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2692543&quot;&gt;Martin, R. Douglas and Arora, Rohit, Inefficiency of Modified VaR and ES&lt;/a&gt;. &lt;a href=&quot;#fnref:80&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:84&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Gueant&lt;sup id=&quot;fnref:1:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; notes that &lt;em&gt;the RiskMetrics model for the distribution of the evolution of the risk factors is based on the assumption that log-returns of prices (or variations in the case of interest rates) are independent across time and normally distributed, when appropriately scaled by an appropriate measure of volatility&lt;/em&gt;&lt;sup id=&quot;fnref:1:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:84&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:108&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://projecteuclid.org/journals/annals-of-statistics/volume-38/issue-5/Kernel-density-estimation-via-diffusion/10.1214/10-AOS799.full&quot;&gt;Z. I. Botev. J. F. Grotowski. D. P. Kroese. “Kernel density estimation via diffusion.” Ann. Statist. 38 (5) 2916 - 2957, October 2010&lt;/a&gt;. &lt;a href=&quot;#fnref:108&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:109&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://doi.org/10.1016/S0167-9473(01)00087-1&quot;&gt;Mhamed-Ali El-Aroui, Jean Diebolt, On the use of the peaks over thresholds method for estimating out-of-sample quantiles, Computational Statistics &amp;amp; Data Analysis, Volume 39, Issue 4, 2002, Pages 453-475&lt;/a&gt;. &lt;a href=&quot;#fnref:109&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:110&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;http://dx.doi.org/10.1080/03610926.2014.887104&quot;&gt;David E. Giles, Hui Feng &amp;amp; Ryan T. Godwin (2016) Bias-corrected maximum likelihood estimation of the parameters of the generalized Pareto distribution, Communications in Statistics - Theory and Methods, 45:8, 2465-2483&lt;/a&gt;. &lt;a href=&quot;#fnref:110&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:104&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://doi.org/10.1016/j.insmatheco.2018.09.004&quot;&gt;Valerie Chavez-Demoulin, Armelle Guillou, Extreme quantile estimation for beta-mixing time series and applications, Insurance: Mathematics and Economics, Volume 83, 2018, Pages 59-74&lt;/a&gt;. &lt;a href=&quot;#fnref:104&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:11&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;http://www.bis.org/publ/bcbs193.pdf&quot;&gt;Basel Committee on Banking Supervision. Revisions to the Basel II market risk framework (Updated as of 31 December 2010). 2011&lt;/a&gt;. &lt;a href=&quot;#fnref:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:13&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.bis.org/publ/bcbs265.htm&quot;&gt;Basel Committee on Banking Supervision. Fundamental review of the trading book.&lt;/a&gt;. &lt;a href=&quot;#fnref:13&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:34&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.jstor.org/stable/2684934&quot;&gt;Hyndman, R. J., &amp;amp; Fan, Y. (1996). Sample Quantiles in Statistical Packages. The American Statistician, 50(4), 361–365&lt;/a&gt;. &lt;a href=&quot;#fnref:34&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:25&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.jstor.org/stable/2286734&quot;&gt;Parzen E (1979), Nonparametric statistical data modeling, J Am Stat Assoc 74(365):105–121&lt;/a&gt;. &lt;a href=&quot;#fnref:25&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:41&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://doi.org/10.1016/j.econlet.2003.07.017&quot;&gt;Karim M Abadir, Steve Lawford, Optimal asymmetric kernels, Economics Letters, Volume 83, Issue 1, 2004, Pages 61-68&lt;/a&gt;. &lt;a href=&quot;#fnref:41&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:98&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://papers.ssrn.com/sol3/papers.cfm?abstract_id=704903&quot;&gt;Buch-Kromann, Tine and Nielsen, Jens Perch and Guillen, Montserrat and Bolancé, Catalina, Kernel Density Estimation for Heavy-Tailed Distributions Using the Champernowne Transformation (January 2005)&lt;/a&gt;. &lt;a href=&quot;#fnref:98&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:6&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://doi.org/10.1016/j.ijforecast.2015.08.003&quot;&gt;Maria Rosa Nieto, Esther Ruiz, Frontiers in VaR forecasting and backtesting, International Journal of Forecasting, Volume 32, Issue 2, 2016, Pages 475-501&lt;/a&gt;. &lt;a href=&quot;#fnref:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:5&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Basel III requires&lt;sup id=&quot;fnref:13:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:13&quot; class=&quot;footnote&quot;&gt;79&lt;/a&gt;&lt;/sup&gt; that the VaR measures for risks on both trading and banking books must be calculated at a 99.9% confidence level. &lt;a href=&quot;#fnref:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name>Roman R.</name></author><category term="value at risk" /><category term="extreme value theory" /><category term="kernel smoothing" /><summary type="html">Value-at-Risk (VaR) is one of the most commonly used risk measures in the financial industry1 in part thanks to its simplicity - because VaR reduces the market risk associated with any portfolio to just one number2 - and in part due to regulatory requirements (Basel market risk frameworks34, SEC Rule 18f-45…). Nevertheless, when it comes to actual computations, the above definition is by no means constructive1 and accurately estimating VaR is a very challenging statistical problem2 for which several methods have been developed. In this blog post, I will describe some of the most well-known univariate VaR estimation methods, ranging from non-parametric methods based on empirical quantiles to semi-parametric methods involving kernel smoothing or extreme value theory and to parametric methods relying on distributional assumptions. Value-at-Risk Definition The Value-at-Risk of a portfolio of financial instruments corresponds to the maximum potential change in value of [that portfolio] with a given probability over a certain horizon2. More formally, the Value-at-Risk $VaR_{\alpha}$ of a portfolio over a time horizon $T$ (1 day, 10 days, 20 days…) and at a confidence level $\alpha$% $\in ]0,1[$ (95%, 97.5%, 99%…) can be defined6 as the opposite7 of the $1 - \alpha$ quantile of the portfolio return distribution over the time horizon $T$ \[\text{VaR}_{\alpha} = - \inf_{x} \left\{x \in \mathbb{R}, P(X \leq x) \geq 1 - \alpha \right\}\] , where $X$ is a random variable representing the portfolio return over the time horizon $T$. This formula is also equivalent8 to \[\text{VaR}_{\alpha} = - F_X^{-1}(1 - \alpha)\] , where $F_X^{-1}$ is the inverse cumulative distribution function, also called the quantile function, of the random variable $X$. Graphically, this definition is illustrated in: Figure 1, for a continuous portfolio return distribution at a generic confidence level $\alpha$% and over a generic horizon. Figure 1. Graphical illustration of a portfolio VaR as a quantile of its continuous return distribution. Source: Adapted from Yamai and Yoshiba. Figure 2, for a discrete portfolio return distribution at a confidence level $\alpha$ = 99% and over a 1-month horizon, which is commented as follows in Jorion9: We need to find the loss that will not be exceeded in 99% of cases, or such that 1% of [the 624] observations - that is, 6 out of 624 occurrences - are lower. From Figure [2], this number is about -3.6%, [resulting in a portfolio VaR of 3.6%]. Figure 2. Graphical illustration of a portfolio VaR as a quantile of its discrete monthly return distribution, $\alpha$ = 99%. Source: Jorion. Univariate v.s. multivariate Value-at-Risk A portfolio can be considered as both: An asset in itself, with its own return distribution. A weighted collection of individual assets, each with their own return distribution. This raises the question of whether first to aggregate profit and loss data and proceed with a univariate [VaR] model for the aggregate, or to start with disaggregate data10 and proceed with a multivariate VaR model from the disaggregated data. In this blog post, I will only discuss univariate11 VaR models - originally suggested by Zangari12 as simple and effective approach[es] for calculating Value-at-Risk12 - in which portfolio returns are considered as a univariate time series without reference to the portfolio constituents13. Indeed, since the goal of VaR is to measure the market risk of a portfolio, it seems reasonable to model the portfolio return series directly12. Arithmetic returns v.s. logarithmic returns in Value-at-Risk calculations In VaR calculations, it is usually prefered, for a variety of reasons, to work with logarithmic returns rather than arithmetic (simple, linear) ones13, c.f. Ballotta13 and Jorion9 for more details. In that case, though, because investors are primarily interested in simple returns14, the logarithmic VaR $\text{VaR}_{\alpha}^{(l)} $ needs to be converted into an arithmetic VaR $\text{VaR}_{\alpha}^{(a)} $. Thanks to the definition of VaR as a quantile of the portfolio return distribution and the relationship between arithmetic and logarithmic returns, this is easily be done through the formula \[\text{VaR}_{\alpha}^{(a)} = 1 - \exp \left( - \text{VaR}_{\alpha}^{(l)} - 1 \right)\] While not frequently mentioned in the litterature15, it is important to be aware of this subtlety. History of Value-at-Risk Searching for the best means to represent the risk exposure of a financial institution’s trading portfolio in a single number16 is a quest that folklore attributes the inception of to Dennis Weatherstone at J.P. Morgan [in the late 1980s], who was looking for a way to convey meaningful risk exposure information to the financial institution’s board without the need for significant technical expertise on the part of the board members16. It is then Till Guldimann, head of global research at J.P. Morgan at that time, who designed what would come to be known as the J.P. Morgan’s daily VaR report17 and who thus can be viewed as the creator of the term Value-at-Risk9. The interested reader is referred to Holton18 for an historical perspective on Value-at-Risk, in which the origins of VaR as a measure of risk are even traced back as far as 1922 to capital requirements the New York Stock Exchange imposed on member firms18. Value-at-Risk estimation When a sample of portfolio returns over a given time horizon $r_1,…,r_n$ is available - like in ex post analysis -, the Value-at-Risk of that portfolio over the same horizon19 at a confidence level $\alpha \in ]0,1[$ is a textbook example of VaR calculation as the opposite of the $(1 - \alpha)$% quantile of the empirical return distribution $r_1,…,r_n$. Problem is, the discrete nature of the extreme returns of interest makes it difficult to accurately compute that quantile, as explained in Danielsson and de Vries20: In the interior, the empirical sampling distribution is very dense, with adjacent observations very close to each other. As a result the sampling distribution is very smooth in the interior and is the mean squared error consistent estimate of the true distribution. The closer one gets to the extremes, the longer the interval between adjacent returns becomes. This can be seen in [Figure 3] where the 7 largest and smallest returns on the stocks in the sample portfolio and SP-500 Index for 10 years are listed. These extreme observations are typically the most important for VaR analysis, however since these values are clearly discrete, the VaR will also be discrete, and hence be either underpredicted or overpredicted. Figure 3. Extreme daily returns for select U.S. stocks and S&amp;amp;P 500, 1987-1996. Source: Danielsson and de Vries. In other words, the quantile corresponding to the estimation of the Value-at-Risk […] rather depends on the realizations of the [portfolio returns] than on their probability distribution1, so that the Value-at-Risk calculated with a quantile of the empirical distribution will be highly unstable, especially when considering a Value-at-Risk with a high confidence level with only few available data1. This is why VaR estimation is a very challenging statistical problem2, sharing many similarities with the problem of estimating the frequency and/or severity of extreme events in other domains, like floods frequency estimation21 in hydrology. In order to compute a statistical estimator of a portfolio Value-at-Risk, three main approaches exist: Non-parametric approaches, that do not make any specific distributional assumptions on the portfolio return distribution and whose VaR estimators do not depend on any auxiliary parameter. Semi-parametric approaches, that do not make any specific distributional assumptions on the portfolio return distribution but whose VaR estimators depend on one or several auxiliary parameters. Parametric approaches, that make a specific distributional assumption on the portfolio return distribution and whose VaR estimators depend on one or several auxiliary parameters. To be noted that non-parametric and semi-parametric approaches might still make distributional assumptions, in particular for convergence proofs - like assuming that returns are independent and identically distributed (i.i.d.) -, but these assumptions are then generic in nature, contrary to parametric approaches which assume a very specific return distribution, like a Gaussian distribution, which is one of the most widely applied parametric probability distribution22 in finance. Non-parametric and semi-parametric Value-at-Risk estimation Chen and Yong Tang23 notes that non-parametric and semi-parametric VaR estimators have the advantages of (i) being free of distributional assumptions […] while being able to capture fat-tail and asymmetry distribution of returns automatically; and (ii) imposing much weaker assumptions on the dynamics of the return process and allowing data “speak for themselves”23. Empirical quantile of the portfolio return distribution A well-known estimator of the $(1 - \alpha)$% quantile of any probability distribution is the empirical $(1 - \alpha)$% quantile of that distribution, which relies on order statistics. In the context of VaR estimation, the underlying idea is explained in Dowd24: If we have a sample of $n$ profit and loss (P/L) observations, we can regard each observation as giving an estimate of VaR at an implied probability level. For example, if $n$ = 100, we can take the 5% VaR as the negative of the sixth25 smallest P/L observation, the 1% VaR as the negative of the second-smallest, and so on. This leads to the empirical portfolio VaR estimator, defined26 as the opposite of the $n (1 - \alpha) + 1$-th highest portfolio return27 \[\text{VaR}_{\alpha} = -r_{\left( n (1 - \alpha) + 1 \right)}\] , where $r_{(1)} \leq r_{(2)} \leq … \leq r_{(n-1)} \leq r_{(n)}$ are the order statistics of the portfolio returns. Now, due to the discrete nature of the portfolio return distribution, there is little chance that $n (1 - \alpha) + 1$ is an integer. In that case, two26 possible choices are: Either to define the opposite of the $\lfloor n (1 - \alpha) \rfloor + 1$-th highest portfolio return27 as the empirical portfolio VaR estimator24 \[\text{VaR}_{\alpha} = - r_{\left( \lfloor n (1 - \alpha) \rfloor + 1 \right)}\] Or to define a linear interpolation28 between the opposite of the $\lfloor (n+1) \left( 1 - \alpha \right) \rfloor $-th and $ \lfloor (n+1) \left( 1 - \alpha \right) \rfloor + 1$-th highest portfolio returns27 as the empirical portfolio VaR estimator293013 \[\text{VaR}_{\alpha} = - \left( 1 - \gamma \right) r_{\left( \lfloor (n+1) \left( 1 - \alpha \right) \rfloor \right)} - \gamma r_{\left( \lfloor (n+1) \left( 1 - \alpha \right) \rfloor + 1 \right)}\] , with $\gamma$ $= (n+1) \left( 1 - \alpha \right)$ $- \lfloor (n+1) \left( 1 - \alpha \right) \rfloor$. An interesting property of the resulting portfolio VaR estimator is that it is consistent in the presence of weak dependence31 between portfolio returns, c.f. Chen and Yong Tang23. In terms of drawbacks, the two major limitations of the empirical portfolio VaR estimator are that: It only takes into account a small part of the information contained in the [portfolio returns] distribution function1 - that is, at most two returns - which is highly inefficient, especially when the number of portfolio returns is already relatively small. It cannot generate any information about the tail of the return distribution beyond the smallest sample observation16, which might lead to severly underestimate the true risk of the portfolio. Kernel-smoothed quantile of the portfolio return distribution Another way to account for the information available in the empirical […] distribution1 than using the empirical quantile estimator discussed in the previous sub-section is to use the kernel-smoothed quantile estimator introduced in Gourieroux et al.32. Kernel-smoothing is a methodology belonging to statistics and probability theory that can be thought of as a way of generalizing a histogram constructed with the sample data16, as illustrated in Figure 4. Figure 4. Histogram v.s. kernel-smoothed density for the same sample of data. Source: Wikipedia. On Figure 4, where a histogram results in a density that is piecewise constant, a kernel[-smoothed} approximation results in a smooth density23. Coming back to the estimator of Gourieroux et al.32, it is defined as the $(1 - \alpha)$% quantile of a kernel-smoothed approximation of the portfolio return distribution, which essentially results in a weighted average of the order statistics around [$r_{\left( \lfloor n (1 - \alpha) \rfloor + 1 \right)}$] rather than […] a single order statistic23 or a linear interpolation between two order statistics. From a practical perspective, that VaR estimator is computed as follows: Select a kernel function $K$, usually23 taken as a symmetric33 probability density function. The theoretical optimal34 choice for such a kernel function is the Epanechnikov kernel, defined as $ K(u) = \frac{3}{4} \left( 1 - u^2 \right) I_{|u| \leq 1}$. However, the litterature suggests that the form of the kernel has little effect on the [accuracy] of the [kernel-smoothed approximation of the return distribution]32, mainly because: The theoretical framework used to establish the optimality of the Epanechnikov kernel relies on large asymptotics, moreover in a debatable way35. The performances36 of other commonly used kernel functions are anyway very close to those of the Epanechnikov kernel37. So, in applications, the most common2332 choice for a kernel function is rather the Gaussian kernel, defined as $ K(u) = \frac{1}{\sqrt{2 \pi }} e^{- \frac{u^2}{2} }$. As a side note, using a kernel function to approximate the portfolio return distribution might look like a parametric approach to VaR estimation in disguise, but Butler and Schachter16 explains why this is not the case: Note that use of a normal or Gaussian kernel estimator does not make the ultimate estimation of the VaR parametric. As the sample size grows, the net sum of all the smoothed points approaches the true [portfolio return distribution], whatever that may be, irrespective of the method of smoothing the data. This is because the influence of each point becomes arbitrarily small as the sample size grows, so the choice of kernel imposes no restrictions on the results. Select a kernel bandwidth parameter $h &amp;gt; 0$ for the kernel function. Gourieroux et al.32 describes that parameter as follows: The bandwidth parameter controls the range of data points that will be used to estimate the distribution. A small bandwidth results in a rough distribution that does not improve appreciably on the original data, while a large bandwidth over-smoothes the density curve and erases the underlying structure. This latter point is illustrated in Figure 5 and Figure 6. Figure 5. Influence of the bandwidth parameter on the kernel-smoothed approximation of a normal mixture distribution (dashed) from n = 1000 observations. Source: Adapted from Wand and Jones. Figure 6. Dynamic influence of the bandwidth parameter on the kernel-smoothed approximation of a normal distribution. Source: KDEpy. On Figure 5, it is clearly visible that: a) The estimate of the normal mixture distribution is very rough37. This corresponds to a too small bandwidth parameter $h$ that undersmoothes the observations. b) The estimate of the normal mixture distribution smoothes away its bimodality structure. This corresponds to a too big bandwidth parameter $h$ that oversmoothes the observations. c) The estimate of the normal mixture distribution is not overly noisy, yet the essential structure of the underlying density has been recovered37. This corresponds to an adequate bandwidth parameter $h$. On Figure 6, the situation is the same as in Figure 5, except that the bandwidth parameter $h$ is being dynamically increased from 0 to ~20. Figure 5 and Figure 6 empirically demonstrate that the choice of the bandwidth is of crucial importance38, although a difficult task, especially when smoothing the tails of underlying distributions with possible data scarcity38. The interested reader is refered to Wand and Jones37, Tsybakov35 and Cheng and Sun39 for the description of several methods to choose the optimal bandwidth for a kernel function. Compute the kernel-smoothed portfolio VaR estimator as \[\text{VaR}_{\alpha} = - \hat{F}^{-1}(1 - \alpha)\] , where $ \hat{F}^{-1}(1 - \alpha) $ is the solution of the quantile equation \[\hat{F}(x) = \int_{-\infty}^x \hat{f}(u) \, du = 1 - \alpha\] with $ \hat{f}(x) = \frac{1}{n h} \sum_{i=1}^n K \left( \frac{x - r_i}{h} \right) $. That part is typically done with a numerical algorithm, like the Gauss–Newton algorithm mentioned in Gourieroux et al.32. Two important positive results on the kernel-smoothed VaR estimator are established in Chen and Yong Tang23: Theoretically, it is consistent in the presence of weak dependence between portfolio returns. Empirically, it produces more precise estimates40 than those obtained with the empirical VaR estimator - especially when the number of observations is small - which can translate to a large amount in financial terms23. Similar results - in a non-financial context - are reported in Cheng and Sun39: It turns out that kernel smoothed quantile estimators, with no matter which bandwidth selection method used, are more efficient than the empirical quantile estimator in most situations. And when sample size is relatively small, kernel smoothed estimators are especially more efficient than the empirical quantile estimator. In other words, the extra effort of smoothing pays of at the end23! The major limitation of the kernel-smoothed VaR estimator, though, is that if the selected kernel function does not reflect the tail features of the true portfolio return distribution, some problems may arise when the quantile to be estimated requires an extrapolation […] far beyond the range of observed data41. In the words of Danielsson and de Vries20: Almost all kernels are estimated with the entire data set, with interior observations dominating the kernel estimation. While even the most careful kernel estimation will provide good estimates for the interior, there is no reason to believe that the kernel will describe the tails adequately. Tail bumpiness is a common problem in kernel estimation. So, while the kernel-smoothed VaR estimator is capable of tail extrapolation - contrary to the empirical VaR estimator - that capability should be used with extreme caution42. A proper portfolio VaR estimator when tail extrapolation is needed thus remains elusive at this stage. Extrapolated empirical quantile of the portfolio return distribution In order to solve the problem of tail extrapolation while retaining the simplicity of the empirical quantile estimator, Hutson30 proposes to extend the linearly interpolated quantile function into a tail extrapolation quantile function35 that allows for non-parametric extrapolation beyond the observed data30. In terms of VaR estimation, Hutson’s work translates into the following extrapolated empirical portfolio VaR estimator: For $0 &amp;lt; \left( 1 - \alpha \right) \leq \frac{1}{n+1}$ \[\text{VaR}_{\alpha} = - r_{(1)} - \left( r_{(2)} - r_{(1)} \right) \log \left( (n+1) \left( 1 - \alpha \right) \right)\] For $\left( 1 - \alpha \right) \in ]\frac{1}{n+1} , \frac{n}{n+1}[$, $ \text{VaR}_{\alpha} $ is defined as the standard empirical portfolio VaR estimator For $\frac{n}{n+1} &amp;lt; \left( 1 - \alpha \right) &amp;lt; 1$ \[\text{VaR}_{\alpha} = - r_{(n)} + \left( r_{(n)} - r_{(n-1)} \right) \log \left( (n+1) \alpha \right)\] Hutson30 establishes the consistency of his quantile estimator for i.i.d. observations and empirically demonstrates using misc. theoretical distributions that it fits well to the ideal sample for all distributions for ideal samples as small as $n = 10$30. Unfortunately for financial applications, Hutson30 also notes that his quantile estimator appears to be [unable] to completely capture the tail behavior of heavy-tailed43 distributions such as Cauchy30. This is confirmed by Banfi et al.41, who finds that Hutson’s method provides competitive results when light-tailed distributions are of interest41 but generates large biases in the case of heavy-tailed distributions41. Extreme value theory-based quantile of the portfolio return distribution Another possibility to improve [the] tail extrapolation41 properties of the empirical quantile estimator is to rely on extreme value theory (EVT), which is a domain of statistics concerned with the study of the asymptotical distribution of extreme events, that is to say events which are rare in frequency and huge with respect to the majority of observations44. Indeed, because VaR only deals with extreme quantiles of the distribution44, EVT sounds like a natural framework for providing more reliable VaR estimates than the usual ones, given that [it] directly concentrates on the tails of the distribution, thus avoiding a major flaw of [other] approaches whose estimates are somehow biased by the credit they give to the central part of the distribution, thus underestimating extremes and outliers, which are exactly what one is interested in when calculating VaR44. Two preliminary remarks before proceeding: One notational The EVT litterature45 typically focuses on the upper tail of distributions, so that the portfolio returns $r_1,…,r_n$ need to be replaced by their opposites $r^{‘}_1 = -r_1$,…,$r^{‘}_n=-r_n$. One important for applying EVT results in finance Most existing EVT methods require [i.i.d.] observations, whereas financial time series exhibit obvious serial dependence features such as volatility clustering44. It turns out that this issue has been addressed in works dealing with weak serial dependence [and] the main message from these studies is that usual EVT methods are still valid44. For example, the Hill estimator discussed below has originally been derived under the assumptions of i.i.d. observations46, but it has also been proved to be usable with weakly dependent observations47; the same applies48 to the Weissman quantile estimator also discussed below. In addition, even though the [EVT] estimators obtained may be less accurate and neglecting this fact could lead to inadequate resolutions in order to cope with the risk of occurrence of extreme events44, they are consistent and unbiased in the presence of higher moment dependence14 and it is even possible to explicitly model extreme dependence using the [notion of] extremal index14. With these remarks in mind, EVT offers two main methods14 to model the upper tail of the portfolio return distribution $r^{‘}_1$,…,$r^{‘}_n$ and compute an EVT-based portfolio VaR estimator: A fully parametric method based on the generalized Pareto distribution (GPD) A semi-parametric method based on the Hill (or similar) estimator Generalized Pareto distribution-based method This method consists in fitting a generalized Pareto distribution to the upper tail of the portfolio return cumulative distribution function (c.d.f.) and computing its $\alpha$% quantile. This is for example done in the seminal paper of McNeil and Frey49, in which such a distribution is fitted to the returns50 of different financial instruments. The theoretical justification for this method lies in the Pickands-Balkema-De Hann’s theorem, which states that EVT holds sufficiently far out in the tails such that we can obtain the distribution not only of the maxima but also of other extremely large observations14. In practice, fitting a GPD to the upper tail of the portfolio return distribution is a two-step process: First, a threshold $r^{‘}_{(n - k)}, k \geq 1$ beyond which returns should be considered as belonging to the upper tail needs to be selected. This threshold corresponds to the location parameter $u \in \mathbb{R}$ of the GPD. Unfortunately, the choice of how many of the $k$-largest observations should be considered extreme is not straightforward41 and is actually a central issue to any application of EVT44. Indeed, as detailled in de Haan et al.51: Theoretically, the statistical properties of EVT-based estimators are established for $k$ such that $k \to \infty$ and $k/n \to 0$ as $n \to \infty$. In applications with a finite sample size, it is necessary to investigate how to choose the number of high observations used in estimation. For financial practitioners, two difficulties arise: firstly, there is no straightforward procedure for the selection; secondly, the performance of the EVT estimators is rather sensitive to this choice. More specifically, there is a bias–variance tradeoff: with a low level of $k$, the estimation variance is at a high level which may not be acceptable for the application; by increasing $k$, i.e., using progressively more data, the variance is reduced, but at the cost of an increasing bias. The literature offers some guidance on how to choose an adequate cut-off between the central part and the upper tail of the return distribution but that choice remains notoriously difficult in general, c.f. Benito et al.52 for a review. Fortunately, in the specific context of VaR estimation, there is a large set of thresholds that provide similar GPD quantiles estimators and as a consequence similar market risk measures53 - from about the 80th percentile of observations to the 95th percentile of observations53 - so that the researchers and practitioners should not focus excessively on the threshold choice53. Figure 7 and Figure 8 illustrate this point with a GPD fitted to the lower tail of daily percentage returns of Deutsche mark/British pound (DEM/GBP) exchange rates from 3rd January 1984 to 31st December 199154, using two different thresholds: Figure 7 - A threshold $u \approx -1.2292$, corresponding to ~2% of the observations. Figure 7. Lower tail of daily percentage returns of Deutsche mark/British pound (DEM/GBP) exchange rates, GDP fit with threshold u = -1.2292, 3rd January 1984 to 31st December 1991. Figure 8 - A threshold $u \approx -0.2683$, corresponding to ~20% of the observations. Figure 8. Lower tail of daily percentage returns of Deutsche mark/British pound (DEM/GBP) exchange rates, GDP fit with threshold u = -0.2683, 3rd January 1984 to 31st December 1991. From Figure 7 and Figure 8, both thresholds lead to an equally good extreme lower tail GPD fit, which is confirmed numerically by goodness-of-fit measures. Consequently, in order to keep the threshold selection step as simple as possible for VaR estimation, the early suggestion of DuMouchel55 to use the 90th percentile of observations seems a very good starting point. Second, the shape parameter $\xi \in \mathbb{R}$ and the scale parameter $\sigma &amp;gt; 0$ of the GPD need to be estimated. This is usually done through likelihood maximization, but other procedures are described in the litterature (method of moments56, maximization of goodness-of-fit estimators57, etc.). Once the parameters of the GPD have been determined, the EVT/GPD-based portfolio VaR estimator49 is defined as the $\alpha$% quantile of that GPD through the formula \[\text{VaR}^{\text{GPD}}_{\alpha} = \begin{cases} r^{'}_{(n - k)} + \frac{\hat{\sigma}}{\hat{\xi}} \left( \left( \frac{k}{T ( 1 - \alpha)} \right)^\xi -1 \right) &amp;amp;\text{if } \hat{\xi} \ne 0 \\ r^{'}_{(n - k)} - \hat{\sigma} \ln \frac{T ( 1 - \alpha)}{k} &amp;amp;\text{if } \hat{\xi} = 0 \\ \end{cases}\] , where: $\hat{\xi}$ is an estimator of the shape parameter $\xi$ of the GPD. $\hat{\sigma}$ is an estimator of the scale parameter $\sigma$ of the GPD. Figure 9, taken from Danielsson 14, illustrates the near perfect fit that can be obtained when such a method is applied to the upper and lower tails of the daily S&amp;amp;P 500 returns over the period 1970-2009. Figure 9. Upper and lower tails of daily S&amp;amp;P 500 returns fitted with an EVT-estimated distribution v.s. a normal distribution, 1970-2009. Source: Danielsson. Hill estimator-based method Under the assumption that the portfolio return distribution belongs58 to the generic family of heavy-tailed distributions, this method consists in deriving an asymptotic estimator of its $\alpha$% quantile. This is for example done in Danielsson and de Vries59 and in Drees48. This method is justified by stylized facts60 of asset returns, as explained in Danielsson and de Vries20: […] because we know that financial return data are heavy tailed distributed, one can rely on a limit expansion for the tail behavior that is shared by all heavy tailed distributions. The importance of the central limit law for extremes is similar to the importance of the central limit law, i.e. one does not have to choose a particular parametric distribution. Under that generic assumption, it can be demonstrated61 that the upper tail of the portfolio returns decays as a power function multiplied by a slowly varying function, that is \[1 - F(x) = x^{-\gamma} L(x), x &amp;gt; 0\] , where: $F$ is the c.d.f. of the (opposite of the) portfolio returns. $\gamma = \frac{1}{\xi} &amp;gt; 0$ is the tail index62 of $F$, with $\xi$ the same shape parameter as in the GPD method. $L$ is a slowly varying function in a sense defined in Rocco44. From this asymptotic behaviour, the EVT/Weissman-based portfolio VaR estimator is defined as the $\alpha$% Weissman63 quantile estimator of an heavy-tailed distribution through the formula \[\text{VaR}^{\text{WM}}_{\alpha} = r^{'}_{(n - k)} \left( \frac{k}{n \left( 1 - \alpha \right) } \right)^{\frac{1}{\hat{\gamma}}}\] , where64: $\hat{\gamma}$ is an estimator of the tail index $\gamma$. The most frequently employed estimator of the tail index is by far the Hill estimator44 introduced in Hill46 and defined conditionally on $k$ as \[\hat{\gamma} = \frac{1}{k} \sum_{i=1}^k \ln \frac{r^{'}_{(n - i + 1)}}{r^{'}_{(n - k)}}\] The interested reader is referred to Fedotenkov65, in which more than one hundred tail index estimators proposed in the literature are reviewed. $k$ is the number of observations $r^{‘}_{(n - k + 1)}$, …, $r^{‘}_{(n)}$ that should be considered extreme. Here, and contrary to the GPD method, the threshold index $k$ has a huge influence on the Weissman quantile estimator and thus on the EVT/Weissman-based VaR estimator. As an illustration, Figure 10 depicts the daily VaR of the Dow Jones Industrial Average index at a confidence level $\alpha = 99.9$%66 as a function of $k$, when estimated by the EVT/Weissman-based VaR estimator. Figure 10. Impact of the threshold index k between the central part and the upper tail of the Dow Jones Industrial Average daily return distribution, Weissman quantile estimator of VaR 99.9%, 1980-2010. Source: de Haan et al. On Figure 10, three distinct ranges of values for the index $k$ are visible: An initial range of values for $k \in [1, 150]$ that results in a bumpy VaR estimate, although on this specific figure the bumpyness is not that pronounced. This is the “high variance” region of the estimator. A second range of values for $k \in [150, 300]$ that results in a flat-ish VaR estimate. This is the “optimal bias-variance” region of the estimator, in which the exact value of $k$ is not important. Obtaining an estimate of $k$ belonging to that region is the ultimate goal. A third range of values for $k \geq 300$ that results in a diverging VaR estimate. This is the “high bias” region of the estimator. Hopefully, this problem can be solved by using a bias-corrected Weissman quantile estimator. For example, Figure 11 depicts the daily VaR of the Dow Jones Industrial Average index at a confidence level $\alpha = 99.9$% as a function of $k$, when estimated by the EVT/unbiased Weissman-based VaR estimator introduced in de Haan et al.51. Figure 11. Impact of the threshold index k between the central part and the upper tail of the Dow Jones Industrial Average daily return distribution, unbiased Weissman quantile estimator of VaR 99.9%, 1980-2010. Source: de Haan et al. Comparing Figure 11 to Figure 10, the improvement in stability of the quantile estimator is striking. One important limitation of this method compared to the GPD-based method is that depending on the exact financial data used (financial instrument, time period, returns frequency…), the stylized heavy tailedness nature of return distributions might be violated, with return distributions being thin-tailed or short-tailed67 instead of heavy-tailed, c.f. Longin and Solnik67 and Drees48. Indicentally, the two lower tail GPD fits depicted in Figure 7 and Figure 8 both correspond to a short-tailed distribution since $\hat{\xi} \approx -0.2304$ on Figure 7 and $\hat{\xi} \approx -0.021$ on Figure 8. Practical performances In terms of practical performances, Danielsson14 highlights that EVT-based VaR estimation delivers good probability–quantile estimates where EVT holds14, but that there are no rules that tell us when [EVT] becomes inaccurate14. Indeed, it depends on the underlying distribution of the data. In some cases, it may be accurate up to 1% or even 5%, while in other cases it is not reliable even up to 0.1%14. In addition, the accuracy of EVT also depends on the selected threshold. This is illustrated in Figure 12, which is identical to Figure 7 except that the GPD fit has been graphically extended to the right of the threshold $u \approx -1.2292$. Figure 12. Lower tail of daily percentage returns of Deutsche mark/British pound (DEM/GBP) exchange rates, GDP fit with threshold u = -1.2292 extended, 3rd January 1984 to 31st December 1991. From Figure 7 and Figure 8, both GPT fits seem usable below $u \approx -1.2292$. But while the GPD fit of Figure 8 is valid up to $u \approx -0.2683$, Figure 12 makes it clear that the GPD fit of Figure 7 is completely off above $u \approx -1.2292$. Other estimators It is out of scope of this blog post to list all non-parametric and semi-parametric portfolio VaR estimators discussed in the litterature, but I would like to finish this section by mentionning estimators based on smoothing splines. An example of such an estimator is described in Shaker-Akhtekhane and Poorabbas68, in which it is empirically demonstrated to outperform common historical, parametric, and kernel-based methods68 when applied to the S&amp;amp;P500 index at VaR confidence levels of $\alpha = 95$% and $\alpha = 99$%. Misc. points of attention for such estimators: Monotonicity constraints need to be imposed on smoothing splines in order to properly approximate the c.d.f. of the portfolio returns, as highlighted in Wood69. These are not mentioned in Shaker-Akhtekhane and Poorabbas68, but are required in practice. Similar to kernel-smoothing, a smoothing parameter needs to be selected. The same kind of problems arise, with solutions that are very close in spirit like generalized cross-validation, c.f. Wood69. Again similar to kernel-smoothing, special care must be taken when extrapolating beyond the range of observed values. In particular, the estimator described in Shaker-Akhtekhane and Poorabbas68 cannot extrapolate beyond the smallest and the highest portfolio returns due to the splines constraints associated with these two points. Parametric Value-at-Risk estimation Contrary to the non-parametric and semi-parametric approaches, parametric - also called analytical - approaches make the assumption70 that the whole portfolio return distribution can be described by a parametric distribution $F_{\theta}$ whose parameters $\theta$ need to be estimated from observations. The $(1 - \alpha)$% quantile of the portfolio return distribution is then simply obtained by inverting that parametric distribution, which leads to the definition of the parametric portfolio VaR estimator as \[\text{VaR}_{\alpha} = - F_{\theta}^{-1} (1 - \alpha)\] Parametric approaches thus replace the problem of accurately computing the quantile of an empirical distribution of observations by the problem of choosing a parametric distribution that best fits these observations. A couple of examples: If the portfolio returns are assumed to be distributed according to a Gaussian distribution, the associated portfolio VaR is called Gaussian Value-at-Risk (GVaR) and is computed through the formula71 \[\text{GVaR}_{\alpha} (X) = - \mu - \sigma z_{1 - \alpha}\] , where: The location parameter $\mu$ and the scale parameter $\sigma$ are usually72 estimated by their sample counterparts. $z_{1 - \alpha}$ is the $1 - \alpha$ quantile of the standard normal distribution. This is the assumption made in the RiskMetrics model73. If the portfolio returns are assumed to be distributed according to an heavy-tailed distribution, several distributions can be used: A Cornish-Fisher distribution, whose associated portfolio VaR is known as Modified Value-at-Risk A Gaussian mixture distribution … Here, as a side note, even though it is known that financial time series usually exhibit skewed and fat-tailed distributions60, there is no complete agreement on what distribution could fit them best44, so that finding the best distribution to use is as much art as science… Parametric VaR estimation approaches might be very successful, especially with long risk horizons (months, years…) and/or with not too extreme confidence levels (90%, 95%…). Nevertheless, they might also not be able to provide an adequate description of the whole range of data, resulting in a good fit of the body but a non accurate description of the tails41, leading to biases in VaR estimates. Implementation in Portfolio Optimizer Portfolio Optimizer supports different portfolio Value-at-Risk estimation methods, c.f. the documentation: Empirical portfolio VaR estimation Extrapolated empirical portfolio VaR estimation Kernel-smoothed empirical portfolio VaR estimation For that estimation method, the kernel bandwidth parameter $h$ is automatically computed using a proprietary variation of the improved Sheather and Jones rule described in Botev et al.74 EVT-based portfolio VaR estimation (both GPD-based and Hill estimator-based) For these estimation methods: The number of extreme observations $k$ is automatically computed using a proprietary variation of the goodness-of-fit procedure described in El-Aroui and Diebolt75. The estimated GPD parameters are unbiased through the formulas described in Giles et al.76. The estimated Hill and Weissman estimators are unbiased through the formulas described in de Haan et al.51 and corrected in Chavez-Demoulin and Guillou77. Parametric portfolio VaR estimation The supported parametric distributions are: The Gaussian distribution The Gaussian mixture distribution The Cornish-Fisher distribution Conclusion This blog post described some of the most well-known methods for univariate Value-at-Risk estimation. Thanks to these, it is possible to analyze the past behaviour of a financial portfolio, but their real interest lies in univariate Value-at-Risk forecasting, which will be the subject of the next blog post in this series. Stay tuned! Meanwhile, feel free to connect with me on LinkedIn or to follow me on Twitter. – See Gueant, O., Computing the Value at Risk of a Portfolio: Academic literature and Practionners’ response, EMMA, Working Paper. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 See Manganelli, Simone and Engle, Robert F., Value at Risk Models in Finance (August 2001). ECB Working Paper No. 75. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 Basel II requires78 to calculate market risk capital requirements using VaR at a 99% confidence level over a 10-day horizon. &amp;#8617; Basel III requires79 internal backtesting procedures based on VaR and Stress VaR (VaR applied to a market stress period). &amp;#8617; The SEC Rule 18f-4 requires companies to calculate daily VaR at a 99% confidence level over a 20-day horizon and using at least 3 years of historical data; it also requires companies to backtest their VaR models daily over a 1-day horizon. &amp;#8617; See Stoyan V. Stoyanov, Svetlozar T. Rachev, Frank J. Fabozzi, Sensitivity of portfolio VaR and CVaR to portfolio return characteristics, Working paper. &amp;#8617; C.f. Dowd24, the VaR is the negative of the relevant P/L observation because P/L is positive for profitable outcomes and negative for losses, and the VaR is the maximum likely loss (rather than profit) at the specified probability24, so that VaR is a positive percentage; to be noted, though, that VaR can be negative when no loss is incurred within the confidence level, in which case it is meaningless; c.f. Daníelsson14. &amp;#8617; This is the case when the portfolio return cumulative distribution function is strictly increasing and continuous; otherwise, a similar formula is still valid, with $F_X^{-1}$ the generalized inverse distribution function of $X$, but these subtleties - important in mathematical proofs and in numerical implementations - are out of scope of this blog post. &amp;#8617; See Jorion, P. (2007). Value at risk: The new benchmark for managing financial risk. New York, NY: McGraw-Hill. &amp;#8617; &amp;#8617;2 &amp;#8617;3 See Keith Kuester, Stefan Mittnik, Marc S. Paolella, Value-at-Risk Prediction: A Comparison of Alternative Strategies, Journal of Financial Econometrics, Volume 4, Issue 1, Winter 2006, Pages 53–89. &amp;#8617; Also called top-down VaR models13 or portfolio aggregation-based models. &amp;#8617; See Zangari, Peter, 1997, Streamlining the market risk measurement process, RiskMetrics Monitor, 1, 29–35. &amp;#8617; &amp;#8617;2 &amp;#8617;3 See Ballotta, L. ORCID: 0000-0002-2059-6281 and Fusai, G. ORCID: 0000-0001-9215-2586 (2017). A Gentle Introduction to Value at Risk. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 See Jon Danielsson, Financial Risk Forecasting: The Theory and Practice of Forecasting Market Risk, with Implementation in R and Matlab, Wiley 2011. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 &amp;#8617;10 &amp;#8617;11 Danielsson14 comes to my mind. &amp;#8617; See J. S. Butler &amp;amp; Barry Schachter, 1996. Improving Value-At-Risk Estimates By Combining Kernel Estimation With Historical Simulation, Finance 9605001, University Library of Munich, Germany. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 That report was required by 4.15pm and originally became known as the 4.15 report. &amp;#8617; See Glyn A. Holton, (2002), History of Value-at-Risk: 1922-1998, Method and Hist of Econ Thought, University Library of Munich, Germany. &amp;#8617; &amp;#8617;2 The two horizons might be different, but this is out of scope of this blog post. &amp;#8617; See Danielsson, Jon, and Casper G. De Vries. Value-at-Risk and Extreme Returns. Annales d’Économie et de Statistique, no. 60, 2000, pp. 239–70. &amp;#8617; &amp;#8617;2 &amp;#8617;3 See Lall, U., Y. Moon, and K. Bosworth (1993), Kernel flood frequency estimators: Bandwidth selection and kernel choice, Water Resour. Res.,29(4), 1003–1015. &amp;#8617; See RiskMetrics. Technical Document, J.P.Morgan/Reuters, New York, 1996. Fourth Edition. &amp;#8617; See Song Xi Chen, Cheng Yong Tang, Nonparametric Inference of Value-at-Risk for Dependent Financial Returns, Journal of Financial Econometrics, Volume 3, Issue 2, Spring 2005, Pages 227–255. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 &amp;#8617;10 See Kevin Dowd, Estimating VaR with Order Statistics, The Journal of Derivatives Spring 2001, 8 (3) 23-30 &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 To be noted that Dowd24 proposes to take the sixth observation as [the] 5% VaR because we want 5% of the probability mass to lie to the left of [the] VaR24, but other auhors propose to use the fifth observation instead1329. &amp;#8617; Many other choices - at least 9 - are possible though, c.f. Hyndman and Fan80; ultimately, what is needed is an estimator of the $(1 - \alpha)$% quantile of the empirical portfolio return distribution. &amp;#8617; &amp;#8617;2 For $ 1 - \alpha \in ]\frac{1}{n+1} , \frac{n}{n+1}[$, that is, no extrapolation is possible. &amp;#8617; &amp;#8617;2 &amp;#8617;3 Such a linearly interpolated quantile estimator has been known since at least Parzen81. &amp;#8617; See Peter Hall and Andrew Rieck. (2001). Improving Coverage Accuracy of Nonparametric Prediction Intervals. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 63(4), 717–725. &amp;#8617; &amp;#8617;2 See Hutson, A.D. A Semi-Parametric Quantile Function Estimator for Use in Bootstrap Estimation Procedures. Statistics and Computing 12, 331–338 (2002). &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 To be noted that weak dependence is a kind of misnomer, because this type of dependence actually covers a very broad range of time series models! &amp;#8617; See Gourieroux, C., Scaillet, O. and Laurent, J.P. (2000). Sensitivity analysis of Values at Risk. Journal of Empirical Finance, 7, 225-245. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 Asymmetric kernel functions also exist, c.f. for example Abadir and Lawford82. &amp;#8617; In a mean squared-error sense. &amp;#8617; See Alexandre B. Tsybakov. 2008. Introduction to Nonparametric Estimation (1st. ed.). Springer Publishing Company, Incorporated. &amp;#8617; &amp;#8617;2 &amp;#8617;3 The performance of a kernel estimator $K$ is defined in terms of the ratio of sample sizes necessary to obtain the same minimum asymptotic mean integrated squared error (for a given [function $f$ that is being kernel-smoothed] when using $K$ as when using [the Epanechnikov kernel]37. &amp;#8617; See Wand, M.P., &amp;amp; Jones, M.C. (1994). Kernel Smoothing (1st ed.). Chapman and Hall/CRC. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 See Keming Yu &amp;amp; Abdallah K. Ally &amp;amp; Shanchao Yang &amp;amp; David J. Hand, Kernel quantile based estimation of expected shortfall, Journal of Risk, Journal of Risk. &amp;#8617; &amp;#8617;2 See Cheng, M.-Y. and S. Sun (2006). Bandwidth selection for kernel quantile estimation. Journal of the Chinese Statistical Association 44 (3), 271–295. &amp;#8617; &amp;#8617;2 Yet, Chen and Yong Tang23 notes that the reduction in RMSE is not very large for large samples23, confirming the theoretical results that the reduction is of second order only23. &amp;#8617; See Banfi, F., Cazzaniga, G. &amp;amp; De Michele, C. Nonparametric extrapolation of extreme quantiles: a comparison study. Stoch Environ Res Risk Assess 36, 1579–1596 (2022). &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 Potential solutions to this problem might be to use 1) a data-driven mixture of a Gaussian and a Cauchy kernel as proposed in Banfi et al.41 or 2) a preliminary transformation of the data with a Champernowne distribution as proposed in Buch-Kromann et al.83. &amp;#8617; The family of heavy-tailed distributions encompasses all the fat-tailed distributions enocuntered in finance like the Student-t distribution and more. &amp;#8617; See Rocco, Marco, Extreme Value Theory for Finance: A Survey (February 3, 2012). Bank of Italy Occasional Paper No. 99. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 &amp;#8617;10 See Danielsson, J., de Vries, C., 1997. Tail index and quantile estimation with very high frequency data. Journal of Empirical Finance 4, 241–257. &amp;#8617; See Hill, B.M. (1975) A Simple General Approach to Inference About the Tail of a Distribution. Annals of Statistics, 3, 1163-1174. &amp;#8617; &amp;#8617;2 See B. Karima and B. Youcef, Asymptotic Normality of Hill’s Estimator under Weak Dependence, Statistical Methodologies. IntechOpen, Feb. 26, 2020. &amp;#8617; See Drees, Holger. Extreme Quantile Estimation for Dependent Data, with Applications to Finance. Bernoulli, vol. 9, no. 4, 2003, pp. 617–57. &amp;#8617; &amp;#8617;2 &amp;#8617;3 See Alexander J. McNeil, Rudiger Frey, Estimation of tail-related risk measures for heteroscedastic financial time series: an extreme value approach, Journal of Empirical Finance, Volume 7, Issues 3–4, 2000, Pages 271-300. &amp;#8617; &amp;#8617;2 More precisely, to the standardized residuals of an AR(1)-GARCH(1,1) model of these asset returns; the rationale is that excesses over threshold should be more justified with AR(1)-GARCH(1,1) standardized residuals than with raw asset returns. &amp;#8617; See de Haan, L., Mercadier, C. &amp;amp; Zhou, C. Adapting extreme value statistics to financial time series: dealing with bias and serial dependence. Finance Stoch 20, 321–354 (2016). &amp;#8617; &amp;#8617;2 &amp;#8617;3 See Benito S, Lopez-Martín C, Navarro MA. Assessing the importance of the choice threshold in quantifying market risk under the POT approach (EVT). Risk Manag. 2023;25(1). &amp;#8617; See Benito, S., Lopez-Martín, C. &amp;amp; Navarro, M.A. Assessing the importance of the choice threshold in quantifying market risk under the POT approach (EVT). Risk Manag 25, 6 (2023). &amp;#8617; &amp;#8617;2 &amp;#8617;3 See Bollerslev, T., and Ghysels, E. (1996). Periodic Autoregressive Conditional Heteroskedasticity. Journal of Business &amp;amp; Economic Statistics, 14, 139–151. &amp;#8617; See William H. DuMouchel. “Estimating the Stable Index α in Order to Measure Tail Thickness: A Critique.” Ann. Statist. 11 (4) 1019 - 1031, December, 1983. &amp;#8617; See J. R. M. Hosking and J. R. Wallis, Parameter and Quantile Estimation for the Generalized Pareto Distribution, Technometrics, Vol. 29, No. 3 (Aug., 1987), pp. 339-349. &amp;#8617; See Alberto Luceno, Fitting the generalized Pareto distribution to data using maximum goodness-of-fit estimators, Computational Statistics &amp;amp; Data Analysis, Volume 51, Issue 2, 2006, Pages 904-917. &amp;#8617; In EVT terms, the true portfolio return distribution is assumed to be in the Fréchet domain of attraction. &amp;#8617; See Danielsson J., de Vries C. G. (1997). Beyond the Sample: Extreme Quantile and Probability Estimation, Mimeo, Tinbergen Institute Rotterdam. &amp;#8617; See R. Cont (2001) Empirical properties of asset returns: stylized facts and statistical issues, Quantitative Finance, 1:2, 223-236. &amp;#8617; &amp;#8617;2 The cumulative distribution function of the portfolio returns is in the maximum domain of attraction of a Fréchet-type extreme value distribution if and only if $1 - F$ has that form, c.f. Rocco44. &amp;#8617; The tail index is also known as the extreme value index. &amp;#8617; See Weissman, I.: Estimation of parameters and large quantiles based on the k largest observations. J. Am. Stat. Assoc. 73, 812–815 (1978). &amp;#8617; The formula for the EVT/Weissman-based portfolio VaR estimator is the one in Nieto and Ruiz84 and is slightly different from the one in Danielsson and de Vries59, c.f. Danielsson14 in which a general threshold $u$ is used instead of $r^’_{(\hat{k})}$ or $r^’_{(\hat{k}+1)}$. &amp;#8617; See Fedotenkov, I. (2020). A Review of More than One Hundred Pareto-Tail Index Estimators. Statistica, 80(3), 245–299. &amp;#8617; As required under Basel III framework85. &amp;#8617; See Longin, F.M., and B. Solnik (1997). Correlation structure of international equity markets during extremely volatile periods. Working Paper 97-039, ESSEC, Cergy-Pontoise, France. &amp;#8617; &amp;#8617;2 See Saeed Shaker-Akhtekhane &amp;amp; Solmaz Poorabbas, 2023. Value-at-Risk Estimation Using an Interpolated Distribution of Financial Returns Series, Journal of Applied Finance &amp;amp; Banking, SCIENPRESS Ltd, vol. 13(1), pages 1-6. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 See Wood, S. N., Monotonic Smoothing Splines Fitted by Cross Validation, 1994, SIAM Journal on Scientific Computing, 1126-1133, 15, 5. &amp;#8617; &amp;#8617;2 In addition, with multivariate VaR models, parametric methods also needs to us[e] approximations of the pricing formulas of each [non simple] asset in the portfolio1, leading to methods like Delta-Normal or Delta-Gamma-(Theta-)Normal based on Taylor expansions of the assets pricing formulas, c.f. Gueant1. &amp;#8617; See Boudt, Kris and Peterson, Brian G. and Croux, Christophe, Estimation and Decomposition of Downside Risk for Portfolios with Non-Normal Returns (October 31, 2007). Journal of Risk, Vol. 11, No. 2, pp. 79-103, 2008. &amp;#8617; See Martin, R. Douglas and Arora, Rohit, Inefficiency of Modified VaR and ES. &amp;#8617; Gueant1 notes that the RiskMetrics model for the distribution of the evolution of the risk factors is based on the assumption that log-returns of prices (or variations in the case of interest rates) are independent across time and normally distributed, when appropriately scaled by an appropriate measure of volatility1. &amp;#8617; See Z. I. Botev. J. F. Grotowski. D. P. Kroese. “Kernel density estimation via diffusion.” Ann. Statist. 38 (5) 2916 - 2957, October 2010. &amp;#8617; See Mhamed-Ali El-Aroui, Jean Diebolt, On the use of the peaks over thresholds method for estimating out-of-sample quantiles, Computational Statistics &amp;amp; Data Analysis, Volume 39, Issue 4, 2002, Pages 453-475. &amp;#8617; See David E. Giles, Hui Feng &amp;amp; Ryan T. Godwin (2016) Bias-corrected maximum likelihood estimation of the parameters of the generalized Pareto distribution, Communications in Statistics - Theory and Methods, 45:8, 2465-2483. &amp;#8617; See Valerie Chavez-Demoulin, Armelle Guillou, Extreme quantile estimation for beta-mixing time series and applications, Insurance: Mathematics and Economics, Volume 83, 2018, Pages 59-74. &amp;#8617; See Basel Committee on Banking Supervision. Revisions to the Basel II market risk framework (Updated as of 31 December 2010). 2011. &amp;#8617; See Basel Committee on Banking Supervision. Fundamental review of the trading book.. &amp;#8617; See Hyndman, R. J., &amp;amp; Fan, Y. (1996). Sample Quantiles in Statistical Packages. The American Statistician, 50(4), 361–365. &amp;#8617; See Parzen E (1979), Nonparametric statistical data modeling, J Am Stat Assoc 74(365):105–121. &amp;#8617; See Karim M Abadir, Steve Lawford, Optimal asymmetric kernels, Economics Letters, Volume 83, Issue 1, 2004, Pages 61-68. &amp;#8617; See Buch-Kromann, Tine and Nielsen, Jens Perch and Guillen, Montserrat and Bolancé, Catalina, Kernel Density Estimation for Heavy-Tailed Distributions Using the Champernowne Transformation (January 2005). &amp;#8617; See Maria Rosa Nieto, Esther Ruiz, Frontiers in VaR forecasting and backtesting, International Journal of Forecasting, Volume 32, Issue 2, 2016, Pages 475-501. &amp;#8617; Basel III requires79 that the VaR measures for risks on both trading and banking books must be calculated at a 99.9% confidence level. &amp;#8617;</summary></entry><entry><title type="html">Supervised Portfolios: A Supervised Machine Learning Approach to Portfolio Optimization</title><link href="https://portfoliooptimizer.io/blog/supervised-portfolios-a-supervised-machine-learning-approach-to-portfolio-optimization/" rel="alternate" type="text/html" title="Supervised Portfolios: A Supervised Machine Learning Approach to Portfolio Optimization" /><published>2025-06-07T00:00:00-05:00</published><updated>2025-06-07T00:00:00-05:00</updated><id>https://portfoliooptimizer.io/blog/supervised-portfolios-a-supervised-machine-learning-approach-to-portfolio-optimization</id><content type="html" xml:base="https://portfoliooptimizer.io/blog/supervised-portfolios-a-supervised-machine-learning-approach-to-portfolio-optimization/">&lt;p&gt;Standard portfolio allocation algorithms like &lt;a href=&quot;https://en.wikipedia.org/wiki/Modern_portfolio_theory&quot;&gt;Markowitz mean-variance optimization&lt;/a&gt; or &lt;a href=&quot;/blog/the-diversification-ratio-measuring-portfolio-diversification/&quot;&gt;Choueffati diversification ratio optimization&lt;/a&gt; usually 
take in input asset information (expected returns, estimated covariance matrix…) as well investor constraints and preferences (maximum asset weights, risk aversion…) to produce in output portfolio weights satisfying a selected mathematical objective like the maximization of 
the portfolio &lt;a href=&quot;https://en.wikipedia.org/wiki/Sharpe_ratio&quot;&gt;Sharpe ratio&lt;/a&gt; or &lt;a href=&quot;/blog/the-diversification-ratio-measuring-portfolio-diversification/&quot;&gt;Diversification ratio&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Chevalier et al.&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; introduces a non-standard portfolio allocation framework - represented in Figure 1 - under which the same input is first used to “learn” in-sample optimized portfolio weights in &lt;a href=&quot;https://en.wikipedia.org/wiki/Supervised_learning&quot;&gt;a supervised training phase&lt;/a&gt; 
and then used to produce out-of-sample optimized portfolio weights in an inference phase.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/supervised-portfolios-comparative-diagram-chevalier.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/supervised-portfolios-comparative-diagram-chevalier-small.png&quot; alt=&quot;Standard v.s. supervised portfolio allocation framework. Source: Adapted from Chevalier et al.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 1. Standard v.s. supervised portfolio allocation framework. Source: Adapted from Chevalier et al.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;In this blog post, I will provide some details about that framework when used with the &lt;a href=&quot;https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm&quot;&gt;$k$-nearest neighbors&lt;/a&gt; supervised machine learning algorithm, which is an idea originally proposed in Varadi and Teed&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;As an example of usage, I will compare the performances of a $k$-nearest neighbors supervised portfolio with those of a “direct” mean-variance portfolio in the context of a monthly tactical asset allocation strategy for a 2-asset class portfolio made of U.S. equities and U.S. Treasury bonds.&lt;/p&gt;

&lt;h2 id=&quot;mathematical-preliminaries&quot;&gt;Mathematical preliminaries&lt;/h2&gt;

&lt;h3 id=&quot;supervised-machine-learning-algorithms&quot;&gt;Supervised machine learning algorithms&lt;/h3&gt;

&lt;p&gt;Let $\left( X_1, Y_1 \right)$, …, $\left( X_n, Y_n \right)$ be $n$ pairs of data points in&lt;sup id=&quot;fnref:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; $\mathbb{R}^m \times \mathbb{R}$, $m \geq 1$&lt;sup id=&quot;fnref:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;, where:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Each data point $X_1, …, X_n$ represents an object - like the pixels of an image - and is called a &lt;a href=&quot;https://en.wikipedia.org/wiki/Feature_vector&quot;&gt;&lt;em&gt;feature vector&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Each data point $Y_1,…,Y_n$ represents a characteristic of its associated object - like what kind of animal is depicted in an image (discrete characteristic) or the angle of the rotation between a rotated image and its original version (continuous characteristic) - and is called a &lt;em&gt;label&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Given a feature vector $x \in \mathbb{R}^m$, the aim of a supervised machine learning algorithm is then to estimate the “most appropriate” label associated to $x$ - $\hat{y} \in  \mathbb{R}$ - thanks to 
the information contained in the training dataset $\left( X_1, Y_1 \right)$, …, $\left( X_n, Y_n \right)$.&lt;/p&gt;

&lt;h3 id=&quot;k-nearest-neighbors-regression-algorithm&quot;&gt;$k$-nearest neighbors regression algorithm&lt;/h3&gt;

&lt;p&gt;Let $d$ be a distance metric&lt;sup id=&quot;fnref:43&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:43&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; on $\mathbb{R}^m$, like the standard &lt;a href=&quot;https://en.wikipedia.org/wiki/Euclidean_distance&quot;&gt;Euclidean distance&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The $k$-nearest neighbor ($k$-NN) regression algorithm is &lt;em&gt;an early&lt;sup id=&quot;fnref:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:9&quot; class=&quot;footnote&quot;&gt;7&lt;/a&gt;&lt;/sup&gt; [supervised] machine learning algorithm&lt;/em&gt;&lt;sup id=&quot;fnref:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt; that uses the “neighborhood” of a feature vector in order to estimate its label.&lt;/p&gt;

&lt;p&gt;In more details, let $\left( X_{(i)}(x), Y_{(i)}(x) \right)$, $i=1..n$ denotes the $i$-th training data point closest to $x$ among all the training data points $\left( X_1, Y_1 \right)$, …, $\left( X_n, Y_n \right)$ such that 
the distance of each training data point to $x$ satisfies $d \left(x, X_{(1)}(x) \right)$ $\leq … \leq$ $d \left(x, X_{(n)}(x) \right)$.&lt;/p&gt;

&lt;p&gt;By definition, the $k$-NN estimate for the label associated to $x$ is then&lt;sup id=&quot;fnref:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:15&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt; the uniformly or non-uniformly weighted average label of the $k \in \{ 1,…,n \}$ nearest neighbors $Y_{(1)}(x)$,…,$Y_{(j)}(x)$&lt;/p&gt;

\[\hat{y} = \frac{1}{k} \sum_{i=1}^k Y_{(i)}(x)\]

&lt;p&gt;or&lt;/p&gt;

\[\hat{y} = \sum_{i=1}^k w_i Y_{(i)}(x)\]

&lt;p&gt;, where $w_i \geq 0$ is the weight associated to the $i$-th nearest neighbor $Y_{(i)}(x)$ and all the weights $w_i$, $i=1..k$ sum to one, that is, $\sum_{j=1}^k w_k$.&lt;/p&gt;

&lt;p&gt;For illustration purposes, the process of selecting the 2 nearest neighbors $X_{(1)}(x)$ and $X_{(2)}(x)$ of a data point $x$ in $\mathbb{R}^2$ is outlined in Figure 2.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/supervised-portfolios-knn-regression-example.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/supervised-portfolios-knn-regression-example-small.png&quot; alt=&quot;Example of $k$-NN nearest neighbors selection process in m = 2 dimensions, with n = 3 training data points and k = 2 nearest neighbors.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 2. Example of $k$-NN nearest neighbors selection process in m = 2 dimensions, with n = 3 training data points and k = 2 nearest neighbors.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;&lt;em&gt;Notes:&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
  &lt;ul&gt;
    &lt;li&gt;It additionally exists the $k$-NN classification algorithm, which is a variant of the $k$-NN regression algorithm where the label space is not $\mathbb{R}$ but a finite subset of $\mathbb{N}$.&lt;/li&gt;
  &lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h4 id=&quot;theoretical-guarantees&quot;&gt;Theoretical guarantees&lt;/h4&gt;

&lt;p&gt;Since the seminal paper of Cover and Hart&lt;sup id=&quot;fnref:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:11&quot; class=&quot;footnote&quot;&gt;10&lt;/a&gt;&lt;/sup&gt; - proving under mild conditions that the $k$-NN classification algorithm &lt;em&gt;achieves an error rate that is at most twice the best error rate achievable&lt;/em&gt;&lt;sup id=&quot;fnref:7:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt; -, 
several convergence results have been established for $k$-NN methods.&lt;/p&gt;

&lt;p&gt;For example, under an asymptotic regime where the number of training data points $n$ and the number of nearest neighbors $k$ both go to infinity, it has been demonstrated&lt;sup id=&quot;fnref:15:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:15&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt; that the $k$-NN regression algorithm is able to learn any functional 
relationship of the form $Y_i = f \left( X_i \right) + \epsilon_i$, $i=1..n$, where $f$ is an unknown function and $\epsilon_i$ represents additive noise.&lt;/p&gt;

&lt;p&gt;As another example, this time under a finite&lt;sup id=&quot;fnref:17&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:17&quot; class=&quot;footnote&quot;&gt;11&lt;/a&gt;&lt;/sup&gt; sample regime, Jiang&lt;sup id=&quot;fnref:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt; derives &lt;em&gt;the first sup-norm finite-sample [“convergence”] result&lt;/em&gt;&lt;sup id=&quot;fnref:6:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt; for the $k$-NN regression algorithm and shows that it achieves a maximum error rate that is equal to the best maximum error rate achievable &lt;em&gt;up to logarithmic factors&lt;/em&gt;&lt;sup id=&quot;fnref:6:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt;, with high probability.&lt;/p&gt;

&lt;p&gt;In addition to these convergence results, $k$-NN methods also exhibit interesting properties w.r.t. the dimensionality of the feature space $\mathbb{R}^m$.&lt;/p&gt;

&lt;p&gt;For example, while &lt;em&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Curse_of_dimensionality&quot;&gt;the curse of dimensionality&lt;/a&gt; forces &lt;a href=&quot;https://en.wikipedia.org/wiki/Nonparametric_statistics&quot;&gt;non-parametric methods&lt;/a&gt; such as $k$-NN to require an exponential-in-dimension sample complexity&lt;/em&gt;&lt;sup id=&quot;fnref:6:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt;, the $k$-NN regression algorithm &lt;em&gt;actually adapts to the local 
&lt;a href=&quot;https://en.wikipedia.org/wiki/Intrinsic_dimension&quot;&gt;intrinsic dimension&lt;/a&gt; without any modifications to the procedure or data&lt;/em&gt;&lt;sup id=&quot;fnref:6:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In other words, if the feature vectors belong to $\mathbb{R}^m$ but have a “true” dimensionality equal to $\mathbb{R}^p, p &amp;lt; m$, then the $k$-NN regression algorithm &lt;em&gt;will [behave] as if it were in the lower dimensional space [of dimension $p$] and independent of the ambient dimension [$m$]&lt;/em&gt;&lt;sup id=&quot;fnref:6:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Further properties of $k$-NN methods can be found in Chen and Shah&lt;sup id=&quot;fnref:7:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt; and in Biau and Devroye&lt;sup id=&quot;fnref:15:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:15&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h4 id=&quot;practical-performances&quot;&gt;Practical performances&lt;/h4&gt;

&lt;p&gt;Like all supervised machine learning algorithms, the practical performances of the $k$-NN regression algorithm heavily depend on the problem at hand.&lt;/p&gt;

&lt;p&gt;Yet, in general, &lt;em&gt;it often yields competitive results [v.s. other more complex algorithms like &lt;a href=&quot;https://en.wikipedia.org/wiki/Neural_network_(machine_learning)&quot;&gt;neural networks&lt;/a&gt;], and in certain domains, when cleverly combined with prior knowledge, it has significantly advanced the state-of-the-art&lt;/em&gt;&lt;sup id=&quot;fnref:18&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:18&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Beyond these competitive performances, Chen and Shah&lt;sup id=&quot;fnref:7:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt; also highlights other important practical aspects of $k$-NN methods that contributed to &lt;em&gt;their empirical success over the years&lt;/em&gt;&lt;sup id=&quot;fnref:7:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Their flexibility in choosing a problem-specific definition of “near” through a custom distance metric&lt;sup id=&quot;fnref:19&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:19&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
  &lt;li&gt;Their computational efficiency, &lt;em&gt;which has enabled these methods to scale to massive datasets (“big data”)&lt;/em&gt;&lt;sup id=&quot;fnref:7:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt; thanks to approaches like approximate nearest neighbor search&lt;sup id=&quot;fnref:25&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:25&quot; class=&quot;footnote&quot;&gt;15&lt;/a&gt;&lt;/sup&gt; or random projections&lt;sup id=&quot;fnref:26&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:26&quot; class=&quot;footnote&quot;&gt;16&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
  &lt;li&gt;Their non-parametric nature, in that &lt;em&gt;they make very few assumptions on the underlying model for the data&lt;/em&gt;&lt;sup id=&quot;fnref:7:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
  &lt;li&gt;Their ease of interpretability, since &lt;em&gt;they provide evidence for their predictions by exhibiting the nearest neighbors found&lt;/em&gt;&lt;sup id=&quot;fnref:7:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;k-nn-based-supervised-portfolios&quot;&gt;$k$-NN based supervised portfolios&lt;/h2&gt;

&lt;h3 id=&quot;supervised-portfolios&quot;&gt;Supervised portfolios&lt;/h3&gt;

&lt;p&gt;Chevalier et al.&lt;sup id=&quot;fnref:1:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; describes &lt;em&gt;an asset allocation strategy that engineers optimal weights before feeding them to a supervised learning algorithm&lt;/em&gt;&lt;sup id=&quot;fnref:1:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, represented in the lower part of Figure 1.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Given a training dataset of past [financial] observations&lt;/em&gt;&lt;sup id=&quot;fnref:1:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; like past asset returns, past macroeconomic indicators, etc., it proceeds as follows:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;For any relevant date&lt;sup id=&quot;fnref:23&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;17&lt;/a&gt;&lt;/sup&gt; $t=t_1,…$ in the training dataset
    &lt;ul&gt;
      &lt;li&gt;
        &lt;p&gt;Compute optimal (in-sample) future portfolio weights $w_{t+1}$ over a (also in-sample) desired future horizon&lt;sup id=&quot;fnref:30&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:30&quot; class=&quot;footnote&quot;&gt;18&lt;/a&gt;&lt;/sup&gt;, using a selected portfolio optimization algorithm with financial observations up to the time $t+1$&lt;/p&gt;

        &lt;p&gt;&lt;strong&gt;These optimal future portfolio weights are the labels $Y_t$, $t=t_1,…$, of the training data points.&lt;/strong&gt;&lt;/p&gt;

        &lt;p&gt;To be noted that &lt;em&gt;by lagging the data, we can use the in-sample future realized returns to compute all the [returns-based] estimates&lt;/em&gt;&lt;sup id=&quot;fnref:1:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; required by the portfolio optimization algorithm like the expected asset returns, the asset covariance matrix, etc. This &lt;em&gt;allows to be forward-looking in the training sample, while at the same time avoiding any look-ahead bias&lt;/em&gt;&lt;sup id=&quot;fnref:1:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

        &lt;p&gt;During this step, &lt;em&gt;constraints can of course be added in order to satisfy targets and policies&lt;/em&gt;&lt;sup id=&quot;fnref:1:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;Compute a chosen set of predictors supposed to be linked to the in-sample future portfolio weights $w_{t+1}$, using financial observations up to the time $t$&lt;/p&gt;

        &lt;p&gt;&lt;strong&gt;These predictors are the feature vectors $X_t$, $t=t_1,…$, of the training data points.&lt;/strong&gt;&lt;/p&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Train and tune a supervised machine learning algorithm using the training data points $\left( X_t, Y_t \right)$, $t=t_1,…$.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once the training phase is completed, the supervised portfolio allocation algorithm is ready to be used with test data&lt;sup id=&quot;fnref:27&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:27&quot; class=&quot;footnote&quot;&gt;19&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;For any relevant (out-of-sample) test date $t’=t’_1,…$
    &lt;ul&gt;
      &lt;li&gt;
        &lt;p&gt;Compute the set of predictors chosen during the training phase, using financial observations up to the time $t’$&lt;/p&gt;

        &lt;p&gt;&lt;strong&gt;These predictors are the test feature vectors $x_{t’}$, $t’=t’_1,…$.&lt;/strong&gt;&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;Provide that set of predictors as an input test feature vector to the supervised machine learning algorithm to receive in output the estimated optimal portfolio weights $\hat{w}_{t’+1}$ over the (out-of-sample) future horizon&lt;/p&gt;

        &lt;p&gt;&lt;strong&gt;These estimated optimal portfolio weights are the estimated labels $\hat{y}_{t’}$, $t’=t’_1,…$.&lt;/strong&gt;&lt;/p&gt;

        &lt;p&gt;Here, depending on the exact supervised machine learning algorithm, the estimated portfolio weights $\hat{w}_{t’+1}$ might not satisfy the portfolio constraints&lt;sup id=&quot;fnref:29&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:29&quot; class=&quot;footnote&quot;&gt;20&lt;/a&gt;&lt;/sup&gt; imposed in the training phase, in which case a post-processing phase would be required.&lt;/p&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The portfolio allocation framework of Chevalier et al.&lt;sup id=&quot;fnref:1:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; described above &lt;em&gt;allows the algorithm to learn from past time series of in-sample optimal weights and to infer the best weights from variables such as past performance, risk, and proxies of the macro-economic outlook&lt;/em&gt;&lt;sup id=&quot;fnref:1:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;This contrasts with the standard practice of directly forecasting the input of a portfolio optimization algorithm, making that framework rather original.&lt;/p&gt;

&lt;p&gt;In terms of empirical performances, Chevalier et al.&lt;sup id=&quot;fnref:1:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; finds that &lt;em&gt;predicting the optimal weights directly instead of the traditional two step approach leads to more stable portfolios with statistically better risk-adjusted performance measures&lt;/em&gt;&lt;sup id=&quot;fnref:1:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; 
when using mean-variance optimization as the selected portfolio optimization algorithm and &lt;a href=&quot;https://en.wikipedia.org/wiki/Gradient_boosting#Gradient_tree_boosting&quot;&gt;gradient boosting decision trees&lt;/a&gt; as the selected supervised machine learning algorithm&lt;sup id=&quot;fnref:28&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:28&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Some of these risk-adjusted performance measures are displayed in Figure 3 in the case of 4 asset classes&lt;sup id=&quot;fnref:48&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:48&quot; class=&quot;footnote&quot;&gt;22&lt;/a&gt;&lt;/sup&gt;, for the 3 horizons of predicted returns and the 3 risk aversion levels used in Chevalier et al.&lt;sup id=&quot;fnref:1:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/supervised-portfolios-chevalier-results-summary.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/supervised-portfolios-chevalier-results-summary-small.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 3. Performances of supervised portfolios v.s. direct mean-variance optimized portfolios, 4 asset classes. Source: Adapted from Chevalier et al.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;&lt;em&gt;Notes:&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
  &lt;ul&gt;
    &lt;li&gt;Additional information can be found in the follow-up paper Chevalier et al.&lt;sup id=&quot;fnref:31&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:31&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt; and in &lt;a href=&quot;https://www.youtube.com/watch?v=mGPR2s1-N4k&quot;&gt;a video of Thomas Raffinot for QuantMinds International&lt;/a&gt;.&lt;/li&gt;
  &lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h3 id=&quot;k-nn-based-supervised-portfolios-1&quot;&gt;$k$-NN-based supervised portfolios&lt;/h3&gt;

&lt;p&gt;Theoretically, the supervised machine learning model used in the portfolio allocation framework of Chevalier et al.&lt;sup id=&quot;fnref:1:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; is trained to learn the following model&lt;sup id=&quot;fnref:1:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

\[w_{t+1} = g_t \left(X_t \right) + \epsilon_{t+1}\]

&lt;p&gt;, where:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$X_t $ is the feature vector made of the chosen set of predictors computed at time $t$&lt;/li&gt;
  &lt;li&gt;$w_{t+1}$ is the vector of optimal portfolio weights over the desired future horizon $t+1$&lt;/li&gt;
  &lt;li&gt;$g$ is an unknown function&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because such a model describes a functional relationship compatible with a $k$-NN regression algorithm, it is reasonable to think about using that algorithm as the supervised machine learning algorithm in the above framework.&lt;/p&gt;

&lt;p&gt;Enter $k$-NN-based supervised portfolios, a portfolio allocation framework originally introduced in Varadi and Teed&lt;sup id=&quot;fnref:2:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; as follows:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;This naturally leads us down the path of creating algorithms that can learn from past data and evolve over time to change the method for creating portfolio allocations.&lt;/p&gt;

  &lt;p&gt;The simplest and most intuitive machine-learning algorithm is the K-Nearest Neighbor method ($k$-NN) […, which] is a form of “case-based” reasoning. That is, it learns from examples that are similar to current situation by looking at the past [and says: “what happened historically when I saw patterns that are close to the current pattern?”].&lt;/p&gt;

  &lt;p&gt;It shares a lot in common with how human beings make decisions. When portfolio managers talk about having 20 years of experience, they are really saying that they have a large inventory of past “case studies” in memory to make superior decisions about the current environment.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As a side note, Varadi and Teed&lt;sup id=&quot;fnref:2:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; is not the first paper to apply a $k$-NN regression algorithm to the problem of portfolio allocation, c.f. for example Gyorfi and al.&lt;sup id=&quot;fnref:20&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:20&quot; class=&quot;footnote&quot;&gt;24&lt;/a&gt;&lt;/sup&gt; in the setting of &lt;a href=&quot;https://en.wikipedia.org/wiki/Online_portfolio_selection&quot;&gt;online portfolio selection&lt;/a&gt;, 
but Varadi and Teed&lt;sup id=&quot;fnref:2:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; is - to my knowledge - the first paper about the same “kind” of supervised portfolios as in Chevalier et al.&lt;sup id=&quot;fnref:1:14&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;A couple of practical advantages of $k$-NN-based supervised portfolios v.s. for example “gradient boosting decision trees”-based supervised portfolios as used in Chevalier et al.&lt;sup id=&quot;fnref:1:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; are:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;The simplicity of the training&lt;/p&gt;

    &lt;p&gt;Since nearest neighbor methods are &lt;a href=&quot;https://en.wikipedia.org/wiki/Lazy_learning&quot;&gt;lazy learners&lt;/a&gt;, there is strictly speaking no real training phase.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The simplicity of the tuning&lt;/p&gt;

    &lt;p&gt;There can be no tuning at all if no “advanced” technique (automated features selection, distance learning…) is used.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The guarantee that (convex) portfolio constraints learned during the training phase are satisfied during the test phase&lt;/p&gt;

    &lt;p&gt;In $k$-NN regression&lt;sup id=&quot;fnref:33&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:33&quot; class=&quot;footnote&quot;&gt;25&lt;/a&gt;&lt;/sup&gt;, the estimate for the label associated to a test point is a &lt;a href=&quot;https://en.wikipedia.org/wiki/Convex_combination&quot;&gt;convex combination&lt;/a&gt; of that point nearest neighbors.&lt;/p&gt;

    &lt;p&gt;As a consequence, the estimated portfolio weights $\hat{w}_{t’+1}$ are guaranteed&lt;sup id=&quot;fnref:33:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:33&quot; class=&quot;footnote&quot;&gt;25&lt;/a&gt;&lt;/sup&gt; to satisfy any learned convex portfolio constraints, thereby avoiding any post-processing that could degrade the “quality” of the estimated weights.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The ease of interpretability&lt;/p&gt;

    &lt;p&gt;Due to &lt;a href=&quot;https://en.wikipedia.org/wiki/Algorithm_aversion&quot;&gt;algorithm aversion&lt;/a&gt;, Chevalier et al.&lt;sup id=&quot;fnref:31:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:31&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt; highlights the need to be able to &lt;em&gt;transform a black box nonlinear predictive algorithm [like gradient boosting decision trees] into a simple combination of rules&lt;/em&gt;&lt;sup id=&quot;fnref:31:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:31&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt; in order to make it interpretable for humans.&lt;/p&gt;

    &lt;p&gt;With a $k$-NN regression algorithm, which is one of the most transparent supervised machine learning algorithm in existence, that step is probably not useful&lt;sup id=&quot;fnref:32&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:32&quot; class=&quot;footnote&quot;&gt;26&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In terms of empirical performances, Varadi and Teed&lt;sup id=&quot;fnref:2:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; concludes that $k$-NN-based supervised portfolios &lt;em&gt;consistently outperformed [vanially maximum Sharpe ratio portfolios] on both heterogeneous and homogenous data sets on a risk-adjusted basis&lt;/em&gt;&lt;sup id=&quot;fnref:2:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, 
with the $k$-NN-based &lt;em&gt;approach [exhibiting] a Sharpe ratio [… up to] over 30% higher than [the direct maximum Sharpe ratio approach]&lt;/em&gt;&lt;sup id=&quot;fnref:2:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Average performance measures for the $k$-NN-based supervised portfolios in Varadi and Teed&lt;sup id=&quot;fnref:2:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; are reported in Figure 4.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/supervised-portfolios-varadi-results-summary.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/supervised-portfolios-varadi-results-summary-small.png&quot; alt=&quot;Performances of supervised portfolios v.s. direct mean-variance optimized portfolios. Source: Adapted from Varadi and Teed.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 4. Performances of $k$-NN-based supervised portfolios v.s. direct mean-variance optimized portfolios. Source: Adapted from Varadi and Teed.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h2 id=&quot;implementing-k-nn-based-supervised-portfolios&quot;&gt;Implementing $k$-NN-based supervised portfolios&lt;/h2&gt;

&lt;h3 id=&quot;features-selection&quot;&gt;Features selection&lt;/h3&gt;

&lt;p&gt;Biau and Devroye&lt;sup id=&quot;fnref:15:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:15&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt; describes &lt;a href=&quot;https://en.wikipedia.org/wiki/Feature_selection&quot;&gt;features selection&lt;/a&gt; as:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;[…] the process of choosing relevant components of the [feature] vector $X$ for use in model construction.&lt;/p&gt;

  &lt;p&gt;There are many potential benefits of such an operation: facilitating data visualization and data understanding, reducing the measurement and storage requirements, decreasing training and utilization times, and defying the curse of dimensionality to improve prediction performance.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;, and provides some &lt;em&gt;rules of thumb that should be followed&lt;/em&gt;&lt;sup id=&quot;fnref:15:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:15&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Noisy measurements, that is, components that are independent of $Y$, should be avoided&lt;/em&gt;&lt;sup id=&quot;fnref:15:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:15&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;, especially because nearest neighbor methods &lt;em&gt;are extremely sensitive to the features used&lt;/em&gt;&lt;sup id=&quot;fnref:38&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:38&quot; class=&quot;footnote&quot;&gt;27&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Adding a component that is a function of other components is useless&lt;/em&gt;&lt;sup id=&quot;fnref:15:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:15&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Beyond these generic rules, and although it &lt;em&gt;has been an active research area in the statistics, machine learning, and data mining communities&lt;/em&gt;&lt;sup id=&quot;fnref:1:16&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, features selection is unfortunately strongly problem-dependent.&lt;/p&gt;

&lt;p&gt;In the context of supervised portfolios, Chevalier et al.&lt;sup id=&quot;fnref:1:17&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; and Varadi and Teed&lt;sup id=&quot;fnref:2:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; both propose to use:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Past asset returns over different horizons&lt;sup id=&quot;fnref:14&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:14&quot; class=&quot;footnote&quot;&gt;28&lt;/a&gt;&lt;/sup&gt; &lt;em&gt;so as to assess momentum and reversals&lt;/em&gt;&lt;sup id=&quot;fnref:1:18&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
  &lt;li&gt;Past asset volatilities over different horizons&lt;sup id=&quot;fnref:14:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:14&quot; class=&quot;footnote&quot;&gt;28&lt;/a&gt;&lt;/sup&gt;, to &lt;em&gt;approximate asset-specific risk&lt;/em&gt;&lt;sup id=&quot;fnref:1:19&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Varadi and Teed&lt;sup id=&quot;fnref:2:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; additionally proposes to include past asset correlations over different horizons&lt;sup id=&quot;fnref:14:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:14&quot; class=&quot;footnote&quot;&gt;28&lt;/a&gt;&lt;/sup&gt; &lt;em&gt;to ensure that [the] $k$-NN algorithm [doesn’t] have access to any information that the [direct mean-variance optimization] [doesn’t] have, but merely use it differently&lt;/em&gt;&lt;sup id=&quot;fnref:2:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Chevalier et al.&lt;sup id=&quot;fnref:1:20&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, building on stocks asset pricing litterature, does not suggest to include other returns-based indicator than past asset returns and volatilities but suggests instead to include various macroeconomic indicators (yield curve, VIX…).&lt;/p&gt;

&lt;h3 id=&quot;features-scaling&quot;&gt;Features scaling&lt;/h3&gt;

&lt;p&gt;Typical distance metrics&lt;sup id=&quot;fnref:39&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:39&quot; class=&quot;footnote&quot;&gt;29&lt;/a&gt;&lt;/sup&gt; used with nearest neighbor methods like the Euclidean distance are said to be &lt;em&gt;scale variant&lt;/em&gt;, meaning that the definition of a nearest neighbor is influenced by the relative and absolute scale of the different features.&lt;/p&gt;

&lt;p&gt;For example, when using the Euclidean distance with features such as a person’s height and a person’s age:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The height feature disproportionally infuences the definition of a neighbor if the height feature is measured in millimeters and age in years&lt;/li&gt;
  &lt;li&gt;The age feature disproportionally infuences the definition of a neighbor if the height feature is measured in meters and age in days&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For this reason, features are usually scaled to a similar range before being provided in input to a $k$-NN algorithm&lt;sup id=&quot;fnref:41&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:41&quot; class=&quot;footnote&quot;&gt;30&lt;/a&gt;&lt;/sup&gt;, which is a pre-processing step called &lt;a href=&quot;https://en.wikipedia.org/wiki/Feature_scaling&quot;&gt;features scaling&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A couple of techniques for features scaling are described in Arora et al.&lt;sup id=&quot;fnref:40&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:40&quot; class=&quot;footnote&quot;&gt;31&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Min-max scaling, which scales all the values of a feature $\left( X_i \right)_j$, $j \in \{ 1,…,m \}$, $i=1..n$ to a given interval - like $[0,1]$ -, based on the minimum and the maximum values of that feature:&lt;/p&gt;

\[\left( X_i \right)_j' = \frac{\left( X_i \right)_j - \min_j \left( X_i \right)_j }{\max_j \left( X_i \right)_j  - \min_j \left( X_i \right)_j }, i=1..n\]
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Standardization, also called z-score normalization, which transforms all the values of a feature $\left( X_i \right)_j$, $j \in \{ 1,…,m \}$ , $i=1..n$ into values that are approximatively standardly normally distributed:&lt;/p&gt;

\[\left( X_i \right)_j' = \frac{ \left( X_i \right)_j - \overline{\left( X_i \right)_j}}{ \sigma_{\left( X_i \right)_j} }\]
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the context of supervised portfolios, additional techniques are described in Chevalier et al.&lt;sup id=&quot;fnref:1:21&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Quantile normal transformation for a “time series”-like feature, which &lt;em&gt;standardizes the time-series into quantile and then map the values to a normal distribution&lt;/em&gt;&lt;sup id=&quot;fnref:1:22&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

    &lt;p&gt;It is important to note that at any given date, the quantiles should be computed using information up to that date only to avoid &lt;em&gt;forward looking leakage&lt;/em&gt;&lt;sup id=&quot;fnref:1:23&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;p&gt;In addition, a lookback window over which to compute the quantiles should be chosen, with possible impacts on the performances of the supervised machine learning algorithm.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Cross sectional normalization for a regular feature, which &lt;em&gt;scales the cross sectional values between 0 and 1 using the empirical cumulative distribution function&lt;/em&gt;&lt;sup id=&quot;fnref:1:24&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

    &lt;p&gt;At any given date, this normalization can be performed fully in the cross-section at that date if there are enough assets or in the cross-section at that date using information up to that date to compute the empirical cumulative distribution function.&lt;/p&gt;

    &lt;p&gt;In the latter case, c.f. the previous point.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Hyperbolic tangent function ($\tanh$) scaling for labels, in order to &lt;em&gt;center [them] and make them more comparable by taming outliers&lt;/em&gt;&lt;sup id=&quot;fnref:1:25&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

\[Y' = 0.5   \tanh{\left( 0.01 \frac{Y − \overline{Y}}{ \sigma_Y } \right) }\]

    &lt;p&gt;Naturally, &lt;em&gt;the reverse transformation is performed after the prediction to transform back the labels into its original values&lt;/em&gt;&lt;sup id=&quot;fnref:1:26&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Finally, in the specific context of $k$-NN-based supervised portfolios, 2 additional techniques are described in Varadi and Teed&lt;sup id=&quot;fnref:2:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, that are variations of the techniques of Chevalier et al.&lt;sup id=&quot;fnref:1:27&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h3 id=&quot;distance-metric-selection&quot;&gt;Distance metric selection&lt;/h3&gt;

&lt;p&gt;As already mentioned in the previous sub-section, the distance metric used with a nearest neighbor method influences the definition of a nearest neighbor due to its scale variant or scale invariant nature.&lt;/p&gt;

&lt;p&gt;But that’s not all, because different distance metrics behave differently with regards to outliers, to noise, to the dimension of the feature space, etc. On top of that, the chosen distance metric is sometimes not a proper metric&lt;sup id=&quot;fnref:43:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:43&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;…&lt;/p&gt;

&lt;p&gt;So, what to do in the specific context of $k$-NN-based supervised portfolios?&lt;/p&gt;

&lt;p&gt;From the empirical results in Varadi and Teed&lt;sup id=&quot;fnref:2:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, the Euclidean distance seems to be a good choice as long as the chosen predictors are properly scaled.&lt;/p&gt;

&lt;p&gt;From the empirical results later in this blog post, a little known distance metric called the &lt;em&gt;Hassanat distance&lt;/em&gt;&lt;sup id=&quot;fnref:44&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:44&quot; class=&quot;footnote&quot;&gt;32&lt;/a&gt;&lt;/sup&gt; also seems to be a good choice and additionally does not require&lt;sup id=&quot;fnref:47&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:47&quot; class=&quot;footnote&quot;&gt;33&lt;/a&gt;&lt;/sup&gt; the chosen predictors to be scaled because it is scale invariant&lt;sup id=&quot;fnref:46&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:46&quot; class=&quot;footnote&quot;&gt;34&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;That distance - noted $HasD(x,y)$ - is defined between two vectors $x = \left(x_1,…,x_m\right)$ and $y = \left(y_1,…,y_m\right)$ as follows:&lt;/p&gt;

\[HasD(x,y) = \sum_{i=1}^m D(x_i,y_i)\]

&lt;p&gt;, with&lt;/p&gt;

\[D(x_i,y_i) = \begin{cases}  1 - \frac{1 + \min(x_i,y_i)}{1 + max(x_i,y_i)}, &amp;amp;\text{if } \min(x_i,y_i) \geq 0 \\ 1 - \frac{1}{1 + \max(x_i,y_i) - min(x_i,y_i) } &amp;amp;\text{if } \min(x_i,y_i) &amp;lt; 0 \end{cases}\]

&lt;p&gt;Figure 5 illustrates the 1-dimensional Hassanat distance $HasD(0,n)$ with $n \in [-10,10]$.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/supervised-portfolios-hassanat-distance.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/supervised-portfolios-hassanat-distance-small.png&quot; alt=&quot;Representation of the 1-dimensional Hassanat distance between the points 0 and n. Source: Abu Alfeilat et al.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 5. Representation of the 1-dimensional Hassanat distance between the points 0 and n. Source: Abu Alfeilat et al.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;As a side note, the Hassanat distance has been empirically demonstrated to &lt;em&gt;perform the best when applied on most data sets comparing with the other tested distances&lt;/em&gt;&lt;sup id=&quot;fnref:42&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:42&quot; class=&quot;footnote&quot;&gt;35&lt;/a&gt;&lt;/sup&gt; in Abu Alfeilat et al.&lt;sup id=&quot;fnref:42:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:42&quot; class=&quot;footnote&quot;&gt;35&lt;/a&gt;&lt;/sup&gt;, which compares the performances of 54 distance metrics used in $k$-NN classification.&lt;/p&gt;

&lt;h3 id=&quot;how-to-select-the-number-of-nearest-neighbors&quot;&gt;How to select the number of nearest neighbors?&lt;/h3&gt;

&lt;p&gt;Together with the distance metric $d$, the number of nearest neighbors $k$ is the other hyperparameter that has to be selected in nearest neighbor methods.&lt;/p&gt;

&lt;p&gt;Varadi and Teed&lt;sup id=&quot;fnref:2:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; explains:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;The choice of the number of nearest matches (or neighbors) is the $k$ in $k$-NN.&lt;/p&gt;

  &lt;p&gt;This is an important variable that allows the benefit of allowing the use the ability to trade-off accuracy versus reliability. Choosing a value for $k$ that is too high will lead to matches that are not appropriate to the current case. Choosing a value that is too low will lead to exact matches but poor generalizability and high sensitivity to noise.&lt;/p&gt;

  &lt;p&gt;The optimal value for K that maximizes out-of-sample forecast accuracy will vary depending on the data and the features chosen.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In practice, &lt;em&gt;the number of nearest neighbors $k$ […] [is] usually selected via cross-validation or more simply data splitting&lt;/em&gt;&lt;sup id=&quot;fnref:7:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt; and &lt;em&gt;the selected [value] minimizes an objective function which is often &lt;a href=&quot;https://en.wikipedia.org/wiki/Root_mean_square_deviation&quot;&gt;the Root Mean Square Error (RMSE)&lt;/a&gt; 
or sometimes &lt;a href=&quot;https://en.wikipedia.org/wiki/Mean_absolute_error&quot;&gt;the Mean Absolute Error (MAE)&lt;/a&gt;&lt;/em&gt;&lt;sup id=&quot;fnref:21&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:21&quot; class=&quot;footnote&quot;&gt;36&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;That being said, Guegan and Huck&lt;sup id=&quot;fnref:21:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:21&quot; class=&quot;footnote&quot;&gt;36&lt;/a&gt;&lt;/sup&gt; cautions about that practice by highlighting that:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;The estimation of $k$ via in sample predictions leads to choose high values, near or on the border of one has tabulated because the RMSE is a decreasing function of the number of neighbors&lt;/em&gt;&lt;sup id=&quot;fnref:21:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:21&quot; class=&quot;footnote&quot;&gt;36&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
  &lt;li&gt;A high value for the number of nearest neighbors is &lt;em&gt;an erroneous usage of the [method] because the neighbors are thus not near the pattern they should mimic&lt;/em&gt;&lt;sup id=&quot;fnref:21:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:21&quot; class=&quot;footnote&quot;&gt;36&lt;/a&gt;&lt;/sup&gt;, leading to (useless) forecasts &lt;em&gt;very close to the mean of the sample&lt;/em&gt;&lt;sup id=&quot;fnref:21:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:21&quot; class=&quot;footnote&quot;&gt;36&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another &lt;em&gt;direction is to adaptively choose the number of nearest neighbors $k$ […] depending on the test feature vector&lt;/em&gt;&lt;sup id=&quot;fnref:7:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;For example, Anava and Levy&lt;sup id=&quot;fnref:49&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:49&quot; class=&quot;footnote&quot;&gt;37&lt;/a&gt;&lt;/sup&gt; proposes &lt;em&gt;solving an optimization problem to adaptively choose what $k$ to use for [a given feature vector] in an approach called $k^*$-NN&lt;/em&gt;&lt;sup id=&quot;fnref:7:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In the specific context of $k$-NN-based supervised portfolios, and again to avoid choosing an explicit number of nearest neighbors, Varadi and Teed&lt;sup id=&quot;fnref:2:14&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; suggests to select &lt;em&gt;a range&lt;sup id=&quot;fnref:54&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:54&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt; of $k$’s to make [the] selection base more robust to potential changes in an “optimal” $k$ selection&lt;/em&gt;&lt;sup id=&quot;fnref:2:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Surprisingly, it turns out that this method is an ensemble method similar in spirit to the method described in Hassanat et al.&lt;sup id=&quot;fnref:50&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:50&quot; class=&quot;footnote&quot;&gt;39&lt;/a&gt;&lt;/sup&gt; for $k$-NN classification, which consists in using a base $k$-NN classifier with $k=1,2,…,\lfloor \sqrt{n} \rfloor$ and to combine the $\lfloor \sqrt{n} \rfloor$ classification results using inverse logarithmic weights.&lt;/p&gt;

&lt;h3 id=&quot;misc-remarks&quot;&gt;Misc. remarks&lt;/h3&gt;

&lt;h4 id=&quot;importance-of-training-dataset-diversity&quot;&gt;Importance of training dataset diversity&lt;/h4&gt;

&lt;p&gt;Asymptotic convergence results for the $k$-NN regression algorithm guarantee that &lt;em&gt;by increasing the amount of [training] data, […] the error probability gets arbitrarily close to the optimum for every training sequence&lt;/em&gt;&lt;sup id=&quot;fnref:15:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:15&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;But the amount of data available for training $k$-NN-based supervised portfolios is not infinite and might even in some cases be extremely limited&lt;sup id=&quot;fnref:51&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:51&quot; class=&quot;footnote&quot;&gt;40&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In that case, there is a high risk that the training data is “unevenly balanced” in the feature space, a situation illustrated in Figure 6 in the case of a univariate feature whose underlying distribution is Gaussian.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/supervised-portfolios-importance-training-set-size-chen.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/supervised-portfolios-importance-training-set-size-chen-small.png&quot; alt=&quot;Univariate $k$-NN regression with low accuracy, Gaussian feature distribution, far away training data. Source: Chen and Shah.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 6. Univariate $k$-NN regression with low accuracy, Gaussian feature distribution, far away training data. Source: Chen and Shah.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;From Figure 6, it is clear that such a lack of training data - or more precisely, such a lack of diversity in the training data - would force the $k$-NN regression algorithm to use far away nearest neighbors, which would severely degrade the quality of the forecasted portfolio weights.&lt;/p&gt;

&lt;p&gt;So, particular attention must be paid to the size and the diversity of the training dataset when using $k$-NN-based supervised portfolios, with for example ad-hoc procedures used whenever needed to simulate past asset returns for assets without return histories (&lt;a href=&quot;/blog/the-mathematics-of-bonds-simulating-the-returns-of-constant-maturity-government-bond-etfs/&quot;&gt;here&lt;/a&gt;) 
or to extend return histories for assets with shorter return histories than others (&lt;a href=&quot;/blog/managing-missing-asset-returns-in-portfolio-analysis-and-optimization-backfilling-through-residuals-recycling/&quot;&gt;here&lt;/a&gt;).&lt;/p&gt;

&lt;h4 id=&quot;avoiding-the-curse-of-dimensionality&quot;&gt;Avoiding the curse of dimensionality&lt;/h4&gt;

&lt;p&gt;The number of features selected by Varadi and Teed&lt;sup id=&quot;fnref:2:16&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; grows quadratically with the number of assets.&lt;/p&gt;

&lt;p&gt;At some point&lt;sup id=&quot;fnref:35&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:35&quot; class=&quot;footnote&quot;&gt;41&lt;/a&gt;&lt;/sup&gt;, the underlying $k$-NN regression algorithm will then inevitably face issues due to:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Distance concentration, which is &lt;em&gt;the tendency of distances between all pairs of points in high-dimensional data to become almost equal&lt;/em&gt;&lt;sup id=&quot;fnref:24&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:24&quot; class=&quot;footnote&quot;&gt;42&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
  &lt;li&gt;Poor discrimination of the nearest and farthest points for a given test point, which is an issue on top of the distance concentration problem, c.f. Beyer et al.&lt;sup id=&quot;fnref:34&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:34&quot; class=&quot;footnote&quot;&gt;43&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
  &lt;li&gt;Hubness&lt;sup id=&quot;fnref:24:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:24&quot; class=&quot;footnote&quot;&gt;42&lt;/a&gt;&lt;/sup&gt;, defined as the emergence of points called hubs which appear overly similar to many others&lt;/li&gt;
  &lt;li&gt;…&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In addition, the higher the number of features selected, the more training data is required to learn enough combinations of these different features, which further compounds the problem mentioned in the previous sub-section…&lt;/p&gt;

&lt;p&gt;All in all, that approach is not scalable but hopefully, a solution is also proposed in Varadi and Teed&lt;sup id=&quot;fnref:2:17&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;[…] to explore multi-asset portfolios [without introducing the problem of dimensionality with too-large a feature space], we took the average weight of each security from a single-pair run, and averaged them across all pair runs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;While this proposal may look like an ad-hoc workaround, it actually corresponds to an ensemble method that has been empirically shown to be effective for $k$-NN classification in high dimension in both:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Domeniconi and Yan&lt;sup id=&quot;fnref:36&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:36&quot; class=&quot;footnote&quot;&gt;44&lt;/a&gt;&lt;/sup&gt;, with a deterministic selection of features as in Varadi and Teed&lt;sup id=&quot;fnref:2:18&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
  &lt;li&gt;Bay&lt;sup id=&quot;fnref:37&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:37&quot; class=&quot;footnote&quot;&gt;45&lt;/a&gt;&lt;/sup&gt;, with a random&lt;sup id=&quot;fnref:38:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:38&quot; class=&quot;footnote&quot;&gt;27&lt;/a&gt;&lt;/sup&gt; selection of features&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The underlying idea of that ensemble method is to &lt;em&gt;exploit [the] instability of $k$-NN classifiers with respect to different choices of features to generate an effective and diverse set of NN classifiers with possibly uncorrelated errors&lt;/em&gt;&lt;sup id=&quot;fnref:36:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:36&quot; class=&quot;footnote&quot;&gt;44&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h2 id=&quot;implementations&quot;&gt;Implementations&lt;/h2&gt;

&lt;h3 id=&quot;implementation-in-portfolio-optimizer&quot;&gt;Implementation in Portfolio Optimizer&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Portfolio Optimizer&lt;/strong&gt; supports $k$-NN-based supervised portfolios through the endpoint &lt;a href=&quot;https://docs.portfoliooptimizer.io/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/portfolios/optimization/supervised/nearest-neighbors-based&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This endpoint supports 2 different distance metrics:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The Euclidean distance matrix&lt;/li&gt;
  &lt;li&gt;The Hassanat distance metric (default)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As for the selection of the number of nearest neighbors, this endpoint supports:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;A manually-defined number of nearest neighbors&lt;/li&gt;
  &lt;li&gt;A dynamically-determined number of nearest neighbors together with their individual weights through:
    &lt;ul&gt;
      &lt;li&gt;The $k$-NN ensemble method of Hassanat et al.&lt;sup id=&quot;fnref:50:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:50&quot; class=&quot;footnote&quot;&gt;39&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
      &lt;li&gt;A proprietary variation of the $k^*$-NN method of Anava and Levy&lt;sup id=&quot;fnref:49:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:49&quot; class=&quot;footnote&quot;&gt;37&lt;/a&gt;&lt;/sup&gt; (default)&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;implementation-elsewhere&quot;&gt;Implementation elsewhere&lt;/h3&gt;

&lt;p&gt;Chevalier et al.&lt;sup id=&quot;fnref:1:28&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; kindly provides &lt;a href=&quot;https://colab.research.google.com/drive/1yrhuV5i_o2g0Ju-xJEyONR5s2PFqyk3k?usp=shari#scrollTo=nsZYmc5nLF75&quot;&gt;a Python code&lt;/a&gt; to experiment with “gradient boosting decision trees”-based supervised portfolios.&lt;/p&gt;

&lt;h2 id=&quot;example-of-usage---learning-maximum-sharpe-ratio-portfolios&quot;&gt;Example of usage - Learning maximum Sharpe ratio portfolios&lt;/h2&gt;

&lt;p&gt;Because &lt;em&gt;most portfolio allocation decisions for active portfolio managers revolve around the optimal allocation between stocks and bonds&lt;/em&gt;&lt;sup id=&quot;fnref:2:19&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, I propose to reproduce the results of Varadi and Teed&lt;sup id=&quot;fnref:2:20&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; in the case of a 2-asset class portfolio made of:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;U.S. equities, represented by the SPY ETF&lt;/li&gt;
  &lt;li&gt;U.S. long-term Treasury bonds, represented by the TLT ETF&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;methodology&quot;&gt;Methodology&lt;/h3&gt;

&lt;p&gt;Varadi and Teed&lt;sup id=&quot;fnref:2:21&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; follows the general procedure of Chevalier et al.&lt;sup id=&quot;fnref:1:29&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; to train a $k$-NN-based supervised portfolio allocation algorithm for learning portfolio weights maximizing the Sharpe ratio.&lt;/p&gt;

&lt;p&gt;For this, and without entering into the details:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The selected features are asset returns, standard deviations and correlations over different&lt;sup id=&quot;fnref:14:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:14&quot; class=&quot;footnote&quot;&gt;28&lt;/a&gt;&lt;/sup&gt; past lookback periods, scaled through a specific normal distribution standardization&lt;/li&gt;
  &lt;li&gt;The selected distance metric is the standard Euclidean distance&lt;/li&gt;
  &lt;li&gt;The selected number of nearest neighbors is not a single value but a range of values related to the size of the training dataset&lt;sup id=&quot;fnref:54:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:54&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
  &lt;li&gt;The relevant initial training dates are the 2000 daily dates present in Varadi and Teed&lt;sup id=&quot;fnref:2:22&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;’s dataset from 4/13/1976 minus 2000 days to 4/12/1976&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The relevant subsequent training dates and test dates are all the daily dates present in Varadi and Teed&lt;sup id=&quot;fnref:2:23&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;’s dataset from 4/13/1976 to 12/31/13&lt;/p&gt;

    &lt;p&gt;To be noted that the training data is used in a rolling window manner over a 2000-day lookback.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;The future horizon over which maximum Sharpe ratio portfolio weights are learned during the training phase and evaluated during the test phase is a 20-day horizon&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On my side:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The selected features will be:
    &lt;ul&gt;
      &lt;li&gt;Past 12-month asset arithmetic returns, cross-sectionally normalized using the procedure described in Almgren and Neil&lt;sup id=&quot;fnref:52&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:52&quot; class=&quot;footnote&quot;&gt;46&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
      &lt;li&gt;Future aggregated asset covariances forecasted over the next month using &lt;a href=&quot;/blog/from-volatility-forecasting-to-covariance-matrix-forecasting-the-return-of-simple-and-exponentially-weighted-moving-average-models/&quot;&gt;an exponentially weighted moving average covariance matrix forecasting model&lt;/a&gt; with daily squared (close-to-close) returns&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The selected distance metric will be the Hassanat distance&lt;/p&gt;

    &lt;p&gt;This avoids the need for further features scaling.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;The selected number of nearest neighbors will be:
    &lt;ul&gt;
      &lt;li&gt;1&lt;/li&gt;
      &lt;li&gt;10&lt;/li&gt;
      &lt;li&gt;Dynamically determined with their individual weights by:
        &lt;ul&gt;
          &lt;li&gt;The $k$-NN ensemble method of Hassanat et al.&lt;sup id=&quot;fnref:50:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:50&quot; class=&quot;footnote&quot;&gt;39&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
          &lt;li&gt;The $k^*$-NN method of Anava and Levy&lt;sup id=&quot;fnref:49:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:49&quot; class=&quot;footnote&quot;&gt;37&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The relevant initial training dates will be all month-end dates present in a SPY/TLT ETFs-like training dataset from 1st January 1979 to 30 November 2003&lt;/p&gt;

    &lt;p&gt;Due to the relatively recent inception dates of both the SPY ETF (22nd January 1993) and the TLT ETF (22th July 2002), it is required to use proxies to extend the returns history of these assets:&lt;/p&gt;
    &lt;ul&gt;
      &lt;li&gt;The daily U.S. market returns $Mkt$ provided in &lt;a href=&quot;https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html&quot;&gt;the Fama and French data library&lt;/a&gt;, as a proxy for the SPY ETF daily returns&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;/blog/the-mathematics-of-bonds-simulating-the-returns-of-constant-maturity-government-bond-etfs/&quot;&gt;The simulated daily returns&lt;/a&gt; associated to the daily FRED &lt;a href=&quot;https://fred.stlouisfed.org/series/DGS30&quot;&gt;30-Year Treasury Constant Maturity Rates&lt;/a&gt;, as a proxy for the TLT ETF daily returns&lt;/li&gt;
    &lt;/ul&gt;

    &lt;p&gt;With these, the earliest date for which daily SPY/TLT ETFs-like returns are available is 16th February 1977; adding 1 year of data for computing the past 12-month returns gives 16th February 1978; rounded to 1st January 1979.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The relevant subsequent training dates and test dates will be all month-end dates present in the SPY/TLT ETFs test dataset from 1st January 2004 to 28th February 2025&lt;sup id=&quot;fnref:53&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:53&quot; class=&quot;footnote&quot;&gt;47&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

    &lt;p&gt;The earliest date for which daily SPY/TLT ETFs returns are available is 29th July 2002; adding 1 year of data for computing the past 12-month returns gives 29th July 2003; rounded to 1st January 2004.&lt;/p&gt;

    &lt;p&gt;To be noted that the training data is used in an expanding window manner.&lt;/p&gt;

    &lt;p&gt;As a consequence, the training dataset is made of 299 data points on 1st January 2004, expanding up to 552 data points on 28th February 2025 when the last forecast is made.&lt;/p&gt;

    &lt;p&gt;This is in stark constrast with Varadi and Teed&lt;sup id=&quot;fnref:2:24&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;’s training dataset which 1) contains 2000 data points and 2) is not expanding but is being rolled forward to &lt;em&gt;keep the algorithm more robust to market changes in feature relevance&lt;/em&gt;&lt;sup id=&quot;fnref:2:25&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;p&gt;As mentionned in a previous section, such a difference in quantity and in “local” diversity of the training dataset might impact my results v.s. those of Varadi and Teed&lt;sup id=&quot;fnref:2:26&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The future horizon over which maximum Sharpe ratio portfolio weights are learned during the training phase and evaluated during the test phase is a 1-month horizon at daily level&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The risk free rate is set to 0% when computing maximum Sharpe ratio portfolio weights&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;The cash portion of the different SPY/TLT portfolios - if any - is allocated to U.S. short-term Treasury bonds, represented by the SHY ETF&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;results&quot;&gt;Results&lt;/h3&gt;

&lt;p&gt;Figure 7 compares the standard direct approach for maximizing the Sharpe ratio to the $k$-NN-based supervised portfolios approach, with the 4 choices of nearest neighbors proposed in the previous sub-section.&lt;/p&gt;

&lt;p&gt;In both cases, like in Varadi and Teed&lt;sup id=&quot;fnref:2:27&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, the same features are used in input of the two algorithms.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/supervised-portfolios-reproduction.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/supervised-portfolios-reproduction-small.png&quot; alt=&quot;MSR portfolio v.s. $k$-NN-based learned MSR portfolios, SPY/TLT ETFs, 1st January 2004 - 31th March 2025.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 7. MSR portfolio v.s. $k$-NN-based learned MSR portfolios, SPY/TLT ETFs, 1st January 2004 - 31th March 2025.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Summmary statistics:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Portfolio&lt;/th&gt;
      &lt;th&gt;CAGR&lt;/th&gt;
      &lt;th&gt;Average Exposure&lt;/th&gt;
      &lt;th&gt;Annualized Sharpe Ratio&lt;/th&gt;
      &lt;th&gt;Maximum (Monthly) Drawdown&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Maximum Sharpe ratio (MSR)&lt;/td&gt;
      &lt;td&gt;~5.6%&lt;/td&gt;
      &lt;td&gt;~61%&lt;/td&gt;
      &lt;td&gt;~0.73&lt;/td&gt;
      &lt;td&gt;~14.4%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;$k$-NN learned MSR, $k=1$&lt;/td&gt;
      &lt;td&gt;~6.4%&lt;/td&gt;
      &lt;td&gt;~49%&lt;/td&gt;
      &lt;td&gt;~0.86&lt;/td&gt;
      &lt;td&gt;~14.4%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;$k$-NN learned MSR, $k=10$&lt;/td&gt;
      &lt;td&gt;~5.2%&lt;/td&gt;
      &lt;td&gt;~51%&lt;/td&gt;
      &lt;td&gt;~0.99&lt;/td&gt;
      &lt;td&gt;~15.8%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;$k$-NN learned MSR, $k=kEnsemble$&lt;/td&gt;
      &lt;td&gt;~5.4%&lt;/td&gt;
      &lt;td&gt;~51%&lt;/td&gt;
      &lt;td&gt;~1.01&lt;/td&gt;
      &lt;td&gt;~15.1%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;$k$-NN learned MSR, $k=k^*$&lt;/td&gt;
      &lt;td&gt;~6.8%&lt;/td&gt;
      &lt;td&gt;~49%&lt;/td&gt;
      &lt;td&gt;~1.00&lt;/td&gt;
      &lt;td&gt;~14.0%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;comments&quot;&gt;Comments&lt;/h3&gt;

&lt;p&gt;A couple of comments are in order:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Consistent with Varadi and Teed&lt;sup id=&quot;fnref:2:28&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, &lt;em&gt;the results demonstrate that the [$k$-NN-based supervised portfolio allocation] approach tends to outperform [the direct] MVO portfolio allocation [approach] on a risk-adjusted basis&lt;/em&gt;&lt;sup id=&quot;fnref:2:29&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, with a Sharpe ratio ~18%-38% higher.&lt;/p&gt;

    &lt;p&gt;This is quite interesting to highlight since the objective of the direct approach is supposed to be the maximization of the portfolio Sharpe ratio!&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The average exposure of the MSR portfolio is ~61% v.s. a relatively much lower exposure of ~50% for all the $k$-NN learned MSR portfolios&lt;/p&gt;

    &lt;p&gt;The Sharpe ratio of all the $k$-NN learned MSR portfolios being higher than that of the MSR portfolio, it implies that the changes in exposure are pretty well “timed”.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The $k$-NN learned MSR portfolios with $k=10$ and $k=kEnsemble$ are nearly identical&lt;/p&gt;

    &lt;p&gt;This is confirmed by examining the underlying asset weights (not shown here).&lt;/p&gt;

    &lt;p&gt;The $k$-NN ensemble portfolio has the advantage of not requiring to choose a specific value for $k$, though, and should definitely be prefered.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The $k$-NN learned MSR portfolios with $k=1$ and $k=k^*$ are close in terms of raw performances, but not in terms of Sharpe ratio&lt;/p&gt;

    &lt;p&gt;A closer look (not detailled here) reveals that this is because the $k$-NN learned MSR portfolios with $k=k^*$ regulary selects only 1 nearest neighbor when the other neighbors are too “far away” but also regularly selects many more neighbors when the other neighbors are “close enough”.&lt;/p&gt;

    &lt;p&gt;I interpret this as an empirical demonstration of the ability of the $k^*$-NN method of Anava and Levy&lt;sup id=&quot;fnref:49:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:49&quot; class=&quot;footnote&quot;&gt;37&lt;/a&gt;&lt;/sup&gt; &lt;em&gt;to adaptively choose the number of nearest neighbors $k$ […] depending on the test feature vector&lt;/em&gt;&lt;sup id=&quot;fnref:7:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The maximum drawdowns are comparable across all portfolios&lt;/p&gt;

    &lt;p&gt;This shows that the $k$-NN learned MSR portfolios, despite their attractive risk-adjusted performancess, are not able to magically avoid “dramatic” events.&lt;/p&gt;

    &lt;p&gt;Another layer of risk management, better return predictors, or both, is probably needed for that.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The winner of this horse race is the $k$-NN learned MSR portfolios with $k=k^*$, but this comes at a price in terms of turnover v.s. the $k$-NN learned MSR portfolios with $k=kEnsemble$&lt;/p&gt;

    &lt;p&gt;Also consistent with Varadi and Teed&lt;sup id=&quot;fnref:2:30&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, the asset weights of the $k$-NN learned MSR portfolio with $k=kEnsemble$ (and with $k=10$) are relatively stable and on average similar to an equal weight portfolio, while those of the MSR portfolio &lt;em&gt;show considerable noise and turnover&lt;/em&gt;&lt;sup id=&quot;fnref:2:31&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;p&gt;This is visible on the portfolio transition maps displayed in Figures 8 and 9.&lt;/p&gt;

    &lt;figure&gt;
      &lt;a href=&quot;/assets/images/blog/supervised-portfolios-reproduction-weights-knn-kensemble.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/supervised-portfolios-reproduction-weights-knn-kensemble-small.png&quot; alt=&quot;$k$-NN-based learned MSR portfolio, $k=kEnsemble$, SPY/TLT/SHY ETFs allocations through time, 1st January 2004 - 31th March 2025.&quot; /&gt;&lt;/a&gt;
      &lt;figcaption&gt;Figure 8. $k$-NN-based learned MSR portfolio, $k=kEnsemble$, SPY/TLT/SHY ETFs allocations through time, 1st January 2004 - 31th March 2025.&lt;/figcaption&gt;
  &lt;/figure&gt;

    &lt;figure&gt;
      &lt;a href=&quot;/assets/images/blog/supervised-portfolios-reproduction-weights-msr.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/supervised-portfolios-reproduction-weights-msr-small.png&quot; alt=&quot;MSR portfolio, SPY/TLT/SHY allocations through time, 1st January 2004 - 31th March 2025.&quot; /&gt;&lt;/a&gt;
      &lt;figcaption&gt;Figure 9. MSR portfolio, SPY/TLT/SHY allocations through time, 1st January 2004 - 31th March 2025.&lt;/figcaption&gt;
  &lt;/figure&gt;

    &lt;p&gt;For Varadi and Teed&lt;sup id=&quot;fnref:2:32&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, this demonstrates &lt;em&gt;the general uncertainty of the portfolio indicator inputs in aggregate&lt;/em&gt;&lt;sup id=&quot;fnref:2:33&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; and that $k$-NN learned MSR portfolio with $k=kEnsemble$ &lt;em&gt;manages to dynamically balance this uncertainty over time and shift more towards a probabilistic allocation that did not overweight or over-react to poor information&lt;/em&gt;&lt;sup id=&quot;fnref:2:34&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;p&gt;This statement is slightly less applicable to the $k$-NN learned MSR portfolio with $k=k^*$, because its better raw performances are explained by a more aggressive allocation, resulting in a much higher turnover, as can be seen by comparing Figure 8 to Figure 10.&lt;/p&gt;

    &lt;figure&gt;
      &lt;a href=&quot;/assets/images/blog/supervised-portfolios-reproduction-weights-knn-kstar.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/supervised-portfolios-reproduction-weights-knn-kstar-small.png&quot; alt=&quot;$k$-NN-based learned MSR portfolio, $k=k^*$, SPY/TLT/SHY ETFs allocations through time, 1st January 2004 - 31th March 2025.&quot; /&gt;&lt;/a&gt;
      &lt;figcaption&gt;Figure 10. $k$-NN-based learned MSR portfolio, $k=k^*$, SPY/TLT/SHY ETFs allocations through time, 1st January 2004 - 31th March 2025.&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;Exactly like in Varadi and Teed&lt;sup id=&quot;fnref:2:35&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, and despite the differences in implementation and in the size of the training dataset&lt;sup id=&quot;fnref:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;48&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The results of this section shows that &lt;em&gt;a traditional mean-variance/Markowitz/MPT framework under-performs [a $k$-NN-based supervised portfolio allocation] framework in terms of maximizing the Sharpe ratio&lt;/em&gt;&lt;sup id=&quot;fnref:2:36&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;The data further implies that traditional MPT makes far too many trades and takes on too many extreme positions as a function of how it is supposed to generate portfolio weights&lt;/em&gt;&lt;sup id=&quot;fnref:2:37&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Varadi and Teed&lt;sup id=&quot;fnref:2:38&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; provides the following explanation:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;This occurs because the inputs - especially the returns - are very noisy and may also demonstrate non-linear or counter-intuitive relationships. In contrast, by learning how the inputs map historically to optimal portfolios at the asset level, the resulting [$k$-NN-based supervised portfolios] allocations drift in a more stable manner over time.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;final-conclusion&quot;&gt;Final conclusion&lt;/h2&gt;

&lt;p&gt;Supervised portfolios as introduced in Chevalier et al.&lt;sup id=&quot;fnref:1:30&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; are able &lt;em&gt;learn from past time series of in-sample optimal weights&lt;/em&gt;&lt;sup id=&quot;fnref:1:31&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; and &lt;em&gt;to infer the best weights from variables such as past performance, risk, and proxies of the macro-economic outlook&lt;/em&gt;&lt;sup id=&quot;fnref:1:32&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In this blog post, I empirically demonstrated that this capability allows one of their simplest embodiement - $k$-NN-based supervised portfolios - to outperform a traditional mean-variance framework that seeks to maximize the Sharpe ratio of a portfolio, which independently confirms the prior results of Varadi and Teed&lt;sup id=&quot;fnref:2:39&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;To keep discovering non-standard portfolio allocation frameworks, feel free to &lt;a href=&quot;https://www.linkedin.com/in/roman-rubsamen/&quot;&gt;connect with me on LinkedIn&lt;/a&gt; or to &lt;a href=&quot;https://twitter.com/portfoliooptim&quot;&gt;follow me on Twitter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;–&lt;/p&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.tandfonline.com/doi/full/10.1080/14697688.2022.2122543&quot;&gt;Chevalier, G., Coqueret, G., &amp;amp; Raffinot, T. (2022). Supervised portfolios. Quantitative Finance, 22(12), 2275–2295&lt;/a&gt;. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:1:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;11&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;12&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:12&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;13&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:13&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;14&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:14&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;15&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:15&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;16&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:16&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;17&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:17&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;18&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:18&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;19&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:19&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;20&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:20&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;21&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:21&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;22&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:22&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;23&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:23&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;24&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:24&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;25&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:25&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;26&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:26&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;27&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:27&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;28&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:28&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;29&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:29&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;30&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:30&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;31&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:31&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;32&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:32&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;33&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://cssanalytics.wordpress.com/2014/05/06/adaptive-portfolio-allocations/&quot;&gt;David Varadi, Jason Teed, Adaptive Portfolio Allocations, NAAIM paper&lt;/a&gt;. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:2:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;11&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;12&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:12&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;13&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:13&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;14&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:14&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;15&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:15&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;16&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:16&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;17&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:17&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;18&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:18&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;19&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:19&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;20&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:20&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;21&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:21&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;22&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:22&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;23&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:23&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;24&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:24&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;25&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:25&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;26&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:26&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;27&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:27&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;28&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:28&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;29&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:29&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;30&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:30&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;31&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:31&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;32&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:32&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;33&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:33&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;34&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:34&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;35&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:35&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;36&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:36&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;37&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:37&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;38&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:38&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;39&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:39&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;40&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Varadi and Teed&lt;sup id=&quot;fnref:2:40&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; has been submitted to the 2014 &lt;a href=&quot;https://naaim.org/&quot;&gt;NAAIM&lt;/a&gt; annual white paper competition known as the &lt;a href=&quot;https://naaim.org/programs/naaim-founders-award/&quot;&gt;NAAIM Founders Award&lt;/a&gt;. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:5&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;To be noted that the data points could belong to a more generic space than $\mathbb{R}^m \times \mathbb{R}$. &lt;a href=&quot;#fnref:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:4&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;$\mathbb{R}^m$ is usually called the &lt;em&gt;feature space&lt;/em&gt;. &lt;a href=&quot;#fnref:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:43&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;In practice, $d$ might not necessarily be a proper metric; for example, it might not satisfy the triangle inequality property like &lt;a href=&quot;https://en.wikipedia.org/wiki/Cosine_similarity&quot;&gt;the cosine “distance”&lt;/a&gt;. &lt;a href=&quot;#fnref:43&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:43:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:9&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;For an historical perspective on the $k$-NN algorithm going beyond the usual technical report from Fix and Hodges&lt;sup id=&quot;fnref:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:10&quot; class=&quot;footnote&quot;&gt;49&lt;/a&gt;&lt;/sup&gt; and the seminal paper from Cover and Hart&lt;sup id=&quot;fnref:11:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:11&quot; class=&quot;footnote&quot;&gt;10&lt;/a&gt;&lt;/sup&gt;, the interested reader is refered to Chen and Shah&lt;sup id=&quot;fnref:6:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt;, which mentions that the $k$-NN classification algorithm was already mentioned in a text from the early 11th century. &lt;a href=&quot;#fnref:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:7&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://ieeexplore.ieee.org/document/8384208&quot;&gt;George H. Chen; Devavrat Shah, Explaining the Success of Nearest Neighbor Methods in Prediction , now, 2018&lt;/a&gt;. &lt;a href=&quot;#fnref:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:7:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;11&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;12&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:15&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://link.springer.com/book/10.1007/978-3-319-25388-6&quot;&gt;Gerard Biau, Luc Devroye, Lectures on the Nearest Neighbor Method, Springer Series in the Data Sciences&lt;/a&gt;. &lt;a href=&quot;#fnref:15&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:15:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:15:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:15:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:15:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:15:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:15:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:15:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:11&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://ieeexplore.ieee.org/document/1053964&quot;&gt;Cover, T. M. and P. E. Hart (1967). “Nearest neighbor pattern classification”. IEEE Transactions on Information Theory&lt;/a&gt;. &lt;a href=&quot;#fnref:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:11:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:17&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Although &lt;em&gt;$n$ must be sufficiently large in order for there to exist a $k$ that satisfies the conditions&lt;/em&gt;&lt;sup id=&quot;fnref:6:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt; required by the main theorem of Jiang&lt;sup id=&quot;fnref:6:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:17&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:6&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://ojs.aaai.org/index.php/AAAI/article/view/4292&quot;&gt;Jiang, H. (2019). Non-Asymptotic Uniform Rates of Consistency for $k$-NN Regression. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 3999-4006&lt;/a&gt;. &lt;a href=&quot;#fnref:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:6:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:6:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:6:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:6:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:6:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:6:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:6:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:6:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:18&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://ieeexplore.ieee.org/document/5569740&quot;&gt;S. Sun and R. Huang, “An adaptive k-nearest neighbor algorithm,” 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery, Yantai, China, 2010, pp. 91-94&lt;/a&gt;. &lt;a href=&quot;#fnref:18&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:19&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See for example &lt;a href=&quot;https://proceedings.neurips.cc/paper/1992/hash/26408ffa703a72e8ac0117e74ad46f33-Abstract.html&quot;&gt;P.Y. Simard, Y. LeCun and J. Decker, “Efficient pattern recognition using a new transformation distance,” In Advances in Neural Information Processing Systems, vol. 6, 1993, pp. 50-58&lt;/a&gt;, in which the Euclidean distance between images of handwritten digits is replaced by an ad-hoc distance invariant with respect to geometric transformations of such images (rotation, translation, scaling, etc.). &lt;a href=&quot;#fnref:19&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:25&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://dl.acm.org/doi/10.1145/276698.276876&quot;&gt;Har-Peled, S., P. Indyk, and R. Motwani (2012). “Approximate Nearest eighbor: Towards Removing the Curse of Dimensionality.” Theory of Computing&lt;/a&gt;. &lt;a href=&quot;#fnref:25&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:26&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://dl.acm.org/doi/10.1145/258533.258653&quot;&gt;Kleinberg, J. M. (1997). “Two algorithms for nearest-neighbor search in high dimensions”. In: Symposium on Theory of Computing&lt;/a&gt;. &lt;a href=&quot;#fnref:26&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:23&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;For example, the end of each month for learning a monthly asset allocation strategy. &lt;a href=&quot;#fnref:23&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:30&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;A day, a week, a month, etc. &lt;a href=&quot;#fnref:30&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:27&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Also called &lt;em&gt;inference data&lt;/em&gt;, that is, data not “seen” during the training phase. &lt;a href=&quot;#fnref:27&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:29&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Like budget constraints, asset weights constraints, asset group constraints, portfolio exposure constraints, etc. &lt;a href=&quot;#fnref:29&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:28&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Chevalier et al.&lt;sup id=&quot;fnref:1:33&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; notes that these empirical results &lt;em&gt;still hold when replacing boosted trees by simple regressions&lt;/em&gt;&lt;sup id=&quot;fnref:1:34&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:28&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:48&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Developed equities, emerging equities, global corporate bonds, global government bonds. &lt;a href=&quot;#fnref:48&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:31&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.pm-research.com/content/iijjfds/6/2/10&quot;&gt;Chevalier, Guillaume, Coqueret, Guillaume, Raffinot, Thomas, Interpretable Supervised Portfolios, The Journal of Financial Data Science  Spring 2024, 6 (2) 10-34&lt;/a&gt;. &lt;a href=&quot;#fnref:31&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:31:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:31:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:20&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.degruyterbrill.com/document/doi/10.1524/stnd.2008.0917/html?srsltid=AfmBOoqmKAyZ2Y3692M6LMwYdPB7o_nLBKPzW5Ma-JPmsizet-DGNvor&quot;&gt;L. Gyorfi, F. Udina, and H. Walk. Nonparametric nearest neighbor based empirical portfolio selection strategies. Statistics &amp;amp; Decisions, International Mathematical Journal for Stochastic Methods and Models, 26(2):145–157, 2008&lt;/a&gt;. &lt;a href=&quot;#fnref:20&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:33&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Unless a very specific variation of $k$-NN regression is used. &lt;a href=&quot;#fnref:33&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:33:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:32&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;At least from an algorithm aversion perspective. Nevertheless, there can be other benefits, c.f. Chevalier et al.&lt;sup id=&quot;fnref:31:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:31&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:32&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:38&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Similar to &lt;a href=&quot;https://en.wikipedia.org/wiki/Random_subspace_method&quot;&gt;random subspace optimization&lt;/a&gt;. &lt;a href=&quot;#fnref:38&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:38:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:14&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;1 month, 2 months, 3 months, 6 months and 12 months. &lt;a href=&quot;#fnref:14&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:14:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:14:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:14:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:39&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://arxiv.org/abs/2408.07706v1&quot;&gt;Avivit Levy, B. Riva Shalom, Michal Chalamish, A Guide to Similarity Measures, arXiv&lt;/a&gt; for a very long list of distance metrics. &lt;a href=&quot;#fnref:39&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:41&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Or more generally, to most supervised machine learning algorithms. &lt;a href=&quot;#fnref:41&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:40&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://api.semanticscholar.org/CorpusID:251472012&quot;&gt;Ishan Arora and Namit Khanduja and Mayank Bansal, Effect of Distance Metric and Feature Scaling on KNN Algorithm while Classifying X-rays, RIF, 2022&lt;/a&gt;. &lt;a href=&quot;#fnref:40&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:44&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;http://www.dx.doi.org/10.7537/marsjas100814.31&quot;&gt;Hassanat, A.B., 2014. Dimensionality Invariant Similarity Measure. Journal of American Science, 10(8), pp.221-26&lt;/a&gt;. &lt;a href=&quot;#fnref:44&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:47&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;To be noted that features scaling might still be performed to try to improve the performances of the $k$-NN regression algorithm. &lt;a href=&quot;#fnref:47&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:46&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Other properties of the Hassanat distance are for example robustness to noise and linear growth with the dimension of the feature space, c.f. Abu Alfeilat et al.&lt;sup id=&quot;fnref:42:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:42&quot; class=&quot;footnote&quot;&gt;35&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:46&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:42&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/31411491/&quot;&gt;Abu Alfeilat HA, Hassanat ABA, Lasassmeh O, Tarawneh AS, Alhasanat MB, Eyal Salman HS, Prasath VBS. Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review. Big Data. 2019 Dec;7(4):221-248&lt;/a&gt;. &lt;a href=&quot;#fnref:42&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:42:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:42:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:21&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://shs.cairn.info/revue-finance-2005-2-page-67?lang=en&quot;&gt;Guegan, D. and Huck, N. (2005). On the Use of Nearest Neighbors in Finance. Finance, . 26(2), 67-86&lt;/a&gt;. &lt;a href=&quot;#fnref:21&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:21:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:21:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:21:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:21:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:49&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://papers.nips.cc/paper_files/paper/2016/hash/2c6ae45a3e88aee548c0714fad7f8269-Abstract.html&quot;&gt;Oren Anava, Kfir Levy, k*-Nearest Neighbors: From Global to Local, Advances in Neural Information Processing Systems 29 (NIPS 2016)&lt;/a&gt;. &lt;a href=&quot;#fnref:49&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:49:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:49:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:49:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:54&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;In more details, Varadi and Teed&lt;sup id=&quot;fnref:2:41&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; chooses the $k$’s &lt;em&gt;in percentages of the size of the training space, which were 5%, 10%, 15% and 20% resulting essentially in a weighted average of the top instances&lt;/em&gt;&lt;sup id=&quot;fnref:2:42&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:54&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:54:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:50&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://sites.google.com/site/ijcsis/all-volumes-issues/vol-12-no-8-aug-2014&quot;&gt;Hassanat, A.B., Mohammad Ali Abbadi, Ghada Awad Altarawneh, Ahmad Ali Alhasanat, 2014. Solving the Problem of the K Parameter in the KNN Classifier Using an Ensemble Learning Approach. International Journal of Computer Science and Information Security, 12(8), pp.33-39&lt;/a&gt;. &lt;a href=&quot;#fnref:50&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:50:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:50:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:51&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;For example, due to the limited price history of some assets or due to the length of the desired horizon over which optimal portfolio weights need to be computed. &lt;a href=&quot;#fnref:51&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:35&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Possibly &lt;em&gt;for as few as 10-15 dimensions&lt;/em&gt;&lt;sup id=&quot;fnref:34:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:34&quot; class=&quot;footnote&quot;&gt;43&lt;/a&gt;&lt;/sup&gt;! &lt;a href=&quot;#fnref:35&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:24&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://jmlr.org/papers/v11/radovanovic10a.html&quot;&gt;Radovanovic, Milo and Nanopoulos, Alexandros and Ivanovic, Mirjana}, Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data, Journal of Machine Learning Research 11 (2010) 2487-2531&lt;/a&gt;. &lt;a href=&quot;#fnref:24&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:24:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:34&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://link.springer.com/chapter/10.1007/3-540-49257-7_15&quot;&gt;Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U. (1999). When Is “Nearest Neighbor” Meaningful?. In: Beeri, C., Buneman, P. (eds) Database Theory — ICDT’99. ICDT 1999. Lecture Notes in Computer Science, vol 1540. Springer, Berlin, Heidelberg&lt;/a&gt;. &lt;a href=&quot;#fnref:34&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:34:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:36&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;http://doi.ieeecomputersociety.org/10.1109/ICPR.2004.1334065&quot;&gt;Domeniconi, C., &amp;amp; Yan, B. (2004). Nearest neighbor ensemble. In Pattern recognition, international conference on, Vol. 1 (pp. 228–231). Los Alamitos, CA, USA: IEEE Computer Society&lt;/a&gt;. &lt;a href=&quot;#fnref:36&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:36:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:37&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.sciencedirect.com/science/article/abs/pii/S1088467X99000189&quot;&gt;Bay S.D. Nearest neighbor classification from multiple feature subsets Intelligent Data Analysis, 3 (1999), pp. 191-209&lt;/a&gt;. &lt;a href=&quot;#fnref:37&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:52&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://ssrn.com/abstract=633801&quot;&gt;Almgren, Robert and Chriss, Neil A., Optimal Portfolios from Ordering Information (December 2004)&lt;/a&gt;. &lt;a href=&quot;#fnref:52&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:53&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;(Adjusted) daily prices have have been retrieved using &lt;a href=&quot;https://api.tiingo.com/&quot;&gt;Tiingo&lt;/a&gt;. &lt;a href=&quot;#fnref:53&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:12&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;This empirically confirms that, desptite their dependency on the size of the training dataset, nearest neighbor methods &lt;em&gt;can learn from a small set of examples&lt;/em&gt;&lt;sup id=&quot;fnref:37:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:37&quot; class=&quot;footnote&quot;&gt;45&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:12&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:10&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.jstor.org/stable/1403797&quot;&gt;Discriminatory analysis, nonparametric discrimination: Consistency properties”. Technical report, USAF School of Aviation Medicine&lt;/a&gt;. &lt;a href=&quot;#fnref:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name>Roman R.</name></author><category term="mean-variance optimization" /><category term="machine learning" /><summary type="html">Standard portfolio allocation algorithms like Markowitz mean-variance optimization or Choueffati diversification ratio optimization usually take in input asset information (expected returns, estimated covariance matrix…) as well investor constraints and preferences (maximum asset weights, risk aversion…) to produce in output portfolio weights satisfying a selected mathematical objective like the maximization of the portfolio Sharpe ratio or Diversification ratio. Chevalier et al.1 introduces a non-standard portfolio allocation framework - represented in Figure 1 - under which the same input is first used to “learn” in-sample optimized portfolio weights in a supervised training phase and then used to produce out-of-sample optimized portfolio weights in an inference phase. Figure 1. Standard v.s. supervised portfolio allocation framework. Source: Adapted from Chevalier et al. In this blog post, I will provide some details about that framework when used with the $k$-nearest neighbors supervised machine learning algorithm, which is an idea originally proposed in Varadi and Teed23. As an example of usage, I will compare the performances of a $k$-nearest neighbors supervised portfolio with those of a “direct” mean-variance portfolio in the context of a monthly tactical asset allocation strategy for a 2-asset class portfolio made of U.S. equities and U.S. Treasury bonds. Mathematical preliminaries Supervised machine learning algorithms Let $\left( X_1, Y_1 \right)$, …, $\left( X_n, Y_n \right)$ be $n$ pairs of data points in4 $\mathbb{R}^m \times \mathbb{R}$, $m \geq 1$5, where: Each data point $X_1, …, X_n$ represents an object - like the pixels of an image - and is called a feature vector Each data point $Y_1,…,Y_n$ represents a characteristic of its associated object - like what kind of animal is depicted in an image (discrete characteristic) or the angle of the rotation between a rotated image and its original version (continuous characteristic) - and is called a label Given a feature vector $x \in \mathbb{R}^m$, the aim of a supervised machine learning algorithm is then to estimate the “most appropriate” label associated to $x$ - $\hat{y} \in \mathbb{R}$ - thanks to the information contained in the training dataset $\left( X_1, Y_1 \right)$, …, $\left( X_n, Y_n \right)$. $k$-nearest neighbors regression algorithm Let $d$ be a distance metric6 on $\mathbb{R}^m$, like the standard Euclidean distance. The $k$-nearest neighbor ($k$-NN) regression algorithm is an early7 [supervised] machine learning algorithm8 that uses the “neighborhood” of a feature vector in order to estimate its label. In more details, let $\left( X_{(i)}(x), Y_{(i)}(x) \right)$, $i=1..n$ denotes the $i$-th training data point closest to $x$ among all the training data points $\left( X_1, Y_1 \right)$, …, $\left( X_n, Y_n \right)$ such that the distance of each training data point to $x$ satisfies $d \left(x, X_{(1)}(x) \right)$ $\leq … \leq$ $d \left(x, X_{(n)}(x) \right)$. By definition, the $k$-NN estimate for the label associated to $x$ is then9 the uniformly or non-uniformly weighted average label of the $k \in \{ 1,…,n \}$ nearest neighbors $Y_{(1)}(x)$,…,$Y_{(j)}(x)$ \[\hat{y} = \frac{1}{k} \sum_{i=1}^k Y_{(i)}(x)\] or \[\hat{y} = \sum_{i=1}^k w_i Y_{(i)}(x)\] , where $w_i \geq 0$ is the weight associated to the $i$-th nearest neighbor $Y_{(i)}(x)$ and all the weights $w_i$, $i=1..k$ sum to one, that is, $\sum_{j=1}^k w_k$. For illustration purposes, the process of selecting the 2 nearest neighbors $X_{(1)}(x)$ and $X_{(2)}(x)$ of a data point $x$ in $\mathbb{R}^2$ is outlined in Figure 2. Figure 2. Example of $k$-NN nearest neighbors selection process in m = 2 dimensions, with n = 3 training data points and k = 2 nearest neighbors. Notes: It additionally exists the $k$-NN classification algorithm, which is a variant of the $k$-NN regression algorithm where the label space is not $\mathbb{R}$ but a finite subset of $\mathbb{N}$. Theoretical guarantees Since the seminal paper of Cover and Hart10 - proving under mild conditions that the $k$-NN classification algorithm achieves an error rate that is at most twice the best error rate achievable8 -, several convergence results have been established for $k$-NN methods. For example, under an asymptotic regime where the number of training data points $n$ and the number of nearest neighbors $k$ both go to infinity, it has been demonstrated9 that the $k$-NN regression algorithm is able to learn any functional relationship of the form $Y_i = f \left( X_i \right) + \epsilon_i$, $i=1..n$, where $f$ is an unknown function and $\epsilon_i$ represents additive noise. As another example, this time under a finite11 sample regime, Jiang12 derives the first sup-norm finite-sample [“convergence”] result12 for the $k$-NN regression algorithm and shows that it achieves a maximum error rate that is equal to the best maximum error rate achievable up to logarithmic factors12, with high probability. In addition to these convergence results, $k$-NN methods also exhibit interesting properties w.r.t. the dimensionality of the feature space $\mathbb{R}^m$. For example, while the curse of dimensionality forces non-parametric methods such as $k$-NN to require an exponential-in-dimension sample complexity12, the $k$-NN regression algorithm actually adapts to the local intrinsic dimension without any modifications to the procedure or data12. In other words, if the feature vectors belong to $\mathbb{R}^m$ but have a “true” dimensionality equal to $\mathbb{R}^p, p &amp;lt; m$, then the $k$-NN regression algorithm will [behave] as if it were in the lower dimensional space [of dimension $p$] and independent of the ambient dimension [$m$]12. Further properties of $k$-NN methods can be found in Chen and Shah8 and in Biau and Devroye9. Practical performances Like all supervised machine learning algorithms, the practical performances of the $k$-NN regression algorithm heavily depend on the problem at hand. Yet, in general, it often yields competitive results [v.s. other more complex algorithms like neural networks], and in certain domains, when cleverly combined with prior knowledge, it has significantly advanced the state-of-the-art13. Beyond these competitive performances, Chen and Shah8 also highlights other important practical aspects of $k$-NN methods that contributed to their empirical success over the years8: Their flexibility in choosing a problem-specific definition of “near” through a custom distance metric14 Their computational efficiency, which has enabled these methods to scale to massive datasets (“big data”)8 thanks to approaches like approximate nearest neighbor search15 or random projections16 Their non-parametric nature, in that they make very few assumptions on the underlying model for the data8 Their ease of interpretability, since they provide evidence for their predictions by exhibiting the nearest neighbors found8 $k$-NN based supervised portfolios Supervised portfolios Chevalier et al.1 describes an asset allocation strategy that engineers optimal weights before feeding them to a supervised learning algorithm1, represented in the lower part of Figure 1. Given a training dataset of past [financial] observations1 like past asset returns, past macroeconomic indicators, etc., it proceeds as follows: For any relevant date17 $t=t_1,…$ in the training dataset Compute optimal (in-sample) future portfolio weights $w_{t+1}$ over a (also in-sample) desired future horizon18, using a selected portfolio optimization algorithm with financial observations up to the time $t+1$ These optimal future portfolio weights are the labels $Y_t$, $t=t_1,…$, of the training data points. To be noted that by lagging the data, we can use the in-sample future realized returns to compute all the [returns-based] estimates1 required by the portfolio optimization algorithm like the expected asset returns, the asset covariance matrix, etc. This allows to be forward-looking in the training sample, while at the same time avoiding any look-ahead bias1. During this step, constraints can of course be added in order to satisfy targets and policies1. Compute a chosen set of predictors supposed to be linked to the in-sample future portfolio weights $w_{t+1}$, using financial observations up to the time $t$ These predictors are the feature vectors $X_t$, $t=t_1,…$, of the training data points. Train and tune a supervised machine learning algorithm using the training data points $\left( X_t, Y_t \right)$, $t=t_1,…$. Once the training phase is completed, the supervised portfolio allocation algorithm is ready to be used with test data19. For any relevant (out-of-sample) test date $t’=t’_1,…$ Compute the set of predictors chosen during the training phase, using financial observations up to the time $t’$ These predictors are the test feature vectors $x_{t’}$, $t’=t’_1,…$. Provide that set of predictors as an input test feature vector to the supervised machine learning algorithm to receive in output the estimated optimal portfolio weights $\hat{w}_{t’+1}$ over the (out-of-sample) future horizon These estimated optimal portfolio weights are the estimated labels $\hat{y}_{t’}$, $t’=t’_1,…$. Here, depending on the exact supervised machine learning algorithm, the estimated portfolio weights $\hat{w}_{t’+1}$ might not satisfy the portfolio constraints20 imposed in the training phase, in which case a post-processing phase would be required. The portfolio allocation framework of Chevalier et al.1 described above allows the algorithm to learn from past time series of in-sample optimal weights and to infer the best weights from variables such as past performance, risk, and proxies of the macro-economic outlook1. This contrasts with the standard practice of directly forecasting the input of a portfolio optimization algorithm, making that framework rather original. In terms of empirical performances, Chevalier et al.1 finds that predicting the optimal weights directly instead of the traditional two step approach leads to more stable portfolios with statistically better risk-adjusted performance measures1 when using mean-variance optimization as the selected portfolio optimization algorithm and gradient boosting decision trees as the selected supervised machine learning algorithm21. Some of these risk-adjusted performance measures are displayed in Figure 3 in the case of 4 asset classes22, for the 3 horizons of predicted returns and the 3 risk aversion levels used in Chevalier et al.1. Figure 3. Performances of supervised portfolios v.s. direct mean-variance optimized portfolios, 4 asset classes. Source: Adapted from Chevalier et al. Notes: Additional information can be found in the follow-up paper Chevalier et al.23 and in a video of Thomas Raffinot for QuantMinds International. $k$-NN-based supervised portfolios Theoretically, the supervised machine learning model used in the portfolio allocation framework of Chevalier et al.1 is trained to learn the following model1: \[w_{t+1} = g_t \left(X_t \right) + \epsilon_{t+1}\] , where: $X_t $ is the feature vector made of the chosen set of predictors computed at time $t$ $w_{t+1}$ is the vector of optimal portfolio weights over the desired future horizon $t+1$ $g$ is an unknown function Because such a model describes a functional relationship compatible with a $k$-NN regression algorithm, it is reasonable to think about using that algorithm as the supervised machine learning algorithm in the above framework. Enter $k$-NN-based supervised portfolios, a portfolio allocation framework originally introduced in Varadi and Teed2 as follows: This naturally leads us down the path of creating algorithms that can learn from past data and evolve over time to change the method for creating portfolio allocations. The simplest and most intuitive machine-learning algorithm is the K-Nearest Neighbor method ($k$-NN) […, which] is a form of “case-based” reasoning. That is, it learns from examples that are similar to current situation by looking at the past [and says: “what happened historically when I saw patterns that are close to the current pattern?”]. It shares a lot in common with how human beings make decisions. When portfolio managers talk about having 20 years of experience, they are really saying that they have a large inventory of past “case studies” in memory to make superior decisions about the current environment. As a side note, Varadi and Teed2 is not the first paper to apply a $k$-NN regression algorithm to the problem of portfolio allocation, c.f. for example Gyorfi and al.24 in the setting of online portfolio selection, but Varadi and Teed2 is - to my knowledge - the first paper about the same “kind” of supervised portfolios as in Chevalier et al.1. A couple of practical advantages of $k$-NN-based supervised portfolios v.s. for example “gradient boosting decision trees”-based supervised portfolios as used in Chevalier et al.1 are: The simplicity of the training Since nearest neighbor methods are lazy learners, there is strictly speaking no real training phase. The simplicity of the tuning There can be no tuning at all if no “advanced” technique (automated features selection, distance learning…) is used. The guarantee that (convex) portfolio constraints learned during the training phase are satisfied during the test phase In $k$-NN regression25, the estimate for the label associated to a test point is a convex combination of that point nearest neighbors. As a consequence, the estimated portfolio weights $\hat{w}_{t’+1}$ are guaranteed25 to satisfy any learned convex portfolio constraints, thereby avoiding any post-processing that could degrade the “quality” of the estimated weights. The ease of interpretability Due to algorithm aversion, Chevalier et al.23 highlights the need to be able to transform a black box nonlinear predictive algorithm [like gradient boosting decision trees] into a simple combination of rules23 in order to make it interpretable for humans. With a $k$-NN regression algorithm, which is one of the most transparent supervised machine learning algorithm in existence, that step is probably not useful26. In terms of empirical performances, Varadi and Teed2 concludes that $k$-NN-based supervised portfolios consistently outperformed [vanially maximum Sharpe ratio portfolios] on both heterogeneous and homogenous data sets on a risk-adjusted basis2, with the $k$-NN-based approach [exhibiting] a Sharpe ratio [… up to] over 30% higher than [the direct maximum Sharpe ratio approach]2. Average performance measures for the $k$-NN-based supervised portfolios in Varadi and Teed2 are reported in Figure 4. Figure 4. Performances of $k$-NN-based supervised portfolios v.s. direct mean-variance optimized portfolios. Source: Adapted from Varadi and Teed. Implementing $k$-NN-based supervised portfolios Features selection Biau and Devroye9 describes features selection as: […] the process of choosing relevant components of the [feature] vector $X$ for use in model construction. There are many potential benefits of such an operation: facilitating data visualization and data understanding, reducing the measurement and storage requirements, decreasing training and utilization times, and defying the curse of dimensionality to improve prediction performance. , and provides some rules of thumb that should be followed9: Noisy measurements, that is, components that are independent of $Y$, should be avoided9, especially because nearest neighbor methods are extremely sensitive to the features used27 Adding a component that is a function of other components is useless9 Beyond these generic rules, and although it has been an active research area in the statistics, machine learning, and data mining communities1, features selection is unfortunately strongly problem-dependent. In the context of supervised portfolios, Chevalier et al.1 and Varadi and Teed2 both propose to use: Past asset returns over different horizons28 so as to assess momentum and reversals1 Past asset volatilities over different horizons28, to approximate asset-specific risk1 Varadi and Teed2 additionally proposes to include past asset correlations over different horizons28 to ensure that [the] $k$-NN algorithm [doesn’t] have access to any information that the [direct mean-variance optimization] [doesn’t] have, but merely use it differently2. Chevalier et al.1, building on stocks asset pricing litterature, does not suggest to include other returns-based indicator than past asset returns and volatilities but suggests instead to include various macroeconomic indicators (yield curve, VIX…). Features scaling Typical distance metrics29 used with nearest neighbor methods like the Euclidean distance are said to be scale variant, meaning that the definition of a nearest neighbor is influenced by the relative and absolute scale of the different features. For example, when using the Euclidean distance with features such as a person’s height and a person’s age: The height feature disproportionally infuences the definition of a neighbor if the height feature is measured in millimeters and age in years The age feature disproportionally infuences the definition of a neighbor if the height feature is measured in meters and age in days For this reason, features are usually scaled to a similar range before being provided in input to a $k$-NN algorithm30, which is a pre-processing step called features scaling. A couple of techniques for features scaling are described in Arora et al.31: Min-max scaling, which scales all the values of a feature $\left( X_i \right)_j$, $j \in \{ 1,…,m \}$, $i=1..n$ to a given interval - like $[0,1]$ -, based on the minimum and the maximum values of that feature: \[\left( X_i \right)_j' = \frac{\left( X_i \right)_j - \min_j \left( X_i \right)_j }{\max_j \left( X_i \right)_j - \min_j \left( X_i \right)_j }, i=1..n\] Standardization, also called z-score normalization, which transforms all the values of a feature $\left( X_i \right)_j$, $j \in \{ 1,…,m \}$ , $i=1..n$ into values that are approximatively standardly normally distributed: \[\left( X_i \right)_j' = \frac{ \left( X_i \right)_j - \overline{\left( X_i \right)_j}}{ \sigma_{\left( X_i \right)_j} }\] In the context of supervised portfolios, additional techniques are described in Chevalier et al.1: Quantile normal transformation for a “time series”-like feature, which standardizes the time-series into quantile and then map the values to a normal distribution1 It is important to note that at any given date, the quantiles should be computed using information up to that date only to avoid forward looking leakage1. In addition, a lookback window over which to compute the quantiles should be chosen, with possible impacts on the performances of the supervised machine learning algorithm. Cross sectional normalization for a regular feature, which scales the cross sectional values between 0 and 1 using the empirical cumulative distribution function1 At any given date, this normalization can be performed fully in the cross-section at that date if there are enough assets or in the cross-section at that date using information up to that date to compute the empirical cumulative distribution function. In the latter case, c.f. the previous point. Hyperbolic tangent function ($\tanh$) scaling for labels, in order to center [them] and make them more comparable by taming outliers1: \[Y' = 0.5 \tanh{\left( 0.01 \frac{Y − \overline{Y}}{ \sigma_Y } \right) }\] Naturally, the reverse transformation is performed after the prediction to transform back the labels into its original values1. Finally, in the specific context of $k$-NN-based supervised portfolios, 2 additional techniques are described in Varadi and Teed2, that are variations of the techniques of Chevalier et al.1. Distance metric selection As already mentioned in the previous sub-section, the distance metric used with a nearest neighbor method influences the definition of a nearest neighbor due to its scale variant or scale invariant nature. But that’s not all, because different distance metrics behave differently with regards to outliers, to noise, to the dimension of the feature space, etc. On top of that, the chosen distance metric is sometimes not a proper metric6… So, what to do in the specific context of $k$-NN-based supervised portfolios? From the empirical results in Varadi and Teed2, the Euclidean distance seems to be a good choice as long as the chosen predictors are properly scaled. From the empirical results later in this blog post, a little known distance metric called the Hassanat distance32 also seems to be a good choice and additionally does not require33 the chosen predictors to be scaled because it is scale invariant34. That distance - noted $HasD(x,y)$ - is defined between two vectors $x = \left(x_1,…,x_m\right)$ and $y = \left(y_1,…,y_m\right)$ as follows: \[HasD(x,y) = \sum_{i=1}^m D(x_i,y_i)\] , with \[D(x_i,y_i) = \begin{cases} 1 - \frac{1 + \min(x_i,y_i)}{1 + max(x_i,y_i)}, &amp;amp;\text{if } \min(x_i,y_i) \geq 0 \\ 1 - \frac{1}{1 + \max(x_i,y_i) - min(x_i,y_i) } &amp;amp;\text{if } \min(x_i,y_i) &amp;lt; 0 \end{cases}\] Figure 5 illustrates the 1-dimensional Hassanat distance $HasD(0,n)$ with $n \in [-10,10]$. Figure 5. Representation of the 1-dimensional Hassanat distance between the points 0 and n. Source: Abu Alfeilat et al. As a side note, the Hassanat distance has been empirically demonstrated to perform the best when applied on most data sets comparing with the other tested distances35 in Abu Alfeilat et al.35, which compares the performances of 54 distance metrics used in $k$-NN classification. How to select the number of nearest neighbors? Together with the distance metric $d$, the number of nearest neighbors $k$ is the other hyperparameter that has to be selected in nearest neighbor methods. Varadi and Teed2 explains: The choice of the number of nearest matches (or neighbors) is the $k$ in $k$-NN. This is an important variable that allows the benefit of allowing the use the ability to trade-off accuracy versus reliability. Choosing a value for $k$ that is too high will lead to matches that are not appropriate to the current case. Choosing a value that is too low will lead to exact matches but poor generalizability and high sensitivity to noise. The optimal value for K that maximizes out-of-sample forecast accuracy will vary depending on the data and the features chosen. In practice, the number of nearest neighbors $k$ […] [is] usually selected via cross-validation or more simply data splitting8 and the selected [value] minimizes an objective function which is often the Root Mean Square Error (RMSE) or sometimes the Mean Absolute Error (MAE)36. That being said, Guegan and Huck36 cautions about that practice by highlighting that: The estimation of $k$ via in sample predictions leads to choose high values, near or on the border of one has tabulated because the RMSE is a decreasing function of the number of neighbors36 A high value for the number of nearest neighbors is an erroneous usage of the [method] because the neighbors are thus not near the pattern they should mimic36, leading to (useless) forecasts very close to the mean of the sample36. Another direction is to adaptively choose the number of nearest neighbors $k$ […] depending on the test feature vector8. For example, Anava and Levy37 proposes solving an optimization problem to adaptively choose what $k$ to use for [a given feature vector] in an approach called $k^*$-NN8. In the specific context of $k$-NN-based supervised portfolios, and again to avoid choosing an explicit number of nearest neighbors, Varadi and Teed2 suggests to select a range38 of $k$’s to make [the] selection base more robust to potential changes in an “optimal” $k$ selection2. Surprisingly, it turns out that this method is an ensemble method similar in spirit to the method described in Hassanat et al.39 for $k$-NN classification, which consists in using a base $k$-NN classifier with $k=1,2,…,\lfloor \sqrt{n} \rfloor$ and to combine the $\lfloor \sqrt{n} \rfloor$ classification results using inverse logarithmic weights. Misc. remarks Importance of training dataset diversity Asymptotic convergence results for the $k$-NN regression algorithm guarantee that by increasing the amount of [training] data, […] the error probability gets arbitrarily close to the optimum for every training sequence9. But the amount of data available for training $k$-NN-based supervised portfolios is not infinite and might even in some cases be extremely limited40. In that case, there is a high risk that the training data is “unevenly balanced” in the feature space, a situation illustrated in Figure 6 in the case of a univariate feature whose underlying distribution is Gaussian. Figure 6. Univariate $k$-NN regression with low accuracy, Gaussian feature distribution, far away training data. Source: Chen and Shah. From Figure 6, it is clear that such a lack of training data - or more precisely, such a lack of diversity in the training data - would force the $k$-NN regression algorithm to use far away nearest neighbors, which would severely degrade the quality of the forecasted portfolio weights. So, particular attention must be paid to the size and the diversity of the training dataset when using $k$-NN-based supervised portfolios, with for example ad-hoc procedures used whenever needed to simulate past asset returns for assets without return histories (here) or to extend return histories for assets with shorter return histories than others (here). Avoiding the curse of dimensionality The number of features selected by Varadi and Teed2 grows quadratically with the number of assets. At some point41, the underlying $k$-NN regression algorithm will then inevitably face issues due to: Distance concentration, which is the tendency of distances between all pairs of points in high-dimensional data to become almost equal42 Poor discrimination of the nearest and farthest points for a given test point, which is an issue on top of the distance concentration problem, c.f. Beyer et al.43 Hubness42, defined as the emergence of points called hubs which appear overly similar to many others … In addition, the higher the number of features selected, the more training data is required to learn enough combinations of these different features, which further compounds the problem mentioned in the previous sub-section… All in all, that approach is not scalable but hopefully, a solution is also proposed in Varadi and Teed2: […] to explore multi-asset portfolios [without introducing the problem of dimensionality with too-large a feature space], we took the average weight of each security from a single-pair run, and averaged them across all pair runs. While this proposal may look like an ad-hoc workaround, it actually corresponds to an ensemble method that has been empirically shown to be effective for $k$-NN classification in high dimension in both: Domeniconi and Yan44, with a deterministic selection of features as in Varadi and Teed2 Bay45, with a random27 selection of features The underlying idea of that ensemble method is to exploit [the] instability of $k$-NN classifiers with respect to different choices of features to generate an effective and diverse set of NN classifiers with possibly uncorrelated errors44. Implementations Implementation in Portfolio Optimizer Portfolio Optimizer supports $k$-NN-based supervised portfolios through the endpoint /portfolios/optimization/supervised/nearest-neighbors-based. This endpoint supports 2 different distance metrics: The Euclidean distance matrix The Hassanat distance metric (default) As for the selection of the number of nearest neighbors, this endpoint supports: A manually-defined number of nearest neighbors A dynamically-determined number of nearest neighbors together with their individual weights through: The $k$-NN ensemble method of Hassanat et al.39 A proprietary variation of the $k^*$-NN method of Anava and Levy37 (default) Implementation elsewhere Chevalier et al.1 kindly provides a Python code to experiment with “gradient boosting decision trees”-based supervised portfolios. Example of usage - Learning maximum Sharpe ratio portfolios Because most portfolio allocation decisions for active portfolio managers revolve around the optimal allocation between stocks and bonds2, I propose to reproduce the results of Varadi and Teed2 in the case of a 2-asset class portfolio made of: U.S. equities, represented by the SPY ETF U.S. long-term Treasury bonds, represented by the TLT ETF Methodology Varadi and Teed2 follows the general procedure of Chevalier et al.1 to train a $k$-NN-based supervised portfolio allocation algorithm for learning portfolio weights maximizing the Sharpe ratio. For this, and without entering into the details: The selected features are asset returns, standard deviations and correlations over different28 past lookback periods, scaled through a specific normal distribution standardization The selected distance metric is the standard Euclidean distance The selected number of nearest neighbors is not a single value but a range of values related to the size of the training dataset38 The relevant initial training dates are the 2000 daily dates present in Varadi and Teed2’s dataset from 4/13/1976 minus 2000 days to 4/12/1976 The relevant subsequent training dates and test dates are all the daily dates present in Varadi and Teed2’s dataset from 4/13/1976 to 12/31/13 To be noted that the training data is used in a rolling window manner over a 2000-day lookback. The future horizon over which maximum Sharpe ratio portfolio weights are learned during the training phase and evaluated during the test phase is a 20-day horizon On my side: The selected features will be: Past 12-month asset arithmetic returns, cross-sectionally normalized using the procedure described in Almgren and Neil46 Future aggregated asset covariances forecasted over the next month using an exponentially weighted moving average covariance matrix forecasting model with daily squared (close-to-close) returns The selected distance metric will be the Hassanat distance This avoids the need for further features scaling. The selected number of nearest neighbors will be: 1 10 Dynamically determined with their individual weights by: The $k$-NN ensemble method of Hassanat et al.39 The $k^*$-NN method of Anava and Levy37 The relevant initial training dates will be all month-end dates present in a SPY/TLT ETFs-like training dataset from 1st January 1979 to 30 November 2003 Due to the relatively recent inception dates of both the SPY ETF (22nd January 1993) and the TLT ETF (22th July 2002), it is required to use proxies to extend the returns history of these assets: The daily U.S. market returns $Mkt$ provided in the Fama and French data library, as a proxy for the SPY ETF daily returns The simulated daily returns associated to the daily FRED 30-Year Treasury Constant Maturity Rates, as a proxy for the TLT ETF daily returns With these, the earliest date for which daily SPY/TLT ETFs-like returns are available is 16th February 1977; adding 1 year of data for computing the past 12-month returns gives 16th February 1978; rounded to 1st January 1979. The relevant subsequent training dates and test dates will be all month-end dates present in the SPY/TLT ETFs test dataset from 1st January 2004 to 28th February 202547 The earliest date for which daily SPY/TLT ETFs returns are available is 29th July 2002; adding 1 year of data for computing the past 12-month returns gives 29th July 2003; rounded to 1st January 2004. To be noted that the training data is used in an expanding window manner. As a consequence, the training dataset is made of 299 data points on 1st January 2004, expanding up to 552 data points on 28th February 2025 when the last forecast is made. This is in stark constrast with Varadi and Teed2’s training dataset which 1) contains 2000 data points and 2) is not expanding but is being rolled forward to keep the algorithm more robust to market changes in feature relevance2. As mentionned in a previous section, such a difference in quantity and in “local” diversity of the training dataset might impact my results v.s. those of Varadi and Teed2. The future horizon over which maximum Sharpe ratio portfolio weights are learned during the training phase and evaluated during the test phase is a 1-month horizon at daily level The risk free rate is set to 0% when computing maximum Sharpe ratio portfolio weights The cash portion of the different SPY/TLT portfolios - if any - is allocated to U.S. short-term Treasury bonds, represented by the SHY ETF Results Figure 7 compares the standard direct approach for maximizing the Sharpe ratio to the $k$-NN-based supervised portfolios approach, with the 4 choices of nearest neighbors proposed in the previous sub-section. In both cases, like in Varadi and Teed2, the same features are used in input of the two algorithms. Figure 7. MSR portfolio v.s. $k$-NN-based learned MSR portfolios, SPY/TLT ETFs, 1st January 2004 - 31th March 2025. Summmary statistics: Portfolio CAGR Average Exposure Annualized Sharpe Ratio Maximum (Monthly) Drawdown Maximum Sharpe ratio (MSR) ~5.6% ~61% ~0.73 ~14.4% $k$-NN learned MSR, $k=1$ ~6.4% ~49% ~0.86 ~14.4% $k$-NN learned MSR, $k=10$ ~5.2% ~51% ~0.99 ~15.8% $k$-NN learned MSR, $k=kEnsemble$ ~5.4% ~51% ~1.01 ~15.1% $k$-NN learned MSR, $k=k^*$ ~6.8% ~49% ~1.00 ~14.0% Comments A couple of comments are in order: Consistent with Varadi and Teed2, the results demonstrate that the [$k$-NN-based supervised portfolio allocation] approach tends to outperform [the direct] MVO portfolio allocation [approach] on a risk-adjusted basis2, with a Sharpe ratio ~18%-38% higher. This is quite interesting to highlight since the objective of the direct approach is supposed to be the maximization of the portfolio Sharpe ratio! The average exposure of the MSR portfolio is ~61% v.s. a relatively much lower exposure of ~50% for all the $k$-NN learned MSR portfolios The Sharpe ratio of all the $k$-NN learned MSR portfolios being higher than that of the MSR portfolio, it implies that the changes in exposure are pretty well “timed”. The $k$-NN learned MSR portfolios with $k=10$ and $k=kEnsemble$ are nearly identical This is confirmed by examining the underlying asset weights (not shown here). The $k$-NN ensemble portfolio has the advantage of not requiring to choose a specific value for $k$, though, and should definitely be prefered. The $k$-NN learned MSR portfolios with $k=1$ and $k=k^*$ are close in terms of raw performances, but not in terms of Sharpe ratio A closer look (not detailled here) reveals that this is because the $k$-NN learned MSR portfolios with $k=k^*$ regulary selects only 1 nearest neighbor when the other neighbors are too “far away” but also regularly selects many more neighbors when the other neighbors are “close enough”. I interpret this as an empirical demonstration of the ability of the $k^*$-NN method of Anava and Levy37 to adaptively choose the number of nearest neighbors $k$ […] depending on the test feature vector8. The maximum drawdowns are comparable across all portfolios This shows that the $k$-NN learned MSR portfolios, despite their attractive risk-adjusted performancess, are not able to magically avoid “dramatic” events. Another layer of risk management, better return predictors, or both, is probably needed for that. The winner of this horse race is the $k$-NN learned MSR portfolios with $k=k^*$, but this comes at a price in terms of turnover v.s. the $k$-NN learned MSR portfolios with $k=kEnsemble$ Also consistent with Varadi and Teed2, the asset weights of the $k$-NN learned MSR portfolio with $k=kEnsemble$ (and with $k=10$) are relatively stable and on average similar to an equal weight portfolio, while those of the MSR portfolio show considerable noise and turnover2. This is visible on the portfolio transition maps displayed in Figures 8 and 9. Figure 8. $k$-NN-based learned MSR portfolio, $k=kEnsemble$, SPY/TLT/SHY ETFs allocations through time, 1st January 2004 - 31th March 2025. Figure 9. MSR portfolio, SPY/TLT/SHY allocations through time, 1st January 2004 - 31th March 2025. For Varadi and Teed2, this demonstrates the general uncertainty of the portfolio indicator inputs in aggregate2 and that $k$-NN learned MSR portfolio with $k=kEnsemble$ manages to dynamically balance this uncertainty over time and shift more towards a probabilistic allocation that did not overweight or over-react to poor information2. This statement is slightly less applicable to the $k$-NN learned MSR portfolio with $k=k^*$, because its better raw performances are explained by a more aggressive allocation, resulting in a much higher turnover, as can be seen by comparing Figure 8 to Figure 10. Figure 10. $k$-NN-based learned MSR portfolio, $k=k^*$, SPY/TLT/SHY ETFs allocations through time, 1st January 2004 - 31th March 2025. Conclusion Exactly like in Varadi and Teed2, and despite the differences in implementation and in the size of the training dataset48: The results of this section shows that a traditional mean-variance/Markowitz/MPT framework under-performs [a $k$-NN-based supervised portfolio allocation] framework in terms of maximizing the Sharpe ratio2 The data further implies that traditional MPT makes far too many trades and takes on too many extreme positions as a function of how it is supposed to generate portfolio weights2 Varadi and Teed2 provides the following explanation: This occurs because the inputs - especially the returns - are very noisy and may also demonstrate non-linear or counter-intuitive relationships. In contrast, by learning how the inputs map historically to optimal portfolios at the asset level, the resulting [$k$-NN-based supervised portfolios] allocations drift in a more stable manner over time. Final conclusion Supervised portfolios as introduced in Chevalier et al.1 are able learn from past time series of in-sample optimal weights1 and to infer the best weights from variables such as past performance, risk, and proxies of the macro-economic outlook1. In this blog post, I empirically demonstrated that this capability allows one of their simplest embodiement - $k$-NN-based supervised portfolios - to outperform a traditional mean-variance framework that seeks to maximize the Sharpe ratio of a portfolio, which independently confirms the prior results of Varadi and Teed2. To keep discovering non-standard portfolio allocation frameworks, feel free to connect with me on LinkedIn or to follow me on Twitter. – See Chevalier, G., Coqueret, G., &amp;amp; Raffinot, T. (2022). Supervised portfolios. Quantitative Finance, 22(12), 2275–2295. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 &amp;#8617;10 &amp;#8617;11 &amp;#8617;12 &amp;#8617;13 &amp;#8617;14 &amp;#8617;15 &amp;#8617;16 &amp;#8617;17 &amp;#8617;18 &amp;#8617;19 &amp;#8617;20 &amp;#8617;21 &amp;#8617;22 &amp;#8617;23 &amp;#8617;24 &amp;#8617;25 &amp;#8617;26 &amp;#8617;27 &amp;#8617;28 &amp;#8617;29 &amp;#8617;30 &amp;#8617;31 &amp;#8617;32 &amp;#8617;33 See David Varadi, Jason Teed, Adaptive Portfolio Allocations, NAAIM paper. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 &amp;#8617;10 &amp;#8617;11 &amp;#8617;12 &amp;#8617;13 &amp;#8617;14 &amp;#8617;15 &amp;#8617;16 &amp;#8617;17 &amp;#8617;18 &amp;#8617;19 &amp;#8617;20 &amp;#8617;21 &amp;#8617;22 &amp;#8617;23 &amp;#8617;24 &amp;#8617;25 &amp;#8617;26 &amp;#8617;27 &amp;#8617;28 &amp;#8617;29 &amp;#8617;30 &amp;#8617;31 &amp;#8617;32 &amp;#8617;33 &amp;#8617;34 &amp;#8617;35 &amp;#8617;36 &amp;#8617;37 &amp;#8617;38 &amp;#8617;39 &amp;#8617;40 Varadi and Teed2 has been submitted to the 2014 NAAIM annual white paper competition known as the NAAIM Founders Award. &amp;#8617; To be noted that the data points could belong to a more generic space than $\mathbb{R}^m \times \mathbb{R}$. &amp;#8617; $\mathbb{R}^m$ is usually called the feature space. &amp;#8617; In practice, $d$ might not necessarily be a proper metric; for example, it might not satisfy the triangle inequality property like the cosine “distance”. &amp;#8617; &amp;#8617;2 For an historical perspective on the $k$-NN algorithm going beyond the usual technical report from Fix and Hodges49 and the seminal paper from Cover and Hart10, the interested reader is refered to Chen and Shah12, which mentions that the $k$-NN classification algorithm was already mentioned in a text from the early 11th century. &amp;#8617; See George H. Chen; Devavrat Shah, Explaining the Success of Nearest Neighbor Methods in Prediction , now, 2018. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 &amp;#8617;10 &amp;#8617;11 &amp;#8617;12 See Gerard Biau, Luc Devroye, Lectures on the Nearest Neighbor Method, Springer Series in the Data Sciences. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 See Cover, T. M. and P. E. Hart (1967). “Nearest neighbor pattern classification”. IEEE Transactions on Information Theory. &amp;#8617; &amp;#8617;2 Although $n$ must be sufficiently large in order for there to exist a $k$ that satisfies the conditions12 required by the main theorem of Jiang12. &amp;#8617; See Jiang, H. (2019). Non-Asymptotic Uniform Rates of Consistency for $k$-NN Regression. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 3999-4006. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 See S. Sun and R. Huang, “An adaptive k-nearest neighbor algorithm,” 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery, Yantai, China, 2010, pp. 91-94. &amp;#8617; See for example P.Y. Simard, Y. LeCun and J. Decker, “Efficient pattern recognition using a new transformation distance,” In Advances in Neural Information Processing Systems, vol. 6, 1993, pp. 50-58, in which the Euclidean distance between images of handwritten digits is replaced by an ad-hoc distance invariant with respect to geometric transformations of such images (rotation, translation, scaling, etc.). &amp;#8617; See Har-Peled, S., P. Indyk, and R. Motwani (2012). “Approximate Nearest eighbor: Towards Removing the Curse of Dimensionality.” Theory of Computing. &amp;#8617; See Kleinberg, J. M. (1997). “Two algorithms for nearest-neighbor search in high dimensions”. In: Symposium on Theory of Computing. &amp;#8617; For example, the end of each month for learning a monthly asset allocation strategy. &amp;#8617; A day, a week, a month, etc. &amp;#8617; Also called inference data, that is, data not “seen” during the training phase. &amp;#8617; Like budget constraints, asset weights constraints, asset group constraints, portfolio exposure constraints, etc. &amp;#8617; Chevalier et al.1 notes that these empirical results still hold when replacing boosted trees by simple regressions1. &amp;#8617; Developed equities, emerging equities, global corporate bonds, global government bonds. &amp;#8617; See Chevalier, Guillaume, Coqueret, Guillaume, Raffinot, Thomas, Interpretable Supervised Portfolios, The Journal of Financial Data Science Spring 2024, 6 (2) 10-34. &amp;#8617; &amp;#8617;2 &amp;#8617;3 See L. Gyorfi, F. Udina, and H. Walk. Nonparametric nearest neighbor based empirical portfolio selection strategies. Statistics &amp;amp; Decisions, International Mathematical Journal for Stochastic Methods and Models, 26(2):145–157, 2008. &amp;#8617; Unless a very specific variation of $k$-NN regression is used. &amp;#8617; &amp;#8617;2 At least from an algorithm aversion perspective. Nevertheless, there can be other benefits, c.f. Chevalier et al.23. &amp;#8617; Similar to random subspace optimization. &amp;#8617; &amp;#8617;2 1 month, 2 months, 3 months, 6 months and 12 months. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 See Avivit Levy, B. Riva Shalom, Michal Chalamish, A Guide to Similarity Measures, arXiv for a very long list of distance metrics. &amp;#8617; Or more generally, to most supervised machine learning algorithms. &amp;#8617; See Ishan Arora and Namit Khanduja and Mayank Bansal, Effect of Distance Metric and Feature Scaling on KNN Algorithm while Classifying X-rays, RIF, 2022. &amp;#8617; See Hassanat, A.B., 2014. Dimensionality Invariant Similarity Measure. Journal of American Science, 10(8), pp.221-26. &amp;#8617; To be noted that features scaling might still be performed to try to improve the performances of the $k$-NN regression algorithm. &amp;#8617; Other properties of the Hassanat distance are for example robustness to noise and linear growth with the dimension of the feature space, c.f. Abu Alfeilat et al.35. &amp;#8617; See Abu Alfeilat HA, Hassanat ABA, Lasassmeh O, Tarawneh AS, Alhasanat MB, Eyal Salman HS, Prasath VBS. Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review. Big Data. 2019 Dec;7(4):221-248. &amp;#8617; &amp;#8617;2 &amp;#8617;3 See Guegan, D. and Huck, N. (2005). On the Use of Nearest Neighbors in Finance. Finance, . 26(2), 67-86. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 See Oren Anava, Kfir Levy, k*-Nearest Neighbors: From Global to Local, Advances in Neural Information Processing Systems 29 (NIPS 2016). &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 In more details, Varadi and Teed2 chooses the $k$’s in percentages of the size of the training space, which were 5%, 10%, 15% and 20% resulting essentially in a weighted average of the top instances2. &amp;#8617; &amp;#8617;2 See Hassanat, A.B., Mohammad Ali Abbadi, Ghada Awad Altarawneh, Ahmad Ali Alhasanat, 2014. Solving the Problem of the K Parameter in the KNN Classifier Using an Ensemble Learning Approach. International Journal of Computer Science and Information Security, 12(8), pp.33-39. &amp;#8617; &amp;#8617;2 &amp;#8617;3 For example, due to the limited price history of some assets or due to the length of the desired horizon over which optimal portfolio weights need to be computed. &amp;#8617; Possibly for as few as 10-15 dimensions43! &amp;#8617; See Radovanovic, Milo and Nanopoulos, Alexandros and Ivanovic, Mirjana}, Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data, Journal of Machine Learning Research 11 (2010) 2487-2531. &amp;#8617; &amp;#8617;2 See Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U. (1999). When Is “Nearest Neighbor” Meaningful?. In: Beeri, C., Buneman, P. (eds) Database Theory — ICDT’99. ICDT 1999. Lecture Notes in Computer Science, vol 1540. Springer, Berlin, Heidelberg. &amp;#8617; &amp;#8617;2 See Domeniconi, C., &amp;amp; Yan, B. (2004). Nearest neighbor ensemble. In Pattern recognition, international conference on, Vol. 1 (pp. 228–231). Los Alamitos, CA, USA: IEEE Computer Society. &amp;#8617; &amp;#8617;2 See Bay S.D. Nearest neighbor classification from multiple feature subsets Intelligent Data Analysis, 3 (1999), pp. 191-209. &amp;#8617; See Almgren, Robert and Chriss, Neil A., Optimal Portfolios from Ordering Information (December 2004). &amp;#8617; (Adjusted) daily prices have have been retrieved using Tiingo. &amp;#8617; This empirically confirms that, desptite their dependency on the size of the training dataset, nearest neighbor methods can learn from a small set of examples45. &amp;#8617; See Discriminatory analysis, nonparametric discrimination: Consistency properties”. Technical report, USAF School of Aviation Medicine. &amp;#8617;</summary></entry><entry><title type="html">Correlation-Based Clustering: Spectral Clustering Methods</title><link href="https://portfoliooptimizer.io/blog/correlation-based-clustering-spectral-clustering-methods/" rel="alternate" type="text/html" title="Correlation-Based Clustering: Spectral Clustering Methods" /><published>2025-05-01T00:00:00-05:00</published><updated>2025-05-01T00:00:00-05:00</updated><id>https://portfoliooptimizer.io/blog/correlation-based-clustering-spectral-clustering-methods</id><content type="html" xml:base="https://portfoliooptimizer.io/blog/correlation-based-clustering-spectral-clustering-methods/">&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Cluster_analysis&quot;&gt;Clustering&lt;/a&gt; consists in &lt;em&gt;trying to identify groups of “similar behavior”&lt;/em&gt;&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; - called clusters - from a dataset, according to some chosen characteristics.&lt;/p&gt;

&lt;p&gt;An example of such a characteristic in finance is the correlation coefficient between two time series of asset returns, whose usage to partition a universe of assets into groups of “close” and “distant” assets 
thanks to a &lt;a href=&quot;https://en.wikipedia.org/wiki/Hierarchical_clustering&quot;&gt;hierarchical clustering method&lt;/a&gt; was originally&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; proposed in Mantegna&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In this blog post, I will describe two correlation-based clustering methods belonging to the family of &lt;em&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Spectral_clustering&quot;&gt;spectral clustering methods&lt;/a&gt;&lt;/em&gt;: the Blockbuster method introduced in Brownlees et al.&lt;sup id=&quot;fnref:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; 
and the SPONGE method introduced in Cucuringu et al.&lt;sup id=&quot;fnref:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;As examples of usage, I will discuss 1) how to automatically group U.S. stocks together without relying on external information like industry classification and 2) how to identify risk-on and risk-off assets within a U.S. centric universe of assets.&lt;/p&gt;

&lt;h2 id=&quot;mathematical-preliminaries&quot;&gt;Mathematical preliminaries&lt;/h2&gt;

&lt;p&gt;Let $x_1$, …, $x_n$ be $n$ points&lt;sup id=&quot;fnref:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:10&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; in $\mathbb{R}^m, m \geq 1$ to be partitioned into $k \geq 2$ subsets.&lt;/p&gt;

&lt;h3 id=&quot;spectral-clustering&quot;&gt;Spectral clustering&lt;/h3&gt;

&lt;p&gt;Spectral clustering, like other approaches to clustering - geometric approaches such as &lt;a href=&quot;https://en.wikipedia.org/wiki/K-means_clustering&quot;&gt;$k$-means &lt;/a&gt;, density-based approaches such as &lt;a href=&quot;https://fr.wikipedia.org/wiki/DBSCAN&quot;&gt;DBSCAN&lt;/a&gt;… - initially relies on pairwise similarities 
$s(x_i, x_j)$ between points $i,j=1..n$ with $s$ &lt;em&gt;some similarity function which is symmetric and non-negative&lt;/em&gt;&lt;sup id=&quot;fnref:1:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Once the corresponding similarity matrix $S_{ij} = s(x_i, x_j)$, $i,j=1…n$, has been computed, a spectral clustering method then usually follows a three-step process:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Compute an affinity matrix $A \in \mathbb{R}^{n \times n}$ from the similarity matrix $S$&lt;/p&gt;

    &lt;p&gt;The affinity matrix $A$ corresponds to the &lt;a href=&quot;https://en.wikipedia.org/wiki/Adjacency_matrix&quot;&gt;adjacency matrix&lt;/a&gt; of an underlying &lt;a href=&quot;https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)&quot;&gt;graph&lt;/a&gt; whose vertices represent the points $x_1$, …, $x_n$ and whose edges &lt;em&gt;model the local&lt;sup id=&quot;fnref:54&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:54&quot; class=&quot;footnote&quot;&gt;7&lt;/a&gt;&lt;/sup&gt; neighborhood relationships between the data points&lt;/em&gt;&lt;sup id=&quot;fnref:1:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Compute a matrix $L \in \mathbb{R}^{n \times n}$ derived from the affinity matrix $A$&lt;/p&gt;

    &lt;p&gt;Due to the relationship between spectral clustering and graph theory&lt;sup id=&quot;fnref:1:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, the matrix $L$ is typically called &lt;a href=&quot;https://en.wikipedia.org/wiki/Laplacian_matrix&quot;&gt;a Laplacian matrix&lt;/a&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Compute a matrix $Y \in \mathbb{R}^{n \times k}$ derived from the eigenvectors corresponding to the $k$ smallest (or sometimes largest) eigenvalues of the matrix $L$ and cluster its $n$ rows $y_1$, …, $y_n$ using the $k$-means algorihm&lt;/p&gt;

    &lt;p&gt;The result of that clustering represents the desired clustering of the original $n$ points $x_1$, …, $x_n$.&lt;/p&gt;

    &lt;p&gt;It should be emphasized that &lt;em&gt;there is nothing principled about using the $k$-means algorithm in this step&lt;/em&gt;&lt;sup id=&quot;fnref:1:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, c.f. for example Huang et al.&lt;sup id=&quot;fnref:33&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:33&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt; in which spectral rotations are used instead of $k$-means, but &lt;em&gt;one can argue that at least the Euclidean distance between the [rows of the matrix $Y$] is a meaningful quantity to look at&lt;/em&gt;&lt;sup id=&quot;fnref:1:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;the-ng-jordan-weiss-spectral-clustering-method&quot;&gt;The Ng-Jordan-Weiss spectral clustering method&lt;/h3&gt;

&lt;p&gt;Different ways of computing the affinity matrix $A$, the Laplacian matrix $L$ or the matrix $Y$ lead to different spectral clustering methods&lt;sup id=&quot;fnref:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:11&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;One popular spectral clustering method is the Ng-Jordan-Weiss (NJW) method&lt;sup id=&quot;fnref:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:9&quot; class=&quot;footnote&quot;&gt;10&lt;/a&gt;&lt;/sup&gt;, detailed in Figure 1 taken from Ng et al.&lt;sup id=&quot;fnref:9:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:9&quot; class=&quot;footnote&quot;&gt;10&lt;/a&gt;&lt;/sup&gt; where $\sigma^2$ is a scaling parameter that &lt;em&gt;controls how rapidly the affinity $A_{ij}$ falls off with the distance between $s_i$ and $s_j$&lt;/em&gt;&lt;sup id=&quot;fnref:9:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:9&quot; class=&quot;footnote&quot;&gt;10&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/correlation-spectral-clustering-njw-method.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/correlation-spectral-clustering-njw-method-small.png&quot; alt=&quot;Ng-Jordan-Weiss spectral clustering method. Source: Ng et al.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 1. Ng-Jordan-Weiss spectral clustering method. Source: Ng et al.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h3 id=&quot;rationale-behind-spectral-clustering&quot;&gt;Rationale behind spectral clustering&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;At first sight, [spectral clustering] seem to make little sense. Since we run $k$-means [on points $y_1$, …, $y_n$], why not just apply $k$-means directly to the [original points $x_1$, …, $x_n$]&lt;/em&gt;&lt;sup id=&quot;fnref:9:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:9&quot; class=&quot;footnote&quot;&gt;10&lt;/a&gt;&lt;/sup&gt;?&lt;/p&gt;

&lt;p&gt;A visual justification of spectral clustering is provided in Figure 2 adpated from Ng et al.&lt;sup id=&quot;fnref:9:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:9&quot; class=&quot;footnote&quot;&gt;10&lt;/a&gt;&lt;/sup&gt;, which displays data points in $\mathbb{R}^2$ forming two circles (e) and their associated spectral representation in $\mathbb{R}^2$ using the NJW spectral clustering method (h).&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/correlation-spectral-clustering-njw-circles.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/correlation-spectral-clustering-njw-circles-small.png&quot; alt=&quot;Data points forming two circles, 2D plane and spectral plane. Source: Ng et al.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 2. Data points forming two circles, 2D plane and spectral plane. Source: Ng et al.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;On Figure 2, it is clearly visible that the transformed data points (h) form two well-separated convex clusters in the spectral plane, which is the ideal situation for the $k$-means algorithm.&lt;/p&gt;

&lt;p&gt;More generally&lt;sup id=&quot;fnref:55&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:55&quot; class=&quot;footnote&quot;&gt;11&lt;/a&gt;&lt;/sup&gt;, it can be demonstrated that the change of representation from the original data points $x_1$, …, $x_n$ to the embedded data points $y_1$, …, $y_n$ &lt;em&gt;enhances the cluster-properties in the data, 
so that clusters can be trivially detected [by the $k$-means clustering algorithm] in the new representation&lt;/em&gt;&lt;sup id=&quot;fnref:1:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h2 id=&quot;correlation-based-spectral-clustering&quot;&gt;Correlation-based spectral clustering&lt;/h2&gt;

&lt;p&gt;Let $x_1$, …, $x_n$ be $n$ variables whose pairwise correlations&lt;sup id=&quot;fnref:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:13&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt; $\rho_{ij}$, $i,j=1…n$, have been assembled in a correlation matrix $C \in \mathbb{R}^{n \times n}$.&lt;/p&gt;

&lt;p&gt;Because correlation is a measure of dependency, a legitimate question is whether it is possible to use spectral clustering to partition these variables w.r.t. their correlations? 
Ideally, we would like to have highly correlated variables grouped together in the same clusters, with low or even negative correlations between the clusters themselves.&lt;/p&gt;

&lt;p&gt;Problem is, as noted in Mantegna&lt;sup id=&quot;fnref:3:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, &lt;em&gt;the correlation coefficient of a pair of [variables] cannot be used as a distance [or as a similarity] between the two [variables] because it does not fulfill the three axioms that define a metric&lt;/em&gt;&lt;sup id=&quot;fnref:3:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, which a priori excludes 
its direct usage in a similarity matrix or in an affinity matrix.&lt;/p&gt;

&lt;p&gt;That being said, &lt;em&gt;a&lt;sup id=&quot;fnref:14&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:14&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt; metric&lt;sup id=&quot;fnref:56&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:56&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt; can be defined using as distance a function of the correlation coefficient&lt;/em&gt;&lt;sup id=&quot;fnref:3:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; - like the distance $ d_{ij} = \sqrt{ 2 \left( 1 - \rho_{ij} \right) } $ -, which allows to indirectly use any spectral clustering method&lt;sup id=&quot;fnref:16&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:16&quot; class=&quot;footnote&quot;&gt;15&lt;/a&gt;&lt;/sup&gt; with correlations.&lt;/p&gt;

&lt;p&gt;But what if we insist to work directly with correlations? In this case, a couple of specific spectral clustering methods have been developped, among which:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The &lt;em&gt;Blockbuster&lt;/em&gt; spectral clustering method of Brownlees et al.&lt;sup id=&quot;fnref:4:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
  &lt;li&gt;The &lt;em&gt;SPONGE&lt;/em&gt; spectral clustering method of Cucuringu et al.&lt;sup id=&quot;fnref:5:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;the-blockbuster-spectral-clustering-method&quot;&gt;The Blockbuster spectral clustering method&lt;/h3&gt;

&lt;p&gt;Brownlees et al.&lt;sup id=&quot;fnref:4:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; introduces a network model&lt;sup id=&quot;fnref:22&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:22&quot; class=&quot;footnote&quot;&gt;16&lt;/a&gt;&lt;/sup&gt; under which &lt;em&gt;large panels of time series 
 are partitioned into latent groups such that correlation is higher within groups than between them&lt;/em&gt;&lt;sup id=&quot;fnref:4:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; and proposes an algorithm relying on the eigenvectors of the sample covariance matrix&lt;sup id=&quot;fnref:21&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:21&quot; class=&quot;footnote&quot;&gt;17&lt;/a&gt;&lt;/sup&gt; $\Sigma \in \mathbb{R}^{n \times n}$ to detect these groups.&lt;/p&gt;

&lt;p&gt;As can be seen in Figure 3 taken from Brownlees et al.&lt;sup id=&quot;fnref:4:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;, this algorithm - called &lt;em&gt;Blockbuster&lt;/em&gt; - is surprisingly very close to the NJW spectral clustering method.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/correlation-spectral-clustering-blockbuster-method.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/correlation-spectral-clustering-blockbuster-method-small.png&quot; alt=&quot;Blockbuster algorithm. Source:  Brownlees et al.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 3. Blockbuster algorithm. Source:  Brownlees et al.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Brownlees et al.&lt;sup id=&quot;fnref:4:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; establishes that the Blockbuster algorithm &lt;em&gt;consistently detects the [groups] when the number of observations $T$ and the dimension of the panel $n$ are sufficiently&lt;sup id=&quot;fnref:20&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:20&quot; class=&quot;footnote&quot;&gt;18&lt;/a&gt;&lt;/sup&gt; large&lt;/em&gt;&lt;sup id=&quot;fnref:4:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;, as long as 
a couple of assumptions are satisfied, like:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$T \geq n$, with &lt;em&gt;the more fat-tailed and dependent the data are, the larger $T$ has to be&lt;/em&gt;&lt;sup id=&quot;fnref:4:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Precision_(statistics)&quot;&gt;The precision matrix&lt;/a&gt; $\Sigma^{-1}$ exists and contains only non-positive entries (or its fraction of positive entries is appropriately controlled&lt;sup id=&quot;fnref:18&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:18&quot; class=&quot;footnote&quot;&gt;19&lt;/a&gt;&lt;/sup&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An important feature of the Blockbuster algorithm is that &lt;em&gt;it allows one to detect the [groups] without estimating the network structure of the data&lt;/em&gt;&lt;sup id=&quot;fnref:4:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;. In other words, while the Blockbuster algorithm is a spectral clustering method, nowhere is an affinity matrix computed!&lt;/p&gt;

&lt;p&gt;The black magic at work here is that the underlying network model of Brownlees et al.&lt;sup id=&quot;fnref:4:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; additionally assumes that the (symmetric normalized) Laplacian matrix $L$ is a function of the precision matrix $\Sigma^{-1}$ through&lt;/p&gt;

\[L = \frac{\sigma^2}{\phi} \Sigma^{-1} - \frac{1}{\phi} I_n\]

&lt;p&gt;, where:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$\sigma^2$ is a network variance parameter that does not influence the detection of the groups&lt;/li&gt;
  &lt;li&gt;$\phi$ is a network dependence parameter that does not influence the detection of the groups&lt;/li&gt;
  &lt;li&gt;$I_n \in \mathbb{R}^{n \times n}$ is the identity matrix of order $n$&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This formula clarifies the otherwise mysterious connection&lt;sup id=&quot;fnref:19&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:19&quot; class=&quot;footnote&quot;&gt;20&lt;/a&gt;&lt;/sup&gt; between the Blockbuster algorithm and the NJW method.&lt;/p&gt;

&lt;h3 id=&quot;the-sponge-spectral-clustering-method&quot;&gt;The SPONGE spectral clustering method&lt;/h3&gt;

&lt;p&gt;Cucuringu et al.&lt;sup id=&quot;fnref:5:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; extends the spectral clustering framework described in the previous section to the case of a signed affinity matrix $A$ whose underlying 
&lt;a href=&quot;https://en.wikipedia.org/wiki/Complete_graph&quot;&gt;complete weighted graph&lt;/a&gt; represents the variables $x_1$, …, $x_n$ and their pairwise correlations.&lt;/p&gt;

&lt;p&gt;Figure 4 taken from Jin et al.&lt;sup id=&quot;fnref:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt; illustrates the idea of Cucuringu et al.&lt;sup id=&quot;fnref:5:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;, which is &lt;em&gt;to minimize the number of violations in the constructed partition, with a violation, as in this figure, is when there are negative edges in a cluster and positive edges across clusters&lt;/em&gt;&lt;sup id=&quot;fnref:8:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/correlation-spectral-clustering-sponge-violations.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/correlation-spectral-clustering-sponge-violations-small.png&quot; alt=&quot;Illustration of the idea behind the SPONGE clustering algorithm - minimizing the number of violations in the constructed partition. Source: Jin et al.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 4. Illustration of the idea behind the SPONGE clustering algorithm - minimizing the number of violations in the constructed partition. Source: Jin et al.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;More formally, Cucuringu et al.&lt;sup id=&quot;fnref:5:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; proposes a three-step algorithm&lt;sup id=&quot;fnref:27&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:27&quot; class=&quot;footnote&quot;&gt;22&lt;/a&gt;&lt;/sup&gt; - called &lt;em&gt;SPONGE (Signed Positive Over Negative Generalized Eigenproblem)&lt;/em&gt; - that aims to &lt;em&gt;find a partition [of the underlying graph] into k clusters such that most edges within clusters are positive, 
and most edges across clusters are negative&lt;/em&gt;&lt;sup id=&quot;fnref:5:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Decompose the adjacency matrix as $A = A^+ - A^-$, with
    &lt;ul&gt;
      &lt;li&gt;$A^+ \in \mathbb{R}^{n \times n}$, with $A^+_{ij} = A_{ij}$ if $A_{ij} \geq 0$ and $A^+_{ij} = 0$ if $A_{ij} &amp;lt; 0$&lt;/li&gt;
      &lt;li&gt;$A^- \in \mathbb{R}^{n \times n}$, with $A^-_{ij} = A_{ij}$ if $A_{ij} \leq 0$ and $A^-_{ij} = 0$ if $A_{ij} &amp;gt; 0$&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Compute the (unnormalized) positive and negative Laplacian matrices $L^+$ and $L^-$, with
    &lt;ul&gt;
      &lt;li&gt;$L^+ \in \mathbb{R}^{n \times n}$, with $L^+ = D^+ - A^+$ and $D^+ \in \mathbb{R}^{n \times n}$ a diagonal matrix satisfying $D^+_{ii} = \sum_{j=1}^n A^+_{ij}$&lt;/li&gt;
      &lt;li&gt;$L^- \in \mathbb{R}^{n \times n}$, with $L^- = D^- - A^-$ and $D^- \in \mathbb{R}^{n \times n}$ a diagonal matrix satisfying $D^-_{ii} = \sum_{j=1}^n A^-_{ij}$&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Compute the matrix $Y \in \mathbb{R}^{n \times k}$ made of the eigenvectors corresponding to the $k$ smallest eigenvalues of the matrix $\left( L^- + \tau^+ D^+ \right)^{-1/2} \left( L^+ + \tau^- D^- \right) \left( L^- + \tau^+ D^+ \right)^{-1/2}$, where $\tau^+ &amp;gt; 0$ and $\tau^- &amp;gt; 0$ are regularization parameters&lt;sup id=&quot;fnref:26&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:26&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt;, and cluster its $n$ rows using the $k$-means algorihm&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cucuringu et al.&lt;sup id=&quot;fnref:5:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; establishes the consistency&lt;sup id=&quot;fnref:23&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;24&lt;/a&gt;&lt;/sup&gt; of the SPONGE algorithm in the case of $k = 2$ equally-sized clusters, provided &lt;em&gt;$\tau^-$ is sufficiently small&lt;sup id=&quot;fnref:24&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:24&quot; class=&quot;footnote&quot;&gt;25&lt;/a&gt;&lt;/sup&gt; compared to $\tau^+$&lt;/em&gt;&lt;sup id=&quot;fnref:5:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; and the number of variables $n$ is &lt;em&gt;sufficiently large&lt;sup id=&quot;fnref:57&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:57&quot; class=&quot;footnote&quot;&gt;26&lt;/a&gt;&lt;/sup&gt; for a clustering to be recoverable&lt;/em&gt;&lt;sup id=&quot;fnref:5:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In subsequent work, Cucuringu et al.&lt;sup id=&quot;fnref:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;27&lt;/a&gt;&lt;/sup&gt; establish the consistency&lt;sup id=&quot;fnref:23:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;24&lt;/a&gt;&lt;/sup&gt; of a variant of the SPONGE algorithm - called &lt;em&gt;symmetric&lt;sup id=&quot;fnref:25&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:25&quot; class=&quot;footnote&quot;&gt;28&lt;/a&gt;&lt;/sup&gt; SPONGE&lt;/em&gt; - in the case of $k \geq 2$ unequal-sized clusters &lt;em&gt;when $n$ is large enough&lt;/em&gt;&lt;sup id=&quot;fnref:6:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;27&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h2 id=&quot;how-to-choose-the-number-of-clusters-in-correlation-based-spectral-clustering&quot;&gt;How to choose the number of clusters in correlation-based spectral clustering?&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Choosing the number $k$ of clusters is a general problem for all clustering algorithms, and a variety of more or less successful methods have been devised for this problem&lt;/em&gt;&lt;sup id=&quot;fnref:1:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Correlation-based spectral clustering being 1) a clustering method 2) based on the spectrum of a matrix derived from 3) a correlation matrix, there are at least three families of methods which can be used to determine the “optimal” number of clusters:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Generic methods&lt;/li&gt;
  &lt;li&gt;Specific methods taylored to spectral clustering&lt;/li&gt;
  &lt;li&gt;Specific methods taylored to correlation-based clustering or to correlation-based factor analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, even if it is mathematically satisfying to find such an optimal number, it is important to keep in mind that &lt;em&gt;just because we find an [optimal] partition […] does not preclude the possibility of other good partitions&lt;/em&gt;&lt;sup id=&quot;fnref:32&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:32&quot; class=&quot;footnote&quot;&gt;29&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Indeed, as Wang and Rohe&lt;sup id=&quot;fnref:32:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:32&quot; class=&quot;footnote&quot;&gt;29&lt;/a&gt;&lt;/sup&gt; puts it:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;We must disabuse ourselves of the notion of “the correct partition”. Instead, there are several “reasonable partitions” some of these clusterings might be consistent with one another (as might be imagined in a hierarchical clustering), others might not be consistent.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3 id=&quot;generic-methods&quot;&gt;Generic methods&lt;/h3&gt;

&lt;p&gt;Any black box method to select an optimal number of clusters in a clustering algorithm - like &lt;em&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Silhouette_(clustering)&quot;&gt;the silhouette index&lt;/a&gt;&lt;/em&gt;&lt;sup id=&quot;fnref:29&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:29&quot; class=&quot;footnote&quot;&gt;30&lt;/a&gt;&lt;/sup&gt; or &lt;em&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set#The_gap_statistics&quot;&gt;the gap statistic&lt;/a&gt;&lt;/em&gt;&lt;sup id=&quot;fnref:28&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:28&quot; class=&quot;footnote&quot;&gt;31&lt;/a&gt;&lt;/sup&gt; - 
can be used in correlation-based spectral clustering.&lt;/p&gt;

&lt;p&gt;On an opinionated note, though, the &lt;em&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Elbow_method_(clustering)&quot;&gt;the elbow criterion&lt;/a&gt;&lt;/em&gt; should not be used, for reasons detailled in Schubert&lt;sup id=&quot;fnref:31&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:31&quot; class=&quot;footnote&quot;&gt;32&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h3 id=&quot;specific-methods-taylored-to-spectral-clustering&quot;&gt;Specific methods taylored to spectral clustering&lt;/h3&gt;

&lt;p&gt;In the context of spectral clustering, several specific methods for determining the optimal number of clusters have been proposed.&lt;/p&gt;

&lt;p&gt;The most well-known of these methods is &lt;em&gt;the eigengap heuristic&lt;/em&gt;&lt;sup id=&quot;fnref:1:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, whose aim is &lt;em&gt;to choose the number $k$ such that all eigenvalues $\lambda_1$,…,$\lambda_k$ [of the Laplacian matrix] are very small, but $\lambda_{k+1}$ is relatively large&lt;/em&gt;&lt;sup id=&quot;fnref:1:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. In practice, 
this method &lt;em&gt;works well if the clusters in the data are very well pronounced&lt;/em&gt;&lt;sup id=&quot;fnref:1:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. Nevertheless, &lt;em&gt;the more noisy or overlapping the clusters are, the less effective is this heuristic&lt;/em&gt;&lt;sup id=&quot;fnref:1:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, which is obviously a problem for financial applications&lt;sup id=&quot;fnref:30&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:30&quot; class=&quot;footnote&quot;&gt;33&lt;/a&gt;&lt;/sup&gt;. 
Furthermore, &lt;em&gt;just because there is a gap, it doesn’t mean that the rest of the eigenvectors are noise&lt;/em&gt;&lt;sup id=&quot;fnref:32:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:32&quot; class=&quot;footnote&quot;&gt;29&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Another popular method is the method used in the self-tuning spectral clustering algorithm of Zelnik-Manor and Perona&lt;sup id=&quot;fnref:34&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:34&quot; class=&quot;footnote&quot;&gt;34&lt;/a&gt;&lt;/sup&gt;, which selects the optimal number of clusters as the number allowing to “best” align the (rows of the) eigenvectors of the Laplacian matrix with the vectors of the canonical basis of $\mathbb{R}^{k}$.&lt;/p&gt;

&lt;h3 id=&quot;specific-methods-taylored-to-correlation-based-clustering-or-to-correlation-based-factor-analysis&quot;&gt;Specific methods taylored to correlation-based clustering or to correlation-based factor analysis&lt;/h3&gt;

&lt;p&gt;Correlation-based clustering, whether spectral or not, revolves around a very special object - a correlation matrix -, whose properties can be used to find the optimal number of clusters.&lt;/p&gt;

&lt;h4 id=&quot;random-matrix-theory-based-methods&quot;&gt;Random matrix theory-based methods&lt;/h4&gt;

&lt;p&gt;From &lt;a href=&quot;https://en.wikipedia.org/wiki/Random_matrix&quot;&gt;random matrix theory&lt;/a&gt;, the distribution of the eigenvalues of a large random correlation matrix follows a particular distribution - called &lt;a href=&quot;https://en.wikipedia.org/wiki/Marchenko%E2%80%93Pastur_distribution&quot;&gt;the Marchenko-Pastur distribution&lt;/a&gt; - 
independent&lt;sup id=&quot;fnref:35&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:35&quot; class=&quot;footnote&quot;&gt;35&lt;/a&gt;&lt;/sup&gt; of the underlying observations.&lt;/p&gt;

&lt;p&gt;That distribution involves a threshold $\lambda_{+} \geq 0$ beyond which it is not expected to find any eigenvalue in a random correlation matrix.&lt;/p&gt;

&lt;p&gt;Laloux et al.&lt;sup id=&quot;fnref:36&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:36&quot; class=&quot;footnote&quot;&gt;36&lt;/a&gt;&lt;/sup&gt; uses this threshold to define a correlation matrix denoising procedure called the &lt;em&gt;eigenvalues clipping method&lt;/em&gt;, c.f. &lt;a href=&quot;/blog/correlation-matrices-denoising-results-from-random-matrix-theory&quot;&gt;a previous blog post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the context at hand, Jin et al.&lt;sup id=&quot;fnref:8:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt; proposes to determine the optimal number of clusters as the number of &lt;em&gt;eigenvalues of the correlation matrix that exceed [this threshold], which are the eigenvalues associated with dominant factors or patterns in the [original data]&lt;/em&gt;&lt;sup id=&quot;fnref:8:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;To illustrate this method, Figure 5, taken from Laloux et al.&lt;sup id=&quot;fnref:36:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:36&quot; class=&quot;footnote&quot;&gt;36&lt;/a&gt;&lt;/sup&gt;, depicts two different Marchenko-Pastur distributions fitted to a correlation matrix of&lt;sup id=&quot;fnref:37&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:37&quot; class=&quot;footnote&quot;&gt;37&lt;/a&gt;&lt;/sup&gt; 406 stocks belonging to the S&amp;amp;P 500 index.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/correlation-spectral-clustering-rmt.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/correlation-spectral-clustering-rmt-small.png&quot; alt=&quot;Smoothed density of the eigenvalues of the correlation matrix of 406 assets belonging to the S&amp;amp;P 500 and two fitted Marchenko-Pastur distributions, 1991 – 1996. Source: Laloux et al.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 5. Smoothed density of the eigenvalues of the correlation matrix of 406 assets belonging to the S&amp;amp;P 500 and two fitted Marchenko-Pastur distributions, 1991 – 1996. Source: Laloux et al.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;From this figure, the number of clusters corresponding to the method of Jin et al.&lt;sup id=&quot;fnref:8:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt; when applied to the dotted Marchenko-Pastur distribution would be around 15.&lt;/p&gt;

&lt;h4 id=&quot;non-linear-shrinkage-based-methods&quot;&gt;Non-linear shrinkage-based methods&lt;/h4&gt;

&lt;p&gt;De Nard&lt;sup id=&quot;fnref:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt; notes that &lt;em&gt;non-linear shrinkage [of the sample covariance matrix] pushes the sample eigenvalues toward their closest and most numerous neighbors, thus toward (local) cluster&lt;/em&gt;&lt;sup id=&quot;fnref:7:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;As a consequence, it should be possible to &lt;em&gt;use the information from non-linear shrinkage — the number of
jumps in the shrunk eigenvalues — or directly the distribution of the sample eigenvalues — the number of sample eigenvalue clusters — to obtain&lt;/em&gt;&lt;sup id=&quot;fnref:7:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt; the optimal number of clusters.&lt;/p&gt;

&lt;p&gt;This method is illustrated in Figure 6 taken from de Nard&lt;sup id=&quot;fnref:7:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt;, on which 2 clusters are visible.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/correlation-spectral-clustering-nl-shrinkage.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/correlation-spectral-clustering-nl-shrinkage-small.png&quot; alt=&quot;Distribution of the sample eigenvalues of a covariance matrix and optimal number of groups corresponding to the centroids of that distribution (0.8 and 2) Source: de Nard.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 6. Distribution of the sample eigenvalues of a covariance matrix and optimal number of groups corresponding to the centroids of that distribution (0.8 and 2). Source: de Nard.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;From a practical perspective, de Nard&lt;sup id=&quot;fnref:7:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt; suggests to use 1D &lt;a href=&quot;https://en.wikipedia.org/wiki/Kernel_density_estimation&quot;&gt;KDE clustering&lt;/a&gt; in order to determine the number of sample eigenvalue clusters.&lt;/p&gt;

&lt;h4 id=&quot;factor-analysis-based-methods&quot;&gt;Factor analysis-based methods&lt;/h4&gt;

&lt;p&gt;Although the ultimate goal of cluster analysis and &lt;a href=&quot;https://en.wikipedia.org/wiki/Factor_analysis&quot;&gt;factor analysis&lt;/a&gt; is different, &lt;em&gt;the underlying logic of both techniques is dimension reduction (i.e., summarizing information on multiple 
variables into just a few variables)&lt;/em&gt;&lt;sup id=&quot;fnref:38&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:38&quot; class=&quot;footnote&quot;&gt;39&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Based on this similarity, de Nard&lt;sup id=&quot;fnref:7:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt; discusses the &lt;em&gt;eigenvalue ratio estimator&lt;/em&gt;&lt;sup id=&quot;fnref:39&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:39&quot; class=&quot;footnote&quot;&gt;40&lt;/a&gt;&lt;/sup&gt; of Ahn and Horenstein&lt;sup id=&quot;fnref:39:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:39&quot; class=&quot;footnote&quot;&gt;40&lt;/a&gt;&lt;/sup&gt; that consists in &lt;em&gt;maximizing the ratio of two adjacent eigenvalues [of the sample correlation matrix&lt;sup id=&quot;fnref:40&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:40&quot; class=&quot;footnote&quot;&gt;41&lt;/a&gt;&lt;/sup&gt;] to determine the number of 
factors (here clusters)&lt;/em&gt;&lt;sup id=&quot;fnref:7:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt; in &lt;em&gt;economic or financial data&lt;/em&gt;&lt;sup id=&quot;fnref:39:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:39&quot; class=&quot;footnote&quot;&gt;40&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Unfortunately, the eigenvalue ratio estimator &lt;em&gt;often cannot identify any cluster [in a large universe of U.S. stocks] and sets $k = 1$&lt;/em&gt;&lt;sup id=&quot;fnref:7:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt;, which corresponds to the situation described in Ahn and Horenstein&lt;sup id=&quot;fnref:39:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:39&quot; class=&quot;footnote&quot;&gt;40&lt;/a&gt;&lt;/sup&gt; where &lt;em&gt;one factor 
[- here, the market factor -] has extremely strong explanatory power for response variables&lt;/em&gt;&lt;sup id=&quot;fnref:39:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:39&quot; class=&quot;footnote&quot;&gt;40&lt;/a&gt;&lt;/sup&gt;. In such a situation, the &lt;em&gt;growth ratio estimator&lt;/em&gt;&lt;sup id=&quot;fnref:39:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:39&quot; class=&quot;footnote&quot;&gt;40&lt;/a&gt;&lt;/sup&gt; of Ahn and Horenstein&lt;sup id=&quot;fnref:39:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:39&quot; class=&quot;footnote&quot;&gt;40&lt;/a&gt;&lt;/sup&gt; - this time maximizing the ratio of the logarithmic growth rate of 
two consecutive eigenvalues to determine the number of factors - should be used instead, because it empirically appears to be able to &lt;em&gt;mitigate the effect of the dominant factor&lt;/em&gt;&lt;sup id=&quot;fnref:39:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:39&quot; class=&quot;footnote&quot;&gt;40&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Yet another estimator of the number of common factors is the &lt;em&gt;adjusted correlation thresholding estimator&lt;/em&gt;&lt;sup id=&quot;fnref:41&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:41&quot; class=&quot;footnote&quot;&gt;42&lt;/a&gt;&lt;/sup&gt; of Fan et al.&lt;sup id=&quot;fnref:41:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:41&quot; class=&quot;footnote&quot;&gt;42&lt;/a&gt;&lt;/sup&gt;, which determines the number of common factors as &lt;em&gt;the number of eigenvalues greater than 1 of the population correlation matrix […], 
taking into account the sampling variabilities and biases of top sample eigenvalues&lt;/em&gt;&lt;sup id=&quot;fnref:41:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:41&quot; class=&quot;footnote&quot;&gt;42&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h2 id=&quot;implementation-in-portfolio-optimizer&quot;&gt;Implementation in Portfolio Optimizer&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Portfolio Optimizer&lt;/strong&gt; implements correlation-based spectral clustering through the endpoint &lt;a href=&quot;https://docs.portfoliooptimizer.io/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/assets/clustering/spectral/correlation-based&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This endpoint supports three different methods:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The Blockbuster spectral clustering method&lt;/li&gt;
  &lt;li&gt;The SPONGE spectral clustering method&lt;/li&gt;
  &lt;li&gt;The symmetric SPONGE spectral clustering method (default)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As for the number of clusters to use, this endpoint supports:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;A manually-defined number of clusters&lt;/li&gt;
  &lt;li&gt;An automatically-determined number of clusters, computed through a proprietary variation of &lt;a href=&quot;https://en.wikipedia.org/wiki/Parallel_analysis&quot;&gt;Horn’s parallel analysis method&lt;/a&gt;&lt;sup id=&quot;fnref:42&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:42&quot; class=&quot;footnote&quot;&gt;43&lt;/a&gt;&lt;/sup&gt; (default)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a side note, Horn’s parallel analysis method seems to be little known in finance, but &lt;em&gt;numerous studies [in psychology] have consistently shown that [it] 
is the most nearly accurate methodology for determining the number of factors to retain in an exploratory factor analysis&lt;/em&gt;&lt;sup id=&quot;fnref:43&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:43&quot; class=&quot;footnote&quot;&gt;44&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h2 id=&quot;examples-of-usage&quot;&gt;Examples of usage&lt;/h2&gt;

&lt;h3 id=&quot;automated-clustering-of-us-stocks&quot;&gt;Automated clustering of U.S. stocks&lt;/h3&gt;

&lt;p&gt;De Nard&lt;sup id=&quot;fnref:7:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt; and Jin et al.&lt;sup id=&quot;fnref:8:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt; both study the automated clustering of a dynamic universe of U.S. stocks through correlation-based&lt;sup id=&quot;fnref:44&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:44&quot; class=&quot;footnote&quot;&gt;45&lt;/a&gt;&lt;/sup&gt; spectral clustering:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;De Nard&lt;sup id=&quot;fnref:7:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt;, in the context of covariance matrix shrinkage&lt;/p&gt;

    &lt;p&gt;It uses the Blockbuster method, together with a 1D KDE clustering method to automatically determine the number of clusters.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Jin et al.&lt;sup id=&quot;fnref:8:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt;, in the context of the construction of statistical arbitrage portfolios&lt;/p&gt;

    &lt;p&gt;It uses the SPONGE and SPONGE symmetric methods, together with a Marchenko-Pastur distribution-based method&lt;sup id=&quot;fnref:45&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:45&quot; class=&quot;footnote&quot;&gt;46&lt;/a&gt;&lt;/sup&gt; to automatically determine the number of clusters.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An important remark at this point.&lt;/p&gt;

&lt;p&gt;When clustering stocks, it is possible to rely on what de Nard&lt;sup id=&quot;fnref:7:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt; calls &lt;em&gt;external information&lt;/em&gt;&lt;sup id=&quot;fnref:7:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt;, for example industry classifications like:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://fr.wikipedia.org/wiki/Standard_Industrial_Classification&quot;&gt;The Standard Industrial Classification&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.msci.com/our-solutions/indexes/gics&quot;&gt;The MSCI Global Industry Classification Standard&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nevertheless, such an external information &lt;em&gt;may fail to create (a valid number of) homogeneous groups&lt;/em&gt;&lt;sup id=&quot;fnref:7:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In the words of de Nard&lt;sup id=&quot;fnref:7:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;For example, if we cluster the covariance matrix into groups of financial and nonfinancial firms, arguably, there will be some nonfinancial firm(s) that has (have) some stock characteristics more similar to the financial stocks as to the non-financial stocks and is (are) therefore misclassified. Especially in large dimension one would expect a few of such misclassifications in both directions. To overcome this misclassification problem and to really create homogeneous groups, [a data-driven procedure should be used instead].&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Results of de Nard&lt;sup id=&quot;fnref:7:14&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt; and Jin et al.&lt;sup id=&quot;fnref:8:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt; are the following:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Within a universe of 500 stocks over the period January 1998 to December 2017, de Nard&lt;sup id=&quot;fnref:7:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt; finds that the number of clusters is limited to only 1 or 2 about 75% of the time.&lt;/li&gt;
  &lt;li&gt;Within a universe of ~600 stocks over the period January 2000 to December 2022, Jin et al.&lt;sup id=&quot;fnref:8:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt; finds that the number of clusters is relatively stable around 19, even though it tends to drop &lt;em&gt;during financial hardships of the United States&lt;/em&gt;&lt;sup id=&quot;fnref:8:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt; down to a minimum of 8, as displayed in Figure 7 adapted from Jin et al.&lt;sup id=&quot;fnref:8:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/correlation-spectral-clustering-rmt-number-of-clusters.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/correlation-spectral-clustering-rmt-number-of-clusters-small.png&quot; alt=&quot;Evolution of the number of clusters found by a Marchenko-Pastur distribution-based method, January 2000 - December 2022. Source: Jin et al.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 7. Evolution of the number of clusters found by a Marchenko-Pastur distribution-based method, January 2000 - December 2022. Source: Jin et al.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;These diverging results on closely related data sets show that different methods to determine the number of clusters to use in clustering analysis might be completely at odds with one another&lt;sup id=&quot;fnref:46&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:46&quot; class=&quot;footnote&quot;&gt;47&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;But these results also show that the “optimal” number of clusters - whatever it is - is not constant over time and that &lt;em&gt;methods that dynamically determine [it] can capture changes in market dynamics, especially when there is significant downside risks in the market&lt;/em&gt;&lt;sup id=&quot;fnref:8:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt;. 
Incidentally, this is another argument in favor of not relying on (slowly evolving) external information to determine the number of clusters to use.&lt;/p&gt;

&lt;h3 id=&quot;identification-of-risk-onrisk-off-assets-in-a-us-centric-universe-of-assets&quot;&gt;Identification of risk-on/risk-off assets in a U.S.-centric universe of assets&lt;/h3&gt;

&lt;p&gt;In &lt;a href=&quot;https://en.wikipedia.org/wiki/2025_stock_market_crash&quot;&gt;April 2025&lt;/a&gt;, “risk on/risk off” has been making the headlines as a phrase &lt;em&gt;describing investment and asset price behavior&lt;/em&gt;&lt;sup id=&quot;fnref:48&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:48&quot; class=&quot;footnote&quot;&gt;48&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Lee&lt;sup id=&quot;fnref:48:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:48&quot; class=&quot;footnote&quot;&gt;48&lt;/a&gt;&lt;/sup&gt; summarizes the underlying meaning as follows&lt;sup id=&quot;fnref:48:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:48&quot; class=&quot;footnote&quot;&gt;48&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;[…] risk on/risk off generally refers to an investment environment in which asset price behavior is largely driven by how risk appetite advances or retreats over time, usually in a synchronized way at a faster than normal pace across global regions and assets.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And describes it in more details as follows&lt;sup id=&quot;fnref:48:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:48&quot; class=&quot;footnote&quot;&gt;48&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Depending on the environment, investors tend to buy or sell risky assets across the board, paying less attention to the unique characteristics of these assets. Volatilities and, most noticeably, correlations of assets that are perceived as risky will jump, particularly during risk-off periods, inspiring comments such as “correlations go to one” during a crisis. Assets, such as U.S. Treasury bonds, and currencies, such as the Japanese yen, tend to move in the opposite direction of risky assets and are generally perceived as the safer assets to hold in the event of a flight to safety.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In this sub-section, I propose to study the capability of correlation-based spectral clustering to identify risk-on/risk-off assets in a U.S.-centric universe of assets.&lt;/p&gt;

&lt;p&gt;For this:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;I will use as a universe of assets 13 ETFs representative&lt;sup id=&quot;fnref:50&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:50&quot; class=&quot;footnote&quot;&gt;49&lt;/a&gt;&lt;/sup&gt; of misc. asset classes:
    &lt;ul&gt;
      &lt;li&gt;U.S. stocks (SPY ETF)&lt;/li&gt;
      &lt;li&gt;European stocks (EZU ETF)&lt;/li&gt;
      &lt;li&gt;Japanese stocks (EWJ ETF)&lt;/li&gt;
      &lt;li&gt;Emerging markets stocks (EEM ETF)&lt;/li&gt;
      &lt;li&gt;U.S. REITs (VNQ ETF)&lt;/li&gt;
      &lt;li&gt;International REITs (RWX ETF)&lt;/li&gt;
      &lt;li&gt;Cash (SHY ETF)&lt;/li&gt;
      &lt;li&gt;U.S. 7-10 year Treasuries (IEF ETF)&lt;/li&gt;
      &lt;li&gt;U.S. 20+ year Treasuries (TLT ETF)&lt;/li&gt;
      &lt;li&gt;U.S. Investment Grade Corporate Bonds(LQD ETF)&lt;/li&gt;
      &lt;li&gt;U.S. High Yield Corporate Bonds (HYG ETF)&lt;/li&gt;
      &lt;li&gt;Commodities (DBC ETF)&lt;/li&gt;
      &lt;li&gt;Gold (GLD ETF)&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;I will compute the correlation matrix of the daily returns of these assets over the period 1st April 2025 - 30th April 2025&lt;sup id=&quot;fnref:52&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:52&quot; class=&quot;footnote&quot;&gt;50&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

    &lt;p&gt;Figure 8 displays that correlation matrix using a planar &lt;a href=&quot;https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding&quot;&gt;t-SNE representation&lt;/a&gt; of its rows, considered as points in $\mathbb{R}^{13}$.&lt;/p&gt;

    &lt;figure&gt;
      &lt;a href=&quot;/assets/images/blog/correlation-spectral-clustering-eaam-corr-mat.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/correlation-spectral-clustering-eaam-corr-mat-small.png&quot; alt=&quot;t-SNE representation of the asset correlation matrix, April 2025.&quot; /&gt;&lt;/a&gt;
      &lt;figcaption&gt;Figure 8. t-SNE representation of the asset correlation matrix, April 2025.&lt;/figcaption&gt;
  &lt;/figure&gt;

    &lt;p&gt;To be noted that the t-SNE representation of Figure 8 is a bit misleading in the context of spectral clustering, because spectral clustering does not operate in the t-SNE plane. Still, it is an helpful representation to give a sense of asset “closeness” in terms of correlations.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;I will cluster these assets using the symmetric SPONGE spectral clustering method with $k = 2$ and $k = 3$ clusters, the rationale being that:
    &lt;ul&gt;
      &lt;li&gt;All the assets clustered together with Cash should correspond to risk-off assets ($k = 2,3$)&lt;/li&gt;
      &lt;li&gt;All the assets clustered together with U.S. stocks should correspond to risk-on assets ($k = 2,3$)&lt;/li&gt;
      &lt;li&gt;All the assets clustered in the remaining cluster should correspond to a another category to be determined ($k = 3$ only)&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Figures 9 and 10 show the resulting clusterings.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/correlation-spectral-clustering-spongesym-eaam-two-clusters.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/correlation-spectral-clustering-spongesym-eaam-two-clusters-small.png&quot; alt=&quot;Symmetric SPONGE clustering of the asset correlation matrix, two clusters, April 2025.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 9. Symmetric SPONGE clustering of the asset correlation matrix, two clusters, April 2025.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/correlation-spectral-clustering-spongesym-eaam-three-clusters.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/correlation-spectral-clustering-spongesym-eaam-three-clusters-small.png&quot; alt=&quot;Symmetric SPONGE clustering of the asset correlation matrix, three clusters, April 2025.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 10. Symmetric SPONGE clustering of the asset correlation matrix, three clusters, April 2025.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;From these figures, and over the considered period:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Intermediate-term U.S. Treasuries is found to be a risk-off asset when $k = 2$&lt;/li&gt;
  &lt;li&gt;Gold is found to be a risk-off asset when $k = 3$, with Intermediate-term and Long-term U.S. Treasuries now moved into their own cluster&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The evolution of the classification of Gold as a risk-off asset is particularly interesting&lt;sup id=&quot;fnref:53&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:53&quot; class=&quot;footnote&quot;&gt;51&lt;/a&gt;&lt;/sup&gt; and also serves to illustrate one of the limitations of the t-SNE representation of the asset correlation matrix.&lt;/p&gt;

&lt;p&gt;Indeed, from that representation, it makes no sense that Cash and Gold could belong to the same cluster since they are complete opposites in the t-SNE plane! But in reality, if the t-SNE plane is diagonally “folded”, Cash and Gold truly are close neighbors…&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;This (abruptly?) concludes this already too long overview of correlation-based spectral clustering.&lt;/p&gt;

&lt;p&gt;Waiting for the next blog post on correlation-based clustering, feel free to &lt;a href=&quot;https://www.linkedin.com/in/roman-rubsamen/&quot;&gt;connect with me on LinkedIn&lt;/a&gt; or to &lt;a href=&quot;https://twitter.com/portfoliooptim&quot;&gt;follow me on Twitter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;–&lt;/p&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://link.springer.com/article/10.1007/s11222-007-9033-z&quot;&gt;von Luxburg, U. A tutorial on spectral clustering. Stat Comput 17, 395–416 (2007)&lt;/a&gt;. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:1:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;11&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;12&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.cambridge.org/core/books/introduction-to-econophysics/6A2727FE42578790E6E1021B7955EE30&quot;&gt;R. N. Mantegna, H. E. Stanley, Introduction to econophysics: correlations and complexity in finance, Cambridge university press, 1999&lt;/a&gt;. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://link.springer.com/article/10.1007/s100510050929&quot;&gt;Mantegna, R.N. Hierarchical structure in financial markets. Eur. Phys. J. B 11, 193–197 (1999)&lt;/a&gt;. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:3:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:4&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.tandfonline.com/doi/abs/10.1080/07350015.2020.1798241&quot;&gt;Brownlees, C., Guðmundsson, G. S., &amp;amp; Lugosi, G. (2020). Community Detection in Partial Correlation Network Models. Journal of Business &amp;amp; Economic Statistics, 40(1), 216–226&lt;/a&gt;. &lt;a href=&quot;#fnref:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:4:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:4:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:4:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:4:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:4:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:4:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:4:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:4:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:4:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:5&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://proceedings.mlr.press/v89/cucuringu19a.html&quot;&gt;Mihai Cucuringu, Peter Davies, Aldo Glielmo, and Hemant Tyagi. SPONGE: A generalized eigenproblem for clustering signed networks. In Artificial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pages 1088–1098. PMLR, 2019&lt;/a&gt;. &lt;a href=&quot;#fnref:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:5:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:5:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:5:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:5:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:5:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:5:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:5:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:5:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:10&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;$x_1, …, x_n$ can be more general “objects”, as long as a distance between these objects is defined. &lt;a href=&quot;#fnref:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:54&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;As opposed to their global relationships that already available from the similarity matrix $S$. &lt;a href=&quot;#fnref:54&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:33&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://dl.acm.org/doi/10.5555/2891460.2891520&quot;&gt;Jin Huang, Feiping Nie, and Heng Huang. 2013. Spectral rotation versus K-means in spectral clustering. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence (AAAI’13). AAAI Press, 431–437&lt;/a&gt;. &lt;a href=&quot;#fnref:33&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:11&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The similarity function $s$ is also very important in spectral clustering, although &lt;em&gt;ultimately, the choice of the similarity function depends on the domain the data comes from, and no general advice can be given&lt;/em&gt;&lt;sup id=&quot;fnref:1:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:9&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://dl.acm.org/doi/10.5555/2980539.2980649&quot;&gt;Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. 2001. On spectral clustering: analysis and an algorithm. In Proceedings of the 15th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS’01). MIT Press, Cambridge, MA, USA, 849–856&lt;/a&gt;. &lt;a href=&quot;#fnref:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:9:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:9:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:9:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:9:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:55&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Thanks to the properties of the Laplacian matrix $L$. &lt;a href=&quot;#fnref:55&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:13&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Typically the &lt;a href=&quot;https://en.wikipedia.org/wiki/Pearson_correlation_coefficient&quot;&gt;Pearson correlation&lt;/a&gt;, but other measures like the &lt;a href=&quot;https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient&quot;&gt;Spearman correlation&lt;/a&gt; or the &lt;a href=&quot;/blog/the-gerber-statistic-a-robust-co-movement-measure-for-correlation-matrix-estimation&quot;&gt;Gerber correlation&lt;/a&gt; can also be used as long as the resulting correlation matrix $C$ is &lt;a href=&quot;/blog/when-a-correlation-matrix-is-not-a-correlation-matrix-the-nearest-correlation-matrix-problem&quot;&gt;a valid correlation matrix&lt;/a&gt;. &lt;a href=&quot;#fnref:13&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:14&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;It is actually possible to define a whole family of metrics as function of the Pearson or Spearman correlation coefficient, c.f. van Dongen and Enright&lt;sup id=&quot;fnref:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:15&quot; class=&quot;footnote&quot;&gt;52&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:14&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:56&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Such correlation-based metrics are also used elsewere in finance, for example in de Prado’s&lt;sup id=&quot;fnref:17&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:17&quot; class=&quot;footnote&quot;&gt;53&lt;/a&gt;&lt;/sup&gt; &lt;a href=&quot;/blog/hierarchical-risk-parity-introducing-graph-theory-and-machine-learning-in-portfolio-optimizer/&quot;&gt;&lt;em&gt;Hierarchical Risk Parity&lt;/em&gt; portfolio optimization algorithm&lt;/a&gt;. &lt;a href=&quot;#fnref:56&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:16&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;To be noted that depending on the exact spectral clustering method used, it might be required to first convert that distance to a &lt;a href=&quot;https://en.wikipedia.org/wiki/Similarity_measure&quot;&gt;similarity measure&lt;/a&gt;. &lt;a href=&quot;#fnref:16&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:22&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Called the &lt;em&gt;Generalised Stochastic Block Model&lt;/em&gt; in Brownlees et al.&lt;sup id=&quot;fnref:4:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;, which is an extension of the vanilla &lt;a href=&quot;https://en.wikipedia.org/wiki/Stochastic_block_model&quot;&gt;stochastic bloc model&lt;/a&gt;. &lt;a href=&quot;#fnref:22&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:21&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Or of the sample correlation matrix, since a correlation matrix is a covariance matrix of a special kind. &lt;a href=&quot;#fnref:21&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:20&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Both the simulation study and the empirical application in Brownlees et al.&lt;sup id=&quot;fnref:4:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; show that the Blockbuster algorithm already performs &lt;em&gt;satisfactorily&lt;/em&gt;&lt;sup id=&quot;fnref:4:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; with $n = 50$, depending on the strength of the correlations; “sufficiently” large is thus not necessarily “very” large, that is, the Blockbuster algorithm seems to be well-behaved in finite sample. &lt;a href=&quot;#fnref:20&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:18&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The definition of “appropriately controlled” is left for future research in Brownlees et al.&lt;sup id=&quot;fnref:4:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:18&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:19&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;In particular, this formula explains why the $k$ “largest” eigenvectors of the matrix $\Sigma$ are extracted by the Blockbuster algorithm - it is because they correspond to the $k$ “smallest” eigenvectors of the matrix $\Sigma^{-1}$ (and of the matrix $L$). &lt;a href=&quot;#fnref:19&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:8&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://ora.ox.ac.uk/objects/uuid:c60358c0-24f0-4c66-b973-f84776f66f8a&quot;&gt;Qi Jin, Mihai Cucuringu, and Álvaro Cartea. 2023. Correlation Matrix Clustering for Statistical Arbitrage Portfolios. In Proceedings of the Fourth ACM International Conference on AI in Finance (ICAIF ‘23). Association for Computing Machinery, New York, NY, USA, 557–564&lt;/a&gt;. &lt;a href=&quot;#fnref:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:8:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:8:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:8:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:8:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:8:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:8:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:8:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:8:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:8:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:8:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;11&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:8:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;12&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:27&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;A Python implementation of the SPONGE and symmetric SPONGE algorithms is available at &lt;a href=&quot;https://github.com/alan-turing-institute/signet&quot;&gt;https://github.com/alan-turing-institute/signet&lt;/a&gt;. &lt;a href=&quot;#fnref:27&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:26&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;These two regularization parameters aim to &lt;em&gt;promote clusterizations that avoid small-sized clusters&lt;/em&gt;&lt;sup id=&quot;fnref:8:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:26&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:23&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Under a &lt;em&gt;signed stochastic bloc model&lt;/em&gt;&lt;sup id=&quot;fnref:5:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;, which is an extension of the vanilla &lt;a href=&quot;https://en.wikipedia.org/wiki/Stochastic_block_model&quot;&gt;stochastic bloc model&lt;/a&gt;. &lt;a href=&quot;#fnref:23&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:23:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:24&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;In numerical experiments, Cucuringu et al.&lt;sup id=&quot;fnref:5:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; uses $\tau^+$ = $\tau^-$ = 1, which nearly &lt;em&gt;always falls within the region of maximum recovery when it is present&lt;/em&gt;&lt;sup id=&quot;fnref:5:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:24&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:57&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;In numerical experiments, Cucuringu et al.&lt;sup id=&quot;fnref:5:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; uses $n$ ranging from 50 to 11259. &lt;a href=&quot;#fnref:57&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:6&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;http://jmlr.org/papers/v22/20-1289.html&quot;&gt;Mihai Cucuringu, Apoorv Vikram Singh, Deborah Sulem, and Hemant Tyagi. 2021. Regularized spectral methods for clustering signed networks. Journal of Machine Learning Research 22, 264 (2021), 1–79&lt;/a&gt;. &lt;a href=&quot;#fnref:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:6:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:25&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The symmetric SPONGE algorithm uses symmetric positive and negative Laplacian matrices $L^+_{sym} = \left( D^+ \right)^{-1/2} L^+ \left( D^+ \right)^{-1/2}$ and $L^-_{sym} = \left( D^- \right)^{-1/2} L^- \left( D^- \right)^{-1/2}$ instead of unnormalized ones and the $k$ smallest eigenvalues of the matrix $\left( L^-_{sym} + \tau^+ I_n \right)^{-1/2} \left( L^+_{sym} + \tau^- I_n \right) \left( L^-_{sym} + \tau^+ I_n \right)^{-1/2}$, c.f. Cucuringu et al.&lt;sup id=&quot;fnref:5:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:25&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:32&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://projecteuclid.org/journals/annals-of-applied-statistics/volume-10/issue-4/Discussion-of-Coauthorship-and-citation-networks-for-statisticians/10.1214/16-AOAS977.full&quot;&gt;Song Wang. Karl Rohe. “Discussion of “Coauthorship and citation networks for statisticians”.” Ann. Appl. Stat. 10 (4) 1820 - 1826, December 2016&lt;/a&gt;. &lt;a href=&quot;#fnref:32&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:32:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:32:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:29&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.sciencedirect.com/science/article/pii/0377042787901257?via%3Dihub&quot;&gt;Peter J. Rousseeuw (1987). Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis. Computational and Applied Mathematics. 20: 53–65&lt;/a&gt;. &lt;a href=&quot;#fnref:29&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:28&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://academic.oup.com/jrsssb/article-abstract/63/2/411/7083348?redirectedFrom=fulltext&quot;&gt;Tibshirani, R., Walther, G., and Hastie, T. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63, 2 (2001), 411–423&lt;/a&gt;. &lt;a href=&quot;#fnref:28&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:31&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://dl.acm.org/doi/10.1145/3606274.3606278&quot;&gt;Erich Schubert. 2023. Stop using the elbow criterion for k-means and how to choose the number of clusters instead. SIGKDD Explor. Newsl. 25, 1 (June 2023), 36–42&lt;/a&gt;. &lt;a href=&quot;#fnref:31&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:30&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;For example, de Nard&lt;sup id=&quot;fnref:7:16&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt; finds that &lt;em&gt;for the estimation problem of asset return covariance matrices, the eigengap heuristic often cannot identify any cluster and sets $k=1$&lt;/em&gt;&lt;sup id=&quot;fnref:7:17&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:30&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:34&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://papers.nips.cc/paper_files/paper/2004/hash/40173ea48d9567f1f393b20c855bb40b-Abstract.html&quot;&gt;Zelnik-Manor, L. and P. Perona (2004). Self-tuning spectral clustering. In Advances in neural information processing systems, pp. 1601–1608&lt;/a&gt;. &lt;a href=&quot;#fnref:34&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:35&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Under suitables technical assumptions. &lt;a href=&quot;#fnref:35&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:36&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.83.1467&quot;&gt;Laurent Laloux, Pierre Cizeau, Jean-Philippe Bouchaud, and Marc Potters, Noise Dressing of Financial Correlation Matrices, Phys. Rev. Lett. 83, 1467&lt;/a&gt;. &lt;a href=&quot;#fnref:36&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:36:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:37&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The returns of. &lt;a href=&quot;#fnref:37&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:7&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://academic.oup.com/jfec/article-abstract/20/4/569/5960228?redirectedFrom=fulltext&quot;&gt;Gianluca de Nard, Oops! I Shrunk the Sample Covariance Matrix Again: Blockbuster Meets Shrinkage, Journal of Financial Econometrics, Volume 20, Issue 4, Fall 2022, Pages 569–611&lt;/a&gt;. &lt;a href=&quot;#fnref:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:7:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;11&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;12&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:12&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;13&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:13&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;14&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:14&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;15&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:15&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;16&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:16&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;17&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:7:17&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;18&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:38&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.sciencedirect.com/science/article/abs/pii/S0091743514002485&quot;&gt;Hedwig Hofstetter, Elise Dusseldorp, Pepijn van Empelen, Theo W.G.M. Paulussen, A primer on the use of cluster analysis or factor analysis to assess co-occurrence of risk behaviors, Preventive Medicine, Volume 67, 2014, Pages 141-146&lt;/a&gt;. &lt;a href=&quot;#fnref:38&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:39&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://onlinelibrary.wiley.com/doi/abs/10.3982/ECTA8968&quot;&gt;Ahn, S., and A. Horenstein. 2013. Eigenvalue Ratio Test for the Number of Factors. Econometrica 80: 1203–1227&lt;/a&gt;. &lt;a href=&quot;#fnref:39&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:39:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:39:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:39:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:39:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:39:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:39:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:39:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:40&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The eigenvalue ratio estimator and the growth ratio estimator of Ahn and Horenstein&lt;sup id=&quot;fnref:39:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:39&quot; class=&quot;footnote&quot;&gt;40&lt;/a&gt;&lt;/sup&gt; both rely on the covariance matrix, of which the correlation matrix is a special kind. In addition, for reasons detailled in Fan et al.&lt;sup id=&quot;fnref:41:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:41&quot; class=&quot;footnote&quot;&gt;42&lt;/a&gt;&lt;/sup&gt;, using the covariance matrix to determine the number of factors to retain in a factor analysis is generally a bad idea. &lt;a href=&quot;#fnref:40&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:41&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.tandfonline.com/doi/abs/10.1080/01621459.2020.1825448&quot;&gt;Fan, J., Guo, J., &amp;amp; Zheng, S. (2020). Estimating Number of Factors by Adjusted Eigenvalues Thresholding. Journal of the American Statistical Association, 117(538), 852–861&lt;/a&gt;. &lt;a href=&quot;#fnref:41&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:41:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:41:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:41:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:42&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/14306381/&quot;&gt;Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185&lt;/a&gt;. &lt;a href=&quot;#fnref:42&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:43&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://journals.sagepub.com/doi/10.1177/0013164495055003002&quot;&gt;Glorfeld, L. W. (1995). An Improvement on Horn’s Parallel Analysis Methodology for Selecting the Correct Number of Factors to Retain. Educational and Psychological Measurement, 55(3), 377-393&lt;/a&gt;. &lt;a href=&quot;#fnref:43&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:44&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Due to the context of the paper, de Nard&lt;sup id=&quot;fnref:7:18&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt; uses covariance-based spectral clustering. &lt;a href=&quot;#fnref:44&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:45&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Jin et al.&lt;sup id=&quot;fnref:8:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt; also uses another method to determine the number of clusters, based on the percentage of variance explained by selecting the top $k$ eigenvalues of the asset correlation matrix. &lt;a href=&quot;#fnref:45&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:46&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;To be noted, though, that Jin et al.&lt;sup id=&quot;fnref:8:14&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt;’s methodology to compute the asset correlation matrix is different from de Nard&lt;sup id=&quot;fnref:7:19&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt;’s, especially in that Jin et al.&lt;sup id=&quot;fnref:8:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt; consider residual asset returns and not raw asset returns. Such a difference could - and actually, should - influence the computation of the optimal number of clusters, whatever the method used. Nevertheless, my personal experience with de Nard&lt;sup id=&quot;fnref:7:20&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;38&lt;/a&gt;&lt;/sup&gt;’s 1D KDE clustering method is that it is anyway too sensitive to its parameters (kernel and bandwidth) to be confidently used in practice. &lt;a href=&quot;#fnref:46&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:48&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.pm-research.com/content/iijpormgmt/38/3/28&quot;&gt;Lee, Wai, Risk On/Risk Off, The Journal of Portfolio Management  Spring 2012, 38 (3) 28-39&lt;/a&gt;. &lt;a href=&quot;#fnref:48&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:48:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:48:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:48:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:50&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;10 of these ETFs are used in the &lt;em&gt;Adaptative Asset Allocation&lt;/em&gt; strategy from &lt;a href=&quot;https://investresolve.com/&quot;&gt;ReSolve Asset Management&lt;/a&gt;, described in the paper &lt;em&gt;Adaptive Asset Allocation: A Primer&lt;/em&gt;&lt;sup id=&quot;fnref:51&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:51&quot; class=&quot;footnote&quot;&gt;54&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:50&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:52&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;(Adjusted) prices have have been retrieved using &lt;a href=&quot;https://api.tiingo.com/&quot;&gt;Tiingo&lt;/a&gt;. &lt;a href=&quot;#fnref:52&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:53&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;My own interpretation is that if one would prefer to avoid investing in Intermediate-term U.S. Treasuries, Gold would then represents the “closest” risk-off asset. &lt;a href=&quot;#fnref:53&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:15&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://arxiv.org/abs/1208.3145&quot;&gt;Stijn van Dongen, Anton J. Enright, Metric distances derived from cosine similarity and Pearson and Spearman correlations, arXiv&lt;/a&gt;. &lt;a href=&quot;#fnref:15&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:17&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;&lt;a href=&quot;https://jpm.pm-research.com/content/42/4/59&quot;&gt;Lopez de Prado, M. (2016). Building diversified portfolios that outperform out-of-sample. Journal of Portfolio Management, 42(4), 59–69&lt;/a&gt;. &lt;a href=&quot;#fnref:17&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:51&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2328254&quot;&gt;Butler, Adam and Philbrick, Mike and Gordillo, Rodrigo and Varadi, David, Adaptive Asset Allocation: A Primer&lt;/a&gt;. &lt;a href=&quot;#fnref:51&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name>Roman R.</name></author><category term="clustering" /><category term="correlation matrix" /><summary type="html">Clustering consists in trying to identify groups of “similar behavior”1 - called clusters - from a dataset, according to some chosen characteristics. An example of such a characteristic in finance is the correlation coefficient between two time series of asset returns, whose usage to partition a universe of assets into groups of “close” and “distant” assets thanks to a hierarchical clustering method was originally2 proposed in Mantegna3. In this blog post, I will describe two correlation-based clustering methods belonging to the family of spectral clustering methods: the Blockbuster method introduced in Brownlees et al.4 and the SPONGE method introduced in Cucuringu et al.5. As examples of usage, I will discuss 1) how to automatically group U.S. stocks together without relying on external information like industry classification and 2) how to identify risk-on and risk-off assets within a U.S. centric universe of assets. Mathematical preliminaries Let $x_1$, …, $x_n$ be $n$ points6 in $\mathbb{R}^m, m \geq 1$ to be partitioned into $k \geq 2$ subsets. Spectral clustering Spectral clustering, like other approaches to clustering - geometric approaches such as $k$-means , density-based approaches such as DBSCAN… - initially relies on pairwise similarities $s(x_i, x_j)$ between points $i,j=1..n$ with $s$ some similarity function which is symmetric and non-negative1. Once the corresponding similarity matrix $S_{ij} = s(x_i, x_j)$, $i,j=1…n$, has been computed, a spectral clustering method then usually follows a three-step process: Compute an affinity matrix $A \in \mathbb{R}^{n \times n}$ from the similarity matrix $S$ The affinity matrix $A$ corresponds to the adjacency matrix of an underlying graph whose vertices represent the points $x_1$, …, $x_n$ and whose edges model the local7 neighborhood relationships between the data points1. Compute a matrix $L \in \mathbb{R}^{n \times n}$ derived from the affinity matrix $A$ Due to the relationship between spectral clustering and graph theory1, the matrix $L$ is typically called a Laplacian matrix. Compute a matrix $Y \in \mathbb{R}^{n \times k}$ derived from the eigenvectors corresponding to the $k$ smallest (or sometimes largest) eigenvalues of the matrix $L$ and cluster its $n$ rows $y_1$, …, $y_n$ using the $k$-means algorihm The result of that clustering represents the desired clustering of the original $n$ points $x_1$, …, $x_n$. It should be emphasized that there is nothing principled about using the $k$-means algorithm in this step1, c.f. for example Huang et al.8 in which spectral rotations are used instead of $k$-means, but one can argue that at least the Euclidean distance between the [rows of the matrix $Y$] is a meaningful quantity to look at1. The Ng-Jordan-Weiss spectral clustering method Different ways of computing the affinity matrix $A$, the Laplacian matrix $L$ or the matrix $Y$ lead to different spectral clustering methods9. One popular spectral clustering method is the Ng-Jordan-Weiss (NJW) method10, detailed in Figure 1 taken from Ng et al.10 where $\sigma^2$ is a scaling parameter that controls how rapidly the affinity $A_{ij}$ falls off with the distance between $s_i$ and $s_j$10. Figure 1. Ng-Jordan-Weiss spectral clustering method. Source: Ng et al. Rationale behind spectral clustering At first sight, [spectral clustering] seem to make little sense. Since we run $k$-means [on points $y_1$, …, $y_n$], why not just apply $k$-means directly to the [original points $x_1$, …, $x_n$]10? A visual justification of spectral clustering is provided in Figure 2 adpated from Ng et al.10, which displays data points in $\mathbb{R}^2$ forming two circles (e) and their associated spectral representation in $\mathbb{R}^2$ using the NJW spectral clustering method (h). Figure 2. Data points forming two circles, 2D plane and spectral plane. Source: Ng et al. On Figure 2, it is clearly visible that the transformed data points (h) form two well-separated convex clusters in the spectral plane, which is the ideal situation for the $k$-means algorithm. More generally11, it can be demonstrated that the change of representation from the original data points $x_1$, …, $x_n$ to the embedded data points $y_1$, …, $y_n$ enhances the cluster-properties in the data, so that clusters can be trivially detected [by the $k$-means clustering algorithm] in the new representation1. Correlation-based spectral clustering Let $x_1$, …, $x_n$ be $n$ variables whose pairwise correlations12 $\rho_{ij}$, $i,j=1…n$, have been assembled in a correlation matrix $C \in \mathbb{R}^{n \times n}$. Because correlation is a measure of dependency, a legitimate question is whether it is possible to use spectral clustering to partition these variables w.r.t. their correlations? Ideally, we would like to have highly correlated variables grouped together in the same clusters, with low or even negative correlations between the clusters themselves. Problem is, as noted in Mantegna3, the correlation coefficient of a pair of [variables] cannot be used as a distance [or as a similarity] between the two [variables] because it does not fulfill the three axioms that define a metric3, which a priori excludes its direct usage in a similarity matrix or in an affinity matrix. That being said, a13 metric14 can be defined using as distance a function of the correlation coefficient3 - like the distance $ d_{ij} = \sqrt{ 2 \left( 1 - \rho_{ij} \right) } $ -, which allows to indirectly use any spectral clustering method15 with correlations. But what if we insist to work directly with correlations? In this case, a couple of specific spectral clustering methods have been developped, among which: The Blockbuster spectral clustering method of Brownlees et al.4 The SPONGE spectral clustering method of Cucuringu et al.5 The Blockbuster spectral clustering method Brownlees et al.4 introduces a network model16 under which large panels of time series are partitioned into latent groups such that correlation is higher within groups than between them4 and proposes an algorithm relying on the eigenvectors of the sample covariance matrix17 $\Sigma \in \mathbb{R}^{n \times n}$ to detect these groups. As can be seen in Figure 3 taken from Brownlees et al.4, this algorithm - called Blockbuster - is surprisingly very close to the NJW spectral clustering method. Figure 3. Blockbuster algorithm. Source: Brownlees et al. Brownlees et al.4 establishes that the Blockbuster algorithm consistently detects the [groups] when the number of observations $T$ and the dimension of the panel $n$ are sufficiently18 large4, as long as a couple of assumptions are satisfied, like: $T \geq n$, with the more fat-tailed and dependent the data are, the larger $T$ has to be4 The precision matrix $\Sigma^{-1}$ exists and contains only non-positive entries (or its fraction of positive entries is appropriately controlled19) An important feature of the Blockbuster algorithm is that it allows one to detect the [groups] without estimating the network structure of the data4. In other words, while the Blockbuster algorithm is a spectral clustering method, nowhere is an affinity matrix computed! The black magic at work here is that the underlying network model of Brownlees et al.4 additionally assumes that the (symmetric normalized) Laplacian matrix $L$ is a function of the precision matrix $\Sigma^{-1}$ through \[L = \frac{\sigma^2}{\phi} \Sigma^{-1} - \frac{1}{\phi} I_n\] , where: $\sigma^2$ is a network variance parameter that does not influence the detection of the groups $\phi$ is a network dependence parameter that does not influence the detection of the groups $I_n \in \mathbb{R}^{n \times n}$ is the identity matrix of order $n$ This formula clarifies the otherwise mysterious connection20 between the Blockbuster algorithm and the NJW method. The SPONGE spectral clustering method Cucuringu et al.5 extends the spectral clustering framework described in the previous section to the case of a signed affinity matrix $A$ whose underlying complete weighted graph represents the variables $x_1$, …, $x_n$ and their pairwise correlations. Figure 4 taken from Jin et al.21 illustrates the idea of Cucuringu et al.5, which is to minimize the number of violations in the constructed partition, with a violation, as in this figure, is when there are negative edges in a cluster and positive edges across clusters21. Figure 4. Illustration of the idea behind the SPONGE clustering algorithm - minimizing the number of violations in the constructed partition. Source: Jin et al. More formally, Cucuringu et al.5 proposes a three-step algorithm22 - called SPONGE (Signed Positive Over Negative Generalized Eigenproblem) - that aims to find a partition [of the underlying graph] into k clusters such that most edges within clusters are positive, and most edges across clusters are negative5: Decompose the adjacency matrix as $A = A^+ - A^-$, with $A^+ \in \mathbb{R}^{n \times n}$, with $A^+_{ij} = A_{ij}$ if $A_{ij} \geq 0$ and $A^+_{ij} = 0$ if $A_{ij} &amp;lt; 0$ $A^- \in \mathbb{R}^{n \times n}$, with $A^-_{ij} = A_{ij}$ if $A_{ij} \leq 0$ and $A^-_{ij} = 0$ if $A_{ij} &amp;gt; 0$ Compute the (unnormalized) positive and negative Laplacian matrices $L^+$ and $L^-$, with $L^+ \in \mathbb{R}^{n \times n}$, with $L^+ = D^+ - A^+$ and $D^+ \in \mathbb{R}^{n \times n}$ a diagonal matrix satisfying $D^+_{ii} = \sum_{j=1}^n A^+_{ij}$ $L^- \in \mathbb{R}^{n \times n}$, with $L^- = D^- - A^-$ and $D^- \in \mathbb{R}^{n \times n}$ a diagonal matrix satisfying $D^-_{ii} = \sum_{j=1}^n A^-_{ij}$ Compute the matrix $Y \in \mathbb{R}^{n \times k}$ made of the eigenvectors corresponding to the $k$ smallest eigenvalues of the matrix $\left( L^- + \tau^+ D^+ \right)^{-1/2} \left( L^+ + \tau^- D^- \right) \left( L^- + \tau^+ D^+ \right)^{-1/2}$, where $\tau^+ &amp;gt; 0$ and $\tau^- &amp;gt; 0$ are regularization parameters23, and cluster its $n$ rows using the $k$-means algorihm Cucuringu et al.5 establishes the consistency24 of the SPONGE algorithm in the case of $k = 2$ equally-sized clusters, provided $\tau^-$ is sufficiently small25 compared to $\tau^+$5 and the number of variables $n$ is sufficiently large26 for a clustering to be recoverable5. In subsequent work, Cucuringu et al.27 establish the consistency24 of a variant of the SPONGE algorithm - called symmetric28 SPONGE - in the case of $k \geq 2$ unequal-sized clusters when $n$ is large enough27. How to choose the number of clusters in correlation-based spectral clustering? Choosing the number $k$ of clusters is a general problem for all clustering algorithms, and a variety of more or less successful methods have been devised for this problem1. Correlation-based spectral clustering being 1) a clustering method 2) based on the spectrum of a matrix derived from 3) a correlation matrix, there are at least three families of methods which can be used to determine the “optimal” number of clusters: Generic methods Specific methods taylored to spectral clustering Specific methods taylored to correlation-based clustering or to correlation-based factor analysis However, even if it is mathematically satisfying to find such an optimal number, it is important to keep in mind that just because we find an [optimal] partition […] does not preclude the possibility of other good partitions29. Indeed, as Wang and Rohe29 puts it: We must disabuse ourselves of the notion of “the correct partition”. Instead, there are several “reasonable partitions” some of these clusterings might be consistent with one another (as might be imagined in a hierarchical clustering), others might not be consistent. Generic methods Any black box method to select an optimal number of clusters in a clustering algorithm - like the silhouette index30 or the gap statistic31 - can be used in correlation-based spectral clustering. On an opinionated note, though, the the elbow criterion should not be used, for reasons detailled in Schubert32. Specific methods taylored to spectral clustering In the context of spectral clustering, several specific methods for determining the optimal number of clusters have been proposed. The most well-known of these methods is the eigengap heuristic1, whose aim is to choose the number $k$ such that all eigenvalues $\lambda_1$,…,$\lambda_k$ [of the Laplacian matrix] are very small, but $\lambda_{k+1}$ is relatively large1. In practice, this method works well if the clusters in the data are very well pronounced1. Nevertheless, the more noisy or overlapping the clusters are, the less effective is this heuristic1, which is obviously a problem for financial applications33. Furthermore, just because there is a gap, it doesn’t mean that the rest of the eigenvectors are noise29. Another popular method is the method used in the self-tuning spectral clustering algorithm of Zelnik-Manor and Perona34, which selects the optimal number of clusters as the number allowing to “best” align the (rows of the) eigenvectors of the Laplacian matrix with the vectors of the canonical basis of $\mathbb{R}^{k}$. Specific methods taylored to correlation-based clustering or to correlation-based factor analysis Correlation-based clustering, whether spectral or not, revolves around a very special object - a correlation matrix -, whose properties can be used to find the optimal number of clusters. Random matrix theory-based methods From random matrix theory, the distribution of the eigenvalues of a large random correlation matrix follows a particular distribution - called the Marchenko-Pastur distribution - independent35 of the underlying observations. That distribution involves a threshold $\lambda_{+} \geq 0$ beyond which it is not expected to find any eigenvalue in a random correlation matrix. Laloux et al.36 uses this threshold to define a correlation matrix denoising procedure called the eigenvalues clipping method, c.f. a previous blog post. In the context at hand, Jin et al.21 proposes to determine the optimal number of clusters as the number of eigenvalues of the correlation matrix that exceed [this threshold], which are the eigenvalues associated with dominant factors or patterns in the [original data]21. To illustrate this method, Figure 5, taken from Laloux et al.36, depicts two different Marchenko-Pastur distributions fitted to a correlation matrix of37 406 stocks belonging to the S&amp;amp;P 500 index. Figure 5. Smoothed density of the eigenvalues of the correlation matrix of 406 assets belonging to the S&amp;amp;P 500 and two fitted Marchenko-Pastur distributions, 1991 – 1996. Source: Laloux et al. From this figure, the number of clusters corresponding to the method of Jin et al.21 when applied to the dotted Marchenko-Pastur distribution would be around 15. Non-linear shrinkage-based methods De Nard38 notes that non-linear shrinkage [of the sample covariance matrix] pushes the sample eigenvalues toward their closest and most numerous neighbors, thus toward (local) cluster38. As a consequence, it should be possible to use the information from non-linear shrinkage — the number of jumps in the shrunk eigenvalues — or directly the distribution of the sample eigenvalues — the number of sample eigenvalue clusters — to obtain38 the optimal number of clusters. This method is illustrated in Figure 6 taken from de Nard38, on which 2 clusters are visible. Figure 6. Distribution of the sample eigenvalues of a covariance matrix and optimal number of groups corresponding to the centroids of that distribution (0.8 and 2). Source: de Nard. From a practical perspective, de Nard38 suggests to use 1D KDE clustering in order to determine the number of sample eigenvalue clusters. Factor analysis-based methods Although the ultimate goal of cluster analysis and factor analysis is different, the underlying logic of both techniques is dimension reduction (i.e., summarizing information on multiple variables into just a few variables)39. Based on this similarity, de Nard38 discusses the eigenvalue ratio estimator40 of Ahn and Horenstein40 that consists in maximizing the ratio of two adjacent eigenvalues [of the sample correlation matrix41] to determine the number of factors (here clusters)38 in economic or financial data40. Unfortunately, the eigenvalue ratio estimator often cannot identify any cluster [in a large universe of U.S. stocks] and sets $k = 1$38, which corresponds to the situation described in Ahn and Horenstein40 where one factor [- here, the market factor -] has extremely strong explanatory power for response variables40. In such a situation, the growth ratio estimator40 of Ahn and Horenstein40 - this time maximizing the ratio of the logarithmic growth rate of two consecutive eigenvalues to determine the number of factors - should be used instead, because it empirically appears to be able to mitigate the effect of the dominant factor40. Yet another estimator of the number of common factors is the adjusted correlation thresholding estimator42 of Fan et al.42, which determines the number of common factors as the number of eigenvalues greater than 1 of the population correlation matrix […], taking into account the sampling variabilities and biases of top sample eigenvalues42. Implementation in Portfolio Optimizer Portfolio Optimizer implements correlation-based spectral clustering through the endpoint /assets/clustering/spectral/correlation-based. This endpoint supports three different methods: The Blockbuster spectral clustering method The SPONGE spectral clustering method The symmetric SPONGE spectral clustering method (default) As for the number of clusters to use, this endpoint supports: A manually-defined number of clusters An automatically-determined number of clusters, computed through a proprietary variation of Horn’s parallel analysis method43 (default) As a side note, Horn’s parallel analysis method seems to be little known in finance, but numerous studies [in psychology] have consistently shown that [it] is the most nearly accurate methodology for determining the number of factors to retain in an exploratory factor analysis44. Examples of usage Automated clustering of U.S. stocks De Nard38 and Jin et al.21 both study the automated clustering of a dynamic universe of U.S. stocks through correlation-based45 spectral clustering: De Nard38, in the context of covariance matrix shrinkage It uses the Blockbuster method, together with a 1D KDE clustering method to automatically determine the number of clusters. Jin et al.21, in the context of the construction of statistical arbitrage portfolios It uses the SPONGE and SPONGE symmetric methods, together with a Marchenko-Pastur distribution-based method46 to automatically determine the number of clusters. An important remark at this point. When clustering stocks, it is possible to rely on what de Nard38 calls external information38, for example industry classifications like: The Standard Industrial Classification The MSCI Global Industry Classification Standard Nevertheless, such an external information may fail to create (a valid number of) homogeneous groups38. In the words of de Nard38: For example, if we cluster the covariance matrix into groups of financial and nonfinancial firms, arguably, there will be some nonfinancial firm(s) that has (have) some stock characteristics more similar to the financial stocks as to the non-financial stocks and is (are) therefore misclassified. Especially in large dimension one would expect a few of such misclassifications in both directions. To overcome this misclassification problem and to really create homogeneous groups, [a data-driven procedure should be used instead]. Results of de Nard38 and Jin et al.21 are the following: Within a universe of 500 stocks over the period January 1998 to December 2017, de Nard38 finds that the number of clusters is limited to only 1 or 2 about 75% of the time. Within a universe of ~600 stocks over the period January 2000 to December 2022, Jin et al.21 finds that the number of clusters is relatively stable around 19, even though it tends to drop during financial hardships of the United States21 down to a minimum of 8, as displayed in Figure 7 adapted from Jin et al.21. Figure 7. Evolution of the number of clusters found by a Marchenko-Pastur distribution-based method, January 2000 - December 2022. Source: Jin et al. These diverging results on closely related data sets show that different methods to determine the number of clusters to use in clustering analysis might be completely at odds with one another47. But these results also show that the “optimal” number of clusters - whatever it is - is not constant over time and that methods that dynamically determine [it] can capture changes in market dynamics, especially when there is significant downside risks in the market21. Incidentally, this is another argument in favor of not relying on (slowly evolving) external information to determine the number of clusters to use. Identification of risk-on/risk-off assets in a U.S.-centric universe of assets In April 2025, “risk on/risk off” has been making the headlines as a phrase describing investment and asset price behavior48. Lee48 summarizes the underlying meaning as follows48: […] risk on/risk off generally refers to an investment environment in which asset price behavior is largely driven by how risk appetite advances or retreats over time, usually in a synchronized way at a faster than normal pace across global regions and assets. And describes it in more details as follows48: Depending on the environment, investors tend to buy or sell risky assets across the board, paying less attention to the unique characteristics of these assets. Volatilities and, most noticeably, correlations of assets that are perceived as risky will jump, particularly during risk-off periods, inspiring comments such as “correlations go to one” during a crisis. Assets, such as U.S. Treasury bonds, and currencies, such as the Japanese yen, tend to move in the opposite direction of risky assets and are generally perceived as the safer assets to hold in the event of a flight to safety. In this sub-section, I propose to study the capability of correlation-based spectral clustering to identify risk-on/risk-off assets in a U.S.-centric universe of assets. For this: I will use as a universe of assets 13 ETFs representative49 of misc. asset classes: U.S. stocks (SPY ETF) European stocks (EZU ETF) Japanese stocks (EWJ ETF) Emerging markets stocks (EEM ETF) U.S. REITs (VNQ ETF) International REITs (RWX ETF) Cash (SHY ETF) U.S. 7-10 year Treasuries (IEF ETF) U.S. 20+ year Treasuries (TLT ETF) U.S. Investment Grade Corporate Bonds(LQD ETF) U.S. High Yield Corporate Bonds (HYG ETF) Commodities (DBC ETF) Gold (GLD ETF) I will compute the correlation matrix of the daily returns of these assets over the period 1st April 2025 - 30th April 202550 Figure 8 displays that correlation matrix using a planar t-SNE representation of its rows, considered as points in $\mathbb{R}^{13}$. Figure 8. t-SNE representation of the asset correlation matrix, April 2025. To be noted that the t-SNE representation of Figure 8 is a bit misleading in the context of spectral clustering, because spectral clustering does not operate in the t-SNE plane. Still, it is an helpful representation to give a sense of asset “closeness” in terms of correlations. I will cluster these assets using the symmetric SPONGE spectral clustering method with $k = 2$ and $k = 3$ clusters, the rationale being that: All the assets clustered together with Cash should correspond to risk-off assets ($k = 2,3$) All the assets clustered together with U.S. stocks should correspond to risk-on assets ($k = 2,3$) All the assets clustered in the remaining cluster should correspond to a another category to be determined ($k = 3$ only) Figures 9 and 10 show the resulting clusterings. Figure 9. Symmetric SPONGE clustering of the asset correlation matrix, two clusters, April 2025. Figure 10. Symmetric SPONGE clustering of the asset correlation matrix, three clusters, April 2025. From these figures, and over the considered period: Intermediate-term U.S. Treasuries is found to be a risk-off asset when $k = 2$ Gold is found to be a risk-off asset when $k = 3$, with Intermediate-term and Long-term U.S. Treasuries now moved into their own cluster The evolution of the classification of Gold as a risk-off asset is particularly interesting51 and also serves to illustrate one of the limitations of the t-SNE representation of the asset correlation matrix. Indeed, from that representation, it makes no sense that Cash and Gold could belong to the same cluster since they are complete opposites in the t-SNE plane! But in reality, if the t-SNE plane is diagonally “folded”, Cash and Gold truly are close neighbors… Conclusion This (abruptly?) concludes this already too long overview of correlation-based spectral clustering. Waiting for the next blog post on correlation-based clustering, feel free to connect with me on LinkedIn or to follow me on Twitter. – See von Luxburg, U. A tutorial on spectral clustering. Stat Comput 17, 395–416 (2007). &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 &amp;#8617;10 &amp;#8617;11 &amp;#8617;12 See R. N. Mantegna, H. E. Stanley, Introduction to econophysics: correlations and complexity in finance, Cambridge university press, 1999. &amp;#8617; See Mantegna, R.N. Hierarchical structure in financial markets. Eur. Phys. J. B 11, 193–197 (1999). &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 See Brownlees, C., Guðmundsson, G. S., &amp;amp; Lugosi, G. (2020). Community Detection in Partial Correlation Network Models. Journal of Business &amp;amp; Economic Statistics, 40(1), 216–226. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 &amp;#8617;10 See Mihai Cucuringu, Peter Davies, Aldo Glielmo, and Hemant Tyagi. SPONGE: A generalized eigenproblem for clustering signed networks. In Artificial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pages 1088–1098. PMLR, 2019. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 $x_1, …, x_n$ can be more general “objects”, as long as a distance between these objects is defined. &amp;#8617; As opposed to their global relationships that already available from the similarity matrix $S$. &amp;#8617; See Jin Huang, Feiping Nie, and Heng Huang. 2013. Spectral rotation versus K-means in spectral clustering. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence (AAAI’13). AAAI Press, 431–437. &amp;#8617; The similarity function $s$ is also very important in spectral clustering, although ultimately, the choice of the similarity function depends on the domain the data comes from, and no general advice can be given1. &amp;#8617; See Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. 2001. On spectral clustering: analysis and an algorithm. In Proceedings of the 15th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS’01). MIT Press, Cambridge, MA, USA, 849–856. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 Thanks to the properties of the Laplacian matrix $L$. &amp;#8617; Typically the Pearson correlation, but other measures like the Spearman correlation or the Gerber correlation can also be used as long as the resulting correlation matrix $C$ is a valid correlation matrix. &amp;#8617; It is actually possible to define a whole family of metrics as function of the Pearson or Spearman correlation coefficient, c.f. van Dongen and Enright52. &amp;#8617; Such correlation-based metrics are also used elsewere in finance, for example in de Prado’s53 Hierarchical Risk Parity portfolio optimization algorithm. &amp;#8617; To be noted that depending on the exact spectral clustering method used, it might be required to first convert that distance to a similarity measure. &amp;#8617; Called the Generalised Stochastic Block Model in Brownlees et al.4, which is an extension of the vanilla stochastic bloc model. &amp;#8617; Or of the sample correlation matrix, since a correlation matrix is a covariance matrix of a special kind. &amp;#8617; Both the simulation study and the empirical application in Brownlees et al.4 show that the Blockbuster algorithm already performs satisfactorily4 with $n = 50$, depending on the strength of the correlations; “sufficiently” large is thus not necessarily “very” large, that is, the Blockbuster algorithm seems to be well-behaved in finite sample. &amp;#8617; The definition of “appropriately controlled” is left for future research in Brownlees et al.4. &amp;#8617; In particular, this formula explains why the $k$ “largest” eigenvectors of the matrix $\Sigma$ are extracted by the Blockbuster algorithm - it is because they correspond to the $k$ “smallest” eigenvectors of the matrix $\Sigma^{-1}$ (and of the matrix $L$). &amp;#8617; See Qi Jin, Mihai Cucuringu, and Álvaro Cartea. 2023. Correlation Matrix Clustering for Statistical Arbitrage Portfolios. In Proceedings of the Fourth ACM International Conference on AI in Finance (ICAIF ‘23). Association for Computing Machinery, New York, NY, USA, 557–564. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 &amp;#8617;10 &amp;#8617;11 &amp;#8617;12 A Python implementation of the SPONGE and symmetric SPONGE algorithms is available at https://github.com/alan-turing-institute/signet. &amp;#8617; These two regularization parameters aim to promote clusterizations that avoid small-sized clusters21. &amp;#8617; Under a signed stochastic bloc model5, which is an extension of the vanilla stochastic bloc model. &amp;#8617; &amp;#8617;2 In numerical experiments, Cucuringu et al.5 uses $\tau^+$ = $\tau^-$ = 1, which nearly always falls within the region of maximum recovery when it is present5. &amp;#8617; In numerical experiments, Cucuringu et al.5 uses $n$ ranging from 50 to 11259. &amp;#8617; See Mihai Cucuringu, Apoorv Vikram Singh, Deborah Sulem, and Hemant Tyagi. 2021. Regularized spectral methods for clustering signed networks. Journal of Machine Learning Research 22, 264 (2021), 1–79. &amp;#8617; &amp;#8617;2 The symmetric SPONGE algorithm uses symmetric positive and negative Laplacian matrices $L^+_{sym} = \left( D^+ \right)^{-1/2} L^+ \left( D^+ \right)^{-1/2}$ and $L^-_{sym} = \left( D^- \right)^{-1/2} L^- \left( D^- \right)^{-1/2}$ instead of unnormalized ones and the $k$ smallest eigenvalues of the matrix $\left( L^-_{sym} + \tau^+ I_n \right)^{-1/2} \left( L^+_{sym} + \tau^- I_n \right) \left( L^-_{sym} + \tau^+ I_n \right)^{-1/2}$, c.f. Cucuringu et al.5. &amp;#8617; See Song Wang. Karl Rohe. “Discussion of “Coauthorship and citation networks for statisticians”.” Ann. Appl. Stat. 10 (4) 1820 - 1826, December 2016. &amp;#8617; &amp;#8617;2 &amp;#8617;3 See Peter J. Rousseeuw (1987). Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis. Computational and Applied Mathematics. 20: 53–65. &amp;#8617; See Tibshirani, R., Walther, G., and Hastie, T. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63, 2 (2001), 411–423. &amp;#8617; See Erich Schubert. 2023. Stop using the elbow criterion for k-means and how to choose the number of clusters instead. SIGKDD Explor. Newsl. 25, 1 (June 2023), 36–42. &amp;#8617; For example, de Nard38 finds that for the estimation problem of asset return covariance matrices, the eigengap heuristic often cannot identify any cluster and sets $k=1$38. &amp;#8617; See Zelnik-Manor, L. and P. Perona (2004). Self-tuning spectral clustering. In Advances in neural information processing systems, pp. 1601–1608. &amp;#8617; Under suitables technical assumptions. &amp;#8617; See Laurent Laloux, Pierre Cizeau, Jean-Philippe Bouchaud, and Marc Potters, Noise Dressing of Financial Correlation Matrices, Phys. Rev. Lett. 83, 1467. &amp;#8617; &amp;#8617;2 The returns of. &amp;#8617; See Gianluca de Nard, Oops! I Shrunk the Sample Covariance Matrix Again: Blockbuster Meets Shrinkage, Journal of Financial Econometrics, Volume 20, Issue 4, Fall 2022, Pages 569–611. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 &amp;#8617;10 &amp;#8617;11 &amp;#8617;12 &amp;#8617;13 &amp;#8617;14 &amp;#8617;15 &amp;#8617;16 &amp;#8617;17 &amp;#8617;18 See Hedwig Hofstetter, Elise Dusseldorp, Pepijn van Empelen, Theo W.G.M. Paulussen, A primer on the use of cluster analysis or factor analysis to assess co-occurrence of risk behaviors, Preventive Medicine, Volume 67, 2014, Pages 141-146. &amp;#8617; See Ahn, S., and A. Horenstein. 2013. Eigenvalue Ratio Test for the Number of Factors. Econometrica 80: 1203–1227. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 The eigenvalue ratio estimator and the growth ratio estimator of Ahn and Horenstein40 both rely on the covariance matrix, of which the correlation matrix is a special kind. In addition, for reasons detailled in Fan et al.42, using the covariance matrix to determine the number of factors to retain in a factor analysis is generally a bad idea. &amp;#8617; See Fan, J., Guo, J., &amp;amp; Zheng, S. (2020). Estimating Number of Factors by Adjusted Eigenvalues Thresholding. Journal of the American Statistical Association, 117(538), 852–861. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 See Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185. &amp;#8617; See Glorfeld, L. W. (1995). An Improvement on Horn’s Parallel Analysis Methodology for Selecting the Correct Number of Factors to Retain. Educational and Psychological Measurement, 55(3), 377-393. &amp;#8617; Due to the context of the paper, de Nard38 uses covariance-based spectral clustering. &amp;#8617; Jin et al.21 also uses another method to determine the number of clusters, based on the percentage of variance explained by selecting the top $k$ eigenvalues of the asset correlation matrix. &amp;#8617; To be noted, though, that Jin et al.21’s methodology to compute the asset correlation matrix is different from de Nard38’s, especially in that Jin et al.21 consider residual asset returns and not raw asset returns. Such a difference could - and actually, should - influence the computation of the optimal number of clusters, whatever the method used. Nevertheless, my personal experience with de Nard38’s 1D KDE clustering method is that it is anyway too sensitive to its parameters (kernel and bandwidth) to be confidently used in practice. &amp;#8617; See Lee, Wai, Risk On/Risk Off, The Journal of Portfolio Management Spring 2012, 38 (3) 28-39. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 10 of these ETFs are used in the Adaptative Asset Allocation strategy from ReSolve Asset Management, described in the paper Adaptive Asset Allocation: A Primer54. &amp;#8617; (Adjusted) prices have have been retrieved using Tiingo. &amp;#8617; My own interpretation is that if one would prefer to avoid investing in Intermediate-term U.S. Treasuries, Gold would then represents the “closest” risk-off asset. &amp;#8617; See Stijn van Dongen, Anton J. Enright, Metric distances derived from cosine similarity and Pearson and Spearman correlations, arXiv. &amp;#8617; Lopez de Prado, M. (2016). Building diversified portfolios that outperform out-of-sample. Journal of Portfolio Management, 42(4), 59–69. &amp;#8617; See Butler, Adam and Philbrick, Mike and Gordillo, Rodrigo and Varadi, David, Adaptive Asset Allocation: A Primer. &amp;#8617;</summary></entry><entry><title type="html">Volatility Forecasting: HExp Model</title><link href="https://portfoliooptimizer.io/blog/volatility-forecasting-hexp-model/" rel="alternate" type="text/html" title="Volatility Forecasting: HExp Model" /><published>2025-03-09T00:00:00-06:00</published><updated>2025-03-09T00:00:00-06:00</updated><id>https://portfoliooptimizer.io/blog/volatility-forecasting-hexp-model</id><content type="html" xml:base="https://portfoliooptimizer.io/blog/volatility-forecasting-hexp-model/">&lt;p&gt;In this series on volatility forecasting, I &lt;a href=&quot;/blog/volatility-forecasting-har-model/&quot;&gt;previously&lt;/a&gt; detailed the &lt;em&gt;Heterogeneous AutoRegressive (HAR)&lt;/em&gt; volatility forecasting model that &lt;em&gt;has become the workhorse of the volatility forecasting literature&lt;/em&gt;&lt;sup id=&quot;fnref:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; since its introduction by Corsi&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;I will now describe an extension of that model due to Bollerslev et al.&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, called the &lt;em&gt;Heterogeneous Exponential (HExp)&lt;/em&gt; volatility forecasting model, in which the lagged HAR volatility components are exponentially - rather than arithmetically - averaged.&lt;/p&gt;

&lt;p&gt;In addition, I will also discuss the panel-based estimation procedure for the HExp and the HAR model parameters proposed in Bollerslev et al.&lt;sup id=&quot;fnref:2:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, which is empirically demonstrated&lt;sup id=&quot;fnref:2:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; to improve the out-of-sample forecasting performances of these two volatility forecasting models when compared to the standard&lt;sup id=&quot;fnref:3:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; individual asset-based procedure.&lt;/p&gt;

&lt;p&gt;Finally, I will illustrate the practical performances of the HExp volatility forecasting model and its panel-based parameters estimation procedure in the context of monthly volatility forecasting for various ETFs.&lt;/p&gt;

&lt;h2 id=&quot;mathematical-preliminaries-reminders&quot;&gt;Mathematical preliminaries (reminders)&lt;/h2&gt;

&lt;p&gt;This section contains reminders from the &lt;a href=&quot;/blog/volatility-forecasting-simple-and-exponentially-weighted-moving-average-models/&quot;&gt;first blog post&lt;/a&gt; of this series.&lt;/p&gt;

&lt;h3 id=&quot;volatility-modelling-and-volatility-proxies&quot;&gt;Volatility modelling and volatility proxies&lt;/h3&gt;

&lt;p&gt;Let $r_t$ be the &lt;a href=&quot;https://en.wikipedia.org/wiki/Rate_of_return#Logarithmic_or_continuously_compounded_return&quot;&gt;logarithmic&lt;/a&gt; return of an asset over a time period $t$ (a day, a week, a month..), over which its (conditional) mean return is supposed to be null.&lt;/p&gt;

&lt;p&gt;Then:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;The asset (conditional) variance is defined as $ \sigma_t^2 = \mathbb{E} \left[ r_t^2 \right] $&lt;/p&gt;

    &lt;p&gt;From this definition, the squared return $r_t^2$ of an asset is a (noisy&lt;sup id=&quot;fnref:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;) &lt;em&gt;variance estimator&lt;/em&gt; - or &lt;em&gt;variance proxy&lt;/em&gt;&lt;sup id=&quot;fnref:6:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; - for that asset variance over the considered time period.&lt;/p&gt;

    &lt;p&gt;Another example of an asset variance proxy is &lt;a href=&quot;/blog/range-based-volatility-estimators-overview-and-examples-of-usage/&quot;&gt;the Parkinson range&lt;/a&gt; of an asset.&lt;/p&gt;

    &lt;p&gt;Yet another example of an asset variance proxy, this time over a specific time period $t$ of one day, is the &lt;em&gt;daily realized variance&lt;/em&gt; $RV_t$, which is defined as the sum of the asset squared intraday returns sampled at a high frequency (1 minutes, 5 minutes, 15 minutes…).&lt;/p&gt;

    &lt;p&gt;The generic notation for an asset variance proxy in this blog post is $\tilde{\sigma}_t^2$.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The asset (conditional) volatility is defined as $ \sigma_t = \sqrt { \sigma_t^2 } $&lt;/p&gt;

    &lt;p&gt;The generic notation for an asset volatility proxy in this blog post is $\tilde{\sigma}_t$.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;weighted-moving-average-volatility-forecasting-model&quot;&gt;Weighted moving average volatility forecasting model&lt;/h3&gt;

&lt;p&gt;Boudoukh et al.&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; shows that many seemingly different methods of volatility forecasting actually share the same underlying representation of the estimate of an asset next period’s 
variance $\hat{\sigma}_{T+1}^2$ as a weighted moving average of that asset past periods’ variance proxies $\tilde{\sigma}^2_t$, $t=1..T$, with&lt;/p&gt;

\[\hat{\sigma}_{T+1}^2 =  w_0 + \sum_{i=1}^{k} w_i \tilde{\sigma}^2_{T+1-i}\]

&lt;p&gt;, where:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$k$, with $1 \leq k \leq T$, is the size of the moving average, possibly time-dependent&lt;/li&gt;
  &lt;li&gt;$w_i, i=0..k$ are the weights of the moving average, possibly time-dependent as well&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;the-original-har-volatility-forecasting-model&quot;&gt;The original HAR volatility forecasting model&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;/blog/volatility-forecasting-har-model/&quot;&gt;The HAR volatility forecasting model&lt;/a&gt; is &lt;em&gt;an additive cascade model of different volatility components&lt;/em&gt;&lt;sup id=&quot;fnref:3:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; subject to &lt;em&gt;economically meaningful restrictions&lt;/em&gt;&lt;sup id=&quot;fnref:3:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Under that model, an asset next day’s daily realized variance $RV_{T+1}$ is forecasted through the formula&lt;sup id=&quot;fnref:6:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

\[\hat{RV}_{T+1} = \beta + \beta_d RV_{T} + \beta_w RV_{T}^w + \beta_m RV_{T}^m\]

&lt;p&gt;, where:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$\hat{RV}_{T+1}$ is the forecast at time $T$ of the asset next day’s daily realized variance $RV_{T+1}$&lt;/li&gt;
  &lt;li&gt;$RV_T$ is the asset daily realized variance at time $T$&lt;/li&gt;
  &lt;li&gt;$RV_T^w = \frac{1}{5} \sum_{i=1}^5 RV_{T-i+1}$ is the asset weekly realized variance at time $T$&lt;/li&gt;
  &lt;li&gt;$RV_T^m = \frac{1}{22} \sum_{i=1}^{22} RV_{T-i+1}$ is the asset monthly realized variance at time $T$&lt;/li&gt;
  &lt;li&gt;$\beta$, $\beta_d$, $\beta_w$ and $\beta_m$ are the HAR model parameters, to be determined&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;the-hexp-volatility-forecasting-model&quot;&gt;The HExp volatility forecasting model&lt;/h2&gt;

&lt;h3 id=&quot;discontinuity-of-the-har-volatility-forecasting-model&quot;&gt;Discontinuity of the HAR volatility forecasting model&lt;/h3&gt;

&lt;p&gt;Under the HAR volatility forecasting model, &lt;em&gt;forecasted future volatilities depends on the past volatilities in a way that is [dis]continuous […] in the lag lengths&lt;/em&gt;&lt;sup id=&quot;fnref:2:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; due to the presence of &lt;a href=&quot;/blog/volatility-forecasting-simple-and-exponentially-weighted-moving-average-models/&quot;&gt;simple moving averages&lt;/a&gt;, 
which might lead to &lt;em&gt;potential variance estimation issues&lt;/em&gt;&lt;sup id=&quot;fnref:2:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;As noted by Bollerslev et al.&lt;sup id=&quot;fnref:2:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;The stepwise nature of the volatility factors employed in the HAR models, imply that the forecasts from the models are subject to potentially abrupt changes as an unusually large/small daily lagged [realized variance] drops out of the sums for the longer-horizon lagged volatility factors.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Figure 1, adapted from Bollerslev et al.&lt;sup id=&quot;fnref:2:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, illustrates &lt;em&gt;the lag coefficients implied by the regression coefficients&lt;/em&gt;&lt;sup id=&quot;fnref:2:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; of the HAR model together with those of a 21-day simple moving average model.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/volatility-forecasting-har-weights-bollerslev.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/volatility-forecasting-har-weights-bollerslev-small.png&quot; alt=&quot;Lag coefficients of the HAR and of the 21-day simple moving average volatility forecasting models. Source: Bollerslev et al.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 1. Lag coefficients of the HAR and of the 21-day simple moving average volatility forecasting models. Source: Bollerslev et al.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;The discontinuity of the HAR model at the 1-day, 5-day and 21-day lags&lt;sup id=&quot;fnref:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; is apparent, similar in spirit to the discontinuity of the 21-day simple moving average model at the 21-day lag.&lt;/p&gt;

&lt;h3 id=&quot;the-original-hexp-volatility-forecasting-model&quot;&gt;The original HExp volatility forecasting model&lt;/h3&gt;

&lt;p&gt;In order to &lt;em&gt;avoid the stepwise changes inherent in the forecast from the HAR component-type structure&lt;/em&gt;&lt;sup id=&quot;fnref:2:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, Bollerslev et al.&lt;sup id=&quot;fnref:2:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; proposes to replace the simple moving averages appearing in the HAR model by &lt;a href=&quot;/blog/volatility-forecasting-simple-and-exponentially-weighted-moving-average-models/&quot;&gt;exponentially weighted moving averages&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Under the resulting volatility forecasting model - denoted &lt;em&gt;the Heterogenous Exponential realized volatility model (HExp for short)&lt;/em&gt;&lt;sup id=&quot;fnref:2:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; - an asset next day’s daily realized variance $RV_{T+1}$ is forecasted through the formula&lt;sup id=&quot;fnref:2:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;sup id=&quot;fnref:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

\[\hat{RV}_{T+1} = \beta + \beta_d ExpVP_T^{\lambda(1)} + \beta_w ExpVP_T^{\lambda(5)} + \beta_m ExpVP_T^{\lambda(25)} + \beta_h ExpVP_T^{\lambda(125)}\]

&lt;p&gt;, where:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$\hat{RV}_{T+1}$ is the forecast at time $T$ of the asset next day’s daily realized variance $RV_{T+1}$&lt;/li&gt;
  &lt;li&gt;$ExpVP_T^{\lambda(CoM)}$ $=$ $\sum_{i=1}^T \frac{e^{-i \lambda(CoM)}}{\sum_{j=1}^T e^{-j \lambda(CoM)}} RV_{T+1-i} $&lt;/li&gt;
  &lt;li&gt;$\lambda \left(CoM\right)$ $=$ $\log \left( 1 + \frac{1}{CoM} \right)$, with &lt;em&gt;CoM&lt;/em&gt; standing for &lt;em&gt;center-of-mass&lt;/em&gt;&lt;sup id=&quot;fnref:2:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
  &lt;li&gt;$RV_i$ is the asset daily realized variance at time $i$, $i=1..T$&lt;/li&gt;
  &lt;li&gt;$\beta$, $\beta_d$, $\beta_w$, $\beta_m$ and $\beta_h$ are the HExp model parameters, to be determined&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To be noted that each center-of-mass used in the HExp model (1, 5, 25 and 125) &lt;em&gt;effectively summarizes the “average” horizon of the lagged realized volatilities that it uses&lt;/em&gt;&lt;sup id=&quot;fnref:2:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; and that they all have been chosen in Bollerslev et al.&lt;sup id=&quot;fnref:2:14&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; so as to &lt;em&gt;“span” the universe of past [realized variance]’s in a way that is both parsimonious and “smooth”&lt;/em&gt;&lt;sup id=&quot;fnref:2:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Speaking of smoothness, Figure 2, again adapted from Bollerslev et al.&lt;sup id=&quot;fnref:2:16&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, compares &lt;em&gt;the lag coefficients implied by the regression coefficients&lt;/em&gt;&lt;sup id=&quot;fnref:2:17&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; of the HAR model with those of the HExp model.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/volatility-forecasting-hexp-weights-bollerslev.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/volatility-forecasting-hexp-weights-bollerslev-small.png&quot; alt=&quot;Lag coefficients of the HAR and of the HExp volatility forecasting models. Source: Bollerslev et al.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 2. Lag coefficients of the HAR and of the HExp volatility forecasting models. Source: Bollerslev et al.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;The continuous nature of the HExp volatility forecasting model is clearly visible.&lt;/p&gt;

&lt;p&gt;In terms of practical performances, the HExp model &lt;em&gt;perform[s] well in out-of-sample risk forecasting&lt;/em&gt;&lt;sup id=&quot;fnref:2:18&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; and is even slightly more performant than the HAR model in terms of $r$-squared, as can be seen on Figure 3 adapted from Bollerslev et al.&lt;sup id=&quot;fnref:2:19&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/volatility-forecasting-hexp-oos-performances-bollerslev.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/volatility-forecasting-hexp-oos-performances-bollerslev-small.png&quot; alt=&quot;Out-of-sample $R$-squared of the HAR model v.s. the HExp model for predicting the 20-day future realized volatility of several assets and for different methods of parameters estimation. Source: Bollerslev et al.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 3. Out-of-sample $r$-squared of the HAR model v.s. the HExp model for predicting the 20-day future realized volatility of several assets and for different methods of parameters estimation (&lt;i&gt;Ind&lt;/i&gt;, &lt;i&gt;Panel&lt;/i&gt;, &lt;i&gt;Mega&lt;/i&gt;, that will be discussed in the next section). Source: Bollerslev et al.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h3 id=&quot;realized-variance-vs-generic-variance-proxy&quot;&gt;Realized variance v.s. generic variance proxy&lt;/h3&gt;

&lt;p&gt;The original HExp model described in the previous subsection relies on a very specific asset variance proxy - the realized variance of an asset - over a very specific time period - a day - for its definition.&lt;/p&gt;

&lt;p&gt;Similarly to &lt;a href=&quot;/blog/volatility-forecasting-har-model/&quot;&gt;the HAR model&lt;/a&gt;, it is possible to replace the daily realized variance by any generic daily variance estimator like daily squared returns&lt;sup id=&quot;fnref:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:10&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt; or any daily &lt;a href=&quot;/blog/range-based-volatility-estimators-overview-and-examples-of-usage/&quot;&gt;range-based variance estimator&lt;/a&gt; (Parkinson, Garman-Klass, Rogers-Satchell…).&lt;/p&gt;

&lt;p&gt;This leads to the generic HExp volatility forecasting model, under which an asset next days’s conditional variance $\sigma_{T+1}^2$ is forecasted through the formula&lt;/p&gt;

\[\hat{\sigma}_{T+1}^2 = \beta + \beta_d ExpVP_T^{\lambda(1)} + \beta_w ExpVP_T^{\lambda(5)} + \beta_m ExpVP_T^{\lambda(25)} + \beta_h ExpVP_T^{\lambda(125)}\]

&lt;p&gt;, where:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$\hat{\sigma}_{T+1}^2$ is the forecast at time $T$ of the asset next day’s conditional variance $\sigma_{T+1}^2$&lt;/li&gt;
  &lt;li&gt;$ExpVP_T^{\lambda(CoM)} = \sum_{i=1}^T \frac{e^{-i \lambda(CoM)}}{\sum_{j=1}^T e^{-j \lambda(CoM)}} \tilde{\sigma}^2_{T+1-i} $&lt;/li&gt;
  &lt;li&gt;$\lambda \left(CoM\right) = \log \left( 1 + \frac{1}{CoM} \right) $&lt;/li&gt;
  &lt;li&gt;$\tilde{\sigma}^2_{i}$ is the asset daily variance estimator at time $i$, $i=1..T$&lt;/li&gt;
  &lt;li&gt;$\beta$, $\beta_d$, $\beta_w$, $\beta_m$ and $\beta_h$ are the HExp model parameters, to be determined&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;relationship-with-the-generic-weighted-moving-average-model&quot;&gt;Relationship with the generic weighted moving average model&lt;/h3&gt;

&lt;p&gt;From its definition, it is not too difficult to see that the HExp volatility forecasting model is a specific kind of weighted moving average volatility forecasting model, with:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$w_0 = \beta$&lt;/li&gt;
  &lt;li&gt;$w_1 = \beta_d \frac{e^{- \lambda(1)}}{\sum_{j=1}^T e^{-j \lambda(1)}}$ $+$ $\beta_w \frac{e^{- \lambda(5)}}{\sum_{j=1}^T e^{-j \lambda(5)}}$ $+$ $\beta_m \frac{e^{- \lambda(25)}}{\sum_{j=1}^T e^{-j \lambda(25)}}$ $+$ $\beta_h \frac{e^{- \lambda(125)}}{\sum_{j=1}^T e^{-j \lambda(125)}} $&lt;/li&gt;
  &lt;li&gt;$w_2 = …$&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;volatility-forecasting-formulas&quot;&gt;Volatility forecasting formulas&lt;/h3&gt;

&lt;p&gt;Under an HExp volatility forecasting model, the generic weighted moving average volatility forecasting formula becomes:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;To estimate an asset next day’s volatility:&lt;/p&gt;

\[\hat{\sigma}_{T+1} = \sqrt{ \beta + \beta_d ExpVP_T^{\lambda(1)} + \beta_w ExpVP_T^{\lambda(5)} + \beta_m ExpVP_T^{\lambda(25)} + \beta_h ExpVP_T^{\lambda(125)} }\]
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;To estimate an asset next $h$-day’s ahead volatility&lt;sup id=&quot;fnref:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;, $h \geq 2$, using an indirect&lt;sup id=&quot;fnref:4:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; multi-step ahead forecast scheme:&lt;/p&gt;

\[\hat{\sigma}_{T+h} = \sqrt{  \beta + \beta_d ExpVP_{T+h-1}^{\lambda(1)} + \beta_w ExpVP_{T+h-1}^{\lambda(5)} + \beta_m ExpVP_{T+h-1}^{\lambda(25)} + \beta_h ExpVP_{T+h-1}^{\lambda(125)} }\]
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;, where:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;$ExpVP_{T+h-1}^{\lambda(CoM)}$ $=$ $\sum_{i=1}^T \frac{e^{-i \lambda(CoM)}}{\sum_{j=1}^{T+h-1} e^{-j \lambda(CoM)}} \tilde{\sigma}^2_{T+1-i} $ $+$ $\sum_{i=1}^{h-1} \frac{e^{-i \lambda(CoM)}}{\sum_{j=1}^{T+h-1} e^{-j \lambda(CoM)}}  \hat{\sigma}^2_{T+h-i} $&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;To estimate an asset aggregated volatility&lt;sup id=&quot;fnref:8:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt; over the next $h$ days:&lt;/p&gt;

\[\hat{\sigma}_{T+1:T+h} = \sqrt{ \sum_{i=1}^{h} \hat{\sigma}^2_{T+i} }\]
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;estimating-the-hexp-model-parameters&quot;&gt;Estimating the HExp model parameters&lt;/h2&gt;

&lt;h3 id=&quot;individual-estimation&quot;&gt;Individual estimation&lt;/h3&gt;

&lt;p&gt;As for &lt;a href=&quot;/blog/volatility-forecasting-har-model/&quot;&gt;the HAR model&lt;/a&gt;, the easiest way to estimate the HExp model parameters is &lt;em&gt;by applying simple linear regression&lt;/em&gt;&lt;sup id=&quot;fnref:3:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; &lt;em&gt;on an asset-by-asset basis&lt;/em&gt;&lt;sup id=&quot;fnref:2:20&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, in which case the asset-specific &lt;a href=&quot;https://en.wikipedia.org/wiki/Ordinary_least_squares&quot;&gt;ordinary least squares (OLS) estimator&lt;/a&gt; 
of the parameters $\beta$, $\beta_d$, $\beta_w$, $\beta_m$ and $\beta_h$ at time $T$ is the solution of the minimization problem&lt;sup id=&quot;fnref:6:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

\[\argmin_{ \left( \beta, \beta_d, \beta_w, \beta_m, \beta_h \right) \in \mathbb{R}^{5}} \sum_{t=1}^T \left( \tilde{\sigma}_{t}^2 - \beta - \beta_d ExpVP_{t-1}^{\lambda(1)} - \beta_w ExpVP_{t-1}^{\lambda(5)} - \beta_m ExpVP_{t-1}^{\lambda(25)} - \beta_h ExpVP_{t-1}^{\lambda(125)} \right)^2\]

&lt;p&gt;Alternatively, following Clements and Preve&lt;sup id=&quot;fnref:6:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; and Clements et al.&lt;sup id=&quot;fnref:4:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, more complex asset-specific least squares estimators than OLS can be used to try to improve forecast performances (&lt;a href=&quot;https://en.wikipedia.org/wiki/Weighted_least_squares&quot;&gt;weighted least squares estimators (WLS)&lt;/a&gt;, &lt;a href=&quot;https://en.wikipedia.org/wiki/Robust_regression&quot;&gt;robust least squares estimators (RLS)&lt;/a&gt;…)&lt;/p&gt;

&lt;h3 id=&quot;panel-based-estimation&quot;&gt;Panel-based estimation&lt;/h3&gt;

&lt;p&gt;Bollerslev et al.&lt;sup id=&quot;fnref:2:21&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; establishes that the dynamics of realized volatility are common across many different financial assets.&lt;/p&gt;

&lt;p&gt;This is illustrated in Figure 4, directly taken from Bollerslev et al.&lt;sup id=&quot;fnref:2:22&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, which depicts &lt;em&gt;the unconditional distributions of daily normalized realized volatilities&lt;/em&gt; for different asset classes.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/volatility-forecasting-hexp-daily-rv-distributions-bollerslev.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/volatility-forecasting-hexp-daily-rv-distributions-bollerslev-small.png&quot; alt=&quot;Normalized unconditional daily realized variance distributions for misc. asset classes. Source: Bollerslev et al.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 4. Normalized unconditional daily realized variance distributions for misc. asset classes. Source: Bollerslev et al.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;From this figure, volatility indeed seems to behave &lt;em&gt;similarly across asset classes&lt;/em&gt;&lt;sup id=&quot;fnref:2:23&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; and Bollerslev et al.&lt;sup id=&quot;fnref:2:24&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; proposes &lt;em&gt;to exploit these strong similarities in the distributions of the volatilities across and within asset classes&lt;/em&gt;&lt;sup id=&quot;fnref:2:25&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; by using 
&lt;em&gt;panel regression techniques that force the [HExp model parameters] to be the same within and across different asset classes&lt;/em&gt;&lt;sup id=&quot;fnref:2:26&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In more details, Bollerslev et al.&lt;sup id=&quot;fnref:2:27&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; reformulates the generic HExp volatility forecasting model as follows:&lt;/p&gt;

\[\hat{\sigma}_{T+1}^2 = \tilde{\sigma}_{T}^{2, LR} + \beta_d^P \left( ExpVP_T^{\lambda(1)} - \tilde{\sigma}_{T}^{2, LR} \right) + \beta_w^P \left( ExpVP_T^{\lambda(5)} - \tilde{\sigma}_{T}^{2, LR} \right) + \beta_m^P \left( ExpVP_T^{\lambda(25)} - \tilde{\sigma}_{T}^{2, LR} \right) + \beta_h^P \left( ExpVP_T^{\lambda(125)} - \tilde{\sigma}_{T}^{2, LR} \right)\]

&lt;p&gt;, where:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$ \tilde{\sigma}_{T}^{2, LR}$ is &lt;em&gt;a long-run volatility factor, equal to the expanding sample mean of [the asset daily variance estimator] from the start of the sample up until day $T$&lt;/em&gt;&lt;sup id=&quot;fnref:2:28&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
  &lt;li&gt;$\beta_d^P$, $\beta_w^P$, $\beta_m^P$ and $\beta_h^P$ are the HExp “panel” model parameters, to be determined&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Such a reformulation - called &lt;em&gt;centering&lt;/em&gt;&lt;sup id=&quot;fnref:2:29&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; in Bollerslev et al.&lt;sup id=&quot;fnref:2:30&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; - &lt;em&gt;eliminat[es] the level of the [asset] volatility&lt;/em&gt;&lt;sup id=&quot;fnref:2:31&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; from the HExp model and enables&lt;sup id=&quot;fnref:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:11&quot; class=&quot;footnote&quot;&gt;10&lt;/a&gt;&lt;/sup&gt; the parameters $\beta_d^P$, $\beta_w^P$, $\beta_m^P$ and $\beta_h^P$ to be estimated simultaneously for all assets by &lt;a href=&quot;https://en.wikipedia.org/wiki/Cross-sectional_data&quot;&gt;panel&lt;/a&gt; regression techniques 
&lt;em&gt;that add power by exploiting the similarities in the cross-asset risk characteristics&lt;/em&gt;&lt;sup id=&quot;fnref:2:32&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Additionally&lt;sup id=&quot;fnref:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:9&quot; class=&quot;footnote&quot;&gt;11&lt;/a&gt;&lt;/sup&gt;, that specific reformulation &lt;em&gt;ensures that the iterated long-run forecasts from the model constructed on day $T$ converges to this day $T$ estimate of the “unconditional” volatility&lt;/em&gt;&lt;sup id=&quot;fnref:2:33&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;From a practical perspective, Figure 3 shows that estimating the HExp parameters&lt;sup id=&quot;fnref:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt; through panel-based estimation (lines &lt;em&gt;Panel&lt;/em&gt; and &lt;em&gt;Mega&lt;/em&gt;) leads to much better performances v.s. their individual estimation (lines &lt;em&gt;Ind&lt;/em&gt;).&lt;/p&gt;

&lt;h2 id=&quot;implementation-in-portfolio-optimizer&quot;&gt;Implementation in Portfolio Optimizer&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Portfolio Optimizer&lt;/strong&gt; implements the HExp volatility forecasting model - together with &lt;a href=&quot;/blog/volatility-forecasting-har-model/&quot;&gt;all the extensions of its predecessor&lt;/a&gt; (the insanity filter described in Clements and Preve&lt;sup id=&quot;fnref:6:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;, the log transformation…) - through the endpoint &lt;a href=&quot;https://docs.portfoliooptimizer.io/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/assets/volatility/forecast/hexp&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This endpoint supports the 4 variance proxies below:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Squared close-to-close returns&lt;/li&gt;
  &lt;li&gt;Demeaned squared close-to-close returns&lt;/li&gt;
  &lt;li&gt;The Parkinson range&lt;/li&gt;
  &lt;li&gt;The jump-adjusted Parkinson range&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This endpoint also supports:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Individual and panel-based estimation of the HExp model parameters.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Using up to 5 centers-of-mass for the variance proxies, the default ones being 1, 5 and 25&lt;sup id=&quot;fnref:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:15&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;example-of-usage---volatility-forecasting-at-monthly-level-for-various-etfs&quot;&gt;Example of usage - Volatility forecasting at monthly level for various ETFs&lt;/h2&gt;

&lt;p&gt;As an example of usage, I propose to enrich the results of &lt;a href=&quot;/blog/volatility-forecasting-har-model/&quot;&gt;the previous blog post&lt;/a&gt;, in which monthly forecasts produced by different volatility models&lt;sup id=&quot;fnref:16&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:16&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt; 
are compared - using Mincer-Zarnowitz&lt;sup id=&quot;fnref:26&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:26&quot; class=&quot;footnote&quot;&gt;15&lt;/a&gt;&lt;/sup&gt; regressions - to the next month’s close-to-close observed volatility for 10 ETFs representative&lt;sup id=&quot;fnref:30&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:30&quot; class=&quot;footnote&quot;&gt;16&lt;/a&gt;&lt;/sup&gt; of misc. asset classes:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;U.S. stocks (SPY ETF)&lt;/li&gt;
  &lt;li&gt;European stocks (EZU ETF)&lt;/li&gt;
  &lt;li&gt;Japanese stocks (EWJ ETF)&lt;/li&gt;
  &lt;li&gt;Emerging markets stocks (EEM ETF)&lt;/li&gt;
  &lt;li&gt;U.S. REITs (VNQ ETF)&lt;/li&gt;
  &lt;li&gt;International REITs (RWX ETF)&lt;/li&gt;
  &lt;li&gt;U.S. 7-10 year Treasuries (IEF ETF)&lt;/li&gt;
  &lt;li&gt;U.S. 20+ year Treasuries (TLT ETF)&lt;/li&gt;
  &lt;li&gt;Commodities (DBC ETF)&lt;/li&gt;
  &lt;li&gt;Gold (GLD ETF)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;individual-estimation-1&quot;&gt;Individual estimation&lt;/h3&gt;

&lt;p&gt;Averaged results for all ETFs/regression models over each ETF price history&lt;sup id=&quot;fnref:27&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:27&quot; class=&quot;footnote&quot;&gt;17&lt;/a&gt;&lt;/sup&gt; are the following&lt;sup id=&quot;fnref:28&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:28&quot; class=&quot;footnote&quot;&gt;18&lt;/a&gt;&lt;/sup&gt;, when adding the HExp volatility forecasting model and its log variation&lt;sup id=&quot;fnref:14&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:14&quot; class=&quot;footnote&quot;&gt;19&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Volatility model&lt;/th&gt;
      &lt;th&gt;Variance proxy&lt;/th&gt;
      &lt;th&gt;$\bar{\alpha}$&lt;/th&gt;
      &lt;th&gt;$\bar{\beta}$&lt;/th&gt;
      &lt;th&gt;$\bar{R^2}$&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;EWMA, optimal $\lambda$&lt;/td&gt;
      &lt;td&gt;Squared close-to-close returns&lt;/td&gt;
      &lt;td&gt;4.7%&lt;/td&gt;
      &lt;td&gt;0.73&lt;/td&gt;
      &lt;td&gt;45%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;HAR&lt;/td&gt;
      &lt;td&gt;Squared close-to-close returns&lt;/td&gt;
      &lt;td&gt;-0.7%&lt;/td&gt;
      &lt;td&gt;0.95&lt;/td&gt;
      &lt;td&gt;46%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;HAR (log)&lt;/td&gt;
      &lt;td&gt;Squared close-to-close returns&lt;/td&gt;
      &lt;td&gt;0.5%&lt;/td&gt;
      &lt;td&gt;0.62&lt;/td&gt;
      &lt;td&gt;40%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;HExp&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Squared close-to-close returns&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;-0.7%&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;0.93&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;48%&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;HExp (log)&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Squared close-to-close returns&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;2.1%&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;0.57&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;42%&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;EWMA, optimal $\lambda$&lt;/td&gt;
      &lt;td&gt;Parkinson range&lt;/td&gt;
      &lt;td&gt;4.3%&lt;/td&gt;
      &lt;td&gt;1.06&lt;/td&gt;
      &lt;td&gt;48%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;HAR&lt;/td&gt;
      &lt;td&gt;Parkinson range&lt;/td&gt;
      &lt;td&gt;0.1%&lt;/td&gt;
      &lt;td&gt;1.25&lt;/td&gt;
      &lt;td&gt;44%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;HAR (log)&lt;/td&gt;
      &lt;td&gt;Parkinson range&lt;/td&gt;
      &lt;td&gt;1.9%&lt;/td&gt;
      &lt;td&gt;1.22&lt;/td&gt;
      &lt;td&gt;50%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;HExp&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Parkinson range&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;-0.5%&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;1.29&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;47%&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;HExp (log)&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Parkinson range&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;2.0%&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;1.21&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;51%&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;EWMA, optimal $\lambda$&lt;/td&gt;
      &lt;td&gt;Jump-adjusted Parkinson range&lt;/td&gt;
      &lt;td&gt;4.0%&lt;/td&gt;
      &lt;td&gt;0.76&lt;/td&gt;
      &lt;td&gt;45%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;HAR&lt;/td&gt;
      &lt;td&gt;Jump-adjusted Parkinson range&lt;/td&gt;
      &lt;td&gt;-1.4%&lt;/td&gt;
      &lt;td&gt;0.99&lt;/td&gt;
      &lt;td&gt;47%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;HAR (log)&lt;/td&gt;
      &lt;td&gt;Jump-adjusted Parkinson range&lt;/td&gt;
      &lt;td&gt;0.9%&lt;/td&gt;
      &lt;td&gt;0.92&lt;/td&gt;
      &lt;td&gt;51%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;HExp&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Jump-adjusted Parkinson range&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;-1.5%&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;0.98&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;49%&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;HExp (log)&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Jump-adjusted Parkinson range&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;1.1%&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;0.91&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;52%&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;panel-based-estimation-1&quot;&gt;Panel-based estimation&lt;/h3&gt;

&lt;p&gt;Averaged results for all ETFs/regression models over the common ETF price history&lt;sup id=&quot;fnref:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:13&quot; class=&quot;footnote&quot;&gt;20&lt;/a&gt;&lt;/sup&gt; are the following&lt;sup id=&quot;fnref:28:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:28&quot; class=&quot;footnote&quot;&gt;18&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;When using the EWMA, HAR and HExp volatility forecasting models with an asset-specific parameters estimation procedure (for reference):&lt;/p&gt;

    &lt;table&gt;
      &lt;thead&gt;
        &lt;tr&gt;
          &lt;th&gt;Volatility model&lt;/th&gt;
          &lt;th&gt;Variance proxy&lt;/th&gt;
          &lt;th&gt;$\bar{\alpha}$&lt;/th&gt;
          &lt;th&gt;$\bar{\beta}$&lt;/th&gt;
          &lt;th&gt;$\bar{R^2}$&lt;/th&gt;
        &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
        &lt;tr&gt;
          &lt;td&gt;EWMA, optimal $\lambda$ (individual est.)&lt;/td&gt;
          &lt;td&gt;Squared close-to-close returns&lt;/td&gt;
          &lt;td&gt;5%&lt;/td&gt;
          &lt;td&gt;0.72&lt;/td&gt;
          &lt;td&gt;43%&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
          &lt;td&gt;HAR (individual est.)&lt;/td&gt;
          &lt;td&gt;Squared close-to-close returns&lt;/td&gt;
          &lt;td&gt;-0.2%&lt;/td&gt;
          &lt;td&gt;0.89&lt;/td&gt;
          &lt;td&gt;46%&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
          &lt;td&gt;HExp (individual est.)&lt;/td&gt;
          &lt;td&gt;Squared close-to-close returns&lt;/td&gt;
          &lt;td&gt;0.02%&lt;/td&gt;
          &lt;td&gt;0.87&lt;/td&gt;
          &lt;td&gt;47%&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
          &lt;td&gt;EWMA, optimal $\lambda$ (individual est.)&lt;/td&gt;
          &lt;td&gt;Parkinson range&lt;/td&gt;
          &lt;td&gt;4.8%&lt;/td&gt;
          &lt;td&gt;1.02&lt;/td&gt;
          &lt;td&gt;45%&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
          &lt;td&gt;HAR (individual est.)&lt;/td&gt;
          &lt;td&gt;Parkinson range&lt;/td&gt;
          &lt;td&gt;0.7%&lt;/td&gt;
          &lt;td&gt;1.17&lt;/td&gt;
          &lt;td&gt;46%&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
          &lt;td&gt;HExp (individual est.)&lt;/td&gt;
          &lt;td&gt;Parkinson range&lt;/td&gt;
          &lt;td&gt;1.2%&lt;/td&gt;
          &lt;td&gt;1.15&lt;/td&gt;
          &lt;td&gt;47%&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
          &lt;td&gt;EWMA, optimal $\lambda$ (individual est.)&lt;/td&gt;
          &lt;td&gt;Jump-adjusted Parkinson range&lt;/td&gt;
          &lt;td&gt;4.5%&lt;/td&gt;
          &lt;td&gt;0.73&lt;/td&gt;
          &lt;td&gt;43%&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
          &lt;td&gt;HAR (individual est.)&lt;/td&gt;
          &lt;td&gt;Jump-adjusted Parkinson range&lt;/td&gt;
          &lt;td&gt;-0.9%&lt;/td&gt;
          &lt;td&gt;0.94&lt;/td&gt;
          &lt;td&gt;46%&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
          &lt;td&gt;HExp (individual est.)&lt;/td&gt;
          &lt;td&gt;Jump-adjusted Parkinson range&lt;/td&gt;
          &lt;td&gt;-0.4%&lt;/td&gt;
          &lt;td&gt;0.90&lt;/td&gt;
          &lt;td&gt;46%&lt;/td&gt;
        &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;When using the HAR and HExp volatility forecasting models with a panel-based parameters estimation procedure comparable to the &lt;em&gt;Mega&lt;/em&gt; procedure described in Bollerslev et al.&lt;sup id=&quot;fnref:2:34&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

    &lt;table&gt;
      &lt;thead&gt;
        &lt;tr&gt;
          &lt;th&gt;Volatility model&lt;/th&gt;
          &lt;th&gt;Variance proxy&lt;/th&gt;
          &lt;th&gt;$\bar{\alpha}$&lt;/th&gt;
          &lt;th&gt;$\bar{\beta}$&lt;/th&gt;
          &lt;th&gt;$\bar{R^2}$&lt;/th&gt;
        &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
        &lt;tr&gt;
          &lt;td&gt;HAR (panel est.)&lt;/td&gt;
          &lt;td&gt;Squared close-to-close returns&lt;/td&gt;
          &lt;td&gt;2.2%&lt;/td&gt;
          &lt;td&gt;0.76&lt;/td&gt;
          &lt;td&gt;47%&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
          &lt;td&gt;HExp (panel est.)&lt;/td&gt;
          &lt;td&gt;Squared close-to-close returns&lt;/td&gt;
          &lt;td&gt;1.1%&lt;/td&gt;
          &lt;td&gt;0.80&lt;/td&gt;
          &lt;td&gt;47%&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
          &lt;td&gt;HAR (panel est.)&lt;/td&gt;
          &lt;td&gt;Parkinson range&lt;/td&gt;
          &lt;td&gt;3.1%&lt;/td&gt;
          &lt;td&gt;1.11&lt;/td&gt;
          &lt;td&gt;50%&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
          &lt;td&gt;HExp (panel est.)&lt;/td&gt;
          &lt;td&gt;Parkinson range&lt;/td&gt;
          &lt;td&gt;3.6%&lt;/td&gt;
          &lt;td&gt;1.08&lt;/td&gt;
          &lt;td&gt;50%&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
          &lt;td&gt;HAR (panel est.)&lt;/td&gt;
          &lt;td&gt;Jump-adjusted Parkinson range&lt;/td&gt;
          &lt;td&gt;0.09%&lt;/td&gt;
          &lt;td&gt;0.88&lt;/td&gt;
          &lt;td&gt;46%&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
          &lt;td&gt;HExp (panel est.)&lt;/td&gt;
          &lt;td&gt;Jump-adjusted Parkinson range&lt;/td&gt;
          &lt;td&gt;-0.07%&lt;/td&gt;
          &lt;td&gt;0.89&lt;/td&gt;
          &lt;td&gt;48%&lt;/td&gt;
        &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;comments&quot;&gt;Comments&lt;/h3&gt;

&lt;p&gt;From the results of the two previous subsections, it is possible to make the following comments:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Consistent with Bollerslev et al.&lt;sup id=&quot;fnref:2:35&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, the HExp model is uniformly better than the HAR model in terms of r-squared.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Contrary to Bollerslev et al.&lt;sup id=&quot;fnref:2:36&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, the panel-based estimation procedure does not seem to dramatically improve the HAR/HExp models performances, except when the Parkinson range is used as a daily variance proxy.&lt;/p&gt;

    &lt;p&gt;Comparing lines #1,2,5,6 with lines #3,4 suggests to perform the same test with (high frequency) realized variances in order to confirm that this is due to the “quality” of the daily variance proxy used.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;This blog post empirically confirmed that the HExp volatility forecasting model of Bollerslev et al.&lt;sup id=&quot;fnref:2:37&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; belongs to the category of the &lt;em&gt;state-of-the-art dynamic [risk models]&lt;/em&gt;&lt;sup id=&quot;fnref:2:38&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; published in the litterature.&lt;/p&gt;

&lt;p&gt;This blog post also concludes this series on volatility forecasting by weighted moving average models, at least until I find a better such model than the HExp model.&lt;/p&gt;

&lt;p&gt;Waiting for that to happen or for a blog post on volatility forecasting by a non-weighted moving average model, feel free to &lt;a href=&quot;https://www.linkedin.com/in/roman-rubsamen/&quot;&gt;connect with me on LinkedIn&lt;/a&gt; or to &lt;a href=&quot;https://twitter.com/portfoliooptim&quot;&gt;follow me on Twitter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;–&lt;/p&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:4&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4733597&quot;&gt;Clements, Adam and Preve, Daniel P. A. and Tee, Clarence, Harvesting the HAR-X Volatility Model&lt;/a&gt;. &lt;a href=&quot;#fnref:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:4:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:4:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://academic.oup.com/jfec/article-abstract/7/2/174/856522&quot;&gt;Fulvio Corsi, A Simple Approximate Long-Memory Model of Realized Volatility, Journal of Financial Econometrics, Volume 7, Issue 2, Spring 2009, Pages 174–196&lt;/a&gt;. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:3:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://academic.oup.com/rfs/article/31/7/2729/5001472&quot;&gt;Tim Bollerslev, Benjamin Hood, John Huss, Lasse Heje Pedersen, Risk Everywhere: Modeling and Managing Volatility, The Review of Financial Studies, Volume 31, Issue 7, July 2018, Pages 2729–2773&lt;/a&gt;. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:2:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;11&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;12&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:12&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;13&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:13&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;14&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:14&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;15&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:15&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;16&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:16&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;17&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:17&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;18&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:18&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;19&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:19&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;20&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:20&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;21&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:21&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;22&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:22&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;23&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:23&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;24&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:24&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;25&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:25&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;26&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:26&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;27&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:27&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;28&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:28&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;29&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:29&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;30&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:30&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;31&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:31&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;32&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:32&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;33&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:33&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;34&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:34&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;35&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:35&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;36&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:36&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;37&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:37&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;38&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:38&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;39&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:6&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.sciencedirect.com/science/article/abs/pii/S0378426621002417&quot;&gt;Adam Clements, Daniel P.A. Preve, A Practical Guide to harnessing the HAR volatility model, Journal of Banking &amp;amp; Finance, Volume 133, 2021&lt;/a&gt;. &lt;a href=&quot;#fnref:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:6:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:6:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:6:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:6:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:6:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.pm-research.com/content/iijderiv/4/3/63&quot;&gt;Boudoukh, J., Richardson, M., &amp;amp; Whitelaw, R.F. (1997). Investigation of a class of volatility estimators, Journal of Derivatives, 4 Spring, 63-71&lt;/a&gt;. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:5&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;These lags correspond to the daily, weekly and monthly realized variance components of the original HAR model. &lt;a href=&quot;#fnref:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:7&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;In Bollerslev et al.&lt;sup id=&quot;fnref:2:39&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, the calculation of $ExpVP_T^{\lambda(CoM)}$ uses only the first 500 lags (i.e., is truncated to $T=500$) because &lt;em&gt;the influence of the remaining lags is numerically immaterial&lt;/em&gt;&lt;sup id=&quot;fnref:2:40&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:10&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Bollerslev et al.&lt;sup id=&quot;fnref:2:41&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; discusses the impact of replacing realized volatilities by daily squared returns in the HExp model. &lt;a href=&quot;#fnref:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:8&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://centaur.reading.ac.uk/21316/&quot;&gt;Brooks, Chris and Persand, Gitanjali (2003) Volatility forecasting for risk management. Journal of Forecasting, 22(1). pp. 1-22&lt;/a&gt;. &lt;a href=&quot;#fnref:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:8:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:11&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Eliminating the level of the asset volatility is a pre-requisite in order to use panel regression techniques because &lt;em&gt;the very different volatility levels for different asset classes means that&lt;/em&gt;&lt;sup id=&quot;fnref:2:42&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; 1) &lt;em&gt;it is unreasonable to force the $\beta$ intercepts to be the same&lt;/em&gt;&lt;sup id=&quot;fnref:2:43&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; for all assets and 2) it is necessary &lt;em&gt;to ensure that all [the remaining] parameters are “scale-free” in the sense that they do not depend on the level of risk&lt;/em&gt;&lt;sup id=&quot;fnref:2:44&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:9&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Thanks to that reformulation, the coefficients $\beta_d$, $\beta_w$, $\beta_m$ and $\beta_h$ are also free &lt;em&gt;(i.e., need not sum to one)&lt;/em&gt;&lt;sup id=&quot;fnref:2:45&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, which allows an easy fitting through OLS. &lt;a href=&quot;#fnref:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:12&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;As a side note, the centering reformulation described in Bollerslev et al.&lt;sup id=&quot;fnref:2:46&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; is also applicable to the HAR volatility forecasting model and is done in Bollerslev et al.&lt;sup id=&quot;fnref:2:47&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:12&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:15&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Contrary to Bollerslev et al.&lt;sup id=&quot;fnref:2:48&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, the center-of-mass 125 is not included by default in the HExp model as implemented in &lt;strong&gt;Portfolio Optimizer&lt;/strong&gt;; this choice was made to make the default HExp model implementation directly comparable with the default HAR model implementation. &lt;a href=&quot;#fnref:15&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:16&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Using &lt;strong&gt;Portfolio Optimizer&lt;/strong&gt;. &lt;a href=&quot;#fnref:16&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:26&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://econpapers.repec.org/bookchap/nbrnberch/1214.htm&quot;&gt;Mincer, J. and V. Zarnowitz (1969). The evaluation of economic forecasts. In J. Mincer (Ed.), Economic Forecasts and Expectations&lt;/a&gt;. &lt;a href=&quot;#fnref:26&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:30&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;These ETFs are used in the &lt;em&gt;Adaptative Asset Allocation&lt;/em&gt; strategy from &lt;a href=&quot;https://investresolve.com/&quot;&gt;ReSolve Asset Management&lt;/a&gt;, described in the paper &lt;em&gt;Adaptive Asset Allocation: A Primer&lt;/em&gt;&lt;sup id=&quot;fnref:31&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:31&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:30&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:27&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The common ending price history of all the ETFs is 31 August 2023, but there is no common starting price history, as all ETFs started trading on different dates. &lt;a href=&quot;#fnref:27&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:28&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;For all models, I used an expanding window for the volatility forecast computation. &lt;a href=&quot;#fnref:28&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:28:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:14&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The log HExp model is similar in spirit to the log HAR model described in &lt;a href=&quot;/blog/volatility-forecasting-har-model/&quot;&gt;the previous blog post&lt;/a&gt;. &lt;a href=&quot;#fnref:14&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:13&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The common starting price history of the ETFs is 31 July 2007 and their common ending price history is 31 August 2023. &lt;a href=&quot;#fnref:13&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:31&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2328254&quot;&gt;Butler, Adam and Philbrick, Mike and Gordillo, Rodrigo and Varadi, David, Adaptive Asset Allocation: A Primer&lt;/a&gt;. &lt;a href=&quot;#fnref:31&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name>Roman R.</name></author><category term="volatility" /><summary type="html">In this series on volatility forecasting, I previously detailed the Heterogeneous AutoRegressive (HAR) volatility forecasting model that has become the workhorse of the volatility forecasting literature1 since its introduction by Corsi2. I will now describe an extension of that model due to Bollerslev et al.3, called the Heterogeneous Exponential (HExp) volatility forecasting model, in which the lagged HAR volatility components are exponentially - rather than arithmetically - averaged. In addition, I will also discuss the panel-based estimation procedure for the HExp and the HAR model parameters proposed in Bollerslev et al.3, which is empirically demonstrated3 to improve the out-of-sample forecasting performances of these two volatility forecasting models when compared to the standard2 individual asset-based procedure. Finally, I will illustrate the practical performances of the HExp volatility forecasting model and its panel-based parameters estimation procedure in the context of monthly volatility forecasting for various ETFs. Mathematical preliminaries (reminders) This section contains reminders from the first blog post of this series. Volatility modelling and volatility proxies Let $r_t$ be the logarithmic return of an asset over a time period $t$ (a day, a week, a month..), over which its (conditional) mean return is supposed to be null. Then: The asset (conditional) variance is defined as $ \sigma_t^2 = \mathbb{E} \left[ r_t^2 \right] $ From this definition, the squared return $r_t^2$ of an asset is a (noisy4) variance estimator - or variance proxy4 - for that asset variance over the considered time period. Another example of an asset variance proxy is the Parkinson range of an asset. Yet another example of an asset variance proxy, this time over a specific time period $t$ of one day, is the daily realized variance $RV_t$, which is defined as the sum of the asset squared intraday returns sampled at a high frequency (1 minutes, 5 minutes, 15 minutes…). The generic notation for an asset variance proxy in this blog post is $\tilde{\sigma}_t^2$. The asset (conditional) volatility is defined as $ \sigma_t = \sqrt { \sigma_t^2 } $ The generic notation for an asset volatility proxy in this blog post is $\tilde{\sigma}_t$. Weighted moving average volatility forecasting model Boudoukh et al.5 shows that many seemingly different methods of volatility forecasting actually share the same underlying representation of the estimate of an asset next period’s variance $\hat{\sigma}_{T+1}^2$ as a weighted moving average of that asset past periods’ variance proxies $\tilde{\sigma}^2_t$, $t=1..T$, with \[\hat{\sigma}_{T+1}^2 = w_0 + \sum_{i=1}^{k} w_i \tilde{\sigma}^2_{T+1-i}\] , where: $k$, with $1 \leq k \leq T$, is the size of the moving average, possibly time-dependent $w_i, i=0..k$ are the weights of the moving average, possibly time-dependent as well The original HAR volatility forecasting model The HAR volatility forecasting model is an additive cascade model of different volatility components2 subject to economically meaningful restrictions2. Under that model, an asset next day’s daily realized variance $RV_{T+1}$ is forecasted through the formula4: \[\hat{RV}_{T+1} = \beta + \beta_d RV_{T} + \beta_w RV_{T}^w + \beta_m RV_{T}^m\] , where: $\hat{RV}_{T+1}$ is the forecast at time $T$ of the asset next day’s daily realized variance $RV_{T+1}$ $RV_T$ is the asset daily realized variance at time $T$ $RV_T^w = \frac{1}{5} \sum_{i=1}^5 RV_{T-i+1}$ is the asset weekly realized variance at time $T$ $RV_T^m = \frac{1}{22} \sum_{i=1}^{22} RV_{T-i+1}$ is the asset monthly realized variance at time $T$ $\beta$, $\beta_d$, $\beta_w$ and $\beta_m$ are the HAR model parameters, to be determined The HExp volatility forecasting model Discontinuity of the HAR volatility forecasting model Under the HAR volatility forecasting model, forecasted future volatilities depends on the past volatilities in a way that is [dis]continuous […] in the lag lengths3 due to the presence of simple moving averages, which might lead to potential variance estimation issues3. As noted by Bollerslev et al.3: The stepwise nature of the volatility factors employed in the HAR models, imply that the forecasts from the models are subject to potentially abrupt changes as an unusually large/small daily lagged [realized variance] drops out of the sums for the longer-horizon lagged volatility factors. Figure 1, adapted from Bollerslev et al.3, illustrates the lag coefficients implied by the regression coefficients3 of the HAR model together with those of a 21-day simple moving average model. Figure 1. Lag coefficients of the HAR and of the 21-day simple moving average volatility forecasting models. Source: Bollerslev et al. The discontinuity of the HAR model at the 1-day, 5-day and 21-day lags6 is apparent, similar in spirit to the discontinuity of the 21-day simple moving average model at the 21-day lag. The original HExp volatility forecasting model In order to avoid the stepwise changes inherent in the forecast from the HAR component-type structure3, Bollerslev et al.3 proposes to replace the simple moving averages appearing in the HAR model by exponentially weighted moving averages. Under the resulting volatility forecasting model - denoted the Heterogenous Exponential realized volatility model (HExp for short)3 - an asset next day’s daily realized variance $RV_{T+1}$ is forecasted through the formula37 \[\hat{RV}_{T+1} = \beta + \beta_d ExpVP_T^{\lambda(1)} + \beta_w ExpVP_T^{\lambda(5)} + \beta_m ExpVP_T^{\lambda(25)} + \beta_h ExpVP_T^{\lambda(125)}\] , where: $\hat{RV}_{T+1}$ is the forecast at time $T$ of the asset next day’s daily realized variance $RV_{T+1}$ $ExpVP_T^{\lambda(CoM)}$ $=$ $\sum_{i=1}^T \frac{e^{-i \lambda(CoM)}}{\sum_{j=1}^T e^{-j \lambda(CoM)}} RV_{T+1-i} $ $\lambda \left(CoM\right)$ $=$ $\log \left( 1 + \frac{1}{CoM} \right)$, with CoM standing for center-of-mass3 $RV_i$ is the asset daily realized variance at time $i$, $i=1..T$ $\beta$, $\beta_d$, $\beta_w$, $\beta_m$ and $\beta_h$ are the HExp model parameters, to be determined To be noted that each center-of-mass used in the HExp model (1, 5, 25 and 125) effectively summarizes the “average” horizon of the lagged realized volatilities that it uses3 and that they all have been chosen in Bollerslev et al.3 so as to “span” the universe of past [realized variance]’s in a way that is both parsimonious and “smooth”3. Speaking of smoothness, Figure 2, again adapted from Bollerslev et al.3, compares the lag coefficients implied by the regression coefficients3 of the HAR model with those of the HExp model. Figure 2. Lag coefficients of the HAR and of the HExp volatility forecasting models. Source: Bollerslev et al. The continuous nature of the HExp volatility forecasting model is clearly visible. In terms of practical performances, the HExp model perform[s] well in out-of-sample risk forecasting3 and is even slightly more performant than the HAR model in terms of $r$-squared, as can be seen on Figure 3 adapted from Bollerslev et al.3. Figure 3. Out-of-sample $r$-squared of the HAR model v.s. the HExp model for predicting the 20-day future realized volatility of several assets and for different methods of parameters estimation (Ind, Panel, Mega, that will be discussed in the next section). Source: Bollerslev et al. Realized variance v.s. generic variance proxy The original HExp model described in the previous subsection relies on a very specific asset variance proxy - the realized variance of an asset - over a very specific time period - a day - for its definition. Similarly to the HAR model, it is possible to replace the daily realized variance by any generic daily variance estimator like daily squared returns8 or any daily range-based variance estimator (Parkinson, Garman-Klass, Rogers-Satchell…). This leads to the generic HExp volatility forecasting model, under which an asset next days’s conditional variance $\sigma_{T+1}^2$ is forecasted through the formula \[\hat{\sigma}_{T+1}^2 = \beta + \beta_d ExpVP_T^{\lambda(1)} + \beta_w ExpVP_T^{\lambda(5)} + \beta_m ExpVP_T^{\lambda(25)} + \beta_h ExpVP_T^{\lambda(125)}\] , where: $\hat{\sigma}_{T+1}^2$ is the forecast at time $T$ of the asset next day’s conditional variance $\sigma_{T+1}^2$ $ExpVP_T^{\lambda(CoM)} = \sum_{i=1}^T \frac{e^{-i \lambda(CoM)}}{\sum_{j=1}^T e^{-j \lambda(CoM)}} \tilde{\sigma}^2_{T+1-i} $ $\lambda \left(CoM\right) = \log \left( 1 + \frac{1}{CoM} \right) $ $\tilde{\sigma}^2_{i}$ is the asset daily variance estimator at time $i$, $i=1..T$ $\beta$, $\beta_d$, $\beta_w$, $\beta_m$ and $\beta_h$ are the HExp model parameters, to be determined Relationship with the generic weighted moving average model From its definition, it is not too difficult to see that the HExp volatility forecasting model is a specific kind of weighted moving average volatility forecasting model, with: $w_0 = \beta$ $w_1 = \beta_d \frac{e^{- \lambda(1)}}{\sum_{j=1}^T e^{-j \lambda(1)}}$ $+$ $\beta_w \frac{e^{- \lambda(5)}}{\sum_{j=1}^T e^{-j \lambda(5)}}$ $+$ $\beta_m \frac{e^{- \lambda(25)}}{\sum_{j=1}^T e^{-j \lambda(25)}}$ $+$ $\beta_h \frac{e^{- \lambda(125)}}{\sum_{j=1}^T e^{-j \lambda(125)}} $ $w_2 = …$ Volatility forecasting formulas Under an HExp volatility forecasting model, the generic weighted moving average volatility forecasting formula becomes: To estimate an asset next day’s volatility: \[\hat{\sigma}_{T+1} = \sqrt{ \beta + \beta_d ExpVP_T^{\lambda(1)} + \beta_w ExpVP_T^{\lambda(5)} + \beta_m ExpVP_T^{\lambda(25)} + \beta_h ExpVP_T^{\lambda(125)} }\] To estimate an asset next $h$-day’s ahead volatility9, $h \geq 2$, using an indirect1 multi-step ahead forecast scheme: \[\hat{\sigma}_{T+h} = \sqrt{ \beta + \beta_d ExpVP_{T+h-1}^{\lambda(1)} + \beta_w ExpVP_{T+h-1}^{\lambda(5)} + \beta_m ExpVP_{T+h-1}^{\lambda(25)} + \beta_h ExpVP_{T+h-1}^{\lambda(125)} }\] , where: $ExpVP_{T+h-1}^{\lambda(CoM)}$ $=$ $\sum_{i=1}^T \frac{e^{-i \lambda(CoM)}}{\sum_{j=1}^{T+h-1} e^{-j \lambda(CoM)}} \tilde{\sigma}^2_{T+1-i} $ $+$ $\sum_{i=1}^{h-1} \frac{e^{-i \lambda(CoM)}}{\sum_{j=1}^{T+h-1} e^{-j \lambda(CoM)}} \hat{\sigma}^2_{T+h-i} $ To estimate an asset aggregated volatility9 over the next $h$ days: \[\hat{\sigma}_{T+1:T+h} = \sqrt{ \sum_{i=1}^{h} \hat{\sigma}^2_{T+i} }\] Estimating the HExp model parameters Individual estimation As for the HAR model, the easiest way to estimate the HExp model parameters is by applying simple linear regression2 on an asset-by-asset basis3, in which case the asset-specific ordinary least squares (OLS) estimator of the parameters $\beta$, $\beta_d$, $\beta_w$, $\beta_m$ and $\beta_h$ at time $T$ is the solution of the minimization problem4 \[\argmin_{ \left( \beta, \beta_d, \beta_w, \beta_m, \beta_h \right) \in \mathbb{R}^{5}} \sum_{t=1}^T \left( \tilde{\sigma}_{t}^2 - \beta - \beta_d ExpVP_{t-1}^{\lambda(1)} - \beta_w ExpVP_{t-1}^{\lambda(5)} - \beta_m ExpVP_{t-1}^{\lambda(25)} - \beta_h ExpVP_{t-1}^{\lambda(125)} \right)^2\] Alternatively, following Clements and Preve4 and Clements et al.1, more complex asset-specific least squares estimators than OLS can be used to try to improve forecast performances (weighted least squares estimators (WLS), robust least squares estimators (RLS)…) Panel-based estimation Bollerslev et al.3 establishes that the dynamics of realized volatility are common across many different financial assets. This is illustrated in Figure 4, directly taken from Bollerslev et al.3, which depicts the unconditional distributions of daily normalized realized volatilities for different asset classes. Figure 4. Normalized unconditional daily realized variance distributions for misc. asset classes. Source: Bollerslev et al. From this figure, volatility indeed seems to behave similarly across asset classes3 and Bollerslev et al.3 proposes to exploit these strong similarities in the distributions of the volatilities across and within asset classes3 by using panel regression techniques that force the [HExp model parameters] to be the same within and across different asset classes3. In more details, Bollerslev et al.3 reformulates the generic HExp volatility forecasting model as follows: \[\hat{\sigma}_{T+1}^2 = \tilde{\sigma}_{T}^{2, LR} + \beta_d^P \left( ExpVP_T^{\lambda(1)} - \tilde{\sigma}_{T}^{2, LR} \right) + \beta_w^P \left( ExpVP_T^{\lambda(5)} - \tilde{\sigma}_{T}^{2, LR} \right) + \beta_m^P \left( ExpVP_T^{\lambda(25)} - \tilde{\sigma}_{T}^{2, LR} \right) + \beta_h^P \left( ExpVP_T^{\lambda(125)} - \tilde{\sigma}_{T}^{2, LR} \right)\] , where: $ \tilde{\sigma}_{T}^{2, LR}$ is a long-run volatility factor, equal to the expanding sample mean of [the asset daily variance estimator] from the start of the sample up until day $T$3. $\beta_d^P$, $\beta_w^P$, $\beta_m^P$ and $\beta_h^P$ are the HExp “panel” model parameters, to be determined Such a reformulation - called centering3 in Bollerslev et al.3 - eliminat[es] the level of the [asset] volatility3 from the HExp model and enables10 the parameters $\beta_d^P$, $\beta_w^P$, $\beta_m^P$ and $\beta_h^P$ to be estimated simultaneously for all assets by panel regression techniques that add power by exploiting the similarities in the cross-asset risk characteristics3. Additionally11, that specific reformulation ensures that the iterated long-run forecasts from the model constructed on day $T$ converges to this day $T$ estimate of the “unconditional” volatility3. From a practical perspective, Figure 3 shows that estimating the HExp parameters12 through panel-based estimation (lines Panel and Mega) leads to much better performances v.s. their individual estimation (lines Ind). Implementation in Portfolio Optimizer Portfolio Optimizer implements the HExp volatility forecasting model - together with all the extensions of its predecessor (the insanity filter described in Clements and Preve4, the log transformation…) - through the endpoint /assets/volatility/forecast/hexp. This endpoint supports the 4 variance proxies below: Squared close-to-close returns Demeaned squared close-to-close returns The Parkinson range The jump-adjusted Parkinson range This endpoint also supports: Individual and panel-based estimation of the HExp model parameters. Using up to 5 centers-of-mass for the variance proxies, the default ones being 1, 5 and 2513. Example of usage - Volatility forecasting at monthly level for various ETFs As an example of usage, I propose to enrich the results of the previous blog post, in which monthly forecasts produced by different volatility models14 are compared - using Mincer-Zarnowitz15 regressions - to the next month’s close-to-close observed volatility for 10 ETFs representative16 of misc. asset classes: U.S. stocks (SPY ETF) European stocks (EZU ETF) Japanese stocks (EWJ ETF) Emerging markets stocks (EEM ETF) U.S. REITs (VNQ ETF) International REITs (RWX ETF) U.S. 7-10 year Treasuries (IEF ETF) U.S. 20+ year Treasuries (TLT ETF) Commodities (DBC ETF) Gold (GLD ETF) Individual estimation Averaged results for all ETFs/regression models over each ETF price history17 are the following18, when adding the HExp volatility forecasting model and its log variation19: Volatility model Variance proxy $\bar{\alpha}$ $\bar{\beta}$ $\bar{R^2}$ EWMA, optimal $\lambda$ Squared close-to-close returns 4.7% 0.73 45% HAR Squared close-to-close returns -0.7% 0.95 46% HAR (log) Squared close-to-close returns 0.5% 0.62 40% HExp Squared close-to-close returns -0.7% 0.93 48% HExp (log) Squared close-to-close returns 2.1% 0.57 42% EWMA, optimal $\lambda$ Parkinson range 4.3% 1.06 48% HAR Parkinson range 0.1% 1.25 44% HAR (log) Parkinson range 1.9% 1.22 50% HExp Parkinson range -0.5% 1.29 47% HExp (log) Parkinson range 2.0% 1.21 51% EWMA, optimal $\lambda$ Jump-adjusted Parkinson range 4.0% 0.76 45% HAR Jump-adjusted Parkinson range -1.4% 0.99 47% HAR (log) Jump-adjusted Parkinson range 0.9% 0.92 51% HExp Jump-adjusted Parkinson range -1.5% 0.98 49% HExp (log) Jump-adjusted Parkinson range 1.1% 0.91 52% Panel-based estimation Averaged results for all ETFs/regression models over the common ETF price history20 are the following18: When using the EWMA, HAR and HExp volatility forecasting models with an asset-specific parameters estimation procedure (for reference): Volatility model Variance proxy $\bar{\alpha}$ $\bar{\beta}$ $\bar{R^2}$ EWMA, optimal $\lambda$ (individual est.) Squared close-to-close returns 5% 0.72 43% HAR (individual est.) Squared close-to-close returns -0.2% 0.89 46% HExp (individual est.) Squared close-to-close returns 0.02% 0.87 47% EWMA, optimal $\lambda$ (individual est.) Parkinson range 4.8% 1.02 45% HAR (individual est.) Parkinson range 0.7% 1.17 46% HExp (individual est.) Parkinson range 1.2% 1.15 47% EWMA, optimal $\lambda$ (individual est.) Jump-adjusted Parkinson range 4.5% 0.73 43% HAR (individual est.) Jump-adjusted Parkinson range -0.9% 0.94 46% HExp (individual est.) Jump-adjusted Parkinson range -0.4% 0.90 46% When using the HAR and HExp volatility forecasting models with a panel-based parameters estimation procedure comparable to the Mega procedure described in Bollerslev et al.3: Volatility model Variance proxy $\bar{\alpha}$ $\bar{\beta}$ $\bar{R^2}$ HAR (panel est.) Squared close-to-close returns 2.2% 0.76 47% HExp (panel est.) Squared close-to-close returns 1.1% 0.80 47% HAR (panel est.) Parkinson range 3.1% 1.11 50% HExp (panel est.) Parkinson range 3.6% 1.08 50% HAR (panel est.) Jump-adjusted Parkinson range 0.09% 0.88 46% HExp (panel est.) Jump-adjusted Parkinson range -0.07% 0.89 48% Comments From the results of the two previous subsections, it is possible to make the following comments: Consistent with Bollerslev et al.3, the HExp model is uniformly better than the HAR model in terms of r-squared. Contrary to Bollerslev et al.3, the panel-based estimation procedure does not seem to dramatically improve the HAR/HExp models performances, except when the Parkinson range is used as a daily variance proxy. Comparing lines #1,2,5,6 with lines #3,4 suggests to perform the same test with (high frequency) realized variances in order to confirm that this is due to the “quality” of the daily variance proxy used. Conclusion This blog post empirically confirmed that the HExp volatility forecasting model of Bollerslev et al.3 belongs to the category of the state-of-the-art dynamic [risk models]3 published in the litterature. This blog post also concludes this series on volatility forecasting by weighted moving average models, at least until I find a better such model than the HExp model. Waiting for that to happen or for a blog post on volatility forecasting by a non-weighted moving average model, feel free to connect with me on LinkedIn or to follow me on Twitter. – See Clements, Adam and Preve, Daniel P. A. and Tee, Clarence, Harvesting the HAR-X Volatility Model. &amp;#8617; &amp;#8617;2 &amp;#8617;3 See Fulvio Corsi, A Simple Approximate Long-Memory Model of Realized Volatility, Journal of Financial Econometrics, Volume 7, Issue 2, Spring 2009, Pages 174–196. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 See Tim Bollerslev, Benjamin Hood, John Huss, Lasse Heje Pedersen, Risk Everywhere: Modeling and Managing Volatility, The Review of Financial Studies, Volume 31, Issue 7, July 2018, Pages 2729–2773. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 &amp;#8617;10 &amp;#8617;11 &amp;#8617;12 &amp;#8617;13 &amp;#8617;14 &amp;#8617;15 &amp;#8617;16 &amp;#8617;17 &amp;#8617;18 &amp;#8617;19 &amp;#8617;20 &amp;#8617;21 &amp;#8617;22 &amp;#8617;23 &amp;#8617;24 &amp;#8617;25 &amp;#8617;26 &amp;#8617;27 &amp;#8617;28 &amp;#8617;29 &amp;#8617;30 &amp;#8617;31 &amp;#8617;32 &amp;#8617;33 &amp;#8617;34 &amp;#8617;35 &amp;#8617;36 &amp;#8617;37 &amp;#8617;38 &amp;#8617;39 See Adam Clements, Daniel P.A. Preve, A Practical Guide to harnessing the HAR volatility model, Journal of Banking &amp;amp; Finance, Volume 133, 2021. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 See Boudoukh, J., Richardson, M., &amp;amp; Whitelaw, R.F. (1997). Investigation of a class of volatility estimators, Journal of Derivatives, 4 Spring, 63-71. &amp;#8617; These lags correspond to the daily, weekly and monthly realized variance components of the original HAR model. &amp;#8617; In Bollerslev et al.3, the calculation of $ExpVP_T^{\lambda(CoM)}$ uses only the first 500 lags (i.e., is truncated to $T=500$) because the influence of the remaining lags is numerically immaterial3. &amp;#8617; Bollerslev et al.3 discusses the impact of replacing realized volatilities by daily squared returns in the HExp model. &amp;#8617; See Brooks, Chris and Persand, Gitanjali (2003) Volatility forecasting for risk management. Journal of Forecasting, 22(1). pp. 1-22. &amp;#8617; &amp;#8617;2 Eliminating the level of the asset volatility is a pre-requisite in order to use panel regression techniques because the very different volatility levels for different asset classes means that3 1) it is unreasonable to force the $\beta$ intercepts to be the same3 for all assets and 2) it is necessary to ensure that all [the remaining] parameters are “scale-free” in the sense that they do not depend on the level of risk3. &amp;#8617; Thanks to that reformulation, the coefficients $\beta_d$, $\beta_w$, $\beta_m$ and $\beta_h$ are also free (i.e., need not sum to one)3, which allows an easy fitting through OLS. &amp;#8617; As a side note, the centering reformulation described in Bollerslev et al.3 is also applicable to the HAR volatility forecasting model and is done in Bollerslev et al.3. &amp;#8617; Contrary to Bollerslev et al.3, the center-of-mass 125 is not included by default in the HExp model as implemented in Portfolio Optimizer; this choice was made to make the default HExp model implementation directly comparable with the default HAR model implementation. &amp;#8617; Using Portfolio Optimizer. &amp;#8617; See Mincer, J. and V. Zarnowitz (1969). The evaluation of economic forecasts. In J. Mincer (Ed.), Economic Forecasts and Expectations. &amp;#8617; These ETFs are used in the Adaptative Asset Allocation strategy from ReSolve Asset Management, described in the paper Adaptive Asset Allocation: A Primer21. &amp;#8617; The common ending price history of all the ETFs is 31 August 2023, but there is no common starting price history, as all ETFs started trading on different dates. &amp;#8617; For all models, I used an expanding window for the volatility forecast computation. &amp;#8617; &amp;#8617;2 The log HExp model is similar in spirit to the log HAR model described in the previous blog post. &amp;#8617; The common starting price history of the ETFs is 31 July 2007 and their common ending price history is 31 August 2023. &amp;#8617; See Butler, Adam and Philbrick, Mike and Gordillo, Rodrigo and Varadi, David, Adaptive Asset Allocation: A Primer. &amp;#8617;</summary></entry><entry><title type="html">The Mathematics of Portfolio Return: Simple Return, Money-Weighted Return and Time-Weighted Return</title><link href="https://portfoliooptimizer.io/blog/the-mathematics-of-portfolio-return-simple-return-money-weighted-return-and-time-weighted-return/" rel="alternate" type="text/html" title="The Mathematics of Portfolio Return: Simple Return, Money-Weighted Return and Time-Weighted Return" /><published>2025-02-05T00:00:00-06:00</published><updated>2025-02-05T00:00:00-06:00</updated><id>https://portfoliooptimizer.io/blog/the-mathematics-of-portfolio-return-simple-return-money-weighted-return-and-time-weighted-return</id><content type="html" xml:base="https://portfoliooptimizer.io/blog/the-mathematics-of-portfolio-return-simple-return-money-weighted-return-and-time-weighted-return/">&lt;p&gt;&lt;em&gt;Whether we manage our own investment assets or choose to hire others to manage the assets on our behalf we are keen to know how well our […] portfolio of assets is performing&lt;/em&gt;&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; and &lt;em&gt;the calculation of portfolio return is the first step in [that] performance measurement process&lt;/em&gt;&lt;sup id=&quot;fnref:1:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Now, while &lt;em&gt;the matter of measuring the rate of return of [a portfolio] appears, on the surface, to be simple enough&lt;/em&gt;&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, the presence of external cash flows - either contributions to the portfolio or withdrawals from the portfolio - leads to the definition of several rate of returns, with &lt;em&gt;no single rate of return measure [being] appropriate for every purpose&lt;/em&gt;&lt;sup id=&quot;fnref:3:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In this blog post, strongly inspired by the book &lt;em&gt;Practical Portfolio Performance Measurement and Attribution&lt;/em&gt;&lt;sup id=&quot;fnref:3:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; from &lt;a href=&quot;https://www.linkedin.com/in/carl-bacon-86984b101&quot;&gt;Carl Bacon&lt;/a&gt;, I will describe the two main methods of portfolio return calculation in the presence of external cash flows.&lt;/p&gt;

&lt;p&gt;As an example of usage, I will illustrate the investor gap in the case of a MSCI World ETF, that is, I will show that the returns of that ETF are different from the returns achieved by the average investor in that ETF.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;&lt;em&gt;Notes:&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
  &lt;ul&gt;
    &lt;li&gt;A fully functional Google sheet corresponding to this post is available &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1dqHW6uGbrTZnv8uHCr0qHgHRXBtYLM6JJQiRbpb8TlQ/edit?usp=sharing&quot;&gt;here&lt;/a&gt;&lt;/li&gt;
  &lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;external-vs-internal-cash-flows&quot;&gt;External v.s. internal cash flows&lt;/h2&gt;

&lt;p&gt;In the context of portfolio return calculation, two types of cash flows are usually distinguished:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;External cash flows, corresponding to &lt;em&gt;any new money added to or taken from [a] portfolio, whether in the form of cash or other assets&lt;/em&gt;&lt;sup id=&quot;fnref:1:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
  &lt;li&gt;Internal cash flows, corresponding to &lt;em&gt;transactions funded from within [a] portfolio&lt;/em&gt;&lt;sup id=&quot;fnref:1:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; like dividend and coupon payments, free shares attributed by companies, positions rebalancing, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, external cash flows impact the valuation of a portfolio other than by the growth (positive or negative) of the funds invested in that portfolio while internal cash flows have no impact on the portfolio valuation.&lt;/p&gt;

&lt;h2 id=&quot;portfolio-return-in-the-absence-of-external-cash-flows&quot;&gt;Portfolio return in the absence of external cash flows&lt;/h2&gt;

&lt;p&gt;Let be:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$T$, the number of observations of the value of a portfolio over a time period $1..T$, with $t_1 &amp;gt; t_2 &amp;gt; … &amp;gt; t_T$ the $T$ observation times&lt;/li&gt;
  &lt;li&gt;$V_1$ the value of the portfolio at the initial observation time $t_1$&lt;/li&gt;
  &lt;li&gt;$V_T$ the value of the portfolio at the final observation time $t_T$&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When no external cash flow occurred over a time period, the return $r_p$ of a portfolio over that period is defined as &lt;em&gt;the change of the portfolio value relative to its beginning value&lt;/em&gt;&lt;sup id=&quot;fnref:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Mathematically, using the notations above, this gives:&lt;/p&gt;

\[r_p = \frac{V_T - V_1}{V_1} = \frac{V_T}{V_1} - 1\]

&lt;p&gt;$r_p$ is called the &lt;em&gt;simple rate of return&lt;/em&gt; or the &lt;em&gt;arithmetic rate of return&lt;/em&gt; of the portfolio over the time period $1..T$.&lt;/p&gt;

&lt;p&gt;To be noted that, by definition, the arithmetic return of a portfolio over a time period is supposed to take into account any internal cash flow out of the portfolio constituents&lt;sup id=&quot;fnref:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; 
(dividends for stocks, coupon payments for bonds…); for this reason, the arithmetic return of a portfolio is also sometimes called the &lt;em&gt;total (arithmetic) return&lt;/em&gt; of the portfolio&lt;sup id=&quot;fnref:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:11&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h2 id=&quot;portfolio-return-in-the-presence-of-external-cash-flows&quot;&gt;Portfolio return in the presence of external cash flows&lt;/h2&gt;

&lt;p&gt;Let now be:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$T$, the number of observations of the value of a portfolio over a time period $1..T$, with $t_1 &amp;gt; t_2 &amp;gt; … &amp;gt; t_T$ the $T$ observation times&lt;/li&gt;
  &lt;li&gt;$V_i$ the value of the portfolio at the observation time $t_i$, $i=1..T$&lt;/li&gt;
  &lt;li&gt;$C_i$ the value of a potential external cash flow ($C_i &amp;gt; 0$ for a contribution, $C_i &amp;lt; 0$ for a withdrawal) at the observation time $t_i$, $i=1..T$, assumed to occur immediately after the observation time $t_i$ (i.e., the cash flow $C_i$ is assumed to be excluded from the portfolio value $V_i$)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When an external cash flow occurred over a time period, &lt;em&gt;the cash flow itself will contribute to the [portfolio] valuation&lt;/em&gt;&lt;sup id=&quot;fnref:1:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, so that &lt;em&gt;the calculation of [the portfolio return] […] must compensate for the fact that the increase in market value is not entirely due to investment gain[/loss] during the period&lt;/em&gt;&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Since at least 1968&lt;sup id=&quot;fnref:1:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, there are two main measures of such a compensated portfolio return:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The portfolio &lt;em&gt;money-weighted return (MWR)&lt;/em&gt;, which integrates the timing and the amount of the external cash flows in the portfolio return, leading to a measure of portfolio return that includes the impacts of both the decisions to contribute money in (resp. withdraw money from) the portfolio and &lt;em&gt;the decisions about asset allocation and security selection&lt;/em&gt;&lt;sup id=&quot;fnref:2:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
  &lt;li&gt;The portfolio &lt;em&gt;time-weighted return (TWR)&lt;/em&gt;, which eliminates the impact of the external cash flows from the portfolio return, leading to a measure of portfolio return that isolates the &lt;em&gt;decisions about asset allocation and security selection&lt;/em&gt;&lt;sup id=&quot;fnref:2:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;money-weighted-return&quot;&gt;Money-weighted return&lt;/h3&gt;

&lt;p&gt;The money-weighted return of a portfolio is &lt;em&gt;a performance statistic reflecting how much money was earned during the measurement period&lt;/em&gt;&lt;sup id=&quot;fnref:2:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; and is thus representative of &lt;em&gt;the return an investor actually experiences&lt;/em&gt;&lt;sup id=&quot;fnref:2:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;As such, the money-weighted return is influenced by both:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;The decisions made by the [portfolio] manager [- who can be the investor herself] - to allocate assets and select securities within the portfolio&lt;/em&gt;&lt;sup id=&quot;fnref:2:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;The timing of [the investor’s] decisions to contribute to or withdraw money from [the portfolio]&lt;/em&gt;&lt;sup id=&quot;fnref:2:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;, with the different cases being summarized in Figure 1, taken from Feibel&lt;sup id=&quot;fnref:2:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;figure&gt;
    &lt;a href=&quot;/assets/images/blog/portfolio-return-irr-cash-flows-impact-feibel.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/portfolio-return-irr-cash-flows-impact-feibel-small.png&quot; alt=&quot;Impact of external cash flows on money-weighted portfolio performances. Source: Feibel.&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Figure 1. Impact of external cash flows on money-weighted portfolio performances. Source: Feibel.&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;calculation&quot;&gt;Calculation&lt;/h4&gt;

&lt;p&gt;To &lt;em&gt;borrow a methodology used throughout finance&lt;/em&gt;&lt;sup id=&quot;fnref:1:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, the money-weighted return of a portfolio corresponds to the &lt;a href=&quot;https://en.wikipedia.org/wiki/Internal_rate_of_return&quot;&gt;internal rate of return (IRR)&lt;/a&gt; of that portfolio, 
defined as the rate of return which &lt;em&gt;reconciles the beginning market value and additional cash flows into the portfolio to the ending market value&lt;/em&gt;&lt;sup id=&quot;fnref:2:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h5 id=&quot;internal-rate-of-return-method&quot;&gt;Internal rate of return method&lt;/h5&gt;

&lt;p&gt;Mathematically, the calculation of the (annualized) money-weighted return $r_{mw,irr}$ of a portfolio over a time period through the IRR method is done by solving what is usually called &lt;em&gt;the IRR equation&lt;/em&gt;&lt;sup id=&quot;fnref:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Using the notations of this section, that equation is:&lt;/p&gt;

\[V_1 - \sum_{i=1}^{T} \frac{C_i}{ \left(1 + r_{mw,irr}\right)^{yearfrac(t_i, t_1)}} - \frac{V_T}{ \left(1 + r_{mw,irr}\right)^{yearfrac(t_T, t_1)}} = 0\]

&lt;p&gt;, where $yearfrac$ is the number of (fractional) years between two dates using a given &lt;a href=&quot;https://en.wikipedia.org/wiki/Day_count_convention&quot;&gt;day count convention&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Numerically, the IRR equation is usually solved by an iterative algorithm, like &lt;a href=&quot;https://en.wikipedia.org/wiki/Newton%27s_method&quot;&gt;the Newton-Raphson method&lt;/a&gt; or similar other methods.&lt;/p&gt;

&lt;h5 id=&quot;modified-dietz-method&quot;&gt;Modified Dietz method&lt;/h5&gt;

&lt;p&gt;The calculation of $r_{mw,irr}$ described in the previous sub-section was &lt;em&gt;a problem when computer CPU time was very expensive and needed to be conserved&lt;/em&gt;&lt;sup id=&quot;fnref:1:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, which &lt;em&gt;led to the development of various IRR estimation techniques that did not require [an] iterative algorithm&lt;/em&gt;&lt;sup id=&quot;fnref:1:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;One of these approximations - the &lt;em&gt;modified Dietz&lt;/em&gt; method - is &lt;em&gt;a first-order [closed-form] linear approximation to the IRR&lt;/em&gt;&lt;sup id=&quot;fnref:1:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; method and is &lt;em&gt;the most common&lt;sup id=&quot;fnref:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:9&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt; way of calculating periodic investment returns&lt;/em&gt;&lt;sup id=&quot;fnref:1:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Mathematically, the calculation of the (total) money-weighted return $r_{mw,md}$ of a portfolio over a time period through the modified Dietz method is done by incorporating “weighted” external cash flows into the formula of the simple rate of return.&lt;/p&gt;

&lt;p&gt;Using the notations of this section, this gives:&lt;/p&gt;

\[r_{mw,md} = \frac{V_T - V_1 - \sum_{i=1}^T C_i}{V_1 + \sum_{i=1}^T \left( 1 - \frac{yearfrac(t_1, t_i)}{yearfrac(t_1, t_T)} \right) C_i }\]

&lt;p&gt;, where the terms $ 1 - \frac{yearfrac(t_1, t_i)}{yearfrac(t_1, t_T)} $, $i=1..T$, are used to weight &lt;em&gt;the [external cash] flows by the number of days they were available for investment during the period&lt;/em&gt;&lt;sup id=&quot;fnref:1:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h4 id=&quot;caveats&quot;&gt;Caveats&lt;/h4&gt;

&lt;p&gt;The two money-weighted return calculation methods discussed in the previous sub-sections have certain limitations to be aware of:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;The IRR method provides no theoretical guarantee in general&lt;sup id=&quot;fnref:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:7&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt; as to the existence or the unicity of an internal rate of return.&lt;/p&gt;

    &lt;p&gt;Indeed, the IRR equation might not have any solution or might have several solutions&lt;sup id=&quot;fnref:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;10&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;p&gt;In addition, even when the IRR equation has a unique solution, it might be numerically very hard to find…&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The modified Dietz method has its own issues, for example &lt;em&gt;the accuracy of the result [being] dependent on relatively small capital flows, low volatility and frequent valuations&lt;/em&gt;&lt;sup id=&quot;fnref:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:13&quot; class=&quot;footnote&quot;&gt;11&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;p&gt;In particular, a very important point of attention is that the &lt;em&gt;modified Dietz [method] is less useful as an approximation to [IRR] over longer time periods&lt;/em&gt;&lt;sup id=&quot;fnref:1:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;when-to-use-it&quot;&gt;When to use it?&lt;/h4&gt;

&lt;p&gt;Calculating the money-weighted return of a portfolio allows to answer the question&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;em&gt;How much did a specific investor’s portfolio grow, thanks to both its underlying strategy and its pattern of external cash flows?&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Thus, the money-weighted return of a portfolio is an appropriate measure of portfolio performance when it is desired to:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Compute the rate of return earned on a specific portfolio&lt;/li&gt;
  &lt;li&gt;Compare the rate of return earned on a specific portfolio to another rate (rate of inflation, rate of return earned on another portfolio&lt;sup id=&quot;fnref:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt;, rate of return earned on an alternative investment like real estate or private equity…)&lt;/li&gt;
  &lt;li&gt;Analyse the pattern of external cash flows for a specific portfolio&lt;sup id=&quot;fnref:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:10&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;time-weighted-return&quot;&gt;Time-weighted return&lt;/h3&gt;

&lt;p&gt;The time-weighted return of a portfolio is &lt;em&gt;a form of total return that measures the performance […] for the complete measurement period&lt;/em&gt;&lt;sup id=&quot;fnref:2:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; in a way that fully &lt;em&gt;eliminates the timing effect that external portfolio cash flows have&lt;/em&gt;&lt;sup id=&quot;fnref:2:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;So, contrary to its money-weighted counterpart, the time-weighted return of a portfolio measures &lt;em&gt;only the effects of the market and manager decision&lt;/em&gt;&lt;sup id=&quot;fnref:2:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;As a side note, the time-weighted return of a portfolio is &lt;em&gt;the only well-behaved rate of return that is not influenced by contributions or withdrawals&lt;/em&gt;&lt;sup id=&quot;fnref:3:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, c.f. for example Gray and Dewar&lt;sup id=&quot;fnref:3:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h4 id=&quot;calculation-1&quot;&gt;Calculation&lt;/h4&gt;

&lt;p&gt;The calculation of the time-weighted return of a portfolio over a time period is easily done through the &lt;em&gt;unit price or unitised method&lt;/em&gt;&lt;sup id=&quot;fnref:1:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, described in Bacon&lt;sup id=&quot;fnref:1:14&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;A standardised unit price or “net asset value” price is calculated immediately before each external cash flow by dividing the market value by the number of units previously allocated. Units are then added or subtracted (bought or sold) in the portfolio at the unit price corresponding to the time of the cash flow […].&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Mathematically, using the notations of this section, let be $U_i$, $i=1..T$, corresponding to the following transformation&lt;sup id=&quot;fnref:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:15&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt; of $V_i$, $i=1..T$:&lt;/p&gt;

\[U_1 = 1 \newline U_{i+1} = U_i \frac{V_{i+1}}{V_i + C_i}, i=1..T-1\]

&lt;p&gt;The (total) time-weighted return $r_{tw}$ of the portfolio over the time period $1..T$ is then defined as the simple rate of return of the transformed portfolio values $U_i$, $i=1..T$:&lt;/p&gt;

\[r_{tw} = \frac{U_T - U_1}{U_1} = \frac{U_T}{U_1} - 1\]

&lt;h4 id=&quot;caveats-1&quot;&gt;Caveats&lt;/h4&gt;

&lt;p&gt;The time-weighted return has two main limitations to be aware of:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;One is practical, with Feibel&lt;sup id=&quot;fnref:2:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; noting that &lt;em&gt;there is […]  a potentially significant hurdle to implementing this method: the time-weighted return methodology requires valuation of the portfolio before each cash flow&lt;/em&gt;&lt;sup id=&quot;fnref:2:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;p&gt;For an individual investor, and depending on the exact tools used, this might or might not be a real issue though.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;One is methological, with the time-weighted return of a portfolio being sometimes positive while the overall portfolio is at a loss&lt;sup id=&quot;fnref:16&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:16&quot; class=&quot;footnote&quot;&gt;15&lt;/a&gt;&lt;/sup&gt;!&lt;/p&gt;

    &lt;p&gt;Bacon&lt;sup id=&quot;fnref:1:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; and Febeil&lt;sup id=&quot;fnref:2:14&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; provide examples of such cases, the bottom line being that &lt;em&gt;it is important [for the portfolio manager] to perform well in the […] period[s] when the majority of client money is invested&lt;/em&gt;&lt;sup id=&quot;fnref:1:16&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; with the time-weighted return calculation.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;when-to-use-it-1&quot;&gt;When to use it?&lt;/h4&gt;

&lt;p&gt;Calculating the time-weighted return of a portfolio allows to answer the question&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;em&gt;How much did a specific investor’s portfolio grow thanks to its underlying strategy (asset allocation, exposure…)?&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The time-weighted return of a portfolio is thus an appropriate measure of portfolio performance when it is desired to &lt;em&gt;compar[e] the performance of [a portfolio with] different asset managers […] and with benchmark indexes&lt;/em&gt;&lt;sup id=&quot;fnref:1:17&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h3 id=&quot;hybrid-return&quot;&gt;Hybrid return&lt;/h3&gt;

&lt;p&gt;Bacon&lt;sup id=&quot;fnref:1:18&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; notes that &lt;em&gt;in practice, many asset managers use neither true time-weighted nor money-weighted calculations exclusively but rather a hybrid combination of both&lt;/em&gt;&lt;sup id=&quot;fnref:1:19&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, some of which are depicted in Figure 2, taken from Bacon&lt;sup id=&quot;fnref:1:20&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;figure&gt;
  &lt;a href=&quot;/assets/images/blog/portfolio-return-different-returns-bacon.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/portfolio-return-different-returns-bacon-small.png&quot; alt=&quot;The evolution of performance returns methodologies. Source: Bacon.&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Figure 2. The evolution of performance returns methodologies. Source: Bacon.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h3 id=&quot;money-weighted-vs-time-weighted-return&quot;&gt;Money-weighted v.s. time-weighted return&lt;/h3&gt;

&lt;p&gt;To conclude this section, Figure 3, taken from Feibel&lt;sup id=&quot;fnref:2:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;, summarizes the respective properties of money-weighted and time-weighted returns.&lt;/p&gt;

&lt;figure&gt;
  &lt;a href=&quot;/assets/images/blog/portfolio-return-mwr-vs-twr-feibel.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/portfolio-return-mwr-vs-twr-feibel-small.png&quot; alt=&quot;Properties of money-weighted and time-weighted returns. Source: Feibel.&quot; /&gt;&lt;/a&gt;
  &lt;figcaption&gt;Figure 3. Properties of money-weighted and time-weighted returns. Source: Feibel.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;To that summary, I would add that for an individual investor, comparing the money-weighted return and the time-weighted return of a portfolio allows to determine whether the timing of contributions and withdrawals has been successful ($r_{mw} \geq r_{tw}$) or not ($r_{mw} \leq r_{tw}$).&lt;/p&gt;

&lt;p&gt;In the latter case, it might be interesting to review&lt;sup id=&quot;fnref:17&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:17&quot; class=&quot;footnote&quot;&gt;16&lt;/a&gt;&lt;/sup&gt; the external cash flows timing strategy (panic sells, &lt;a href=&quot;https://en.wikipedia.org/wiki/Fear_of_missing_out&quot;&gt;fomo&lt;/a&gt; buys…).&lt;/p&gt;

&lt;h2 id=&quot;implementation-in-portfolio-optimizer&quot;&gt;Implementation in Portfolio Optimizer&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Portfolio Optimizer&lt;/strong&gt; allows to:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Compute the money-weighted and the time-weighted return of a portfolio over a time period, through the endpoints &lt;a href=&quot;https://docs.portfoliooptimizer.io/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/portfolios/analysis/return/money-weighted&lt;/code&gt;&lt;/a&gt; and &lt;a href=&quot;https://docs.portfoliooptimizer.io/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/portfolios/analysis/return/time-weighted&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Transform portfolio values impacted by external cash flows into money-weighted or time-weighted portfolio values directly usable to compute portfolio performance indicators (return, Sharpe ratio, maximum drawdown…), through the endpoints &lt;a href=&quot;https://docs.portfoliooptimizer.io/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/portfolios/transformation/money-weighted&lt;/code&gt;&lt;/a&gt; and &lt;a href=&quot;https://docs.portfoliooptimizer.io/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/portfolios/transformation/time-weighted&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;examples-of-usage&quot;&gt;Examples of usage&lt;/h2&gt;

&lt;p&gt;I propose to illustrate the differences between money-weighted returns and time-weighted returns through two examples:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Comparing a &lt;a href=&quot;https://en.wikipedia.org/wiki/Dollar_cost_averaging&quot;&gt;dollar cost averaging&lt;/a&gt; investment strategy in an ETF to a &lt;a href=&quot;https://en.wikipedia.org/wiki/Lump_sum&quot;&gt;lump sum&lt;/a&gt; investment strategy in the same ETF&lt;/li&gt;
  &lt;li&gt;Comparing the returns of an ETF to the returns of an “average” investor in that ETF&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;dollar-cost-averaging-vs-lump-sum-investing-in-an-msci-world-etf&quot;&gt;Dollar cost averaging v.s. lump sum investing in an MSCI World ETF&lt;/h3&gt;

&lt;p&gt;Let’s suppose that we would like to compare two investment strategies in the &lt;a href=&quot;https://www.amundietf.fr/fr/professionnels/produits/equity/amundi-msci-world-ucits-etf-eur-c/lu1681043599&quot;&gt;Amundi MSCI World UCITS ETF - EUR (C)&lt;/a&gt; over the period 29/12/2023 - 31/12/2024:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Dollar cost averaging (DCA) investing
    &lt;ul&gt;
      &lt;li&gt;Portfolio creation - 1000€ on 29/12/2023&lt;/li&gt;
      &lt;li&gt;Portfolio contributions - 100€ at each month’s end&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Lump sum (LS) investing
    &lt;ul&gt;
      &lt;li&gt;Portfolio creation - 2200€ on 29/12/2023&lt;/li&gt;
      &lt;li&gt;Portfolio contributions - n/a&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In both cases, the amount of external portfolio cash flows is the same&lt;sup id=&quot;fnref:18&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:18&quot; class=&quot;footnote&quot;&gt;17&lt;/a&gt;&lt;/sup&gt; - equal to 2200€ - but the patterns of these cash flows are different, leading to different portfolio returns:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Portfolio return measure&lt;/th&gt;
      &lt;th&gt;Total return (%)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Simple return (DCA)&lt;/td&gt;
      &lt;td&gt;161.42%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Money-weighted return (DCA)&lt;/td&gt;
      &lt;td&gt;25.45%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Time-weighted return (DCA)&lt;/td&gt;
      &lt;td&gt;26.33%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Simple return (LS)&lt;/td&gt;
      &lt;td&gt;26.33%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Money-weighted return (LS)&lt;/td&gt;
      &lt;td&gt;26.33%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Time-weighted return (LS)&lt;/td&gt;
      &lt;td&gt;26.33%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;&lt;em&gt;Notes:&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
  &lt;ul&gt;
    &lt;li&gt;Detailed calculations are available in &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1dqHW6uGbrTZnv8uHCr0qHgHRXBtYLM6JJQiRbpb8TlQ/edit?usp=sharing&quot;&gt;the Google Sheet associated to this blog post&lt;/a&gt;.&lt;/li&gt;
  &lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;The table above empirically confirms that:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;In the absence of external cash flows, all portfolio return measures discussed in this blog post become equal (lines #4-#6)&lt;/li&gt;
  &lt;li&gt;In the presence of external cash flows
    &lt;ul&gt;
      &lt;li&gt;
        &lt;p&gt;The simple return is not an appropriate measure of portfolio return (line #1)&lt;/p&gt;

        &lt;p&gt;Despite this, and at the date of publication of this blog post, there are still some financial services using that measure to compute portfolio returns in the presence of external cash flows, like &lt;a href=&quot;https://www.trading212.com/&quot;&gt;Trading 212&lt;/a&gt;, 
c.f. &lt;a href=&quot;https://community.trading212.com/t/returns-graph-instead-of-portfolio-graph/17053/15&quot;&gt;the associated discussion on their forum&lt;/a&gt;.&lt;/p&gt;

        &lt;p&gt;Usually, it is because of data availability problems like difficulties in identifying external cash flows, but sometimes it is simply because providing accurate portfolio performance measures is not a priority…&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;The money-weighted return is impacted by the timing of external cash flows (line #2)&lt;/p&gt;

        &lt;p&gt;Here, there is a ~-1% difference between the portfolio money-weighted and time-weighted returns, mainly due to the ill-timed&lt;sup id=&quot;fnref:20&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:20&quot; class=&quot;footnote&quot;&gt;18&lt;/a&gt;&lt;/sup&gt; contribution made on 28/03/2024.&lt;/p&gt;

        &lt;p&gt;Such a difference is relatively insignificant over one year, but depending on the pattern of external cash flows, it could be much worse.&lt;/p&gt;

        &lt;p&gt;For example, if a unique contribution of 1200€ on 28/03/2024 was made instead of monthly contributions of 100€, the associated portfolio money-weighted return would decrease to 22.59%, representing this time a ~-4% difference!&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;The time-weighted return is not impacted by external cash flows&lt;sup id=&quot;fnref:19&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:19&quot; class=&quot;footnote&quot;&gt;19&lt;/a&gt;&lt;/sup&gt; and is akin to a “lump sum investing”-equivalent return (line #3 is equal to lines #4-#6)&lt;/p&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;investment-returns-vs-average-investor-returns-in-an-msci-world-etf&quot;&gt;Investment returns v.s. average investor returns in an MSCI World ETF&lt;/h3&gt;

&lt;p&gt;Financial studies regularly show that investors tend to lag actual fund returns across a variety of asset classes, leading to what is usually called the &lt;em&gt;investor gap&lt;/em&gt;&lt;sup id=&quot;fnref:22&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:22&quot; class=&quot;footnote&quot;&gt;20&lt;/a&gt;&lt;/sup&gt;&lt;sup id=&quot;fnref:24&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:24&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;The &lt;a href=&quot;https://www.dalbar.com/&quot;&gt;Dalbar’s&lt;/a&gt; &lt;a href=&quot;https://www.qaib.com/&quot;&gt;Quantitative Analysis of Investor Behavior (QAIB)&lt;/a&gt; annual study &lt;em&gt;measure[s] the effects of investor decisions to buy, sell and switch into and out of funds over short and long-term timeframes&lt;/em&gt;&lt;sup id=&quot;fnref:21&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:21&quot; class=&quot;footnote&quot;&gt;22&lt;/a&gt;&lt;/sup&gt; and &lt;em&gt;the results consistently show that the average investor earns less – in many cases, much less – than mutual fund performance reports would suggest&lt;/em&gt;&lt;sup id=&quot;fnref:21:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:21&quot; class=&quot;footnote&quot;&gt;22&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;p&gt;For the U.S., Figure 4, adapted from Dalbar&lt;sup id=&quot;fnref:21:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:21&quot; class=&quot;footnote&quot;&gt;22&lt;/a&gt;&lt;/sup&gt;, depicts the returns of U.S. funds v.s. the returns of the average investor in U.S. funds over the period 1994-2023.&lt;/p&gt;

    &lt;figure&gt;
    &lt;a href=&quot;/assets/images/blog/portfolio-return-dalbar-investor-gap.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/portfolio-return-dalbar-investor-gap.png&quot; alt=&quot;U.S. funds returns v.s. average U.S.funds investor returns (1994-2023). Source: Dalbar.&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Figure 4. U.S. funds returns v.s. average U.S. funds investor returns (30-year returns, 1994-2023). Source: Dalbar.&lt;/figcaption&gt;
  &lt;/figure&gt;

    &lt;p&gt;The observed performance differencial of ~2% per year creates dramatic cumulative effects over such a long period&lt;sup id=&quot;fnref:21:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:21&quot; class=&quot;footnote&quot;&gt;22&lt;/a&gt;&lt;/sup&gt;!&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The &lt;a href=&quot;https://www.morningstar.com/&quot;&gt;Morningstar’s&lt;/a&gt; &lt;a href=&quot;https://www.morningstar.com/lp/mind-the-gap&quot;&gt;Mind the Gap&lt;/a&gt; annual study compares &lt;em&gt;funds’ dollar-weighted returns [- that is, funds’ returns as experienced by their investors -] with their time-weighted returns to see how large the gap, or difference, has been over time&lt;/em&gt;&lt;sup id=&quot;fnref:23&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt; and analyses &lt;em&gt;where investors succeeded in capturing most of their funds’ returns&lt;/em&gt;&lt;sup id=&quot;fnref:23:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;p&gt;Figure 5, directly taken from Morningstar&lt;sup id=&quot;fnref:23:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt;, illustrates that gap by U.S. funds category over the 10 years ended 31th December 2023.&lt;/p&gt;

    &lt;figure&gt;
    &lt;a href=&quot;/assets/images/blog/portfolio-return-morningstar-investor-gap.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/portfolio-return-morningstar-investor-gap.png&quot; alt=&quot; Investor return gaps by U.S. category group (10-year returns, 2013-2023). Source: Morningstar.&quot; /&gt;&lt;/a&gt;
    &lt;figcaption&gt;Figure 5. Investor return gaps by U.S. category group (10-year returns, 2013-2023). Source: Morningstar.&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this sub-section, I propose to apply the same methodology as these studies to the &lt;a href=&quot;https://www.amundietf.fr/fr/professionnels/produits/equity/amundi-msci-world-ucits-etf-eur-c/lu1681043599&quot;&gt;Amundi MSCI World UCITS ETF - EUR (C)&lt;/a&gt; over the period 30/04/2018 - 31/12/2024.&lt;/p&gt;

&lt;p&gt;In more details:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;I will use &lt;em&gt;the month-end asset data [(AUM)] compared with the underlying total return to estimate a net inflow or outflow for [each] month&lt;/em&gt;&lt;sup id=&quot;fnref:23:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt;, using Morningstar’s approach to estimating funds’ monthly net flows&lt;sup id=&quot;fnref:25&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:25&quot; class=&quot;footnote&quot;&gt;24&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

    &lt;blockquote&gt;
      &lt;p&gt;The [net] cash flow estimate for a month $C_t$ is the difference in the beginning $NAV_{t-1}$ and ending $NAV_t$ total net assets that cannot be explained by the monthly total return $r_t$, that is, $ C_t = NAV_t - NAV_{t-1} \left( 1 + r_t \right) $.&lt;/p&gt;
    &lt;/blockquote&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Once all the monthly cash flows are available for the period&lt;/em&gt;&lt;sup id=&quot;fnref:23:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt;, I will calculate the associated money-weighted return - which correspond to the average investor return in the ETF - and I will then compare it to the ETF money-weighted return&lt;sup id=&quot;fnref:26&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:26&quot; class=&quot;footnote&quot;&gt;25&lt;/a&gt;&lt;/sup&gt; - which corresponds to the ETF return itself.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;&lt;em&gt;Notes:&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
  &lt;ul&gt;
    &lt;li&gt;Detailed calculations are available in &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1dqHW6uGbrTZnv8uHCr0qHgHRXBtYLM6JJQiRbpb8TlQ/edit?usp=sharing&quot;&gt;the Google Sheet associated to this blog post&lt;/a&gt;.&lt;/li&gt;
  &lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Results are the following:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Portfolio return measure&lt;/th&gt;
      &lt;th&gt;Annualized return (%)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Money-weighted return (AUM)&lt;/td&gt;
      &lt;td&gt;14.15%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Time-weighted return (AUM)&lt;/td&gt;
      &lt;td&gt;12.74%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The table above confirms the presence of an investor gap for the considered MSCI World ETF, but for once, to the benefit (of around ~1.5% yearly) of the average investor!&lt;/p&gt;

&lt;p&gt;This is at odds with Morningstar’s findings&lt;sup id=&quot;fnref:23:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt; that &lt;em&gt;the gap between index ETFs’ dollar-weighted returns and their total return […] was quite a bit wider than the gap for index open-end funds&lt;/em&gt;&lt;sup id=&quot;fnref:23:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt;, but of couse, the analysis of this sub-section has no statistical significance compared to that of Morningstar.&lt;/p&gt;

&lt;p&gt;Or, maybe investors in the Amundi MSCI World UCITS ETF have specific timing abilities, who knows!&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;As mentioned in the introduction, &lt;em&gt;the calculation of portfolio return is the first step in the performance measurement process&lt;/em&gt;&lt;sup id=&quot;fnref:1:21&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Once properly done, the next step is &lt;em&gt;to know if the return is good or bad. In other words, [it is needed] to evaluate performance (risk and return), against an appropriate benchmark&lt;/em&gt;&lt;sup id=&quot;fnref:1:22&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, which can for example be &lt;a href=&quot;/blog/random-portfolio-benchmarking-simulation-based-performance-evaluation-in-finance/&quot;&gt;random portfolios&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For more discussions on portfolio performances, feel free to &lt;a href=&quot;https://www.linkedin.com/in/roman-rubsamen/&quot;&gt;connect with me on LinkedIn&lt;/a&gt; or to &lt;a href=&quot;https://twitter.com/portfoliooptim&quot;&gt;follow me on Twitter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;–&lt;/p&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.wiley.com/en-us/Practical+Portfolio+Performance+Measurement+and+Attribution%2C+3rd+Edition-p-9781119831969&quot;&gt;Carl Bacon, Practical Portfolio Performance Measurement and Attribution, Third edition&lt;/a&gt;. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:1:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;11&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;12&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:12&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;13&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:13&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;14&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:14&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;15&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:15&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;16&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:16&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;17&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:17&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;18&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:18&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;19&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:19&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;20&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:20&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;21&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:21&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;22&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:1:22&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;23&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.jstor.org/stable/2629526&quot;&gt;Kenneth B. Gray, Jr. and Robert B. K. Dewar, Axiomatic Characterization of the Time-Weighted Rate of Return, Management Science, Vol. 18, No. 2, Application Series (Oct., 1971), pp. B32-B35&lt;/a&gt;. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:3:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:4&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://link.springer.com/book/10.1007/978-3-319-19812-5&quot;&gt;W. Marty, Portfolio Analytics. An Introduction to Return and RiskMeasurement, Springer Texts in Business and Economics (2nd edition), Springer Berlin, 2015.&lt;/a&gt;. &lt;a href=&quot;#fnref:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:5&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Internal cash flows our of the portfolio is sometimes refered to as &lt;em&gt;income&lt;/em&gt;&lt;sup id=&quot;fnref:1:23&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;sup id=&quot;fnref:2:16&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; from the portfolio constituents. &lt;a href=&quot;#fnref:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:11&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;In contrast with the &lt;em&gt;price return&lt;/em&gt; of the portfolio, which would only take into account the price evolution of the portfolio constituents. &lt;a href=&quot;#fnref:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.wiley.com/en-us/Investment+Performance+Measurement-p-9780471445630&quot;&gt;Bruce J. Feibel, Investment Performance Measurement&lt;/a&gt;. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:2:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;11&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;12&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:12&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;13&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:13&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;14&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:14&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;15&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:15&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;16&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:16&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;17&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:6&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;As noted in Bacon&lt;sup id=&quot;fnref:1:24&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, &lt;em&gt;the formal definition of the IRR is the discount rate that makes the net present value equal to zero in a discounted cash flow analysis&lt;/em&gt;&lt;sup id=&quot;fnref:1:25&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;; that equation is precisely the equation of the net present value in the specific case of a portfolio of assets. &lt;a href=&quot;#fnref:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:9&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;For example, the modified Dietz method is used in Interactive Brokers &lt;a href=&quot;https://www.interactivebrokers.com/en/portfolioanalyst/overview.php&quot;&gt;PortfolioAnalyst&lt;/a&gt; product to compute the money-weighted return of the investor’s portfolio. &lt;a href=&quot;#fnref:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:7&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;C.f. for example &lt;a href=&quot;https://support.microsoft.com/en-us/office/xirr-function-de1242ec-6477-445b-b11b-a303ad9adc9d&quot;&gt;the documentation of the Excel’s function XIRR&lt;/a&gt;. &lt;a href=&quot;#fnref:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:8&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Although Bacon&lt;sup id=&quot;fnref:1:26&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; notes that &lt;em&gt;you need to work very hard, with quite extreme data, to generate suitable examples&lt;/em&gt;&lt;sup id=&quot;fnref:1:27&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; because multiple solutions &lt;em&gt;are extremely unlikely to be experienced in practice&lt;/em&gt;&lt;sup id=&quot;fnref:1:28&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:13&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.cambridge.org/core/journals/journal-of-the-staple-inn-actuarial-society/article/abs/measurement-of-pension-fund-investment-performance/04093B956C53330BAF24A076756EE5C6&quot;&gt;Hager DP. Measurement of Pension Fund Investment Performance. Journal of the Staple Inn Actuarial Students’ Society. 1980;24:33-64&lt;/a&gt;. &lt;a href=&quot;#fnref:13&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:12&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;For Bacon&lt;sup id=&quot;fnref:1:29&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, though, the money-weighted return &lt;em&gt;is unique and not really comparable with other portfolios enjoying a different pattern of cash flow&lt;/em&gt;&lt;sup id=&quot;fnref:1:30&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;; on this, I kindly disagree, because as an individual investor, I will for example certainly be pleased to have a higher money-weighted return than that of my father/brother in law! &lt;a href=&quot;#fnref:12&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:10&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;For example when testing for investor’s skill in timing external cash flows or when comparing &lt;a href=&quot;https://en.wikipedia.org/wiki/Dollar_cost_averaging&quot;&gt;dollar-cost averaging (DCA)&lt;/a&gt; investment strategies (weekly, monthly, quarterly…). &lt;a href=&quot;#fnref:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:15&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;That transformation is &lt;em&gt;in effect a normalised market value&lt;/em&gt;&lt;sup id=&quot;fnref:1:31&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; transformation of the portfolio. &lt;a href=&quot;#fnref:15&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:16&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Or conversely, with the time-weighted return of a portfolio being sometimes negative while the overall portfolio is at a gain!! &lt;a href=&quot;#fnref:16&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:17&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;For example, by switching to &lt;a href=&quot;https://en.wikipedia.org/wiki/Dollar_cost_averaging&quot;&gt;a dollar cost averaging&lt;/a&gt; strategy. &lt;a href=&quot;#fnref:17&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:18&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The last contribution of 100€ made on 31/12/2024 is excluded from this count, since it is invested at the end of the considered period. &lt;a href=&quot;#fnref:18&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:20&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;From 28/03/2024 to 30/04/2024, the MSCI World ETF under consideration returned -2.74%. &lt;a href=&quot;#fnref:20&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:19&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The time-weighted return is also equal to the NAV return. &lt;a href=&quot;#fnref:19&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:22&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;To be noted that the investor gap as discussed here is not related to fund fees; for example, Morningstar&lt;sup id=&quot;fnref:23:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt; &lt;em&gt;didn’t find a strong link between fees and investor return gaps in the study&lt;/em&gt;&lt;sup id=&quot;fnref:23:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;23&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:22&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:24&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Also known as the &lt;em&gt;behaviour gap&lt;/em&gt;. &lt;a href=&quot;#fnref:24&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:21&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.qaib.com/&quot;&gt;Dalbar’s 2024 QAIB Report&lt;/a&gt;. &lt;a href=&quot;#fnref:21&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:21:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:21:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:21:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:23&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.morningstar.com/lp/mind-the-gap&quot;&gt;Morningstar’s Mind the Gap 2024 Report&lt;/a&gt;. &lt;a href=&quot;#fnref:23&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:23:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:23:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:23:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:23:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:23:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:23:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:23:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:23:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:25&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.morningstar.com/content/dam/marketing/shared/research/methodology/765555_Estimated_Net_Cash_Flow_Methodology.pdf&quot;&gt;Morningstar’s Estimated Net Cash Flow Methodology&lt;/a&gt;. &lt;a href=&quot;#fnref:25&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:26&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;As with most funds, ETF returns correspond to time-weighted returns. &lt;a href=&quot;#fnref:26&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name>Roman R.</name></author><category term="portfolio return" /><summary type="html">Whether we manage our own investment assets or choose to hire others to manage the assets on our behalf we are keen to know how well our […] portfolio of assets is performing1 and the calculation of portfolio return is the first step in [that] performance measurement process1. Now, while the matter of measuring the rate of return of [a portfolio] appears, on the surface, to be simple enough2, the presence of external cash flows - either contributions to the portfolio or withdrawals from the portfolio - leads to the definition of several rate of returns, with no single rate of return measure [being] appropriate for every purpose2. In this blog post, strongly inspired by the book Practical Portfolio Performance Measurement and Attribution2 from Carl Bacon, I will describe the two main methods of portfolio return calculation in the presence of external cash flows. As an example of usage, I will illustrate the investor gap in the case of a MSCI World ETF, that is, I will show that the returns of that ETF are different from the returns achieved by the average investor in that ETF. Notes: A fully functional Google sheet corresponding to this post is available here External v.s. internal cash flows In the context of portfolio return calculation, two types of cash flows are usually distinguished: External cash flows, corresponding to any new money added to or taken from [a] portfolio, whether in the form of cash or other assets1 Internal cash flows, corresponding to transactions funded from within [a] portfolio1 like dividend and coupon payments, free shares attributed by companies, positions rebalancing, etc. In other words, external cash flows impact the valuation of a portfolio other than by the growth (positive or negative) of the funds invested in that portfolio while internal cash flows have no impact on the portfolio valuation. Portfolio return in the absence of external cash flows Let be: $T$, the number of observations of the value of a portfolio over a time period $1..T$, with $t_1 &amp;gt; t_2 &amp;gt; … &amp;gt; t_T$ the $T$ observation times $V_1$ the value of the portfolio at the initial observation time $t_1$ $V_T$ the value of the portfolio at the final observation time $t_T$ When no external cash flow occurred over a time period, the return $r_p$ of a portfolio over that period is defined as the change of the portfolio value relative to its beginning value3. Mathematically, using the notations above, this gives: \[r_p = \frac{V_T - V_1}{V_1} = \frac{V_T}{V_1} - 1\] $r_p$ is called the simple rate of return or the arithmetic rate of return of the portfolio over the time period $1..T$. To be noted that, by definition, the arithmetic return of a portfolio over a time period is supposed to take into account any internal cash flow out of the portfolio constituents4 (dividends for stocks, coupon payments for bonds…); for this reason, the arithmetic return of a portfolio is also sometimes called the total (arithmetic) return of the portfolio5. Portfolio return in the presence of external cash flows Let now be: $T$, the number of observations of the value of a portfolio over a time period $1..T$, with $t_1 &amp;gt; t_2 &amp;gt; … &amp;gt; t_T$ the $T$ observation times $V_i$ the value of the portfolio at the observation time $t_i$, $i=1..T$ $C_i$ the value of a potential external cash flow ($C_i &amp;gt; 0$ for a contribution, $C_i &amp;lt; 0$ for a withdrawal) at the observation time $t_i$, $i=1..T$, assumed to occur immediately after the observation time $t_i$ (i.e., the cash flow $C_i$ is assumed to be excluded from the portfolio value $V_i$) When an external cash flow occurred over a time period, the cash flow itself will contribute to the [portfolio] valuation1, so that the calculation of [the portfolio return] […] must compensate for the fact that the increase in market value is not entirely due to investment gain[/loss] during the period6. Since at least 19681, there are two main measures of such a compensated portfolio return: The portfolio money-weighted return (MWR), which integrates the timing and the amount of the external cash flows in the portfolio return, leading to a measure of portfolio return that includes the impacts of both the decisions to contribute money in (resp. withdraw money from) the portfolio and the decisions about asset allocation and security selection6. The portfolio time-weighted return (TWR), which eliminates the impact of the external cash flows from the portfolio return, leading to a measure of portfolio return that isolates the decisions about asset allocation and security selection6. Money-weighted return The money-weighted return of a portfolio is a performance statistic reflecting how much money was earned during the measurement period6 and is thus representative of the return an investor actually experiences6. As such, the money-weighted return is influenced by both: The decisions made by the [portfolio] manager [- who can be the investor herself] - to allocate assets and select securities within the portfolio6 The timing of [the investor’s] decisions to contribute to or withdraw money from [the portfolio]6, with the different cases being summarized in Figure 1, taken from Feibel6. Figure 1. Impact of external cash flows on money-weighted portfolio performances. Source: Feibel. Calculation To borrow a methodology used throughout finance1, the money-weighted return of a portfolio corresponds to the internal rate of return (IRR) of that portfolio, defined as the rate of return which reconciles the beginning market value and additional cash flows into the portfolio to the ending market value6. Internal rate of return method Mathematically, the calculation of the (annualized) money-weighted return $r_{mw,irr}$ of a portfolio over a time period through the IRR method is done by solving what is usually called the IRR equation7. Using the notations of this section, that equation is: \[V_1 - \sum_{i=1}^{T} \frac{C_i}{ \left(1 + r_{mw,irr}\right)^{yearfrac(t_i, t_1)}} - \frac{V_T}{ \left(1 + r_{mw,irr}\right)^{yearfrac(t_T, t_1)}} = 0\] , where $yearfrac$ is the number of (fractional) years between two dates using a given day count convention. Numerically, the IRR equation is usually solved by an iterative algorithm, like the Newton-Raphson method or similar other methods. Modified Dietz method The calculation of $r_{mw,irr}$ described in the previous sub-section was a problem when computer CPU time was very expensive and needed to be conserved1, which led to the development of various IRR estimation techniques that did not require [an] iterative algorithm1. One of these approximations - the modified Dietz method - is a first-order [closed-form] linear approximation to the IRR1 method and is the most common8 way of calculating periodic investment returns1. Mathematically, the calculation of the (total) money-weighted return $r_{mw,md}$ of a portfolio over a time period through the modified Dietz method is done by incorporating “weighted” external cash flows into the formula of the simple rate of return. Using the notations of this section, this gives: \[r_{mw,md} = \frac{V_T - V_1 - \sum_{i=1}^T C_i}{V_1 + \sum_{i=1}^T \left( 1 - \frac{yearfrac(t_1, t_i)}{yearfrac(t_1, t_T)} \right) C_i }\] , where the terms $ 1 - \frac{yearfrac(t_1, t_i)}{yearfrac(t_1, t_T)} $, $i=1..T$, are used to weight the [external cash] flows by the number of days they were available for investment during the period1. Caveats The two money-weighted return calculation methods discussed in the previous sub-sections have certain limitations to be aware of: The IRR method provides no theoretical guarantee in general9 as to the existence or the unicity of an internal rate of return. Indeed, the IRR equation might not have any solution or might have several solutions10. In addition, even when the IRR equation has a unique solution, it might be numerically very hard to find… The modified Dietz method has its own issues, for example the accuracy of the result [being] dependent on relatively small capital flows, low volatility and frequent valuations11. In particular, a very important point of attention is that the modified Dietz [method] is less useful as an approximation to [IRR] over longer time periods1. When to use it? Calculating the money-weighted return of a portfolio allows to answer the question How much did a specific investor’s portfolio grow, thanks to both its underlying strategy and its pattern of external cash flows? Thus, the money-weighted return of a portfolio is an appropriate measure of portfolio performance when it is desired to: Compute the rate of return earned on a specific portfolio Compare the rate of return earned on a specific portfolio to another rate (rate of inflation, rate of return earned on another portfolio12, rate of return earned on an alternative investment like real estate or private equity…) Analyse the pattern of external cash flows for a specific portfolio13 Time-weighted return The time-weighted return of a portfolio is a form of total return that measures the performance […] for the complete measurement period6 in a way that fully eliminates the timing effect that external portfolio cash flows have6. So, contrary to its money-weighted counterpart, the time-weighted return of a portfolio measures only the effects of the market and manager decision6. As a side note, the time-weighted return of a portfolio is the only well-behaved rate of return that is not influenced by contributions or withdrawals2, c.f. for example Gray and Dewar2. Calculation The calculation of the time-weighted return of a portfolio over a time period is easily done through the unit price or unitised method1, described in Bacon1: A standardised unit price or “net asset value” price is calculated immediately before each external cash flow by dividing the market value by the number of units previously allocated. Units are then added or subtracted (bought or sold) in the portfolio at the unit price corresponding to the time of the cash flow […]. Mathematically, using the notations of this section, let be $U_i$, $i=1..T$, corresponding to the following transformation14 of $V_i$, $i=1..T$: \[U_1 = 1 \newline U_{i+1} = U_i \frac{V_{i+1}}{V_i + C_i}, i=1..T-1\] The (total) time-weighted return $r_{tw}$ of the portfolio over the time period $1..T$ is then defined as the simple rate of return of the transformed portfolio values $U_i$, $i=1..T$: \[r_{tw} = \frac{U_T - U_1}{U_1} = \frac{U_T}{U_1} - 1\] Caveats The time-weighted return has two main limitations to be aware of: One is practical, with Feibel6 noting that there is […] a potentially significant hurdle to implementing this method: the time-weighted return methodology requires valuation of the portfolio before each cash flow6. For an individual investor, and depending on the exact tools used, this might or might not be a real issue though. One is methological, with the time-weighted return of a portfolio being sometimes positive while the overall portfolio is at a loss15! Bacon1 and Febeil6 provide examples of such cases, the bottom line being that it is important [for the portfolio manager] to perform well in the […] period[s] when the majority of client money is invested1 with the time-weighted return calculation. When to use it? Calculating the time-weighted return of a portfolio allows to answer the question How much did a specific investor’s portfolio grow thanks to its underlying strategy (asset allocation, exposure…)?. The time-weighted return of a portfolio is thus an appropriate measure of portfolio performance when it is desired to compar[e] the performance of [a portfolio with] different asset managers […] and with benchmark indexes1. Hybrid return Bacon1 notes that in practice, many asset managers use neither true time-weighted nor money-weighted calculations exclusively but rather a hybrid combination of both1, some of which are depicted in Figure 2, taken from Bacon1. Figure 2. The evolution of performance returns methodologies. Source: Bacon. Money-weighted v.s. time-weighted return To conclude this section, Figure 3, taken from Feibel6, summarizes the respective properties of money-weighted and time-weighted returns. Figure 3. Properties of money-weighted and time-weighted returns. Source: Feibel. To that summary, I would add that for an individual investor, comparing the money-weighted return and the time-weighted return of a portfolio allows to determine whether the timing of contributions and withdrawals has been successful ($r_{mw} \geq r_{tw}$) or not ($r_{mw} \leq r_{tw}$). In the latter case, it might be interesting to review16 the external cash flows timing strategy (panic sells, fomo buys…). Implementation in Portfolio Optimizer Portfolio Optimizer allows to: Compute the money-weighted and the time-weighted return of a portfolio over a time period, through the endpoints /portfolios/analysis/return/money-weighted and /portfolios/analysis/return/time-weighted. Transform portfolio values impacted by external cash flows into money-weighted or time-weighted portfolio values directly usable to compute portfolio performance indicators (return, Sharpe ratio, maximum drawdown…), through the endpoints /portfolios/transformation/money-weighted and /portfolios/transformation/time-weighted. Examples of usage I propose to illustrate the differences between money-weighted returns and time-weighted returns through two examples: Comparing a dollar cost averaging investment strategy in an ETF to a lump sum investment strategy in the same ETF Comparing the returns of an ETF to the returns of an “average” investor in that ETF Dollar cost averaging v.s. lump sum investing in an MSCI World ETF Let’s suppose that we would like to compare two investment strategies in the Amundi MSCI World UCITS ETF - EUR (C) over the period 29/12/2023 - 31/12/2024: Dollar cost averaging (DCA) investing Portfolio creation - 1000€ on 29/12/2023 Portfolio contributions - 100€ at each month’s end Lump sum (LS) investing Portfolio creation - 2200€ on 29/12/2023 Portfolio contributions - n/a In both cases, the amount of external portfolio cash flows is the same17 - equal to 2200€ - but the patterns of these cash flows are different, leading to different portfolio returns: Portfolio return measure Total return (%) Simple return (DCA) 161.42% Money-weighted return (DCA) 25.45% Time-weighted return (DCA) 26.33% Simple return (LS) 26.33% Money-weighted return (LS) 26.33% Time-weighted return (LS) 26.33% Notes: Detailed calculations are available in the Google Sheet associated to this blog post. The table above empirically confirms that: In the absence of external cash flows, all portfolio return measures discussed in this blog post become equal (lines #4-#6) In the presence of external cash flows The simple return is not an appropriate measure of portfolio return (line #1) Despite this, and at the date of publication of this blog post, there are still some financial services using that measure to compute portfolio returns in the presence of external cash flows, like Trading 212, c.f. the associated discussion on their forum. Usually, it is because of data availability problems like difficulties in identifying external cash flows, but sometimes it is simply because providing accurate portfolio performance measures is not a priority… The money-weighted return is impacted by the timing of external cash flows (line #2) Here, there is a ~-1% difference between the portfolio money-weighted and time-weighted returns, mainly due to the ill-timed18 contribution made on 28/03/2024. Such a difference is relatively insignificant over one year, but depending on the pattern of external cash flows, it could be much worse. For example, if a unique contribution of 1200€ on 28/03/2024 was made instead of monthly contributions of 100€, the associated portfolio money-weighted return would decrease to 22.59%, representing this time a ~-4% difference! The time-weighted return is not impacted by external cash flows19 and is akin to a “lump sum investing”-equivalent return (line #3 is equal to lines #4-#6) Investment returns v.s. average investor returns in an MSCI World ETF Financial studies regularly show that investors tend to lag actual fund returns across a variety of asset classes, leading to what is usually called the investor gap2021. For example: The Dalbar’s Quantitative Analysis of Investor Behavior (QAIB) annual study measure[s] the effects of investor decisions to buy, sell and switch into and out of funds over short and long-term timeframes22 and the results consistently show that the average investor earns less – in many cases, much less – than mutual fund performance reports would suggest22. For the U.S., Figure 4, adapted from Dalbar22, depicts the returns of U.S. funds v.s. the returns of the average investor in U.S. funds over the period 1994-2023. Figure 4. U.S. funds returns v.s. average U.S. funds investor returns (30-year returns, 1994-2023). Source: Dalbar. The observed performance differencial of ~2% per year creates dramatic cumulative effects over such a long period22! The Morningstar’s Mind the Gap annual study compares funds’ dollar-weighted returns [- that is, funds’ returns as experienced by their investors -] with their time-weighted returns to see how large the gap, or difference, has been over time23 and analyses where investors succeeded in capturing most of their funds’ returns23. Figure 5, directly taken from Morningstar23, illustrates that gap by U.S. funds category over the 10 years ended 31th December 2023. Figure 5. Investor return gaps by U.S. category group (10-year returns, 2013-2023). Source: Morningstar. In this sub-section, I propose to apply the same methodology as these studies to the Amundi MSCI World UCITS ETF - EUR (C) over the period 30/04/2018 - 31/12/2024. In more details: I will use the month-end asset data [(AUM)] compared with the underlying total return to estimate a net inflow or outflow for [each] month23, using Morningstar’s approach to estimating funds’ monthly net flows24: The [net] cash flow estimate for a month $C_t$ is the difference in the beginning $NAV_{t-1}$ and ending $NAV_t$ total net assets that cannot be explained by the monthly total return $r_t$, that is, $ C_t = NAV_t - NAV_{t-1} \left( 1 + r_t \right) $. Once all the monthly cash flows are available for the period23, I will calculate the associated money-weighted return - which correspond to the average investor return in the ETF - and I will then compare it to the ETF money-weighted return25 - which corresponds to the ETF return itself. Notes: Detailed calculations are available in the Google Sheet associated to this blog post. Results are the following: Portfolio return measure Annualized return (%) Money-weighted return (AUM) 14.15% Time-weighted return (AUM) 12.74% The table above confirms the presence of an investor gap for the considered MSCI World ETF, but for once, to the benefit (of around ~1.5% yearly) of the average investor! This is at odds with Morningstar’s findings23 that the gap between index ETFs’ dollar-weighted returns and their total return […] was quite a bit wider than the gap for index open-end funds23, but of couse, the analysis of this sub-section has no statistical significance compared to that of Morningstar. Or, maybe investors in the Amundi MSCI World UCITS ETF have specific timing abilities, who knows! Conclusion As mentioned in the introduction, the calculation of portfolio return is the first step in the performance measurement process1. Once properly done, the next step is to know if the return is good or bad. In other words, [it is needed] to evaluate performance (risk and return), against an appropriate benchmark1, which can for example be random portfolios. For more discussions on portfolio performances, feel free to connect with me on LinkedIn or to follow me on Twitter. – See Carl Bacon, Practical Portfolio Performance Measurement and Attribution, Third edition. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 &amp;#8617;10 &amp;#8617;11 &amp;#8617;12 &amp;#8617;13 &amp;#8617;14 &amp;#8617;15 &amp;#8617;16 &amp;#8617;17 &amp;#8617;18 &amp;#8617;19 &amp;#8617;20 &amp;#8617;21 &amp;#8617;22 &amp;#8617;23 See Kenneth B. Gray, Jr. and Robert B. K. Dewar, Axiomatic Characterization of the Time-Weighted Rate of Return, Management Science, Vol. 18, No. 2, Application Series (Oct., 1971), pp. B32-B35. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 See W. Marty, Portfolio Analytics. An Introduction to Return and RiskMeasurement, Springer Texts in Business and Economics (2nd edition), Springer Berlin, 2015.. &amp;#8617; Internal cash flows our of the portfolio is sometimes refered to as income16 from the portfolio constituents. &amp;#8617; In contrast with the price return of the portfolio, which would only take into account the price evolution of the portfolio constituents. &amp;#8617; See Bruce J. Feibel, Investment Performance Measurement. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 &amp;#8617;10 &amp;#8617;11 &amp;#8617;12 &amp;#8617;13 &amp;#8617;14 &amp;#8617;15 &amp;#8617;16 &amp;#8617;17 As noted in Bacon1, the formal definition of the IRR is the discount rate that makes the net present value equal to zero in a discounted cash flow analysis1; that equation is precisely the equation of the net present value in the specific case of a portfolio of assets. &amp;#8617; For example, the modified Dietz method is used in Interactive Brokers PortfolioAnalyst product to compute the money-weighted return of the investor’s portfolio. &amp;#8617; C.f. for example the documentation of the Excel’s function XIRR. &amp;#8617; Although Bacon1 notes that you need to work very hard, with quite extreme data, to generate suitable examples1 because multiple solutions are extremely unlikely to be experienced in practice1. &amp;#8617; See Hager DP. Measurement of Pension Fund Investment Performance. Journal of the Staple Inn Actuarial Students’ Society. 1980;24:33-64. &amp;#8617; For Bacon1, though, the money-weighted return is unique and not really comparable with other portfolios enjoying a different pattern of cash flow1; on this, I kindly disagree, because as an individual investor, I will for example certainly be pleased to have a higher money-weighted return than that of my father/brother in law! &amp;#8617; For example when testing for investor’s skill in timing external cash flows or when comparing dollar-cost averaging (DCA) investment strategies (weekly, monthly, quarterly…). &amp;#8617; That transformation is in effect a normalised market value1 transformation of the portfolio. &amp;#8617; Or conversely, with the time-weighted return of a portfolio being sometimes negative while the overall portfolio is at a gain!! &amp;#8617; For example, by switching to a dollar cost averaging strategy. &amp;#8617; The last contribution of 100€ made on 31/12/2024 is excluded from this count, since it is invested at the end of the considered period. &amp;#8617; From 28/03/2024 to 30/04/2024, the MSCI World ETF under consideration returned -2.74%. &amp;#8617; The time-weighted return is also equal to the NAV return. &amp;#8617; To be noted that the investor gap as discussed here is not related to fund fees; for example, Morningstar23 didn’t find a strong link between fees and investor return gaps in the study23. &amp;#8617; Also known as the behaviour gap. &amp;#8617; See Dalbar’s 2024 QAIB Report. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 See Morningstar’s Mind the Gap 2024 Report. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 See Morningstar’s Estimated Net Cash Flow Methodology. &amp;#8617; As with most funds, ETF returns correspond to time-weighted returns. &amp;#8617;</summary></entry><entry><title type="html">Covariance Matrix Forecasting: Iterated Exponentially Weighted Moving Average Model</title><link href="https://portfoliooptimizer.io/blog/covariance-matrix-forecasting-iterated-exponentially-weighted-moving-average-model/" rel="alternate" type="text/html" title="Covariance Matrix Forecasting: Iterated Exponentially Weighted Moving Average Model" /><published>2024-10-28T00:00:00-05:00</published><updated>2024-10-28T00:00:00-05:00</updated><id>https://portfoliooptimizer.io/blog/covariance-matrix-forecasting-iterated-exponentially-weighted-moving-average-model</id><content type="html" xml:base="https://portfoliooptimizer.io/blog/covariance-matrix-forecasting-iterated-exponentially-weighted-moving-average-model/">&lt;p&gt;In the &lt;a href=&quot;/blog/from-volatility-forecasting-to-covariance-matrix-forecasting-the-return-of-simple-and-exponentially-weighted-moving-average-models/&quot;&gt;previous post&lt;/a&gt; of this series on covariance matrix forecasting, I reviewed both the simple and the exponentially weighted moving average covariance matrix forecasting models, 
which are straightforward extensions of &lt;a href=&quot;/blog/volatility-forecasting-simple-and-exponentially-weighted-moving-average-models/&quot;&gt;their respective univariate volatility forecasting models&lt;/a&gt; to a multivariate setting.&lt;/p&gt;

&lt;p&gt;With these reference models established, &lt;em&gt;we can now delve into more sophisticated approaches for forecasting covariance matrices&lt;/em&gt;&lt;sup id=&quot;fnref:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:15&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In this blog post, I will describe the &lt;em&gt;iterated exponentially weighted moving average (IEWMA) model&lt;/em&gt; that has recently&lt;sup id=&quot;fnref:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; been introduced in Johansson et al.&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; 
and I will illustrate its empirical performances in the context of monthly covariance matrix forecasting for a multi-asset class ETF portfolio.&lt;/p&gt;

&lt;h2 id=&quot;mathematical-preliminaries&quot;&gt;Mathematical preliminaries&lt;/h2&gt;

&lt;h3 id=&quot;covariance-matrix-modelling-and-covariance-proxies-reminders&quot;&gt;Covariance matrix modelling and covariance proxies (reminders)&lt;/h3&gt;

&lt;p&gt;This sub-section contains reminders from a &lt;a href=&quot;/blog/from-volatility-forecasting-to-covariance-matrix-forecasting-the-return-of-simple-and-exponentially-weighted-moving-average-models/&quot;&gt;previous blog post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let $n$ be the number of assets in a universe of assets and $r_t \in \mathbb{R}^n$ be the vector of the (unknown) (&lt;a href=&quot;https://en.wikipedia.org/wiki/Rate_of_return#Logarithmic_or_continuously_compounded_return&quot;&gt;logarithmic&lt;/a&gt;) 
return process of these assets over a time period $t$ (a day, a week, a month..), over which the (conditional) mean return vector $\mu_t \in \mathbb{R}^n$ of these assets is supposed to be null.&lt;/p&gt;

&lt;p&gt;Then:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;$r_t$ can be expressed as&lt;sup id=&quot;fnref:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:10&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; $r_t = \epsilon_t$, with $\epsilon_t \in \mathbb{R}^n$ an unpredictable error term, often referred to as a vector of “shocks” or as a vector of “random disturbances”&lt;sup id=&quot;fnref:10:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:10&quot; class=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;, over the time period $t$&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The asset (conditional) covariance matrix $\Sigma_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ over that time period $t$ is defined as $\Sigma_t = \mathbb{E} \left[ r_t r_t {}^t \right]$&lt;/p&gt;

    &lt;p&gt;From this definition, the &lt;a href=&quot;https://en.wikipedia.org/wiki/Outer_product&quot;&gt;outer product&lt;/a&gt; of the realized asset returns $ \tilde{r}_t \tilde{r}_t {}^t $ over a time period $t$ (a day, a week, a month..) is a covariance estimate $\tilde{\Sigma}_t$ - or covariance proxy&lt;sup id=&quot;fnref:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:9&quot; class=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; - for the (unknown) asset returns covariance matrix over the considered time period $t$.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The asset (conditional) correlation matrix $C_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ is defined as $ C_t = V_t^{-1} \Sigma_t V_t^{-1} $, where $V_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ is the diagonal matrix of the asset (conditional) standard deviations.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;correlation-matrix-modelling&quot;&gt;Correlation matrix modelling&lt;/h3&gt;

&lt;p&gt;In order to &lt;em&gt;clarify the relation between conditional correlations and conditional variances&lt;/em&gt;&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;, Engle&lt;sup id=&quot;fnref:2:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; proposes to &lt;em&gt;write the [asset] returns as the conditional standard deviation 
times the standardized disturbance&lt;/em&gt;&lt;sup id=&quot;fnref:2:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;, that is&lt;/p&gt;

\[r_{i,t} = \epsilon_{i,t} = \sigma_{i,t} \varepsilon_{i,t}, i=1..n\]

&lt;p&gt;, with:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$\sigma_{i,t} = \sqrt{ \mathbb{E} \left[ r_{i,t}^2 \right] } $&lt;/li&gt;
  &lt;li&gt;$\varepsilon_{i,t}$ a &lt;em&gt;standardized disturbance that has mean zero and variance one&lt;/em&gt;&lt;sup id=&quot;fnref:2:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This way, &lt;em&gt;the conditional correlation [between asset $i$ and asset $j$ becomes equal to] the conditional covariance between the standardized disturbances [$\varepsilon_{i,t}$ and $\varepsilon_{j,t}$]&lt;/em&gt;&lt;sup id=&quot;fnref:2:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

\[\rho_{ij,t} = \frac{\mathbb{E} \left[ r_{i,t} r_{j,t} \right]}{\sqrt{ \mathbb{E} \left[ r_{i,t}^2 \right] \mathbb{E} \left[ r_{j,t}^2 \right] }} = \mathbb{E} \left[ \varepsilon_{i,t} \varepsilon_{j,t} \right]\]

&lt;h2 id=&quot;the-iterated-exponentially-weighted-moving-average-covariance-matrix-forecasting-model&quot;&gt;The iterated exponentially weighted moving average covariance matrix forecasting model&lt;/h2&gt;

&lt;p&gt;The IEWMA covariance matrix forecasting model&lt;sup id=&quot;fnref:3:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; is a two-step model that:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Uses &lt;a href=&quot;/blog/volatility-forecasting-simple-and-exponentially-weighted-moving-average-models/&quot;&gt;an EWMA volatility forecasting model&lt;/a&gt; with squared asset returns as variance proxies in order to forecast asset volatilities&lt;/li&gt;
  &lt;li&gt;Uses &lt;a href=&quot;/blog/from-volatility-forecasting-to-covariance-matrix-forecasting-the-return-of-simple-and-exponentially-weighted-moving-average-models/&quot;&gt;an EWMA correlation matrix forecasting model&lt;/a&gt; with outer products of (EWMA-)volatility-standardized asset returns as covariance matrix proxies in order to forecast asset correlations&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Johansson et al.&lt;sup id=&quot;fnref:3:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; highlights that the IEWMA model was originally&lt;sup id=&quot;fnref:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot;&gt;7&lt;/a&gt;&lt;/sup&gt; proposed in Engle&lt;sup id=&quot;fnref:2:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; &lt;em&gt;as an efficient alternative to the DCC-GARCH&lt;sup id=&quot;fnref:2:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; predictor, although he did not refer to it as IEWMA&lt;/em&gt;&lt;sup id=&quot;fnref:2:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In other words, the IEWMA covariance matrix forecasting model bridges the gap between the simple EWMA model and the more complex DCC-GARCH model.&lt;/p&gt;

&lt;h3 id=&quot;forecasting-formulas&quot;&gt;Forecasting formulas&lt;/h3&gt;

&lt;p&gt;Let be:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$n$ be the number of assets in a universe of assets&lt;/li&gt;
  &lt;li&gt;$r_t r_t {}^t$, $t=1..T$ the outer products of the realized asset returns over each of $T$ past periods&lt;/li&gt;
  &lt;li&gt;A decay factor $\lambda_{vol} \in [0, 1]$&lt;/li&gt;
  &lt;li&gt;A decay factor $\lambda_{cor} \in [0, 1]$&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;next-periods-asset-returns-covariancecorrelation-matrix&quot;&gt;Next period’s asset returns covariance/correlation matrix&lt;/h4&gt;

&lt;p&gt;The IEWMA covariance matrix forecasting model estimates the next period’s asset returns covariance matrix $\hat{\Sigma}_{T+1}$ and correlation matrix $\hat{C}_{T+1}$ as follows&lt;sup id=&quot;fnref:3:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;For each asset $i=1..n$ in the universe of assets
    &lt;ul&gt;
      &lt;li&gt;Forecast the asset one-period-ahead variances $\hat{\sigma}^2_{i,2}$, …, $\hat{\sigma}^2_{i,T+1}$ using  an EWMA volatility forecasting model with decay factor $\lambda_{vol}$ and squared asset returns $r_{i,t}^2$, $t=1..T$, as variance proxies&lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;Compute the volatility-standardized asset returns $\tilde{r}_{i,2},…,\tilde{r}_{i,T}$ defined as the asset returns standardized by their one-period-ahead EWMA volatility forecasts&lt;/p&gt;

\[\tilde{r}_{i,t} = \frac{r_{i,t}}{\hat{\sigma}_{i,t}}, t = 2..T\]
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Compute the diagonal matrix of the assets next period’s forecasted volatilities $V_{T+1}$&lt;/p&gt;

\[V_{T+1} = \begin{pmatrix}
                \hat{\sigma}_{1,T+1} &amp;amp; 0 &amp;amp; ... &amp;amp; 0 \\
                0 &amp;amp; \hat{\sigma}_{2,T+1} &amp;amp; ... &amp;amp; 0 \\
                ... &amp;amp; ... &amp;amp; ... &amp;amp; ... \\
                0 &amp;amp; 0 &amp;amp; ... &amp;amp; \hat{\sigma}_{n,T+1}
             \end{pmatrix}\]
  &lt;/li&gt;
  &lt;li&gt;Forecast the next period’s asset returns correlation matrix $\hat{C}_{T+1}$ using an EWMA covariance matrix forecasting model with decay factor $\lambda_{cor}$ and outer products of volatility-standardized asset returns $\tilde{r_t} \tilde{r_t} {}^t$, $t=2..T$, as covariance matrix proxies&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Compute the next period’s forecasted asset returns covariance matrix $\hat{\Sigma}_{T+1}$&lt;/p&gt;

\[\hat{\Sigma}_{T+1} = V_{T+1} \hat{C}_{T+1} V_{T+1}\]
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;next-h-periods-ahead-asset-returns-covariancecorrelation-matrix&quot;&gt;Next $h$-period’s ahead asset returns covariance/correlation matrix&lt;/h4&gt;

&lt;p&gt;The IEWMA covariance matrix forecasting model estimates the next $h$-period’s ahead asset returns covariance matrix $\hat{\Sigma}_{T+h}$ and correlation matrix $\hat{C}_{T+h}$, $h \geq 2$, 
by the next period’s asset returns covariance matrix $\hat{\Sigma}_{T+1}$ and correlation matrix $\hat{C}_{T+1}$.&lt;/p&gt;

&lt;p&gt;Indeed, due to the properties of the EWMA volatility and covariance matrix forecasting models&lt;sup id=&quot;fnref:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;, we have:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$V_{T+h} = V_{T+1}$, $h \geq 2$&lt;/li&gt;
  &lt;li&gt;$\hat{C}_{T+h} = \hat{C}_{T+1}$, $h \geq 2$&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So that $\hat{\Sigma}_{T+h} = \hat{\Sigma}_{T+1}$, $h \geq 2$.&lt;/p&gt;

&lt;h4 id=&quot;averaged-asset-returns-covariancecorrelation-matrix-over-the-next-h-periods&quot;&gt;Averaged asset returns covariance/correlation matrix over the next $h$ periods&lt;/h4&gt;

&lt;p&gt;The IEWMA covariance matrix forecasting model estimates the averaged&lt;sup id=&quot;fnref:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:11&quot; class=&quot;footnote&quot;&gt;9&lt;/a&gt;&lt;/sup&gt; asset returns covariance matrix $\hat{\Sigma}_{T+1:T+h}$ and correlation matrix $\hat{C}_{T+1:T+h}$ over the next $h$ periods, 
$h \geq 2$, by the next period’s asset returns covariance matrix $\hat{\Sigma}_{T+1}$ and correlation matrix $\hat{C}_{T+1}$.&lt;/p&gt;

&lt;p&gt;Indeed, from the previous sub-section, we have:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;$ \hat{\Sigma}_{T+1:T+h} = \frac{1}{h} \sum_{i=1}^{h} \hat{\Sigma}_{T+i} = \hat{\Sigma}_{T+1} $, $h \geq 2$&lt;/li&gt;
  &lt;li&gt;$ \hat{C}_{T+1:T+h} = \frac{1}{h} \sum_{i=1}^{h} \hat{C}_{T+i} = \hat{C}_{T+1} $, $h \geq 2$&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;rationale&quot;&gt;Rationale&lt;/h3&gt;

&lt;p&gt;The rationale behind the IEWMA covariance matrix forecasting model is twofold:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Separate volatility forecasting from correlation matrix forecasting while using the same baseline forecasting model for internal consistency.&lt;/p&gt;

    &lt;p&gt;This first idea is well known among practitioners, c.f. for example Menchero and Li&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;10&lt;/a&gt;&lt;/sup&gt; which describes the usage of different EWMA half-lives for estimating volatilities and correlations&lt;sup id=&quot;fnref:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:13&quot; class=&quot;footnote&quot;&gt;11&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Use volatility-standardized asset returns instead of raw asset returns for correlation matrix forecasting.&lt;/p&gt;

    &lt;p&gt;This second idea, detailled for example in Engle&lt;sup id=&quot;fnref:2:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;, originates from the fact that conditional correlations between asset returns are equal to conditional covariances between standardized disturbances, c.f. the previous section.&lt;/p&gt;

    &lt;p&gt;So, given an estimator for the (unobservable) standardized disturbances $\epsilon_{i,t} = \frac{r_{i,t}}{\sigma_{i,t}}$, $i=1..n$, $t=1..T$, it is possible to estimate the conditional correlations between asset returns.&lt;/p&gt;

    &lt;p&gt;An example of such estimator is the volatility-standardized asset returns $\tilde{r}_{i,t} = \frac{r_{i,t}}{\hat{\sigma}_{i,t}}$, with the drawback that the quality of the correlation forecasts is then influenced by the quality of the volatility forecasts.&lt;/p&gt;

    &lt;p&gt;And unfortunately, because &lt;em&gt;it is well known that […] asset returns […] fat tails are typically reduced but not eliminated when returns are standardized by volatilities estimated from popular [volatility forecasting] models&lt;/em&gt;&lt;sup id=&quot;fnref:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:12&quot; class=&quot;footnote&quot;&gt;12&lt;/a&gt;&lt;/sup&gt; and because &lt;em&gt;the use of correlation as a measure of dependence can be misdealing in the case of (conditionally) non-Gaussian returns&lt;/em&gt;&lt;sup id=&quot;fnref:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:8&quot; class=&quot;footnote&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;, we should not expect any magic here…&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;how-to-choose-the-decay-factors&quot;&gt;How to choose the decay factors?&lt;/h3&gt;

&lt;p&gt;Due to its relationship with the vanilla EWMA volatility and covariance matrix forecasting models, there are two main procedures to choose the decay factors 
$\lambda_{vol}$ and $\lambda_{cor}$ of an IEWMA covariance matrix forecasting model:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Using recommended values from the EWMA models literature (0.94, 0.97…).&lt;/p&gt;

    &lt;p&gt;On this, Johansson et al.&lt;sup id=&quot;fnref:3:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; notes that &lt;em&gt;empirical studies on real return data confirm that choosing a faster volatility half-life than correlation half-life yields better estimates&lt;/em&gt;&lt;sup id=&quot;fnref:2:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; and uses the following pairs of decay factors $\left(\lambda_{vol}, \lambda_{cor}\right)$ in their experimental setup:&lt;/p&gt;
    &lt;ul&gt;
      &lt;li&gt;Short term - $\left(0.870,0.933\right)$, $\left(0.933,0.967\right)$&lt;/li&gt;
      &lt;li&gt;Medium term - $\left(0.967,0.989\right)$, $\left(0.989,0.994\right)$, $\left(0.994,0.997\right)$&lt;/li&gt;
      &lt;li&gt;Long term - $\left(0.997,0.998\right)$, $\left(0.998,0.999\right)$&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Determining the optimal values w.r.t. the forecast horizon $h$, for example through the minimization of the &lt;a href=&quot;https://en.wikipedia.org/wiki/Root-mean-square_deviation&quot;&gt;root mean square error (RMSE)&lt;/a&gt; between the forecasted covariance matrix over the desired horizon and the observed covariance matrix over that horizon&lt;sup id=&quot;fnref:14&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:14&quot; class=&quot;footnote&quot;&gt;14&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

    &lt;p&gt;In practice, because there are two decay factors to choose, this can be done two ways:&lt;/p&gt;
    &lt;ul&gt;
      &lt;li&gt;
        &lt;p&gt;Either consider the two decay factors as a two independent univariate parameters $\lambda_{vol} \in [0,1]$ and $\lambda_{cor} \in [0,1]$.&lt;/p&gt;

        &lt;p&gt;This choice is justified by the original desire to separate volatility forecasting from correlation matrix forecasting.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;Or consider the two decay factors as a single multivariate parameter $\left(\lambda_{vol}, \lambda_{cor}\right) \in [0,1]^2$.&lt;/p&gt;

        &lt;p&gt;This choice is justified by the observed dependency of the correlation forecasts on the volatility-standardized asset returns.&lt;/p&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;extensions-of-the-iterated-exponentially-weighted-moving-average-covariance-matrix-forecasting-model&quot;&gt;Extensions of the iterated exponentially weighted moving average covariance matrix forecasting model&lt;/h2&gt;

&lt;h3 id=&quot;asset-specific-volatility-decay-factors&quot;&gt;Asset-specific volatility decay factors&lt;/h3&gt;

&lt;p&gt;The IEWMA covariance matrix forecasting model uses univariate EWMA models in order to forecast asset volatilities, all these models sharing the same decay factor $\lambda_{vol}$.&lt;/p&gt;

&lt;p&gt;Having an identical decay factor for all assets is parsimonious, but is somewhat at odds with the DCC-GARCH model of Engle&lt;sup id=&quot;fnref:2:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; which uses asset-specific univariate GARCH models - 
that is, each with its own asset-specific parameters - in order to forecast asset volatilities.&lt;/p&gt;

&lt;p&gt;So, one natural extension of the IEWMA model is to allow asset-specific univariate EWMA models - each with its own asset-specific decay factor $\lambda_{i,vol}$, $i=1..n$ - 
in order to forecast asset volatilities.&lt;/p&gt;

&lt;h3 id=&quot;linear-combination-of-iewma-covariance-matrix-forecasting-models&quot;&gt;Linear combination of IEWMA covariance matrix forecasting models&lt;/h3&gt;

&lt;p&gt;Another interesting covariance matrix forecasting model is introduced in Johansson et al.&lt;sup id=&quot;fnref:3:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; as the &lt;em&gt;combined multiple iterated exponentially weighted moving average 
(CM-IEWMA) model&lt;/em&gt;, which consist in a time-varying linear combination of individual IEWMA models, each with its own pair of fixed decay factors.&lt;/p&gt;

&lt;p&gt;As explained in Johansson et al.&lt;sup id=&quot;fnref:3:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;The CM-IEWMA predictor is constructed from a modest number of IEWMA predictors, with different pairs of half-lives, which are combined using dynamically varying weights that are based on recent performance.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The rationale behind the CM-IEWMA model is that &lt;em&gt;different pairs of half-lives may work better for different market conditions&lt;/em&gt;&lt;sup id=&quot;fnref:3:7&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, with &lt;em&gt;short half-lives [typically performing] better 
in volatile markets [and] long half-lives [performing] better for calm markets where conditions are changing slowly&lt;/em&gt;&lt;sup id=&quot;fnref:3:8&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;This behaviour is illustrated in Figure 1, taken from Johansson et al.&lt;sup id=&quot;fnref:3:9&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, which shows the evolution of the weights of a 5-IEWMA CM-IEWMA covariance matrix forecasting model 
applied to a universe of U.S. stocks.&lt;/p&gt;

&lt;figure&gt;
	&lt;a href=&quot;/assets/images/blog/covariance-matrix-forecasting-iewma-cm-iewma-weights-evolution.png&quot;&gt;&lt;img src=&quot;/assets/images/blog/covariance-matrix-forecasting-iewma-cm-iewma-weights-evolution-thumb.png&quot; alt=&quot;Evolution of the weights of a 5-IEWMA CM-IEWMA covariance matrix forecasting model applied to a universe of U.S. stocks, 4th January 2010 - 30th December 2022.&quot; /&gt;&lt;/a&gt;
	&lt;figcaption&gt;Figure 1. Evolution of the weights of a 5-IEWMA CM-IEWMA covariance matrix forecasting model applied to a universe of U.S. stocks, 4th January 2010 - 30th December 2022. Source: Johansson et al.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;From Figure 1, it is visible that although &lt;em&gt;substantial weight is put on the slower (longer halflife) IEWMAs most years&lt;/em&gt;&lt;sup id=&quot;fnref:3:10&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, the CM-IEWMA model still 
&lt;em&gt;adapts the weights depending on market conditions&lt;/em&gt;&lt;sup id=&quot;fnref:3:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;The interested reader is referenced to Johansson et al.&lt;sup id=&quot;fnref:3:12&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; for all the technicalities of the CM-IEWMA model, and in particular for the details about the computation of the dynamically varying 
weights of the individual IEWMA models through the resolution of a convex optimization problem&lt;sup id=&quot;fnref:16&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:16&quot; class=&quot;footnote&quot;&gt;15&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;A last important remark to conclude this sub-section - the CM-IEWMA model is actually &lt;em&gt;a special case of [a more general] dynamically weighted prediction [model]&lt;/em&gt;&lt;sup id=&quot;fnref:3:13&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, so that the same 
weighting logic can be applied to any combination of covariance matrix forecasting models.&lt;/p&gt;

&lt;h2 id=&quot;implementations&quot;&gt;Implementations&lt;/h2&gt;

&lt;h3 id=&quot;implementation-in-portfolio-optimizer&quot;&gt;Implementation in Portfolio Optimizer&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Portfolio Optimizer&lt;/strong&gt; implements the IEWMA covariance and correlation matrix forecasting model through the endpoints &lt;a href=&quot;https://docs.portfoliooptimizer.io/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/assets/covariance/matrix/forecast/ewma/iterated&lt;/code&gt;&lt;/a&gt; and &lt;a href=&quot;https://docs.portfoliooptimizer.io/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/assets/correlation/matrix/forecast/ewma/iterated&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;These endpoints support the 2 covariance proxies below:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Squared (close-to-close) returns&lt;/li&gt;
  &lt;li&gt;Demeaned squared (close-to-close) returns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These endpoints also allow:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;To use asset-specific univariate EWMA models in order to forecast asset volatilities&lt;/li&gt;
  &lt;li&gt;To automatically determine the optimal value of their parameters (the decay factors $\lambda_{vol}$ and $\lambda_{cor}$) using a proprietary procedure.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To be noted that &lt;strong&gt;Portfolio Optimizer&lt;/strong&gt; does not provide any implementation of the CM-IEWMA model&lt;sup id=&quot;fnref:20&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:20&quot; class=&quot;footnote&quot;&gt;16&lt;/a&gt;&lt;/sup&gt;, but c.f. the next sub-section.&lt;/p&gt;

&lt;h3 id=&quot;implementation-elsewhere&quot;&gt;Implementation elsewhere&lt;/h3&gt;

&lt;p&gt;Johansson et al.&lt;sup id=&quot;fnref:3:14&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; kindly provides an open source Python implementation of:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The IEWMA covariance matrix forecasting model&lt;/li&gt;
  &lt;li&gt;The CM-IEWMA covariance matrix forecasting model&lt;/li&gt;
  &lt;li&gt;The general “covariance matrix forecasting models” combination model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;at &lt;a href=&quot;https://github.com/cvxgrp/cov_pred_finance&quot;&gt;https://github.com/cvxgrp/cov_pred_finance&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I definitely encourage anyone interested in the CM-IEMWA model to play with this code!&lt;/p&gt;

&lt;h2 id=&quot;example-of-usage---covariance-matrix-forecasting-at-monthly-level-for-a-portfolio-of-various-etfs&quot;&gt;Example of usage - Covariance matrix forecasting at monthly level for a portfolio of various ETFs&lt;/h2&gt;

&lt;p&gt;As an example of usage, I propose to evaluate the empirical performances of the IEWMA covariance matrix forecating model within the framework of &lt;a href=&quot;/blog/from-volatility-forecasting-to-covariance-matrix-forecasting-the-return-of-simple-and-exponentially-weighted-moving-average-models/&quot;&gt;the previous blog bost&lt;/a&gt;, whose aim is 
to forecast monthly covariance and correlation matrices for a portfolio of 10 ETFs representative&lt;sup id=&quot;fnref:17&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:17&quot; class=&quot;footnote&quot;&gt;17&lt;/a&gt;&lt;/sup&gt; of misc. asset classes:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;U.S. stocks (SPY ETF)&lt;/li&gt;
  &lt;li&gt;European stocks (EZU ETF)&lt;/li&gt;
  &lt;li&gt;Japanese stocks (EWJ ETF)&lt;/li&gt;
  &lt;li&gt;Emerging markets stocks (EEM ETF)&lt;/li&gt;
  &lt;li&gt;U.S. REITs (VNQ ETF)&lt;/li&gt;
  &lt;li&gt;International REITs (RWX ETF)&lt;/li&gt;
  &lt;li&gt;U.S. 7-10 year Treasuries (IEF ETF)&lt;/li&gt;
  &lt;li&gt;U.S. 20+ year Treasuries (TLT ETF)&lt;/li&gt;
  &lt;li&gt;Commodities (DBC ETF)&lt;/li&gt;
  &lt;li&gt;Gold (GLD ETF)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;results---covariance-matrix-forecasting&quot;&gt;Results - Covariance matrix forecasting&lt;/h3&gt;

&lt;p&gt;Results over the period 31st January 2008 - 31st July 2023&lt;sup id=&quot;fnref:22&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:22&quot; class=&quot;footnote&quot;&gt;18&lt;/a&gt;&lt;/sup&gt; for covariance matrices are the following&lt;sup id=&quot;fnref:23&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;19&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Covariance matrix model&lt;/th&gt;
      &lt;th&gt;Covariance matrix MSE&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;SMA, window size of all the previous months (historical average model)&lt;/td&gt;
      &lt;td&gt;9.59 $10^{-6}$&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SMA, window size of the previous year&lt;/td&gt;
      &lt;td&gt;9.08 $10^{-6}$&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;EWMA, optimal&lt;sup id=&quot;fnref:24&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:24&quot; class=&quot;footnote&quot;&gt;20&lt;/a&gt;&lt;/sup&gt; $\lambda$&lt;/td&gt;
      &lt;td&gt;6.52 $10^{-6}$&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;EWMA, $\lambda = 0.97$&lt;/td&gt;
      &lt;td&gt;6.37 $10^{-6}$&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;IEWMA, $\left(\lambda_{vol},\lambda_{cor}\right) = \left(0.97,0.99\right)$&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;6.35 $10^{-6}$&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;IEWMA, optimal&lt;sup id=&quot;fnref:24:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:24&quot; class=&quot;footnote&quot;&gt;20&lt;/a&gt;&lt;/sup&gt; $\lambda_{i,vol}$, $i=1..n$ and $\lambda_{cor}$&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;6.33 $10^{-6}$&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;IEWMA, optimal&lt;sup id=&quot;fnref:24:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:24&quot; class=&quot;footnote&quot;&gt;20&lt;/a&gt;&lt;/sup&gt; $\left(\lambda_{vol},\lambda_{cor}\right)$&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;6.16 $10^{-6}$&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SMA, window size of the previous month (random walk model)&lt;/td&gt;
      &lt;td&gt;6.06 $10^{-6}$&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;EWMA, $\lambda = 0.94$&lt;/td&gt;
      &lt;td&gt;5.78 $10^{-6}$&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;IEWMA, $\left(\lambda_{vol},\lambda_{cor}\right) = \left(0.94,0.97\right)$&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;5.78 $10^{-6}$&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Within this specific evaluation framework, the IEWMA covariance matrix forecasting model unfortunately does not seem to improve upon the previous models despite the added complexity.&lt;/p&gt;

&lt;p&gt;It is also noteworthy that the IEWMA model with asset-specific univariate EWMA models for volatility (line #6) does not exhibit better performances than the vanilla 
IEWMA model (line #7) when using automatically determined parameters&lt;sup id=&quot;fnref:19&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:19&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h3 id=&quot;results---correlation-matrix-forecasting&quot;&gt;Results - Correlation matrix forecasting&lt;/h3&gt;

&lt;p&gt;Results over the period 31st January 2008 - 31st July 2023&lt;sup id=&quot;fnref:22:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:22&quot; class=&quot;footnote&quot;&gt;18&lt;/a&gt;&lt;/sup&gt; for the correlation matrices associated to the covariance matrices of the previous sub-section are the following&lt;sup id=&quot;fnref:23:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:23&quot; class=&quot;footnote&quot;&gt;19&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Covariance matrix model&lt;/th&gt;
      &lt;th&gt;Correlation matrix MSE&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;SMA, window size of the previous month (random walk model)&lt;/td&gt;
      &lt;td&gt;8.19&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SMA, window size of all the previous months (historical average model)&lt;/td&gt;
      &lt;td&gt;8.10&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;EWMA, $\lambda = 0.94$&lt;/td&gt;
      &lt;td&gt;7.67&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SMA, window size of the previous year&lt;/td&gt;
      &lt;td&gt;6.50&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;EWMA, $\lambda = 0.97$&lt;/td&gt;
      &lt;td&gt;6.36&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;EWMA, optimal&lt;sup id=&quot;fnref:24:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:24&quot; class=&quot;footnote&quot;&gt;20&lt;/a&gt;&lt;/sup&gt; $\lambda$&lt;/td&gt;
      &lt;td&gt;5.87&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;IEWMA, $\left(\lambda_{vol},\lambda_{cor}\right) = \left(0.94,0.97\right)$&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;5.85&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;IEWMA, $\left(\lambda_{vol},\lambda_{cor}\right) = \left(0.97,0.99\right)$&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;5.85&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;IEWMA, optimal&lt;sup id=&quot;fnref:24:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:24&quot; class=&quot;footnote&quot;&gt;20&lt;/a&gt;&lt;/sup&gt; $\lambda_{i,vol}$, $i=1..n$ and $\lambda_{cor}$&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;5.72&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;IEWMA, optimal&lt;sup id=&quot;fnref:24:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:24&quot; class=&quot;footnote&quot;&gt;20&lt;/a&gt;&lt;/sup&gt; $\left(\lambda_{vol},\lambda_{cor}\right)$&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;5.70&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;This time, the IEWMA model - and especially the two IEWMA models using automatically determined parameters (lines #9-#10) - does exhibit slightly better performances than all the previous models.&lt;/p&gt;

&lt;p&gt;Nevertheless, the improvement over the previously best-performing model (line #6 - the EWMA model with automatically determined parameter) is not that impressive…&lt;/p&gt;

&lt;p&gt;So, the idea of removing the impact of volatility on asset returns in order to better estimate asset correlations seems to have some merits, but the EWMA volatility forecasting model 
might be insufficient to fully exploit this idea.&lt;/p&gt;

&lt;p&gt;Also, here again, the performances of the IEWMA model with asset-specific univariate EWMA models for volatility are again strictly worse than the performances of the vanilla IEWMA model&lt;sup id=&quot;fnref:19:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:19&quot; class=&quot;footnote&quot;&gt;21&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;One of the main characteristics of the IEWMA covariance matrix forecasting model of Johansson et al.&lt;sup id=&quot;fnref:3:15&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; is to forecast asset correlations using (EWMA-)volatility-standardized asset returns 
instead of raw asset returns, so that the impact of asset volatilities on their correlations is (tentatively) minimized.&lt;/p&gt;

&lt;p&gt;Unfortunately, the empirical performances of that model in terms of correlation matrix forecasting are not that different from those of the EWMA model, which raises the question of whether 
improving the volatility-standardization might lead to better correlation forecasts.&lt;/p&gt;

&lt;p&gt;This will be the subject of a future blog post in this series.&lt;/p&gt;

&lt;p&gt;As usual, feel free to &lt;a href=&quot;https://www.linkedin.com/in/roman-rubsamen/&quot;&gt;connect with me on LinkedIn&lt;/a&gt; or to &lt;a href=&quot;https://twitter.com/portfoliooptim&quot;&gt;follow me on Twitter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;–&lt;/p&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:15&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;ChatGPT-generated, as can be seen by the signature word “delve” :-) ! &lt;a href=&quot;#fnref:15&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:4&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;At the date of the initial publication of this blog post. &lt;a href=&quot;#fnref:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.nowpublishers.com/article/Details/ECO-047&quot;&gt;Kasper Johansson, Mehmet G. Ogut, Markus Pelger, Thomas Schmelzer and Stephen Boyd (2023), A Simple Method for Predicting Covariance Matrices of Financial Returns, Foundations and Trends in Econometrics: Vol. 12: No. 4, pp 324-407&lt;/a&gt;. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:3:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;11&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;12&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:12&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;13&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:13&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;14&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:14&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;15&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:3:15&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;16&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:10&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.pm-research.com/content/iijpormgmt/41/3/97&quot;&gt;Valeriy Zakamulin, A Test of Covariance-Matrix Forecasting Methods, The Journal of Portfolio Management  Spring 2015, 41 (3) 97-108&lt;/a&gt;. &lt;a href=&quot;#fnref:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:10:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:9&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://link.springer.com/chapter/10.1007/978-3-540-71297-8_36&quot;&gt;Patton, A.J., Sheppard, K. (2009). Evaluating Volatility and Correlation Forecasts. In: Mikosch, T., Kreiß, JP., Davis, R., Andersen, T. (eds) Handbook of Financial Time Series. Springer, Berlin, Heidelberg&lt;/a&gt;. &lt;a href=&quot;#fnref:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.tandfonline.com/doi/abs/10.1198/073500102288618487&quot;&gt;Engle, R. (2002). Dynamic Conditional Correlation. Journal of Business &amp;amp; Economic Statistics. 20(3): 339–350&lt;/a&gt;. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:2:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:7&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:9&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:2:10&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;11&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:5&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;That being said, I find that what Engle&lt;sup id=&quot;fnref:2:11&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; describes is closer to an iterated $n$-univariate GARCH volatility forecasting models/EWMA correlation matrix forecasting model than to the iterated EWMA forecasting model of Johansson et al.&lt;sup id=&quot;fnref:3:16&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;… &lt;a href=&quot;#fnref:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:6&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;C.f. the associated blog posts &lt;a href=&quot;/blog/volatility-forecasting-simple-and-exponentially-weighted-moving-average-models/&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;/blog/from-volatility-forecasting-to-covariance-matrix-forecasting-the-return-of-simple-and-exponentially-weighted-moving-average-models/&quot;&gt;there&lt;/a&gt;. &lt;a href=&quot;#fnref:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:11&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.sciencedirect.com/science/article/pii/S0378426622000267&quot;&gt;Gianluca De Nard, Robert F. Engle, Olivier Ledoit, Michael Wolf, Large dynamic covariance matrices: Enhancements based on intraday data, Journal of Banking &amp;amp; Finance, Volume 138, 2022, 106426&lt;/a&gt;. &lt;a href=&quot;#fnref:11&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://joim.com/downloads/correlation-shrinkage-implications-for-risk-forecasting/&quot;&gt;Menchero, Jose and Peng Li. Correlation Shrinkage: Implications for Risk Forecasting, Journal of Investment Management (2020)&lt;/a&gt;. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:13&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Digging a little deeper, the old MAC1 and MAC2 Bloomberg multi-asset risk models were using an half-life of 26 weeks for estimating volatilities and an half-life of 52 weeks for estimating correlations. &lt;a href=&quot;#fnref:13&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:12&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.mfsociety.org/modules/modDashboard/uploadFiles/journals/googleScholar/684.html&quot;&gt;Andersen, Torben G., Tim Bollerslev, Francis X. Diebold, and Paul Labys, 2000, Exchange Rate Returns Standardized by Realized Volatility are (Nearly) Gaussian, Multinational Finance Journal 4, 159-179.&lt;/a&gt;. &lt;a href=&quot;#fnref:12&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:8&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.sciencedirect.com/science/article/abs/pii/S0264999310001392&quot;&gt;Bahram Pesaran, M. Hashem Pesaran, Conditional volatility and correlations of weekly returns and the VaR analysis of 2008 stock market crash, Economic Modelling, Volume 27, Issue 6, 2010, Pages 1398-1416&lt;/a&gt;. &lt;a href=&quot;#fnref:8&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:14&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://www.msci.com/documents/10199/5915b101-4206-4ba0-aee2-3449d5c7e95a&quot;&gt;RiskMetrics. Technical Document, J.P.Morgan/Reuters, New York, 1996. Fourth Edition&lt;/a&gt;. &lt;a href=&quot;#fnref:14&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:16&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The maximization of &lt;em&gt;the average log-likelihood of the combined [covariance matrix] prediction over [a trailing number of periods]&lt;/em&gt;&lt;sup id=&quot;fnref:3:17&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:16&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:20&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;This is because my own tests did not highlight any strong improvement in terms of forecasting ability v.s. the IEWMA model when used with an optimal pair of decay factors. &lt;a href=&quot;#fnref:20&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:17&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;These ETFs are used in the &lt;em&gt;Adaptative Asset Allocation&lt;/em&gt; strategy from &lt;a href=&quot;https://investresolve.com/&quot;&gt;ReSolve Asset Management&lt;/a&gt;, described in the paper &lt;em&gt;Adaptive Asset Allocation: A Primer&lt;/em&gt;&lt;sup id=&quot;fnref:18&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:18&quot; class=&quot;footnote&quot;&gt;22&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;#fnref:17&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:22&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;(Adjusted) daily prices have have been retrieved using &lt;a href=&quot;https://api.tiingo.com/&quot;&gt;Tiingo&lt;/a&gt;. &lt;a href=&quot;#fnref:22&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:22:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:23&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Using the outer product of asset returns - assuming a mean return of 0 - as covariance proxy, and using an expanding historical window of asset returns. &lt;a href=&quot;#fnref:23&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:23:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:24&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The optimal decay factors ($\lambda$, $\lambda_{vol}$, $\lambda_{cor}$) are computed at the end of every month using all the available asset returns history up to that point in time, as implemented in &lt;strong&gt;Portfolio Optimizer&lt;/strong&gt;; thus, there is no look-ahead bias. &lt;a href=&quot;#fnref:24&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:24:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:24:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:24:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:24:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#fnref:24:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:19&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The difference between the two models is probably just noise, which, at worst, implies that the added complexity of the IEWMA model with asset-specific univariate EWMA models is useless in practice. &lt;a href=&quot;#fnref:19&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt; &lt;a href=&quot;#fnref:19:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:18&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See &lt;a href=&quot;https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2328254&quot;&gt;Butler, Adam and Philbrick, Mike and Gordillo, Rodrigo and Varadi, David, Adaptive Asset Allocation: A Primer&lt;/a&gt;. &lt;a href=&quot;#fnref:18&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name>Roman R.</name></author><category term="covariance matrix" /><category term="correlation matrix" /><summary type="html">In the previous post of this series on covariance matrix forecasting, I reviewed both the simple and the exponentially weighted moving average covariance matrix forecasting models, which are straightforward extensions of their respective univariate volatility forecasting models to a multivariate setting. With these reference models established, we can now delve into more sophisticated approaches for forecasting covariance matrices1. In this blog post, I will describe the iterated exponentially weighted moving average (IEWMA) model that has recently2 been introduced in Johansson et al.3 and I will illustrate its empirical performances in the context of monthly covariance matrix forecasting for a multi-asset class ETF portfolio. Mathematical preliminaries Covariance matrix modelling and covariance proxies (reminders) This sub-section contains reminders from a previous blog post. Let $n$ be the number of assets in a universe of assets and $r_t \in \mathbb{R}^n$ be the vector of the (unknown) (logarithmic) return process of these assets over a time period $t$ (a day, a week, a month..), over which the (conditional) mean return vector $\mu_t \in \mathbb{R}^n$ of these assets is supposed to be null. Then: $r_t$ can be expressed as4 $r_t = \epsilon_t$, with $\epsilon_t \in \mathbb{R}^n$ an unpredictable error term, often referred to as a vector of “shocks” or as a vector of “random disturbances”4, over the time period $t$ The asset (conditional) covariance matrix $\Sigma_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ over that time period $t$ is defined as $\Sigma_t = \mathbb{E} \left[ r_t r_t {}^t \right]$ From this definition, the outer product of the realized asset returns $ \tilde{r}_t \tilde{r}_t {}^t $ over a time period $t$ (a day, a week, a month..) is a covariance estimate $\tilde{\Sigma}_t$ - or covariance proxy5 - for the (unknown) asset returns covariance matrix over the considered time period $t$. The asset (conditional) correlation matrix $C_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ is defined as $ C_t = V_t^{-1} \Sigma_t V_t^{-1} $, where $V_t \in \mathcal{M}(\mathbb{R}^{n \times n})$ is the diagonal matrix of the asset (conditional) standard deviations. Correlation matrix modelling In order to clarify the relation between conditional correlations and conditional variances6, Engle6 proposes to write the [asset] returns as the conditional standard deviation times the standardized disturbance6, that is \[r_{i,t} = \epsilon_{i,t} = \sigma_{i,t} \varepsilon_{i,t}, i=1..n\] , with: $\sigma_{i,t} = \sqrt{ \mathbb{E} \left[ r_{i,t}^2 \right] } $ $\varepsilon_{i,t}$ a standardized disturbance that has mean zero and variance one6 This way, the conditional correlation [between asset $i$ and asset $j$ becomes equal to] the conditional covariance between the standardized disturbances [$\varepsilon_{i,t}$ and $\varepsilon_{j,t}$]6 \[\rho_{ij,t} = \frac{\mathbb{E} \left[ r_{i,t} r_{j,t} \right]}{\sqrt{ \mathbb{E} \left[ r_{i,t}^2 \right] \mathbb{E} \left[ r_{j,t}^2 \right] }} = \mathbb{E} \left[ \varepsilon_{i,t} \varepsilon_{j,t} \right]\] The iterated exponentially weighted moving average covariance matrix forecasting model The IEWMA covariance matrix forecasting model3 is a two-step model that: Uses an EWMA volatility forecasting model with squared asset returns as variance proxies in order to forecast asset volatilities Uses an EWMA correlation matrix forecasting model with outer products of (EWMA-)volatility-standardized asset returns as covariance matrix proxies in order to forecast asset correlations Johansson et al.3 highlights that the IEWMA model was originally7 proposed in Engle6 as an efficient alternative to the DCC-GARCH6 predictor, although he did not refer to it as IEWMA6. In other words, the IEWMA covariance matrix forecasting model bridges the gap between the simple EWMA model and the more complex DCC-GARCH model. Forecasting formulas Let be: $n$ be the number of assets in a universe of assets $r_t r_t {}^t$, $t=1..T$ the outer products of the realized asset returns over each of $T$ past periods A decay factor $\lambda_{vol} \in [0, 1]$ A decay factor $\lambda_{cor} \in [0, 1]$ Next period’s asset returns covariance/correlation matrix The IEWMA covariance matrix forecasting model estimates the next period’s asset returns covariance matrix $\hat{\Sigma}_{T+1}$ and correlation matrix $\hat{C}_{T+1}$ as follows3: For each asset $i=1..n$ in the universe of assets Forecast the asset one-period-ahead variances $\hat{\sigma}^2_{i,2}$, …, $\hat{\sigma}^2_{i,T+1}$ using an EWMA volatility forecasting model with decay factor $\lambda_{vol}$ and squared asset returns $r_{i,t}^2$, $t=1..T$, as variance proxies Compute the volatility-standardized asset returns $\tilde{r}_{i,2},…,\tilde{r}_{i,T}$ defined as the asset returns standardized by their one-period-ahead EWMA volatility forecasts \[\tilde{r}_{i,t} = \frac{r_{i,t}}{\hat{\sigma}_{i,t}}, t = 2..T\] Compute the diagonal matrix of the assets next period’s forecasted volatilities $V_{T+1}$ \[V_{T+1} = \begin{pmatrix} \hat{\sigma}_{1,T+1} &amp;amp; 0 &amp;amp; ... &amp;amp; 0 \\ 0 &amp;amp; \hat{\sigma}_{2,T+1} &amp;amp; ... &amp;amp; 0 \\ ... &amp;amp; ... &amp;amp; ... &amp;amp; ... \\ 0 &amp;amp; 0 &amp;amp; ... &amp;amp; \hat{\sigma}_{n,T+1} \end{pmatrix}\] Forecast the next period’s asset returns correlation matrix $\hat{C}_{T+1}$ using an EWMA covariance matrix forecasting model with decay factor $\lambda_{cor}$ and outer products of volatility-standardized asset returns $\tilde{r_t} \tilde{r_t} {}^t$, $t=2..T$, as covariance matrix proxies Compute the next period’s forecasted asset returns covariance matrix $\hat{\Sigma}_{T+1}$ \[\hat{\Sigma}_{T+1} = V_{T+1} \hat{C}_{T+1} V_{T+1}\] Next $h$-period’s ahead asset returns covariance/correlation matrix The IEWMA covariance matrix forecasting model estimates the next $h$-period’s ahead asset returns covariance matrix $\hat{\Sigma}_{T+h}$ and correlation matrix $\hat{C}_{T+h}$, $h \geq 2$, by the next period’s asset returns covariance matrix $\hat{\Sigma}_{T+1}$ and correlation matrix $\hat{C}_{T+1}$. Indeed, due to the properties of the EWMA volatility and covariance matrix forecasting models8, we have: $V_{T+h} = V_{T+1}$, $h \geq 2$ $\hat{C}_{T+h} = \hat{C}_{T+1}$, $h \geq 2$ So that $\hat{\Sigma}_{T+h} = \hat{\Sigma}_{T+1}$, $h \geq 2$. Averaged asset returns covariance/correlation matrix over the next $h$ periods The IEWMA covariance matrix forecasting model estimates the averaged9 asset returns covariance matrix $\hat{\Sigma}_{T+1:T+h}$ and correlation matrix $\hat{C}_{T+1:T+h}$ over the next $h$ periods, $h \geq 2$, by the next period’s asset returns covariance matrix $\hat{\Sigma}_{T+1}$ and correlation matrix $\hat{C}_{T+1}$. Indeed, from the previous sub-section, we have: $ \hat{\Sigma}_{T+1:T+h} = \frac{1}{h} \sum_{i=1}^{h} \hat{\Sigma}_{T+i} = \hat{\Sigma}_{T+1} $, $h \geq 2$ $ \hat{C}_{T+1:T+h} = \frac{1}{h} \sum_{i=1}^{h} \hat{C}_{T+i} = \hat{C}_{T+1} $, $h \geq 2$ Rationale The rationale behind the IEWMA covariance matrix forecasting model is twofold: Separate volatility forecasting from correlation matrix forecasting while using the same baseline forecasting model for internal consistency. This first idea is well known among practitioners, c.f. for example Menchero and Li10 which describes the usage of different EWMA half-lives for estimating volatilities and correlations11. Use volatility-standardized asset returns instead of raw asset returns for correlation matrix forecasting. This second idea, detailled for example in Engle6, originates from the fact that conditional correlations between asset returns are equal to conditional covariances between standardized disturbances, c.f. the previous section. So, given an estimator for the (unobservable) standardized disturbances $\epsilon_{i,t} = \frac{r_{i,t}}{\sigma_{i,t}}$, $i=1..n$, $t=1..T$, it is possible to estimate the conditional correlations between asset returns. An example of such estimator is the volatility-standardized asset returns $\tilde{r}_{i,t} = \frac{r_{i,t}}{\hat{\sigma}_{i,t}}$, with the drawback that the quality of the correlation forecasts is then influenced by the quality of the volatility forecasts. And unfortunately, because it is well known that […] asset returns […] fat tails are typically reduced but not eliminated when returns are standardized by volatilities estimated from popular [volatility forecasting] models12 and because the use of correlation as a measure of dependence can be misdealing in the case of (conditionally) non-Gaussian returns13, we should not expect any magic here… How to choose the decay factors? Due to its relationship with the vanilla EWMA volatility and covariance matrix forecasting models, there are two main procedures to choose the decay factors $\lambda_{vol}$ and $\lambda_{cor}$ of an IEWMA covariance matrix forecasting model: Using recommended values from the EWMA models literature (0.94, 0.97…). On this, Johansson et al.3 notes that empirical studies on real return data confirm that choosing a faster volatility half-life than correlation half-life yields better estimates6 and uses the following pairs of decay factors $\left(\lambda_{vol}, \lambda_{cor}\right)$ in their experimental setup: Short term - $\left(0.870,0.933\right)$, $\left(0.933,0.967\right)$ Medium term - $\left(0.967,0.989\right)$, $\left(0.989,0.994\right)$, $\left(0.994,0.997\right)$ Long term - $\left(0.997,0.998\right)$, $\left(0.998,0.999\right)$ Determining the optimal values w.r.t. the forecast horizon $h$, for example through the minimization of the root mean square error (RMSE) between the forecasted covariance matrix over the desired horizon and the observed covariance matrix over that horizon14. In practice, because there are two decay factors to choose, this can be done two ways: Either consider the two decay factors as a two independent univariate parameters $\lambda_{vol} \in [0,1]$ and $\lambda_{cor} \in [0,1]$. This choice is justified by the original desire to separate volatility forecasting from correlation matrix forecasting. Or consider the two decay factors as a single multivariate parameter $\left(\lambda_{vol}, \lambda_{cor}\right) \in [0,1]^2$. This choice is justified by the observed dependency of the correlation forecasts on the volatility-standardized asset returns. Extensions of the iterated exponentially weighted moving average covariance matrix forecasting model Asset-specific volatility decay factors The IEWMA covariance matrix forecasting model uses univariate EWMA models in order to forecast asset volatilities, all these models sharing the same decay factor $\lambda_{vol}$. Having an identical decay factor for all assets is parsimonious, but is somewhat at odds with the DCC-GARCH model of Engle6 which uses asset-specific univariate GARCH models - that is, each with its own asset-specific parameters - in order to forecast asset volatilities. So, one natural extension of the IEWMA model is to allow asset-specific univariate EWMA models - each with its own asset-specific decay factor $\lambda_{i,vol}$, $i=1..n$ - in order to forecast asset volatilities. Linear combination of IEWMA covariance matrix forecasting models Another interesting covariance matrix forecasting model is introduced in Johansson et al.3 as the combined multiple iterated exponentially weighted moving average (CM-IEWMA) model, which consist in a time-varying linear combination of individual IEWMA models, each with its own pair of fixed decay factors. As explained in Johansson et al.3: The CM-IEWMA predictor is constructed from a modest number of IEWMA predictors, with different pairs of half-lives, which are combined using dynamically varying weights that are based on recent performance. The rationale behind the CM-IEWMA model is that different pairs of half-lives may work better for different market conditions3, with short half-lives [typically performing] better in volatile markets [and] long half-lives [performing] better for calm markets where conditions are changing slowly3. This behaviour is illustrated in Figure 1, taken from Johansson et al.3, which shows the evolution of the weights of a 5-IEWMA CM-IEWMA covariance matrix forecasting model applied to a universe of U.S. stocks. Figure 1. Evolution of the weights of a 5-IEWMA CM-IEWMA covariance matrix forecasting model applied to a universe of U.S. stocks, 4th January 2010 - 30th December 2022. Source: Johansson et al. From Figure 1, it is visible that although substantial weight is put on the slower (longer halflife) IEWMAs most years3, the CM-IEWMA model still adapts the weights depending on market conditions3. The interested reader is referenced to Johansson et al.3 for all the technicalities of the CM-IEWMA model, and in particular for the details about the computation of the dynamically varying weights of the individual IEWMA models through the resolution of a convex optimization problem15. A last important remark to conclude this sub-section - the CM-IEWMA model is actually a special case of [a more general] dynamically weighted prediction [model]3, so that the same weighting logic can be applied to any combination of covariance matrix forecasting models. Implementations Implementation in Portfolio Optimizer Portfolio Optimizer implements the IEWMA covariance and correlation matrix forecasting model through the endpoints /assets/covariance/matrix/forecast/ewma/iterated and /assets/correlation/matrix/forecast/ewma/iterated. These endpoints support the 2 covariance proxies below: Squared (close-to-close) returns Demeaned squared (close-to-close) returns These endpoints also allow: To use asset-specific univariate EWMA models in order to forecast asset volatilities To automatically determine the optimal value of their parameters (the decay factors $\lambda_{vol}$ and $\lambda_{cor}$) using a proprietary procedure. To be noted that Portfolio Optimizer does not provide any implementation of the CM-IEWMA model16, but c.f. the next sub-section. Implementation elsewhere Johansson et al.3 kindly provides an open source Python implementation of: The IEWMA covariance matrix forecasting model The CM-IEWMA covariance matrix forecasting model The general “covariance matrix forecasting models” combination model at https://github.com/cvxgrp/cov_pred_finance. I definitely encourage anyone interested in the CM-IEMWA model to play with this code! Example of usage - Covariance matrix forecasting at monthly level for a portfolio of various ETFs As an example of usage, I propose to evaluate the empirical performances of the IEWMA covariance matrix forecating model within the framework of the previous blog bost, whose aim is to forecast monthly covariance and correlation matrices for a portfolio of 10 ETFs representative17 of misc. asset classes: U.S. stocks (SPY ETF) European stocks (EZU ETF) Japanese stocks (EWJ ETF) Emerging markets stocks (EEM ETF) U.S. REITs (VNQ ETF) International REITs (RWX ETF) U.S. 7-10 year Treasuries (IEF ETF) U.S. 20+ year Treasuries (TLT ETF) Commodities (DBC ETF) Gold (GLD ETF) Results - Covariance matrix forecasting Results over the period 31st January 2008 - 31st July 202318 for covariance matrices are the following19: Covariance matrix model Covariance matrix MSE SMA, window size of all the previous months (historical average model) 9.59 $10^{-6}$ SMA, window size of the previous year 9.08 $10^{-6}$ EWMA, optimal20 $\lambda$ 6.52 $10^{-6}$ EWMA, $\lambda = 0.97$ 6.37 $10^{-6}$ IEWMA, $\left(\lambda_{vol},\lambda_{cor}\right) = \left(0.97,0.99\right)$ 6.35 $10^{-6}$ IEWMA, optimal20 $\lambda_{i,vol}$, $i=1..n$ and $\lambda_{cor}$ 6.33 $10^{-6}$ IEWMA, optimal20 $\left(\lambda_{vol},\lambda_{cor}\right)$ 6.16 $10^{-6}$ SMA, window size of the previous month (random walk model) 6.06 $10^{-6}$ EWMA, $\lambda = 0.94$ 5.78 $10^{-6}$ IEWMA, $\left(\lambda_{vol},\lambda_{cor}\right) = \left(0.94,0.97\right)$ 5.78 $10^{-6}$ Within this specific evaluation framework, the IEWMA covariance matrix forecasting model unfortunately does not seem to improve upon the previous models despite the added complexity. It is also noteworthy that the IEWMA model with asset-specific univariate EWMA models for volatility (line #6) does not exhibit better performances than the vanilla IEWMA model (line #7) when using automatically determined parameters21. Results - Correlation matrix forecasting Results over the period 31st January 2008 - 31st July 202318 for the correlation matrices associated to the covariance matrices of the previous sub-section are the following19: Covariance matrix model Correlation matrix MSE SMA, window size of the previous month (random walk model) 8.19 SMA, window size of all the previous months (historical average model) 8.10 EWMA, $\lambda = 0.94$ 7.67 SMA, window size of the previous year 6.50 EWMA, $\lambda = 0.97$ 6.36 EWMA, optimal20 $\lambda$ 5.87 IEWMA, $\left(\lambda_{vol},\lambda_{cor}\right) = \left(0.94,0.97\right)$ 5.85 IEWMA, $\left(\lambda_{vol},\lambda_{cor}\right) = \left(0.97,0.99\right)$ 5.85 IEWMA, optimal20 $\lambda_{i,vol}$, $i=1..n$ and $\lambda_{cor}$ 5.72 IEWMA, optimal20 $\left(\lambda_{vol},\lambda_{cor}\right)$ 5.70 This time, the IEWMA model - and especially the two IEWMA models using automatically determined parameters (lines #9-#10) - does exhibit slightly better performances than all the previous models. Nevertheless, the improvement over the previously best-performing model (line #6 - the EWMA model with automatically determined parameter) is not that impressive… So, the idea of removing the impact of volatility on asset returns in order to better estimate asset correlations seems to have some merits, but the EWMA volatility forecasting model might be insufficient to fully exploit this idea. Also, here again, the performances of the IEWMA model with asset-specific univariate EWMA models for volatility are again strictly worse than the performances of the vanilla IEWMA model21. Conclusion One of the main characteristics of the IEWMA covariance matrix forecasting model of Johansson et al.3 is to forecast asset correlations using (EWMA-)volatility-standardized asset returns instead of raw asset returns, so that the impact of asset volatilities on their correlations is (tentatively) minimized. Unfortunately, the empirical performances of that model in terms of correlation matrix forecasting are not that different from those of the EWMA model, which raises the question of whether improving the volatility-standardization might lead to better correlation forecasts. This will be the subject of a future blog post in this series. As usual, feel free to connect with me on LinkedIn or to follow me on Twitter. – ChatGPT-generated, as can be seen by the signature word “delve” :-) ! &amp;#8617; At the date of the initial publication of this blog post. &amp;#8617; See Kasper Johansson, Mehmet G. Ogut, Markus Pelger, Thomas Schmelzer and Stephen Boyd (2023), A Simple Method for Predicting Covariance Matrices of Financial Returns, Foundations and Trends in Econometrics: Vol. 12: No. 4, pp 324-407. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 &amp;#8617;10 &amp;#8617;11 &amp;#8617;12 &amp;#8617;13 &amp;#8617;14 &amp;#8617;15 &amp;#8617;16 See Valeriy Zakamulin, A Test of Covariance-Matrix Forecasting Methods, The Journal of Portfolio Management Spring 2015, 41 (3) 97-108. &amp;#8617; &amp;#8617;2 See Patton, A.J., Sheppard, K. (2009). Evaluating Volatility and Correlation Forecasts. In: Mikosch, T., Kreiß, JP., Davis, R., Andersen, T. (eds) Handbook of Financial Time Series. Springer, Berlin, Heidelberg. &amp;#8617; See Engle, R. (2002). Dynamic Conditional Correlation. Journal of Business &amp;amp; Economic Statistics. 20(3): 339–350. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 &amp;#8617;7 &amp;#8617;8 &amp;#8617;9 &amp;#8617;10 &amp;#8617;11 That being said, I find that what Engle6 describes is closer to an iterated $n$-univariate GARCH volatility forecasting models/EWMA correlation matrix forecasting model than to the iterated EWMA forecasting model of Johansson et al.3… &amp;#8617; C.f. the associated blog posts here and there. &amp;#8617; See Gianluca De Nard, Robert F. Engle, Olivier Ledoit, Michael Wolf, Large dynamic covariance matrices: Enhancements based on intraday data, Journal of Banking &amp;amp; Finance, Volume 138, 2022, 106426. &amp;#8617; See Menchero, Jose and Peng Li. Correlation Shrinkage: Implications for Risk Forecasting, Journal of Investment Management (2020). &amp;#8617; Digging a little deeper, the old MAC1 and MAC2 Bloomberg multi-asset risk models were using an half-life of 26 weeks for estimating volatilities and an half-life of 52 weeks for estimating correlations. &amp;#8617; See Andersen, Torben G., Tim Bollerslev, Francis X. Diebold, and Paul Labys, 2000, Exchange Rate Returns Standardized by Realized Volatility are (Nearly) Gaussian, Multinational Finance Journal 4, 159-179.. &amp;#8617; See Bahram Pesaran, M. Hashem Pesaran, Conditional volatility and correlations of weekly returns and the VaR analysis of 2008 stock market crash, Economic Modelling, Volume 27, Issue 6, 2010, Pages 1398-1416. &amp;#8617; See RiskMetrics. Technical Document, J.P.Morgan/Reuters, New York, 1996. Fourth Edition. &amp;#8617; The maximization of the average log-likelihood of the combined [covariance matrix] prediction over [a trailing number of periods]3. &amp;#8617; This is because my own tests did not highlight any strong improvement in terms of forecasting ability v.s. the IEWMA model when used with an optimal pair of decay factors. &amp;#8617; These ETFs are used in the Adaptative Asset Allocation strategy from ReSolve Asset Management, described in the paper Adaptive Asset Allocation: A Primer22. &amp;#8617; (Adjusted) daily prices have have been retrieved using Tiingo. &amp;#8617; &amp;#8617;2 Using the outer product of asset returns - assuming a mean return of 0 - as covariance proxy, and using an expanding historical window of asset returns. &amp;#8617; &amp;#8617;2 The optimal decay factors ($\lambda$, $\lambda_{vol}$, $\lambda_{cor}$) are computed at the end of every month using all the available asset returns history up to that point in time, as implemented in Portfolio Optimizer; thus, there is no look-ahead bias. &amp;#8617; &amp;#8617;2 &amp;#8617;3 &amp;#8617;4 &amp;#8617;5 &amp;#8617;6 The difference between the two models is probably just noise, which, at worst, implies that the added complexity of the IEWMA model with asset-specific univariate EWMA models is useless in practice. &amp;#8617; &amp;#8617;2 See Butler, Adam and Philbrick, Mike and Gordillo, Rodrigo and Varadi, David, Adaptive Asset Allocation: A Primer. &amp;#8617;</summary></entry></feed>