Correlation Matrices Denoising: Results from Random Matrix Theory
The estimation of empirical correlation matrices in finance is known to be affected by noise, in the form of measurement error, due in part to the short length of the time series of asset returns typically used in their computation^{1}.
Worse, large empirical correlation matrices have been shown to be so noisy that, except for their largest eigenvalues and their associated eigenvectors, they can essentially be considered as random.
For example, Laloux et al.^{2} reports that around 94% of the spectrum of an empirical correlation matrix estimated from the returns of the S&P 500 constituents is indistinguishable from the spectrum of a random correlation matrix!
So, before using an empirical correlation matrix, it is usually advised to denoise it.
In this post, I will present the denoising method proposed by Laloux et al.^{2}, based on results from random matrix theory (RMT), and I will illustrate its behaviour in two universe of assets.
Notes:
 An excellent introduction to random matrix theory and some of its applications in finance is the book A First Course in Random Matrix Theory^{1} from Marc Potters and JeanPhilippe Bouchaud.
Mathematical preliminaries
In the context of this post, the most important result from random matrix theory is the MarchenkoPastur theorem, which describes the eigenvalue distribution of large random covariance matrices.
The MarchenkoPastur theorem
Below is a simple version of the MarchenkoPastur theorem, in the case of i.i.d. gaussian observations^{1}.
Let be:
 $X \in \mathcal{M}(\mathbb{R}^{n \times T})$ an observation matrix made of i.i.d. gaussian entries with mean 0 and variance $\sigma^2$^{3}
 $\Sigma \in \mathcal{M}(\mathbb{R}^{n \times n})$ the empirical covariance matrix associated to $X$, defined by $\Sigma = \frac{1}{T} X X^t$
Then, given $N \to +\infty$, $T \to +\infty$ and $0 < q = \frac{n}{T} \leq 1$, the density of eigenvalues of the matrix $\Sigma$ converges to the MarchenkoPastur density defined by
\[\rho_{MP}(\lambda) = \begin{cases} \frac{\sqrt{\left(\lambda_{+}  \lambda\right)\left(\lambda  \lambda_{}\right)}}{2 \pi q \sigma^2}, \textrm{if } \lambda \in [\lambda_{+}, \lambda_{}] \newline 0, \textrm{if } \lambda \notin [\lambda_{+}, \lambda_{}] \end{cases}\], with $\lambda_{}$ the lower edge of the spectrum defined by
\[\lambda_{} = \sigma^2 \left(1  \sqrt q \right)^2\]and $\lambda_{+}$ the upper edge of the spectrum defined by
\[\lambda_{+} = \sigma^2 \left(1 + \sqrt q \right)^2\]To be noted that, under proper technical assumptions, the MarchenkoPastur theorem remains valid for observations drawn from more general distributions, like fattailed distributions (general i.i.d. observations^{4}, general i.i.d. columns and general dependence structure within the columns^{5}…).
In other words, the MarchenkoPastur theorem establishes that the distribution of the eigenvalues of a large random covariance matrix is actually universal, in that it follows a distribution independent^{6} of the underlying observation matrix.
Potters and Bouchaud^{1} explain this surprising result as follows:
For large random matrices, many scalar quantities […] do not fluctuate from sample to sample, or more precisely such fluctuations go to zero in the large N limit. Physicists speak of this phenomenon as selfaveraging and mathematicians speak of concentration of measure.
To help visualize the MarchenkoPastur theorem, Figure 1 displays together
 The eigenvalue density of the empirical correlation matrix of a $n = 1000$ by $T = 5000$ random observation matrix made of i.i.d. standard gaussian variables
 The MarchenkoPastur density with parameters $q = \frac{1000}{5000} = 0.2$ and $\sigma^2=1$
A couple of remarks to finish:
 The parameter $q$ is usually called the aspect ratio of the empirical covariance matrix $\Sigma$
 When $q > 1$, the MarchenkoPastur theorem is still valid^{1}, with an additional Dirac mass appearing at $\lambda = 0$ in the MarchenkoPastur density to account for the null eigenvalues of $\Sigma$
The MarchenkoPastur theorem in nonasymptotic regime
Because empirical covariance matrices are of finite dimensions in practice, a natural question to ask is whether the MarchenkoPastur theorem remains applicable in a nonasymptotic regime^{7}.
Figure 1 already demonstrates a pretty good agreement between theory and practice with values of $n = 1000$ and $T = 5000$ that are far from infinity.
What about smaller values?
Like $n = 100$ and $T = 500$, displayed in Figure 2.
Or even like $n = 10$ and $T = 50$, displayed in Figure 3.
Based on Figures 1 to 3, it appears that the MarchenkoPastur theorem remains applicable with small values of $n$ and $T$ down to $\approx 100$, but that caution is warranted for very small values of $n$ and $T$ of order $\approx 10$.
A method to denoise correlation matrices based on random matrix theory
Description
Laloux et al.^{2} propose a method to denoise empirical correlation matrices based on random matrix theory, called the eigenvalues clipping method^{8}.
The rationale behind this method is that by comparing the spectrum of an empirical correlation matrix to the spectrum of a random correlation matrix it is possible to identify the random part of the empirical correlation matrix.
More formally, let be:
 $C \in \mathcal{M}(\mathbb{R}^{n \times n})$ an empirical correlation matrix associated to a (large) universe of $n$ variables, determined using $T > n$ observations per variable.
 $\lambda_1 \ge \lambda_2 \ge … \ge \lambda_n \ge 0$ the eigenvalues of $C$
 $0 < q = \frac{n}{T} < 1$
Then, the upper edge $\lambda_{+}$ of the MarchenkoPastur density can serve as a threshold to identify the noisy part of $C$:
 All the eigenvalues of $C$ belonging to $[\lambda_{}, \lambda_{+}]$ are compatible with the hypothesis of a random correlation matrix and can be considered to represent eigenvalues associated to noise
 All the eigenvalues of $C$ strictly smaller than $\lambda_{}$ can also be considered to represent eigenvalues associated to noise^{9}
 All the eigenvalues of $C$ strictly greater than $\lambda_{+}$ are not compatible with the hypothesis of a random correlation matrix and can be considered to represent “true” eigenvalues
This leads to the following method to denoise $C$:
 All the eigenvalues of $C$ lower than or equal to $\lambda_{+}$ are replaced by a constant value
 Either equal to the average value of the “noisy” eigenvalues, as in the original paper of Laloux et al.^{2}
 Or equal to zero, as in Plerou et al.^{10}^{11}
 All the eigenvalues of $C$ strictly greater than $\lambda_{+}$ are left unchanged
The reason why the eigenvalues associated to noise should be replaced by a constant value is that^{2}
Since the eigenstates corresponding to the “noise band” are not expected to contain real information, one should not distinguish the different eigenvalues […] in this sector.
Practical details
Two important practical details are missing from the previous description of the eigenvalues clipping method.
$q$ must be an adjustable parameter
Comparing an empirical correlation matrix of aspect ratio $q$ to a random correlation matrix of aspect ratio $q$ is usually incorrect because of the presence of both temporal correlations (autocorrelations) and spatial correlations (crosssectional correlations) in the observations used to estimate the empirical correlation matrix^{1}.
This is especially true for time series of asset returns^{2}.
As a consequence, the aspect ratio $q$ must be considered as an adjustable parameter and not as a constant value equal to $\frac{n}{T}$, as explained by Potters and Bouchaud^{1}
Intuitively, correlated samples are somehow redundant and the sample [correlation] matrix should behave as if we had observed not $T$ samples but an effective number $T^* < T$.
$\sigma^2$ must be an adjustable parameter
Comparing an empirical correlation matrix to a purely random correlation matrix is also usually incorrect, because empirical eigenvalues strictly above $\lambda_{+}$ are reducing the variance $\sigma^2$ of the random part of the empirical correlation matrix^{2}.
This variance must then be considered as an adjustable parameter and not as a constant value equal to one.
Finding the best values of the adjustable parameters $q$ and $\sigma^2$
Because of what precedes, the eigenvalues clipping method must in practice include a preliminary step to find the “best” values of the parameters $q$ and $\sigma^2$, which could for example be defined as the values that bring the eigenvalue density of the random correlation matrix as close as possible^{12} to the eigenvalue density of the empirical correlation matrix, as in de Prado^{13}.
Figure 4 and Figure 5, taken from Gatheral^{14}, illustrate the impact of such a preliminary step on the empirical correlation matrix of $n = 431$ stocks belonging to the S&P 500 index and computed using $T = 2,155$ daily returns for each stock^{15}.
The MarchenkoPastur density with optimal parameters $q = 0.34$ and $\sigma^2 = 0.53$ displayed on Figure 5 definitely better fits the random part of the eigenvalue density of the empirical correlation matrix than the MarchenkoPastur density with “by the book” parameters $q = 0.2$ and $\sigma^2 = 1$ displayed on Figure 4.
Implementation in Portfolio Optimizer
Portfolio Optimizer implements the eigenvalues clipping method through the endpoint /assets/correlation/matrix/denoised
,
using a proprietary algorithm to find the best values of the adjustable parameters $q$ and $\sigma^2$.
Caveats
There is one wellknown limitation to the eigenvalues clipping method.
Results from random matrix theory establish that the spectrum of an empirical correlation matrix is usually a broadened version of the spectrum of its true unobservable counterpart^{8}. That is, small empirical eigenvalues are usually too small and large empirical eigenvalues are usually too large.
The eigenvalues clipping method does increase the small empirical eigenvalues, but does not alter the large empirical eigenvalues so that they remain overestimated in the denoised empirical correlation matrix.
Example of application  Meanvariance analysis
Markowitz’s meanvariance analysis is one of the most wellknown frameworks to construct a portfolio with an optimal level of risk^{16} and return from a universe of risky assets.
One of its inputs is a correlation matrix, representing future asset correlations, and while the most natural choice [for it] is to use the sample [correlation] matrix determined using a series of past returns, […] [this choice] […] can lead to disastrous results^{1}.
Indeed, because the sample correlation matrix is affected by noise, its usage leads to a dramatic underestimation of the real risk, by overinvesting in artificially lowrisk eigenvectors^{17}.
A possible solution to this issue is to denoise it thanks to the eigenvalues clipping method, for which I will illustrate the behaviour on two very different universe of assets.
Large universe of similar assets
Laloux et al.^{17} analyze the impact of using the sample correlation matrix v.s. the denoised sample correlation matrix in a large universe of $n = 406$ stocks belonging to the S&P 500 index.
In details, they:
 Split their dataset of stock returns covering the whole period 19911996 into two datasets covering respectively the period 19911993 and the period 19941996.
 Compute^{18}:
 A first predicted meanvariance efficient frontier, using as input the sample correlation matrix of the stock returns over the “past” period 19911993
 A second predicted meanvariance efficient frontier, using as input the same sample correlation matrix as before but this time denoised thanks to the eigenvalues clipping method
 A realized meanvariance efficient frontier, computed using as input the sample correlation matrix of the stock returns over the “future” period 19941996
The three resulting efficient frontiers are displayed in Figure 6, taken from Laloux et al.^{17}, where:
 The leftmost curve represents the first predicted meanvariance efficient frontier
 The middle curve represents the second predicted meanvariance efficient frontier
 The rightmost curve represents the realized meanvariance efficient frontier
It is clearly visible that:
 The realized risk is underestimated by the two predicted meanvariance efficient frontiers, because the realized meanvariance efficient frontier is located to the left of the two predicted meanvariance efficient frontiers
 The denoised sample correlation matrix is better at predicting the realized risk than the sample correlation matrix, because the second predicted meanvariance efficient frontier is closer to the realized meanvariance efficient frontier than the first one
Large universe of similar assets, part 2
The catch with the previous example^{19} is that the predicted meanvariance efficient frontiers are usually computed in the literature without taking into account the reallife constraints faced by individuals or mutual funds like no short sales constraints or maximum investment constraints.
Problem is, Jagannathan and Ma^{20} established that such constraints have a shrinkagelike effect on the sample asset correlation matrix, very similar to the effect of a denoising method.
As a consequence, the eigenvalues clipping method might actually have no additional value compared to simply imposing asset weight constraints.
In order to determine whether this is the case, I used Portfolio Optimizer to reproduce the methodology of Laloux et al.^{17} in a universe of $n = 470$ stocks belonging to the S&P 500 index^{21} and I imposed nonnegativity constraints on the computed efficient portfolios’ weights.
The three resulting efficient frontiers are displayed in Figure 7, on which the denoised sample correlation matrix is again visibly better at predicting the realized risk than the sample correlation matrix.
Nevertheless, the improvement is not as dramatic as in Laloux et al.^{17}, which is in agreement, for example, with the findings of Golden and Flint^{22} for the South African equity market.
Small universe of dissimilar assets
I will now analyze the impact of using the sample correlation matrix v.s. the denoised sample correlation matrix in the small universe of $n = 10$ assets of the Adaptative Asset Allocation strategy from ReSolve Asset Management, described in the paper Adaptive Asset Allocation: A Primer^{23}:
 U.S. stocks (SPY ETF)
 European stocks (EZU ETF)
 Japanese stocks (EWJ ETF)
 Emerging market stocks (EEM ETF)
 U.S. REITs (VNQ ETF)
 International REITs (RWX ETF)
 U.S. 710 year Treasuries (IEF ETF)
 U.S. 20+ year Treasuries (TLT ETF)
 Commodities (DBC ETF)
 Gold (GLD ETF)
For this, I propose to adapt the methodology of Laloux et al.^{17} to the rules of the Adaptative Asset Allocation strategy^{24}, which leads to the computation of the following quantities at the end of the last trading day of each month:
 Meanvariance input estimation
 The past sample correlation matrix $C_i$ of the daily asset returns over the past 6 months ($T \approx 126$)
 The denoised past sample correlation matrix $\hat{C_i}$, using the Portfolio Optimizer endpoint
/assets/correlation/matrix/denoised
 The future sample correlation matrix $C_o$ of the daily asset returns over the next month
 The future volatilities $\sigma_o$ of the daily asset returns over the next month
 The future average returns $\mu_o$ of the daily asset returns over the next month
 Meanvariance efficient frontiers computation, with a noshortsales constraint
 The predicted insample meanvariance efficient frontier, using as inputs $\sigma_o$, $C_i$ and $\mu_o$
 The predicted filtered insample meanvariance efficient frontier, using as inputs $\sigma_o$, $\hat{C_i}$ and $\mu_o$
 The realized outofsample meanvariance efficient frontier, using as inputs $\sigma_o$, $C_o$ and $\mu_o$
When applied to the period June 2020  September 2022^{25}, the results are that:
 ~60% of the time, the two predicted insample meanvariance efficient frontiers are nearly indistinguishable, as illustrated in Figure 8
 ~30% of the time, the predicted filtered insample meanvariance efficient frontier is slightly closer to realized outofsample meanvariance efficient frontier than the predicted insample meanvariance efficient frontier, as illustrated in Figure 9
 ~10% of the time, the predicted filtered insample meanvariance efficient frontier is slightly farther from the realized outofsample meanvariance efficient frontier than the predicted insample meanvariance efficient frontier, as illustrated in Figure 10
To summarize, on this example, using the denoised sample correlation matrix instead of the sample correlation matrix does no harm or improves the realized risk estimate ~90% of the time^{26}, which is quite remarkable for such small values of $n$ and $T$!
Conclusion
As usual, feel free to connect with me on LinkedIn or to follow me on Twitter if you would like to discuss about Portfolio Optimizer (new feature request, support request…) or random finance stuff.
–

See Marc Potters, JeanPhilippe Bouchaud, A First Course in Random Matrix Theory, Cambridge University Press. ↩ ↩^{2} ↩^{3} ↩^{4} ↩^{5} ↩^{6} ↩^{7} ↩^{8}

See Laurent Laloux, Pierre Cizeau, JeanPhilippe Bouchaud, and Marc Potters, Noise Dressing of Financial Correlation Matrices, Phys. Rev. Lett. 83, 1467. ↩ ↩^{2} ↩^{3} ↩^{4} ↩^{5} ↩^{6} ↩^{7}

When $\sigma^2 = 1$, the matrix $\Sigma$ is the empirical correlation matrix associated to $X$. ↩

See V. A. Marchenko, L. A. Pastur  Distribution of eigenvalues for some sets of random matrices  Mat. Sb. (N.S.), 72(114):4 (1967), 507–536. ↩

See Pavel Yaskov, A short proof of the Marchenko–Pastur theorem, Comptes Rendus Mathematique Volume 354, Issue 3, March 2016. ↩

Under the proper technical assumptions mentioned above. ↩

Which could not be the case, for example because of an extremely slow rate of convergence. ↩

See Joel Bun, JeanPhilippe Bouchaud, Marc Potters, Cleaning correlation matrices, Risk.net. ↩ ↩^{2}

This is not a theoretical consequence of the MarchenkoPastur theorem, but rather a practical consequence of the attempt to limit small eigenvalues. In addition, small eigenvalues are usually not as clearly separated from the bulk of the spectrum as the large eigenvalues. ↩

See Vasiliki Plerou, Parameswaran Gopikrishnan, Bernd Rosenow, Luís A. Nunes Amaral, Thomas Guhr, and H. Eugene Stanley, Random matrix approach to cross correlations in financial data, Phys. Rev. E 65, 066126, 27 June 2002. ↩

Choosing the constant equal to zero requires an additional manipulation of the denoised correlation matrix to make it a valid correlation matrix, c.f. Plerou et al.^{10}. ↩

For example, in terms of $l^2$ norm. ↩

See Lopez de Prado, Marcos, A Robust Estimator of the Efficient Frontier. ↩

See Jim Gatheral, Random Matrix Theory and Covariance Estimation, NYU Courant Institute Algorithmic Trading Conference (October 2008). ↩

The aspect ratio of this empirical correlation matrix is thus $q = \frac{431}{2,155} = 0.2$. ↩

In meanvariance analysis, the risk of a portfolio is defined in terms of the variance of its returns. ↩

See Laurent Laloux, Pierre Cizeau, JeanPhilippe Bouchaud, and Marc Potters, Random matrix theory and financial correlations, International Journal of Theoretical and Applied Finance, Vol. 03, No. 03, pp. 391397 (2000). ↩ ↩^{2} ↩^{3} ↩^{4} ↩^{5} ↩^{6}

In order to isolate the impact of the asset correlations from the asset returns and the asset volatilities, future asset returns and future asset volatilities are used in the computation of the efficient frontiers. ↩

For example, the same remark applies to Plerou et al.^{10}. ↩

See Ravi Jagannathan & Tongshu Ma, 2003. Risk Reduction in Large Portfolios: Why Imposing the Wrong Constraints Helps, Journal of Finance, American Finance Association, vol. 58(4), pages 16511684, 08. ↩

The associated dataset, which covers the period 11th February 2013  07th February 2018, is available on Kaggle. ↩

See Golden, Daron and Flint, Emlyn, Improving Portfolio Allocation Through Covariance Matrix Filtering. ↩

See Butler, Adam and Philbrick, Mike and Gordillo, Rodrigo and Varadi, David, Adaptive Asset Allocation: A Primer. ↩

Taken from Allocate Smartly’s blog. ↩

I retrieved the adjusted ETF prices over the period January 2020  September 2022 using Tiingo. ↩

A possible next step would be to determine if this improves the backtested performances of the Adaptative Asset Allocation strategy. ↩