Computation of TheoryImplied Correlation Matrices: Overview and Example
In this short post, I will provide an overview of the TIC algorithm^{1} introduced by Marcos Lopez de Prado in his paper Estimation of TheoryImplied Correlation Matrices^{2}, which aims to compute a forwardlooking asset correlation matrix blending both empirical and theoretical inputs.
I will also describe the associated implementation tweaks in Portfolio Optimizer.
Notes:
 A Google sheet corresponding to this post is available here
TheoryImplied Correlation algorithm overview
Step 1  Constrained hierarchical clustering of the assets
The first step of the TheoryImplied Correlation algorithm consists in using a hierarchical clustering algorithm to group similar assets together based on a distance metric $d$ derived from their pairwise correlations, defined as
\[d_{i,j} = \sqrt{\frac{1}{2} (1  c_{i,j})}\]where $d_{i,j}$ (resp. $c_{i,j}$) is the distance (resp. the correlation) between asset $i$ and asset $j$, $i,j = 1..n$, with $n \ge 2$ the total number of assets.
Nevertheless, and contrary for example to the Hierarchical Risk parity algorithm, the hierarchical clustering algorithm is constrained to match a prior represented by a theoretical tree graph structure, which usually corresponds to a hierarchical classification of the assets like:
 The Standard Industrial Classification or the MSCI Global Industry Classification Standard when assets are stocks
 The MSCI ACWI Index market allocation, illustrated in Figure 1, when assets are country ETFs
The result of this first step is a hierarchical clustering tree which somewhat best matches the asset correlations with the theoretical tree structure^{3}.
In the code accompanying the original paper^{2}, a combination of a single linkage clustering algorithm^{4} and of a custom average linkage clustering algorithm^{5} is used as the hierarchical clustering algorithm.
In Portfolio Optimizer, 4 hierarchical clustering algorithms are supported^{6}:
 Single linkage
 Complete linkage
 (Default) Average linkage
 Ward’s linkage
Also, in Portfolio Optimizer, the number of levels of the theoretical tree graph structure is limited to 4, which corresponds to the 4 levels of the MSCI GICS hierarchical classification^{7}.
Step 2  Computation of the implied asset correlation matrix
The second step of the TheoryImplied Correlation algorithm consists in determining the asset correlation matrix associated with the hierarchical clustering tree computed in the first step.
This is done by inverting the relationship between the distance metric $d$ and the assets correlation $c_{i,j}$, $i,j = 1..n$, which leads to
\[c_{i,j} = 1  2 d_{C_i, C_j}^2\]where $C_i$ and $C_j$ are the two clusters of the hierarchical clustering tree satisfying the following properties:
 $C_i$ and $C_j$ are the two children of a node $n$ of the hierarchical clustering tree such that asset $i$ belongs to $C_i$ and asset $j$ belongs to $C_j$
 There are no other clusters $\hat{C}_i$ and $\hat{C}_j$ that are the two children of a node $\hat{n}$, indirect child of the node $n$, such that asset $i$ belongs to $\hat{C}_i$ and asset $j$ belongs to $\hat{C}_j$
In other words, $C_i$ and $C_j$ are the two children of the deepest node $n$ of the hierarchical clustering tree such that asset $i$ belongs to $C_i$ and asset $j$ belongs to $C_j$.
Step 3  Denoising of the implied asset correlation matrix
The third and final step of the TheoryImplied Correlation algorithm consists in altering the implied asset correlation matrix computed in the second step to transform it into a valid correlation matrix.
Indeed, as noted by Marcos Lopez de Prado^{2}:
The correlation matrix derived from the [hierarchical clustering tree] may not be definite positive, or it may have a high condition number.
In the code accompanying the original paper^{2}, the implied asset correlation matrix is denoised using an algorithm based on random matrix theory and detailed by Marcos Lopez de Prado in his book Machine Learning for Asset Managers^{8}.
In Portfolio Optimizer, to better separate the concerns, such alteration is done by calling one of the following API endpoints:
/assets/correlation/matrix/nearest
, to compute the nearest correlation matrix from the implied asset correlation matrix
TheoryImplied Correlation algorithm usage with Portfolio Optimizer
As a quick practical example of Portfolio Optimizer usage, I will reproduce the example from the article of Hudson & Thames Portfolio Optimisation with PortfolioLab: TheoryImplied Correlation Matrix.
In order to illustrate their own TIC algorithm, Hudson & Thames propose to work with a universe of 23 ETFs^{9} for which they define the theoretical tree graph structure displayed in Figure 2.
Thanks to their PortfolioLab Python library, they compute the denoised theoryimplied correlation matrix of these 23 ETFs and they compare it to the original empirical correlation matrix of these same 23 ETFs.
I will do the same with Portfolio Optimizer.
Computation of the ETFs empirical correlation matrix
Thanks to price data retrieved from both^{10} Tiingo and Alpha Vantage, as well as to the Portfolio Optimizer API endpoints
/assets/returns
and /assets/correlation/matrix
, the empirical correlation matrix of the 23 ETFs is immediate
to compute and is illustrated in Figure 3.
This empirical correlation matrix is nearly perfectly matching the empirical correlation matrix from Hudson & Thames, except for the XLU ETF which exhibit a negative correlation to the equity ETFs and a positive correlation to the bonds ETFs over the selected period!
Computation of the raw ETFs theoryimplied correlation matrix
Using the computed empirical correlation matrix together with the theoretical tree graph structure displayed in Figure 2, it is possible to compute the raw, non denoised, theoryimplied correlation matrix of the 23 ETFs.
With Portfolio Optimizer, this is done through the following invocation of the API endpoint /assets/correlation/matrix/theoryimplied
fetch('https://api.portfoliooptimizer.io/v1/assets/correlation/matrix/theoryimplied',
{
method: 'POST',
headers: { 'ContentType': 'application/json' },
body: JSON.stringify({ assets: [ {assetHierarchicalClassification: [10, 1010, 101010]}, ... ],
assetsCorrelationMatrix: [[1.0,0.7943899971154988, ...], ...]
})
})
, which returns
{
"assetsCorrelationMatrix":[[1,0.8129900881613477, ...], ...]
}
This correlation matrix is unfortunately not positive semidefinite^{11}, so that it needs to be transformed into a valid correlation matrix.
Computation of the final ETFs theoryimplied correlation matrix
As mentioned earlier in this post, “fixing” a non positive semidefinite theoryimplied correlation matrix can be done with the Portfolio Optimizer API endpoint
/assets/correlation/matrix/nearest
.
Here, the resulting valid theoryimplied correlation matrix is illustrated in Figure 4.
Again, this final theoryimplied correlation matrix is nearly perfectly matching its equivalent from Hudson & Thames, except for the XLU ETF.
Computation of the distance from the ETFs empirical correlation matrix to the final ETFs theoryimplied correlation matrix
As a last step, it is possible to compute the distance from the ETFs empirical correlation matrix to the final ETFs theoryimplied correlation matrix.
For this, and like both in the paper from Marcos Lopez de Prado^{2} and in the article from Hudson & Thames, I will use a distance called the correlation matrix distance^{12} implemented in
Portfolio Optimizer API endpoint /assets/correlation/matrix/distance
.
The computed distance is $\approx 0.105$, which is much greater than the computed distance of $\approx 0.036$ in the article from Hudson & Thames, but since the correlation data for the XLU ETF completely differs, this is not unexpected.
In any cases, this distance is of the same order of magnitude as the distance computed by Marcos Lopez de Prado for the S&P 500^{2}, so that, as he puts it:
While [the] TIC [matrix] departs from the empirical correlation matrix […], the two are not too far apart. This corroborates that the TIC matrix has blended theoryimplied views with empirical evidence.
Last words
No specific last words this time, except that I recently created my LinkedIn profile.
So, feel free to connect with me to discuss about Portfolio Optimize or more generally quantitative stuff!
–

TIC algorithm stands for TheoryImplied Correlation algorithm. ↩

See Lopez de Prado, Marcos Estimation of TheoryImplied Correlation Matrices (November 9, 2019). ↩ ↩^{2} ↩^{3} ↩^{4} ↩^{5} ↩^{6}

For example, in case of a universe of assets made of the country ETFs illustrated on Figure 1, an unconstrained hierarchical clustering algorithm might cluster together the Canada ETF and the Singapore ETF, while the TheoryImplied Correlation algorithm will always first cluster together the Canada ETF and the United States ETF on one hand and all the Asia region ETFs on the other hand. ↩

Used to cluster the leaves of the theoretical tree graph structure. ↩

Used to compute the distances between newly created clusters and old clusters. ↩

Used both to cluster the leaves of the theoretical tree graph structure AND to compute the distances between newly created clusters and old clusters, so that contrary to the original paper^{2}, there is a unique hierarchical clustering algorithm used throughout all the process. ↩

This limit is neither a technical limit nor an API limit, so that it can be increased  if ever needed  by simply reaching out. ↩

See Lopez de Prado, M. (2019): Machine Learning for Asset Managers. Cambridge University Press, First edition. ↩

The 23 ETFs are EEM, EWG, TIP, EWJ, EFA, IEF, EWQ, EWU, XLB, XLE, XLF, LQD, XLK, XLU, EPP, FXI, VGK, VPL, SPY, TLT, BND, CSJ, DIA. ↩

Price data have been retrieved from Tiingo for all the ETFs except CSJ and from Alpha Vantage for CSJ and are covering the period 20170731 to 20180731. This latter date corresponds to the termination date of the CSJ ETF. ↩

This can be verified thanks to the Portfolio Optimizer API endpoint
/assets/correlation/matrix/validation
. ↩ 
See M. Herdin, N. Czink, H. Ozcelik and E. Bonek, “Correlation matrix distance, a meaningful measure for evaluation of nonstationary MIMO channels,” 2005 IEEE 61st Vehicular Technology Conference, 2005, pp. 136140 Vol. 1, doi: 10.1109/VETECS.2005.1543265. ↩