# Sparse Index Tracking: Limiting the Number of Assets in an Index Tracking Portfolio

In the previous post, I introduced the index tracking problem1, which consists in finding a portfolio that tracks as closely as possible2 a given financial market index.

Because such a portfolio might contain any number of assets, with for example an S&P 500 tracking portfolio possibly containing ~500 stocks, it is [sometimes desirable] that the tracking portfolio consists of a small number of assets3 in order to simplify the execution, avoid small and illiquid positions, and large transaction costs3.

In other words, it is sometimes desirable to impose a constraint on the maximum number of assets contained in an index tracking portfolio, which leads to an extension of the regular index tracking problem called the sparse index tracking problem4.

In this new post, I will describe the mathematics of the sparse index tracking problem and I will detail a few examples of usage like:

• Replicating a fund of funds to reduce fees while keeping the number of directly-invested funds manageable
• Replicating the S&P 500 with only ~50 stocks as an (extreme) illustration of the optimized sampling5 ETF methodology

## Mathematical preliminaries

### The general sparse index tracking optimization problem

Let be:

• $T$, a number of time periods
• $r_{idx} = \left( r_{idx, 1}, …, r_{idx, T} \right) \in \mathcal{R}^{T}$, the vector of the index arithmetic returns over each of the $T$ time periods
• $n$, the number of assets in the universe of the sparse index tracking portfolio
• $X \in \mathcal{R}^{T \times n}$ the matrix of the $n$ assets arithmetic returns over each of the $T$ time periods
• $w = \left( w_1,…,w_n \right) \in \mathcal{R}^{n}$ the vector of the weights of a portfolio in each of the $n$ assets

The vector of the sparse index tracking portfolio weights $w^*$ is then the6 solution to the optimization problem which consists in minimizing a tracking error measure $f \left( X w - r_{idx} \right)$ - with $f$ some loss function - subject to additional constraints like full investment constraint, no short sale constraint, etc. and a cardinality constraint3

$w^* = \operatorname{argmin} f \left( X w - r_{idx} \right) \newline \textrm{s.t. } \begin{cases} \sum_{i=1}^{n} w_i = 1 \newline 0 \leqslant w_i \leqslant 1, i = 1..n \newline ... \newline \lVert w \rVert_0 \le n_{max} \end{cases}$

, where:

• $\lVert . \rVert_0$ is the zero “norm”7 of a vector, that is, the number of non-zero elements in that vector
• $n_{max}$ is the maximum number of assets with non-zero weights desired in the sparse index tracking portfolio

From a computational perspective, and whatever the exact definition of the loss function $f$, it is known that when a cardinality constraint restricting the number of stocks is introduced, the problem of optimizing the composition of a portfolio tends to become NP-hard8, which means that exact solutions to instances of realistic sizes are computationally intractable, and thus inexact solution methods are the only practical ones8.

Standard quadratic optimization methods guaranteeing an optimal solution are thus unfortunately not usable to solve the sparse index tracking optimization problem9

### Heuristics for solving the general sparse index tracking optimization problem

While an exhaustive enumeration of all possible methods for solving the sparse index tracking optimization problem is out of scope of this post, most of them10 seem to fall into one of three rough categories.

#### Combinatorial optimization methods

The cardinality constraint present in the sparse index tracking problem makes it a combinatorial problem, so that standard combinatorial optimization methods (continuous relaxation of a mixed integer programming formulation811, genetic algorithms1213…) can be used to find an approximate solution.

As a side note, thanks to the relationship of the sparse index tracking problem with combinatorial optimization and hence with operations research, it is possible to find datasets for different index tracking problems - ranging from a universe of 31 assets to a universe of 2151 assets - in the OR-Library.

#### Signal processing methods

The sparse index tracking problem in finance is closely related to a similar problem in signal processing called compressed sensing, for which specific algorithms have been developed over the years, like the iterative hard thresholding algorithm14 or the compressive sampling matching pursuit (CoSaMP) algorithm15.

To be noted, though, that all signal processing methods that rely on the least absolute shrinkage and selection operator (LASSO), which consists in approximating the $l_0$-norm constraint by a $l_1$-norm constraint, are not applicable to the sparse index tracking problem. Indeed, when full investment and no-short sales constraints are imposed16, we have, for any portfolio weights vector $w$

$\lVert w \rVert_1 = \sum_{i=1}^n \left| w_i \right| = \sum_{i=1}^n w_i = 1$

, so that the $l_1$-norm reduces to a constant and is therefore irrelevant3.

#### Non-linear optimization methods

At the crossroads with signal processing methods, generic non-linear optimization methods taking into account cardinality constraints have also been developed over the years:

• Methods that directly integrate cardinality constraints, like projected gradient methods1718 or coordinate descent methods18

• Methods that indirectly integrate cardinality constraints by either:

• Relaxing the $l_0$-norm constraint into an “easier” constraint, like the entropic lower bound algorithm of Jacquet et al.19
• Reformulating the $l_0$-norm constraint as a $l_0$-norm penalty integrated in the tracking error measure20 and replacing that $l_0$-norm penalty with an “easier” penalty, like the $q$-norm approximation algorithm of Jansen and van Dijk21 or the majorization-minimization algorithm of Benidis et al.3

### The sparse index tracking/empirical tracking error optimization problem

Like in the previous post of this series, this post will use the empirical tracking error as the preferred tracking error measure.

In this case, the general sparse index tracking optimization problem becomes

$w^* = \operatorname{argmin} \frac{1}{T} \lVert X w - r_{idx} \rVert_2^2 \newline \textrm{s.t. } \begin{cases} \sum_{i=1}^{n} w_i = 1 \newline 0 \leqslant w_i \leqslant 1, i = 1..n \newline ... \newline \lVert w \rVert_0 \le n_{max} \end{cases}$

, which, as noted by Benidis et al.3, is nothing else than a constrained sparse regression problem:

the sparse index tracking problem is similar to many sparsity formulations in the signal processing area in the sense that it is a regression problem with some sparsity requirements

## Implementation in Portfolio Optimizer

Portfolio Optimizer allows to compute an approximate solution to the sparse index tracking optimization problem under the empirical tracking error measure through the endpoint /portfolios/replication/index-tracking/sparse.

In addition, Portfolio Optimizer allows to impose misc. constraints like minimum and maximum asset weights or minimum and maximum group weights constraints22.

## Examples of usage

### Replicating the S&P 500 by (extreme) optimized sampling

One of the most immediate application of the sparse index tracking problem is the ability to replicate a financial market index without investing into all its constituents, which is desirable for the reasons explained in the introduction of this post.

In order to illustrate this, I propose to partially replicate one of the numerical experiments in Benidis et al.3.

In details, using the S&P 500 data provided in the R package sparseIndexTracking - which consists in the daily returns of the S&P 500 index and of 386 of its constituents over the period from 1st January 2010 to 31th December 2010 - I will solve23 the following 80 sparse index tracking problems:

• Index - S&P 500
• Tracking assets - 386 stocks included in the S&P 500
• Maximum number of assets - Varying from 20 to 100
• Misc. constraints - Maximum asset weight constraint of 5%

The empirical tracking error of the resulting 80 sparse S&P 500 tracking portfolios, as well as - for reference - the empirical tracking error of the regular S&P 500 tracking portfolio24, is depicted in Figure 1.

From this figure, it is visible that a sparse tracking portfolio made of ~100 stocks allows to track the S&P 500 with the same level of tracking error as a regular tracking portfolio made of much more stocks24.

But it is possible to do better!

A sparse tracking portfolio made of only ~50 stocks is actually sufficient, as illustrated in Figure 2.

Of course, such a small portfolio might be less “diversified”25 than its 100-stock counterpart, but the main takeaway is that a portfolio made of ~50-100 stocks adequately replicates the S&P 500, which opens the door of direct investing26 to DIY investors.

### Automatically determining the proper composite benchmark for a mutual fund

In the previous post of this series, the index tracking machinery was shown to allow to easily and automatically construct a benchmark for any mutual fund.

By extension, the sparse index tracking machinery also allows to do the same, with an important advantage for multi-asset mutual funds - the automated construction of the best composite benchmark with a given number of constituents.

### Replicating a fund of funds to reduce fees

As a last example of usage, I propose to build upon a LinkedIn post from Finominal, in which the following question is asked:

What’s the likelihood that all […] funds in JP’s Investor Growth & Income Fund contribute value?

A bit of context first.

The JPMorgan Investor Growth & Income Fund is a fund of funds (FoF) whose main investment strategy is to invest in other J.P. Morgan Funds27. As of 31 December 2023, it is invested in 25 such other J.P. Morgan funds (JPMorgan Core Bond R6, JPMorgan US Equity R6…).

One problem with FoF is that they usually have higher fees than traditional mutual funds because they include the management fees charged by the underlying funds28.

The JPMorgan Investor Growth & Income Fund is no different, with a total expense ratio ranging from 1.47% to 0.72%, the latter being for its institutional class share (ticker ONGFX).

Inspired by the results from the previous post of this series, it is possible to reduce those fees by investing directly in all the underlying J.P. Morgan funds in proportion determined by solving23 the following regular index tracking problem:

• Index - The JPMorgan Investor Growth & Income Fund (ONGFX)
• Tracking assets - 24 over the 25 J.P. Morgan funds held by the JPMorgan Investor Growth & Income Fund as of 31 December 2023, knowing that the 25th fund is marked as restricted for investment by Morningstar29

Using price data30 for the period 01 January 2023 - 31 December 2023, this gives the regular tracking portfolio whose weights are displayed in Figure 3 and whose performances are compared with the JPMorgan Investor Growth & Income Fund in Figure 4.

This tracking portfolio has:

• A practically null tracking error v.s. the JPMorgan Investor Growth & Income Fund (Figure 4)
• An expense ratio of ~0.43%31 v.s. of at least 0.72% for the JPMorgan Investor Growth & Income Fund

So, this tracking portfolio is an interesting alternative to the JPMorgan Investor Growth & Income Fund, but there is a catch - it is invested in a total of 22 J.P. Morgan funds (Figure 3), which might be cumbersome to manage…

An even better tracking portfolio would have a smaller number of directly-invested funds.

Does such a unicorn exist?

Figures 5 and 6 answer this question, by solving23 the following sequence of sparse index tracking problems over the period 01 January 2023 - 31 December 2023:

• Index - The JPMorgan Investor Growth & Income Fund (ONGFX)
• Tracking assets - The 24 non-restricted J.P. Morgan funds held by the JPMorgan Investor Growth & Income Fund as of 31 December 2023, knowing that the 25th fund is marked as restricted for investment by Morningstar29
• Maximum number of assets - Varying from 1 to 24

From these two figures, the sweet spot for the number of directly-invested funds in terms of tracking error/fees/manageability seems to be ~6-8.

Coming back to the initial question from Finominal, this analysis confirms that there is basically zero likelihood that all […] funds in JP’s Investor Growth & Income Fund contribute value, because these can be drastically reduced32.

## Conclusion

That’s it for the first blog post of 2024!

Next in this series, I will discuss another extension of the regular index tracking problem, aiming this time at tracking a financial index with “dynamic” asset weights over the considered $T$ time periods.

1. Also called the benchmark replication problem

2. In terms of a given tracking error measure, like the empirical tracking error3, also called the mean squared tracking error

3. Also called the partial benchmark replication problem

4. Or a solution, in case multiple solutions exist.

5. The zero “norm” is not a real norm.

6. To be noted that in the specific case of a quadratic loss function, Mutunge and Haugland8 establish that the sparse index tracking optimization problem is actually strongly NP-hard

7. Excluding simple heuristics like selecting the largest assets according to a given criteria. For example, a widely used method, especially for a market capitalization weighted index, is to select the largest K assets according to their market capitalizations3; as another example, greedy or reverse greedy algorithms can be used on top of a non-sparse formulation of the index tracking problem to include or exclude assets one by one19

8. Which is usually the case in practice…

9. It is not trivial to show that such a reformulation is equivalent; c.f. Jansen and van Dijk21 for a detailed proof.

10. The possibility to include minimum and maximum group weights constraints is important in practice, because it allows for example to set an upper limit on fees or to (try to) impose sector neutrality in the computed portfolio, a characteristic often overlooked c.f. for example Che et al.33

11. Using Portfolio Optimizer 2 3

12. The non-sparse S&P 500 tracking portfolio, subject to the same maximum asset weight constraint of 5% as the sparse S&P 500 tracking portfolios, is made of 286 stocks.  2

13. For example, subject to sector concentration bias, as highlighted in Che et al.33

14. Direct investing solutions are usually provided by asset management firms (Fidelity, Frec…). In the case of a stock market index, investing directly in stocks composing that stock market index, as opposed to indirectly through an ETF, has several advantages (null expense ratio, possibility to do tax-loss harvesting…).

15. C.f. Wikipedia

16. C.f. Morningstar 2

17. (Adjusted) prices have have been retrieved using Tiingo

18. Computation made using the expense ratios reported by Morningstar for the different J.P. Morgan funds, you will have to trust me on this one. To also be noted that the fees associated to the potential rebalancing of the tracking portfolio are not taken into account and would need to be properly minimized in real life.

19. In view of Figure 5 and Figure 6, I would take the returns increased part of Finominal’s LinkedIn post with a grain of salt, although it is certain that returns will increase thanks to lowering the fees.

Tags:

Updated: