Sparse Index Tracking: Limiting the Number of Assets in an Index Tracking Portfolio
In the previous post, I introduced the index tracking problem^{1}, which consists in finding a portfolio that tracks as closely as possible^{2} a given financial market index.
Because such a portfolio might contain any number of assets, with for example an S&P 500 tracking portfolio possibly containing ~500 stocks, it is [sometimes desirable] that the tracking portfolio consists of a small number of assets^{3} in order to simplify the execution, avoid small and illiquid positions, and large transaction costs^{3}.
In other words, it is sometimes desirable to impose a constraint on the maximum number of assets contained in an index tracking portfolio, which leads to an extension of the regular index tracking problem called the sparse index tracking problem^{4}.
In this new post, I will describe the mathematics of the sparse index tracking problem and I will detail a few examples of usage like:
 Replicating a fund of funds to reduce fees while keeping the number of directlyinvested funds manageable
 Replicating the S&P 500 with only ~50 stocks as an (extreme) illustration of the optimized sampling^{5} ETF methodology
Mathematical preliminaries
The general sparse index tracking optimization problem
Let be:
 $T$, a number of time periods
 $r_{idx} = \left( r_{idx, 1}, …, r_{idx, T} \right) \in \mathcal{R}^{T}$, the vector of the index arithmetic returns over each of the $T$ time periods
 $n$, the number of assets in the universe of the sparse index tracking portfolio
 $X \in \mathcal{R}^{T \times n}$ the matrix of the $n$ assets arithmetic returns over each of the $T$ time periods
 $w = \left( w_1,…,w_n \right) \in \mathcal{R}^{n} $ the vector of the weights of a portfolio in each of the $n$ assets
The vector of the sparse index tracking portfolio weights $w^*$ is then the^{6} solution to the optimization problem which consists in minimizing a tracking error measure $f \left( X w  r_{idx} \right)$  with $f$ some loss function  subject to additional constraints like full investment constraint, no short sale constraint, etc. and a cardinality constraint^{3}
\[w^* = \operatorname{argmin} f \left( X w  r_{idx} \right) \newline \textrm{s.t. } \begin{cases} \sum_{i=1}^{n} w_i = 1 \newline 0 \leqslant w_i \leqslant 1, i = 1..n \newline ... \newline \lVert w \rVert_0 \le n_{max} \end{cases}\], where:
 $\lVert . \rVert_0$ is the zero “norm”^{7} of a vector, that is, the number of nonzero elements in that vector
 $n_{max}$ is the maximum number of assets with nonzero weights desired in the sparse index tracking portfolio
From a computational perspective, and whatever the exact definition of the loss function $f$, it is known that when a cardinality constraint restricting the number of stocks is introduced, the problem of optimizing the composition of a portfolio tends to become NPhard^{8}, which means that exact solutions to instances of realistic sizes are computationally intractable, and thus inexact solution methods are the only practical ones^{8}.
Standard quadratic optimization methods guaranteeing an optimal solution are thus unfortunately not usable to solve the sparse index tracking optimization problem^{9}…
Heuristics for solving the general sparse index tracking optimization problem
While an exhaustive enumeration of all possible methods for solving the sparse index tracking optimization problem is out of scope of this post, most of them^{10} seem to fall into one of three rough categories.
Combinatorial optimization methods
The cardinality constraint present in the sparse index tracking problem makes it a combinatorial problem, so that standard combinatorial optimization methods (continuous relaxation of a mixed integer programming formulation^{8}^{11}, genetic algorithms^{12}^{13}…) can be used to find an approximate solution.
As a side note, thanks to the relationship of the sparse index tracking problem with combinatorial optimization and hence with operations research, it is possible to find datasets for different index tracking problems  ranging from a universe of 31 assets to a universe of 2151 assets  in the ORLibrary.
Signal processing methods
The sparse index tracking problem in finance is closely related to a similar problem in signal processing called compressed sensing, for which specific algorithms have been developed over the years, like the iterative hard thresholding algorithm^{14} or the compressive sampling matching pursuit (CoSaMP) algorithm^{15}.
To be noted, though, that all signal processing methods that rely on the least absolute shrinkage and selection operator (LASSO), which consists in approximating the $l_0$norm constraint by a $l_1$norm constraint, are not applicable to the sparse index tracking problem. Indeed, when full investment and noshort sales constraints are imposed^{16}, we have, for any portfolio weights vector $w$
\[\lVert w \rVert_1 = \sum_{i=1}^n \left w_i \right = \sum_{i=1}^n w_i = 1\], so that the $l_1$norm reduces to a constant and is therefore irrelevant^{3}.
Nonlinear optimization methods
At the crossroads with signal processing methods, generic nonlinear optimization methods taking into account cardinality constraints have also been developed over the years:

Methods that directly integrate cardinality constraints, like projected gradient methods^{17}^{18} or coordinate descent methods^{18}

Methods that indirectly integrate cardinality constraints by either:
 Relaxing the $l_0$norm constraint into an “easier” constraint, like the entropic lower bound algorithm of Jacquet et al.^{19}
 Reformulating the $l_0$norm constraint as a $l_0$norm penalty integrated in the tracking error measure^{20} and replacing that $l_0$norm penalty with an “easier” penalty, like the $q$norm approximation algorithm of Jansen and van Dijk^{21} or the majorizationminimization algorithm of Benidis et al.^{3}
The sparse index tracking/empirical tracking error optimization problem
Like in the previous post of this series, this post will use the empirical tracking error as the preferred tracking error measure.
In this case, the general sparse index tracking optimization problem becomes
\[w^* = \operatorname{argmin} \frac{1}{T} \lVert X w  r_{idx} \rVert_2^2 \newline \textrm{s.t. } \begin{cases} \sum_{i=1}^{n} w_i = 1 \newline 0 \leqslant w_i \leqslant 1, i = 1..n \newline ... \newline \lVert w \rVert_0 \le n_{max} \end{cases}\], which, as noted by Benidis et al.^{3}, is nothing else than a constrained sparse regression problem:
the sparse index tracking problem is similar to many sparsity formulations in the signal processing area in the sense that it is a regression problem with some sparsity requirements
Implementation in Portfolio Optimizer
Portfolio Optimizer allows to compute an approximate solution to the sparse index tracking optimization problem under the empirical tracking error measure through the endpoint /portfolios/replication/indextracking/sparse
.
In addition, Portfolio Optimizer allows to impose misc. constraints like minimum and maximum asset weights or minimum and maximum group weights constraints^{22}.
Examples of usage
Replicating the S&P 500 by (extreme) optimized sampling
One of the most immediate application of the sparse index tracking problem is the ability to replicate a financial market index without investing into all its constituents, which is desirable for the reasons explained in the introduction of this post.
In order to illustrate this, I propose to partially replicate one of the numerical experiments in Benidis et al.^{3}.
In details, using the S&P 500 data provided in the R package sparseIndexTracking  which consists in the daily returns of the S&P 500 index and of 386 of its constituents over the period from 1st January 2010 to 31th December 2010  I will solve^{23} the following 80 sparse index tracking problems:
 Index  S&P 500
 Tracking assets  386 stocks included in the S&P 500
 Maximum number of assets  Varying from 20 to 100
 Misc. constraints  Maximum asset weight constraint of 5%
The empirical tracking error of the resulting 80 sparse S&P 500 tracking portfolios, as well as  for reference  the empirical tracking error of the regular S&P 500 tracking portfolio^{24}, is depicted in Figure 1.
From this figure, it is visible that a sparse tracking portfolio made of ~100 stocks allows to track the S&P 500 with the same level of tracking error as a regular tracking portfolio made of much more stocks^{24}.
But it is possible to do better!
A sparse tracking portfolio made of only ~50 stocks is actually sufficient, as illustrated in Figure 2.
Of course, such a small portfolio might be less “diversified”^{25} than its 100stock counterpart, but the main takeaway is that a portfolio made of ~50100 stocks adequately replicates the S&P 500, which opens the door of direct investing^{26} to DIY investors.
Automatically determining the proper composite benchmark for a mutual fund
In the previous post of this series, the index tracking machinery was shown to allow to easily and automatically construct a benchmark for any mutual fund.
By extension, the sparse index tracking machinery also allows to do the same, with an important advantage for multiasset mutual funds  the automated construction of the best composite benchmark with a given number of constituents.
Replicating a fund of funds to reduce fees
As a last example of usage, I propose to build upon a LinkedIn post from Finominal, in which the following question is asked:
What’s the likelihood that all […] funds in JP’s Investor Growth & Income Fund contribute value?
A bit of context first.
The JPMorgan Investor Growth & Income Fund is a fund of funds (FoF) whose main investment strategy is to invest in other J.P. Morgan Funds^{27}. As of 31 December 2023, it is invested in 25 such other J.P. Morgan funds (JPMorgan Core Bond R6, JPMorgan US Equity R6…).
One problem with FoF is that they usually have higher fees than traditional mutual funds because they include the management fees charged by the underlying funds^{28}.
The JPMorgan Investor Growth & Income Fund is no different, with a total expense ratio ranging from 1.47% to 0.72%, the latter being for its institutional class share (ticker ONGFX).
Inspired by the results from the previous post of this series, it is possible to reduce those fees by investing directly in all the underlying J.P. Morgan funds in proportion determined by solving^{23} the following regular index tracking problem:
 Index  The JPMorgan Investor Growth & Income Fund (ONGFX)
 Tracking assets  24 over the 25 J.P. Morgan funds held by the JPMorgan Investor Growth & Income Fund as of 31 December 2023, knowing that the 25th fund is marked as restricted for investment by Morningstar^{29}
Using price data^{30} for the period 01 January 2023  31 December 2023, this gives the regular tracking portfolio whose weights are displayed in Figure 3 and whose performances are compared with the JPMorgan Investor Growth & Income Fund in Figure 4.
This tracking portfolio has:
 A practically null tracking error v.s. the JPMorgan Investor Growth & Income Fund (Figure 4)
 An expense ratio of ~0.43%^{31} v.s. of at least 0.72% for the JPMorgan Investor Growth & Income Fund
So, this tracking portfolio is an interesting alternative to the JPMorgan Investor Growth & Income Fund, but there is a catch  it is invested in a total of 22 J.P. Morgan funds (Figure 3), which might be cumbersome to manage…
An even better tracking portfolio would have a smaller number of directlyinvested funds.
Does such a unicorn exist?
Figures 5 and 6 answer this question, by solving^{23} the following sequence of sparse index tracking problems over the period 01 January 2023  31 December 2023:
 Index  The JPMorgan Investor Growth & Income Fund (ONGFX)
 Tracking assets  The 24 nonrestricted J.P. Morgan funds held by the JPMorgan Investor Growth & Income Fund as of 31 December 2023, knowing that the 25th fund is marked as restricted for investment by Morningstar^{29}
 Maximum number of assets  Varying from 1 to 24
From these two figures, the sweet spot for the number of directlyinvested funds in terms of tracking error/fees/manageability seems to be ~68.
Coming back to the initial question from Finominal, this analysis confirms that there is basically zero likelihood that all […] funds in JP’s Investor Growth & Income Fund contribute value, because these can be drastically reduced^{32}.
Conclusion
That’s it for the first blog post of 2024!
Next in this series, I will discuss another extension of the regular index tracking problem, aiming this time at tracking a financial index with “dynamic” asset weights over the considered $T$ time periods.
As usual, feel free to connect with me on LinkedIn or to follow me on Twitter.
–

Also called the benchmark replication problem. ↩

In terms of a given tracking error measure, like the empirical tracking error^{3}, also called the mean squared tracking error. ↩

See Konstantinos Benidis; Yiyong Feng; Daniel P. Palomar, Optimization Methods for Financial Index Tracking: From Theory to Practice , now, 2018.. ↩ ↩^{2} ↩^{3} ↩^{4} ↩^{5} ↩^{6} ↩^{7} ↩^{8}

Also called the partial benchmark replication problem. ↩

See Kees van Montfort, Elout Visser, Laurens Fijn van Draat, Index Tracking by Means of Optimized Sampling, The Journal of Portfolio Management Winter 2008, 34 (2) 143152. ↩

Or a solution, in case multiple solutions exist. ↩

The zero “norm” is not a real norm. ↩

See Purity Mutunge, Dag Haugland, Minimizing the tracking error of cardinality constrained portfolios, Computers & Operations Research, Volume 90, 2018, Pages 3341. ↩ ↩^{2} ↩^{3}

To be noted that in the specific case of a quadratic loss function, Mutunge and Haugland^{8} establish that the sparse index tracking optimization problem is actually strongly NPhard. ↩

Excluding simple heuristics like selecting the largest assets according to a given criteria. For example, a widely used method, especially for a market capitalization weighted index, is to select the largest K assets according to their market capitalizations^{3}; as another example, greedy or reverse greedy algorithms can be used on top of a nonsparse formulation of the index tracking problem to include or exclude assets one by one^{19}. ↩

See N.A. Canakgoz, J.E. Beasley, Mixedinteger programming approaches for index tracking and enhanced indexation, European Journal of Operational Research, Volume 196, Issue 1, 2009, Pages 384399. ↩

See J.E. Beasley, N. Meade, T.J. Chang, An evolutionary heuristic for the index tracking problem, European Journal of Operational Research, Volume 148, Issue 3, 2003, Pages 621643. ↩

See Gilli, M., Kellezi, E. (2002). The Threshold Accepting Heuristic for Index Tracking. In: Pardalos, P.M., Tsitsiringos, V.K. (eds) Financial Engineering, Ecommerce and Supply Chain. Applied Optimization, vol 70. Springer, Boston, MA. ↩

See T. Blumensath and M. Davies, Iterative hard thresholding for compressed sensing, Appl. Comput. Harmon. Anal., 27 (2009), pp. 265–274.. ↩

See Deanna Needell and Joel A Tropp. Cosamp: iterative signal recovery from incomplete and inaccurate samples. Applied and Computational Harmonic Analysis, 26(3):301–321, 2009. ↩

Which is usually the case in practice… ↩

See Xu, F., Dai, Y., Zhao, Z. et al. Efficient projected gradient methods for cardinality constrained optimization. Sci. China Math. 62, 245–268 (2019).. ↩

See Amir Beck, Yonina C. Eldar, Sparsity Constrained Nonlinear Optimization: Optimality Conditions and Algorithms, 2013, SIAM Journal on Optimization, 14801509, 23, 3 ↩ ↩^{2}

See Quentin Jacquet, Agnes Bialecki, Laurent El Ghaoui, Stéphane Gaubert, Riadh Zorgati. Entropic Lower Bound of Cardinality for Sparse Optimization. 2022.. ↩ ↩^{2}

It is not trivial to show that such a reformulation is equivalent; c.f. Jansen and van Dijk^{21} for a detailed proof. ↩

See Roel Jansen, Ronald van Dijk, Optimal Benchmark Tracking with Small Portfolios, The Journal of Portfolio Management, Winter 2002, 28 (2) 3339. ↩ ↩^{2}

The possibility to include minimum and maximum group weights constraints is important in practice, because it allows for example to set an upper limit on fees or to (try to) impose sector neutrality in the computed portfolio, a characteristic often overlooked c.f. for example Che et al.^{33} ↩

The nonsparse S&P 500 tracking portfolio, subject to the same maximum asset weight constraint of 5% as the sparse S&P 500 tracking portfolios, is made of 286 stocks. ↩ ↩^{2}

For example, subject to sector concentration bias, as highlighted in Che et al.^{33}. ↩

Direct investing solutions are usually provided by asset management firms (Fidelity, Frec…). In the case of a stock market index, investing directly in stocks composing that stock market index, as opposed to indirectly through an ETF, has several advantages (null expense ratio, possibility to do taxloss harvesting…). ↩

C.f. the JPMorgan Investor Growth & Income Fund prospectus. ↩

C.f. Morningstar. ↩ ↩^{2}

Computation made using the expense ratios reported by Morningstar for the different J.P. Morgan funds, you will have to trust me on this one. To also be noted that the fees associated to the potential rebalancing of the tracking portfolio are not taken into account and would need to be properly minimized in real life. ↩

In view of Figure 5 and Figure 6, I would take the returns increased part of Finominal’s LinkedIn post with a grain of salt, although it is certain that returns will increase thanks to lowering the fees. ↩

See Che, Y.; Chen, S.; Liu, X. Sparse Index Tracking Portfolio with Sector Neutrality. Mathematics 2022, 10, 2645.. ↩ ↩^{2}