Month: July 2018

This post is available as a PDF download here.

Summary

We prefer to think about diversification in a three-dimensional framework: what, how, and when.
The “how” axis covers the process with which an investment decision is made.
There are a number of models that trend-followers might use to capture a trend. For example, trend-followers might employ a time-series momentum model, a price-minus moving average model, or a double moving average cross-over model.
Beyond multiple models, each model can have a variety of parameterizations. For example, a time-series momentum model can just as equally be applied with a 3-month formation period as an 18-month period.
In this commentary, we attempt to measure how much diversification opportunity is available by employing multiple models with multiple parameterizations in a simple long/flat trend-following process.

When investors talk about diversification, they typically mean across different investments. Do not just by a single stock, for example, buy a basket of stocks in order to diversify away the idiosyncratic risk.

We call this “what” diversification (i.e. “what are you buying?”) and believe this is only one of three meaningful axes of diversification for investors. The other two are “how” (i.e. “how are you making your decision?”) and “when” (i.e. “when are you making your decision?”). In recent years, we have written a great deal about the “when” axis, and you can find a summary of that research in our commentary Quantifying Timing Luck.

In this commentary, we want to discuss the potential benefits of diversifying across the “how” axis in trend-following strategies.

But what, exactly, do we mean by this? Consider that there are a number of ways investors can implement trend-following signals. Some popular methods include:

Prior total returns (“time-series momentum”)
Price-minus-moving-average (e.g. price falls below the 200-day moving average)
Moving-average double cross-over (e.g. the 50-day moving average crosses the 200-day moving average)
Moving-average change-in-direction (e.g. the 200-day moving average slope turns positive or negative)

As it turns out, these varying methodologies are actually cousins of one another. Recent research has established that these models can, more or less, be thought of as different weighting schemes of underlying returns. For example, a time-series momentum model (with no skip month) derives its signal by averaging daily log returns over the lookback period equally.

With this common base, a number of papers over the last decade have found significant relationships between the varying methods. For example:

	Evidence
Bruder, Dao, Richard, and Roncalli (2011)	Moving-average-double-crossover is just an alternative weighting scheme for time-series momentum.
Marshall, Nguyen and Visaltanachoti (2014)	Time-series momentum is related to moving-average-change-in-direction.
Levine and Pedersen (2015)	Time-series-momentum and moving-average cross-overs are highly related; both methods perform similarly on 58 liquid futures contracts.
Beekhuizen and Hallerbach (2015)	Mathematically linked moving averages with prior returns.
Zakamulin (2015)	Price-minus-moving-average, moving-average-double-cross-over, and moving-average-change-of-direction can all be interpreted as a computation of a weighted moving average of momentum rules.

As we have argued in past commentaries, we do not believe any single method is necessarily superior to another. In fact, it is trivial to evaluate these methods over different asset classes and time-horizons and find an example that proves that a given method provides the best result.

Without a crystal ball, however, and without any economic interpretation why one might be superior to another, the choice is arbitrary. Yet the choice will ultimately introduce randomness into our results: a factor we like to call “process risk.” A question we should ask ourselves is, “if we have no reason to believe one is better than another, why would we pick one at all?”

We like to think of it this way: ex-post, we will know whether the return over a given period is positive or negative. Ex-ante, all we have is a handful of trend-following signals that are forecasting that direction. If, historically, all of these trend signals have been effective, then there may be no reason to necessarily believe on over another.

Combining them, in many ways, is sort of like trying to triangulate on the truth. We have a number of models that all look at the problem from a slightly different perspective and, therefore, provide a slightly different interpretation. A (very) loose analogy might be using the collective information from a number of cell towers in effort to pinpoint the geographic location of a cellphone.

We may believe that all of the trend models do a good job of identifying trends over the long run, but most will prove false from time-to-time in the short-run. By using them together, we can potentially increase our overall confidence when the models agree and decrease our confidence when they do not.

With all this in mind, we want to explore the simple question: “how much potential benefit does process diversification bring us?”

The Setup

To answer this question, we first generate a number of long/flat trend following strategies that invest in a broad U.S. equity index or the risk-free rate (both provided by the Kenneth French database and ranging from 1926 to 2018). There are 48 strategy variations in total constructed through a combination of four difference processes – time-series momentum, price-minus-moving-average, and moving-average double cross-over– and 16 different lookback periods (from the approximate equivalent of 3-to-18 months).

We then treat each of the 64 variations as its own unique asset.

To measure process diversification, we are going to use the concept of “independent bets.” The greater the number of independent bets within a portfolio, the greater the internal diversification. Below are a couple examples outlining the basic intuition for a two-asset portfolio:

If we have a portfolio holding two totally independent assets with similar volatility levels, a 50% allocation to each would maximize our diversification.Intuitively, we have equally allocated across two unique bets.
If we have a portfolio holding two totally independent assets with similar volatility levels, a 90% allocation to one asset and a 10% allocation to another would lead us to a highly concentrated bet.
If we have a portfolio holding two highly correlated assets, no matter the allocation split, we have a large, concentrated bet.
If we have a portfolio of two assets with disparate volatility levels, we will have a large concentrated bet unless the lower volatility asset comprises the vast majority of the portfolio.

To measure this concept mathematically, we are going to use the fact that the square of the “diversification ratio” of a portfolio is equal to the number of independent bets that portfolio is taking.¹

Diversifying Parameterization Risk

Within process diversification, the first variable we can tweak is the formation period of our trend signal. For example, if we are using a time-series momentum model that simply looks at the sign of the total return over the prior period, the length of that period may have a significant influence in the identification of a trend. Intuition tells us that shorter formation periods might identify short-term trends as well as react to long-term trend changes more quickly but may be more sensitive to whipsaw risk.

To explore the diversification opportunities available to us simply by varying our formation parameterization, we build equal-weight portfolios comprised of two strategies at a time, where each strategy utilizes the same trend model but a different parameterization. We then measure the number of independent bets in that combination.

We run this test for each trend following process independently. As an example, we compare using a shorter lookback period with a longer lookback period in the context of time-series momentum in isolation. We will compare across models in the next section.

In the graphs below, L0 through L15 represent the lookback periods, with L0 being the shortest lookback period and L15 representing the longest lookback period.

As we might suspect, the largest increase in available bets arises from combining shorter formation periods with longer formation periods. This makes sense, as they represent the two horizons that share the smallest proportion of data and therefore have the least “information leakage.” Consider, for example, a time-series momentum signal that has a 4-monnth lookback and one with an 8-month lookback. At all times, 50% of the information used to derive the latter model is contained within the former model. While the technical details are subtler, we would generally expect that the more informational overlap, the less diversification is available.

We can see that combining short- and long-term lookbacks, the total number of bets the portfolio is taking from 1.0 to approximately 1.2.

This may not seem like a significant lift, but we should remember Grinold and Kahn’s Fundamental Law of Active Management:

Information Ratio = Information Coefficient x SQRT(Independent Bets)

Assuming the information coefficient stays the same, an increase in the number of independent bets from 1.0 to 1.2 increases our information ratio by approximately 10%. Such is the power of diversification.

Another interesting way to approach this data is by allowing an optimizer to attempt to maximize the diversification ratio. In other words, instead of only looking at naïve, equal-weight combinations of two processes at a time, we can build a portfolio from all available lookback variations.

Doing so may provide two interesting insights.

First, we can see how the optimizer might look to combine different variations to maximize diversification. Will it barbell long and short lookbacks, or is there benefit to including medium lookbacks? Will the different processes have different solutions? Second, by optimizing over the full history of data, we can find an upper limit threshold to the number of independent bets we might be able to capture if we had a crystal ball.

A few takeaways from the graphs above:

Almost all of the processes barbell short and long lookback horizons to maximize diversification.
The optimizer finds value, in most cases, in introducing medium-term lookback horizons as well. We can see for Time-Series MOM, the significant weights are placed on L0, L1, L6, L10, and L15. While not perfectly spaced or equally weighted, this still provides a strong cross-section of available information. Double MA Cross-Over, on the other hand, finds value in weighting L0, L8, and L15.
While the optimizer increases the number of independent bets in all cases versus a naïve, equal-weight approach, the pickup is not incredibly dramatic. At the end of the day, a crystal ball does not find a meaningfully better solution than our intuition may provide.

Diversifying Model Risk

Similar to the process taken in the above section, we will now attempt to quantify the benefits of cross-process diversification.

For each trend model, we will calculate the number of independent bets available by combining it with another trend model but hold the lookback period constant. As an example, we will combine the shortest lookback period of the Time-Series MOM model with the shortest lookback period of the MA Double Cross-Over.

We plot the results below of the number of independent bets available through a naïve, equal-weight combination.

We can see that model combinations can lift the number of independent bets from by 0.05 to 0.1. Not as significant as the theoretical lift from parameter diversification, but not totally insignificant.

Combining Model and Parameterization Diversification

We can once again employ our crystal ball in an attempt to find an upper limit to the diversification available to trend followers, as well as the process / parameterization combinations that will maximize this opportunity. Below, we plot the results.

We see a few interesting things of note:

The vast majority of models and parameterizations are ignored.
Time-Series MOM is heavily favored as a model, receiving nearly 60% of the portfolio weight.
We see a spread of weight across short, medium, and long-term weights. Short-term is heavily favored, with Time-Series MOM L0 and Price-Minus MA L0 approaching nearly 45% of model weight.
All three models are, ultimately, incorporated, with approximately 10% being allocated to Double MA Cross-Over, 30% to Price-Minus MA, and 60% to Time-Series MOM.

It is worth pointing out that naively allocating equally across all 48 models creates 1.18 independent bets while the full-period crystal ball generated 1.29 bets.

Of course, having a crystal ball is unrealistic. Below, we look at a rolling window optimization that looks at the prior 5 years of weekly returns to create the most diversified portfolio. To avoid plotting a graph with 48 different components, we have plot the results two ways: (1) clustered by process and (2) clustered by lookback period.

Using the rolling window, we see similar results as we saw with the crystal ball. First, Time-Series MOM is largely favored, often peaking well over 50% of the portfolio weights. Second, we see that a barbelling approach is frequently employed, balancing allocations to the shortest lookbacks (L0 and L1) with the longest lookbacks (L14 and L15). Mid-length lookbacks are not outright ignored, however, and L5 through L11 combined frequently make up 20% of the portfolio.

Finally, we can see that the rolling number of bets is highly variable over time, but optimization frequently creates a meaningful impact over an equal-weight approach.²

Conclusion

In this commentary, we have explored the idea of process diversification. In the context of a simple long/flat trend-following strategy, we find that combining strategies that employ different trend identification models and different formation periods can lead to an increase in the independent number of bets taken by the portfolio.

As it specifically pertains to trend-following, we see that diversification appears to be maximized by allocating across a number of lookback horizons, with an optimizer putting a particular emphasis on barbelling shorter and longer lookback periods.

We also see that incorporating multiple processes can increase available diversification as well. Interestingly, the optimizer did not equally diversify across models. This may be due to the fact that these models are not truly independent from one another than they might seem. For example, Zakamulin (2015) demonstrated that these models can all be decomposed into a different weighted average of the same general momentum rules.

Finding process diversification, then, might require moving to a process that may not have a common basis. For example, trend followers might consider channel methods or a change in basis (e.g. constant volume bars instead of constant time bars).

Machine Learning, Subset Resampling, and Portfolio Optimization

By Justin Sibears

On July 23, 2018

In Portfolio Construction, Uncategorized

This post is available as a PDF download here.

Summary

Portfolio optimization research can be challenging due to the plethora of factors that can influence results, making it hard to generalize results outside of the specific cases tested.
That being said, building a robust portfolio optimization engine requires a diligent focus on estimation risk. Estimation risk is the risk that the inputs to the portfolio optimization process (i.e. expected returns, volatilities, correlations) are imprecisely estimated by sampling from the historical data, leading to suboptimal allocations.
We summarize the results from two recent papers we’ve reviewed on the topic of managing estimation risk. The first paper relies on techniques from machine learning while the second paper uses a form of simulation called subset resampling.
Both papers report that their methodologies outperform various heuristic and optimization-based benchmarks.
We perform our own tests by building minimum variance portfolios using the 49 Fama/French industry portfolios. We find that while both outperform equal-weighting on a risk-adjusted basis, the results are not statistically significant at the 5% level.

This week, we are going to review a couple of recent papers we’ve come across on the topic of reducing estimation risk in portfolio optimization.

Before we get started, we want to point out that while there are many fascinating papers on portfolio optimization, it is also one of the most frustrating areas to study in our opinion. Why? Because ultimately portfolio optimization is a very, very complex topic. The results will be impacted in significant ways by a number of factors like:

What is the investment universe studied?
Over what time period?
How are the parameters estimated?
What are the lookback periods used to estimate parameters?
And so on…

Say that you find a paper that argues for the superiority of equal-weighted portfolios over mean-variance optimization by testing on a universe of large-cap U.S. equities. Does this mean that equal-weighting is superior to mean-variance optimization in general? We tend to believe not. Rather, we should take the study at face value: equal-weighting was superior to the particular style of mean-variance in this specific test.

In addition, the result in and of itself says nothing about why the outperformance occurred. It could be that equal-weighting is a superior portfolio construction technique.

But maybe the equal-weighted stock portfolio just happens by chance to be close to the true Sharpe optimal portfolio. If I have a number of asset classes that have reasonably similar returns, risks, and correlations, it is very likely that equal-weighting does a decent job of getting close to the Sharpe optimal solution. On the other hand, consider an investment universe that consists of 9 equity sectors and U.S. Treasuries. In this case, equal-weighting is much less likely to be close to optimal and we would find it more probable that optimization approaches could outperform.

Maybe equal-weighting exposes the stock portfolio to risk-premia like the value and size factors that improve performance. I suspect that to some extent the outperformance of minimum variance portfolios in a number of studies is at least partially explained by the exposures that these portfolios have to the defensive or low beta factor (the tendency of low risk exposures to outperform high risk exposures on a risk-adjusted basis).

Maybe the mean estimates in the mean-variance optimization are just terrible and the results are less an indictment on MVO than on the particular mean estimation technique used. To some extent, the difficulty of estimating means is a major part of the argument for equal-weighting or other heuristic or shrinkage-based approaches. At the same time, we see a number of studies that estimate expected returns using sample means with long (i.e. 5 or 10 year) lookbacks. These long-term horizons are exactly the period over which returns tend to mean revert and so the evidence would suggest these are precisely the types of mean estimates you wouldn’t want to use. To properly test mean-variance, we should at least use mean estimates that have a chance of succeeding.

All this is a long-winded way of saying that it can be difficult to use the results from research papers to build a robust, general purpose portfolio optimizer. The results may have limited value outside of the very specific circumstances explored in that particular paper.

That being said, this does not give us an excuse to stop trying. With that preamble out of the way, we’ll return to our regularly scheduled programming.

Estimation Risk in Portfolio Optimization

Estimation risk is the risk that the inputs to the portfolio optimization process (i.e. expected returns, volatilities, correlations) are imprecisely estimated by sampling from the historical data, leading to suboptimal allocations.

One popular approach to dealing with estimation risk is to simply ignore parameters that are hard to estimate. For example, the naïve 1/N portfolio, which allocates an equal amount of capital to each investment in the universe, completely foregoes using any information about the distribution of returns. DiMiguel, Garlappi and Uppal (2007)[1] tested fourteen variations of sample-based mean-variance optimization on seven different datasets and concluded that “…none is consistently better than the 1/N rule in terms of Sharpe Ratio, certainty-equivalent return, or turnover, which indicates that, out of sample, the gain from optimal diversification is more than offset by estimator error.”

Another popular approach is to employ “shrinkage estimators” for key inputs. For example, Ledoit and Wolf (2004)[2] propose shrinking the sample correlation matrix towards (a fancy way of saying “averaging it with”) the constant correlation matrix. The constant correlation matrix is simply the correlation matrix where each diagonal element is equal to the pairwise average correlation across all assets.

Generally speaking, shrinkage involves blending an “unstructured estimator” like the sample correlation matrix with a “structured estimator” like the constant correlation matrix that tries to represent the data with few free parameters. Shrinkage tends to limit extreme observations, thereby reducing the unwanted impact that such observations can have on the optimization result.

Interestingly, the common practice of imposing a short-sale constraint when performing mean-variance optimization or minimum variance optimization is equivalent to shrinking the expected return estimates[3] and the covariance estimates[4], respectively.

Both papers that we’ll discuss here are alternate ways of performing shrinkage.

Applying Machine Learning to Reduce Estimation Risk

The first paper, Reducing Estimation Risk in Mean-Variance Portfolios with Machine Learning by Daniel Kinn (2018)[5], explores using a standard machine learning approach to reduce estimation risk in portfolio optimization.

Kinn’s approach recognizes that estimation error can be decomposed into two sources: bias and variance. Both bias and variance result in suboptimal results, but in very different ways. Bias results from the model doing a poor job of capturing the pertinent features of the data. Variance, on the other hand, results from the model being sensitive to the data used to train the model.

To get a better intuitive sense of bias vs. variance, consider two weather forecasters, Mr. Bias and Ms. Variance. Both Mr. Bias and Ms. Variance work in a town where the average temperature is 50 degrees. Mr. Bias is very stubborn and set in his ways. He forecasts that the temperature will be 75 degrees each and every day. Ms. Variance, however, is known for having forecasts that jump up and down. Half of the time she forecasts a temperature of 75 degrees and half of the time she forecasts a temperature of 25 degrees.

Both forecasters have roughly the same amount of forecast error, but the nature of their errors are very different. Mr. Bias is consistent but has way too rosy of a picture of the town’s weather. Ms. Variance on the other hand, actually has the right idea when it comes to long-term weather trends, but her volatile forecasts still leave much to be desired.

The following graphic from EliteDataScience.com gives another take on explaining the difference between the two concepts.

Source: https://elitedatascience.com/bias-variance-tradeoff

When it comes to portfolio construction, some popular techniques can be neatly classified into one of these two categories. The 1/N portfolio, for example, has no variance (weights will be the same every period), but may have quite a bit of bias if it is far from the true optimal portfolio. Sample-based mean-variance options, on the other hand, should have no bias (assuming the underlying distributions of asset class returns does not change over time), but can be highly sensitive to parameter measurements and therefore exhibit high variance. At the end of the day, we are interested in minimum total estimation error, which will generally involve a trade-off between bias and variance.

Source: https://elitedatascience.com/bias-variance-tradeoff

Finding where this optimal trade-off lies is exactly what Kinn sets out to accomplish with the machine learning algorithm described in this paper. The general outline of the algorithm is pretty straightforward:

Identify the historical data to be used in calculating the sample moments (expected returns, volatilities, and correlations).
Add a penalty function to the function that we are going to optimize. The paper discusses a number of different penalty functions including Ridge, Lasso, Elastic Net, and Principal Component Regression. These penalty functions will effectively shrink the estimated parameters with the exact nature of the shrinkage dependent on the penalty function being used. By doing so we introduce some bias, but hopefully with the benefit of reducing variance even further and as a result reducing overall estimation error.
Use K-fold cross-validation to fit the parameter(s) of the penalty function. Cross-validation is a machine learning technique where the training data is divided in various sets of in sample and out of sample data. The parameter(s) chosen will be those that produce the lowest estimation error in the out of sample data.
Using the optimized parameters from #3, fit the model on the entire training set. The result will be the optimized portfolio weights for the next holding period.

Kinn tests three versions of the algorithm (one using a Ridge penalty function, one using a Lasso penalty function, and one using principal component regression) on the following real-world data sets.

20 randomly selected stocks from the S&P 500 (covers January 1990 to November 2017)
50 randomly selected stocks from the S&P 500 covers January 1990 to November 2017)
30 industry portfolios using stocks listed on the NYSE, AMEX, and NASDAQ covers January 1990 to November January 2018)
49 industry portfolios using stocks listed on the NYSE, AMEX, and NASDAQ covers January 1990 to November January 2018)
200 largest cryptocurrencies by market value as of the end of 2017 (if there was ever a sign of a 2018 paper on portfolio optimization it has to be that one of the datasets relates to crypto)
1200 cryptocurrencies observed from September 2013 to December 2017

As benchmarks, Kinn uses traditional sample-based mean-variance, sample-based mean-variance with no short selling, minimum variance, and 1/N.

The results are pretty impressive with the machine learning algorithms delivering statistically significant risk-adjusted outperformance.

Here are a few thoughts/comments we had when implementing the paper ourselves:

The specific algorithm, as outlined in the paper, is a bit inflexible in the sense that it only works for mean-variance optimization where the means and covariances are estimated from the sample. In other words, we couldn’t use the algorithm to compute a minimum variance portfolio or a mean-variance portfolio where we want to substitute in our own return estimates. That being said, we think there are some relatively straightforward tweaks that can make the process applicable in these scenarios.
In our tests, the parameter optimization for the penalty functions was a bit unstable. For example, when using the principal component regression, we might identify two principal components as being worth keeping in one month and then ten principal components being worth keeping in the next month. This can in term lead to instability in the allocations. While this is a concern, it could be dealt with by smoothing the parameters over a number of months (although this introduces more questions like how exactly to smooth and over what period).
The results tend to be biased towards having significantly fewer holdings than the 1/N benchmark. For example, see the righthand chart in the exhibit below. While this is by design, we do tend to get wary of results showing such concentrated portfolios to be optimal especially when in the real world we know that asset class distributions are far from well-behaved.

Applying Subset Resampling to Reduce Estimation Error

The second paper, Portfolio Selection via Subset Resampling by Shen and Wang (2017)[6], uses a technique called subset resampling. This approach works as follows:

Select a random subset of the securities in the universe (e.g. if there are 30 commodity contracts, you could pick ten of them).
Perform the portfolio optimization on the subset selected in #1.
Repeat steps #1 and #2 many times.
Average the resulting allocations together to get the following result.

The table below shows an example of how this would work for three asset classes and three simulations with two asset classes selected in each subset.

One way we can try to get intuition around subset resampling is by thinking about the extremes. If we resampled using subsets of size 1, then we would end up with the 1/N portfolio. If we resampled using subsets that were the same size as the universe, we would just have the standard portfolio optimized over the entire universe. With subset sizes greater than 1 and less than the size of the whole universe, we end up with some type of blend between 1/N and the traditionally optimized portfolio.

The only parameter we need to select is the size of the universe. The authors suggest a subset size equal to n^0.8 where n is the number of securities in the universe. For the S&P 500, this would correlate to a subset size of 144.

The authors test subset resampling on the following real-world data sets.

FF100: 100 Fama and French portfolios spanning July 1963 to December 2004
ETF139: 139 ETFs spanning January 2008 to October 2012
EQ181: Individual equities from the Russell Top 200 Index (excluding those stocks with missing data) spanning January 2008 to October 2012
SP434: Individual equities from the S&P 500 Index (excluding those stocks with missing data) spanning September 2001 to August 2013.

As benchmarks, the authors use 1/N (EW); value-weighted (VW); minimum-variance (MV); resampled efficiency (RES) from Michaud (1989)[7]; the two-fund portfolio (TZT) from Tu and Zhou (2011)[8], which blends 1/N and classic mean-variance; the three-fund portfolio (KZT) from Kan and Zhou (2007)[9] which blends the risk-free asset, classic mean-variance, and minimum variance; the four fund portfolio (TZF) from Tu and Zhou (2011) which blends KZT and 1/N; mean-variance using the shrinkage estimator from Ledoit and Wolf (2004) (SKC); and on-line passive aggressive mean reversion (PAMR) from Li (2012)[10].

Similar to the machine learning algorithm, subset resampling does very well in terms of risk-adjusted performance. On three of the four data sets, the Sharpe Ratio of subset resampling is better than that of 1/N by a statistically significant margin. Additionally, subset resampling has the lowest maximum drawdown in three of the four data sets. From a practical standpoint, it is also positive to see that the turnover for subset resampling is significantly lower than many of the competing strategies.

As we did with the first paper, here are some thoughts that came to mind in reading and re-implementing the subset resampling paper:

As presented, the subset resampling algorithm will be sensitive to the number and types of asset classes in an undesirable way. What do we mean by this? Suppose we had three uncorrelated asset classes with identical means and standard deviations. We use subset resampling with subsets of size two to compute a mean-variance portfolio. The result will be approximately 1/3 of the portfolio in each asset class, which happens to match the true mean-variance optimal portfolio. Now we add a fourth asset class that also has the same mean and standard deviation but is perfectly correlated to the third asset class. With this setup, the third and fourth asset classes are one in the same. As a result, the true mean-variance optimal portfolio will have 1/3 in the first and second asset classes and 1/6 in the third or fourth asset class (in reality the solution will be optimal as long as the allocations to the third and fourth asset classes sum to 1/3). However, subset resampling will produce a portfolio that is 25% in each of the four asset classes, an incorrect result. Note that this is a problem with many heuristic solutions, including the 1/N portfolio.
There are ways that we could deal with the above issue by not sampling uniformly, but this will introduce some more complexity into the approach.
In a mean-variance setting, the subset resampling will dilute the value of our mean estimates. Now, this should be expected when using any shrinkage-like approach, but it is something to at least be aware of. Dilution will be more severe the smaller the size of the subsets.
In terms of computational burden, it can be very helpful to use some “smart” resampling that is able to get a representative sampling with fewer iterations that a naïve approach. Otherwise, subset resampling can take quite a while to run due to the sheer number of optimizations that must be calculated.

Performing Our Own Tests

In this section, we perform our own tests using what we learned from the two papers. Initially, we performed the test using mean-variance as our optimization of choice with 12-month return as the mean estimate. We found, however, that the impact of the mean estimate swamped that of the optimizations. As a result, we repeated the tests, this time building minimum variance portfolios. This will isolate the estimator error relating to the covariance matrix, which we think is more relevant anyways since few practitioners use sample-based estimates of expected returns. Note that we used the principal component regression version of the machine learning algorithm.

Our dataset was the 49 industry portfolios provided in the Fama and French data library. We tested the following optimization approaches:

EW: 1/N equally-weighted portfolio
NRP: naïve risk parity where positions are weighted inversely to their volatility, correlations are ignored
MV: minimum variance using the sample covariance matrix
ZERO: minimum variance using sample covariance matrix shrunk using a shrinkage target where all correlations are assumed to be zero
CONSTANT: minimum variance using sample covariance matrix shrunk using a shrinkage target where all correlations are equal to the sample pairwise correlation across all assets in the universe
PCA: minimum variance using sample covariance matrix shrunk using a shrinkage target that only keeps the top 10% of eigenvectors by variance explained
SSR: subset resampling
ML: machine learning with principal component regression

The results are presented below:

Results are hypothetical and backtested and do not reflect any fees or expenses. Returns include the reinvestment of dividends. Results cover the period from 1936 to 2018. Past performance does not guarantee future results.

All of the minimum variance strategies deliver lower risk than EW and NRP and outperform a risk-adjusted basis although none of the Sharpe Ratio differences are significant at a 5% confidence level. Of the strategies, ZERO (shrinking with a covariance matrix that assumes zero correlation) and SSR (subset resampling) delivered the highest Sharpe Ratios.

Conclusion

Portfolio optimization research can be challenging due to the plethora of factors that can influence results, making it hard to generalize results outside of the specific cases tested. It can be difficult to ascertain whether the conclusions are truly attributable to the optimization processes being tested or some other factors.

That being said, building a robust portfolio optimization engine requires a diligent focus on estimation risk. Estimation risk is the risk that the inputs to the portfolio optimization process (i.e. expected returns, volatilities, correlations) are imprecisely estimated by sampling from the historical data, leading to suboptimal allocations.

We summarize the results from two recent papers we’ve reviewed on the topic of managing estimation risk. The first paper relies on techniques from machine learning to find the optimal shrinkage parameters that minimize estimation error by acknowledging the trade-off between bias and variance. The second paper uses a form of simulation called subset resampling. In this approach, we repeatedly select a random subset of the universe, optimize over that subset, and then blend the subset results to get the final result.

Both papers report that their methodologies outperform various heuristic and optimization-based benchmarks. We feel that both the machine learning and subset resampling approaches have merit after making some minor tweaks to deal with real world complexities.

We perform our own tests by building minimum various portfolios using the 49 Fama/French industry portfolios. We find that while both outperform equal-weighting on a risk-adjusted basis, the results are not statistically significant at the 5% level. While this highlights that research results may not translate out of sample, this certainly does not disqualify either method as potentially being useful as tools to manage estimation risk.

[1] Paper can be found here: http://faculty.london.edu/avmiguel/DeMiguel-Garlappi-Uppal-RFS.pdf.

[2] Paper can be found here: http://www.ledoit.net/honey.pdf

[3] DiMiguel, Garlappi and Uppal (2007)

[4] Jagannathan and Ma (2003), “Risk reduction in large portfolios: Why imposing the wrong constraints helps.”

[5] Paper can be found here: https://arxiv.org/pdf/1804.01764.pdf.

[6] Paper can be found here: https://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14443

[7] Paper can be found here: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2387669

[8] Paper can be found here: https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=2104&context=lkcsb_research

[9] Paper can be found here: https://www.cambridge.org/core/journals/journal-of-financial-and-quantitative-analysis/article/optimal-portfolio-choice-with-parameter-uncertainty/A0E9F31F3B3E0873109AD8B2C8563393

[10] Paper can be found here: http://research.larc.smu.edu.sg/mlg/papers/PAMR_ML_final.pdf

Momentum’s Magic Number

By Corey Hoffstein

On July 15, 2018

In Momentum, Risk & Style Premia, Weekly Commentary

This post is available as a PDF download here.

Summary

In HIMCO’s May 2018 Quantitative Insight, they publish a figure that suggests the optimal holding length of a momentum strategy is a function of the formation period.
Specifically, the result suggests that the optimal holding period is one selected such that the formation period plus the holding period is equal to 14-to-18 months: a somewhat “magic” result that makes little intuitive, statistical, or economic sense.
To investigate this result, we construct momentum strategies for country indices as well as industry groups.
We find similar results, with performance peaking when the formation period plus the holding period is equal to 12-to-14 months.
While lacking a specific reason why this effect exists, it suggests that investors looking to leverage shorter-term momentum signals may benefit from longer investment horizons, particularly when costs are considered.

A few weeks ago, we came across a study published by HIMCO on momentum investing¹. Contained within this research note was a particularly intriguing exhibit.

Source: HIMCO Quantitative Insights, May 2018

What this figure demonstrates is that the excess cumulative return for U.S. equity momentum strategies peaks as a function of both formation period and holding period. Specifically, the returns appear to peak when the sum of the formation and holding period is between 14-18 months.

For example, if you were to form a portfolio based upon trailing 6-1 momentum – i.e. ranking on the prior 6-month total returns and skipping the most recent month (labeled in the figure above as “2_6”) – this evidence suggests that you would want to hold such a portfolio for 8-to-12 months (labeled in the figure above as 14-to-18 months since the beginning of the uptrend).

Which is a rather odd conclusion. Firstly, we would intuitively expect that we should employ holding periods that are shorter than our formation periods. The notion here is that we want to use enough data to harvest information that will be stationary over the next, smaller time-step. So, for example, we might use 36 months of returns to create a covariance matrix that we might hold constant for the next month (i.e. a 36-month formation period with a 1-month hold). Given that correlations are non-stable, we would likely find the idea of using 1-month of data to form a correlation matrix we hold for the next 36-months rather ludicrous.

And, yet, here we are in a similar situation, finding that if we use a formation period of 5 months, we should hold our portfolio steady for the next 8-to-10 months. And this is particularly weird in the world of momentum, which we typically expect to be a high turnover strategy. How in the world can having a holding period longer than our formation period make sense when we expect information to quickly decay in value?

Perhaps the oddest thing of all is the fact that all these results center around 14-18 months. It would be one thing if the conclusion was simply, “holding for six months after formation is optimal”; here the conclusion is that the optimal holding period is a function of formation period. Nor is the conclusion something intuitive, like “the holding period should be half the formation period.”

Rather, the result – that the holding period should be 14-to-18 months minus the length of the formation period – makes little intuitive, statistical, or economic sense.

Out-of-Sample Testing with Countries and Sectors

In effort to explore this result further, we wanted to determine whether similar results were found when cross-sectional momentum was applied to country indices and industry groups.

Specifically, we ran three tests.

In the first, we constructed momentum portfolios using developed country index returns (U.S. dollar denominated; net of withholding taxes) from MSCI. The countries included in the test are: Australia, Austria, Belgium, Canada, Denmark, Finland, France, Germany, Hong Kong, Ireland, Israel, Italy, Japan, Netherlands, New Zealand, Norway, Portugal, Singapore, Spain, Sweden, Switzerland, the United Kingdom, and the United States of America. The data extends back to 12/1969.

In the second, we constructed momentum portfolios using the 12 industry group data set from the Kenneth French Data Library. The data extends back to 7/1926.

In the third, we constructed momentum portfolios using the 49 industry group data set from the Kenneth French Data Library. The data extends back to 7/1926.

For each data set, we ran the same test:

Vary formation periods from 5-1 to 12-1 months.
Vary holding periods from 1-to-26 months.
Using this data, construct dollar-neutral long/short portfolios that go long, in equal-weight, the top third ranking holdings and go short, in equal-weight, the bottom third.

Note that for holding periods exceeding 1 month, we employed an overlapping portfolio construction process.

Below we plot the results.

Source: MSCI and Kenneth French Data Library. Calculations by Newfound Research. Past performance is not a predictor of future results. All information is backtested and hypothetical and does not reflect the actual strategy managed by Newfound Research. Performance is net of all fees except for underlying ETF expense ratios. Returns assume the reinvestment of all dividends, capital gains, and other earnings.

While the results are not as clear as those published by HIMCO, we still see an intriguing effect: returns peak as a function of both formation and holding period. For the country strategy, formation and holding appear to peak between 12-14 months, indicating that an investor using 5-1 month signals would want to hold for 7 months while an investor using 12-1 signals would only want to hold for 1 month.

For the industry data, the results are less clear. Where the HIMCO and country results exhibited a clear “peak,” the industry results simply seem to “decay slower.” In particular, we can see in the results for the 12-industry group test that almost all strategies peak with a 1-month holding period. However, they all appear to fall off rapidly, and uniformly, after the time where formation plus holding period exceeds 16 months.

While less pronounced, it is worth pointing out that this result is achieved without the consideration of trading costs or taxes. So, while the 5-1 strategy 12-industry group strategy return may peak with a 1-month hold, we can see that it later forms a second peak at a 9-month hold (“14 months since beginning uptrend”). Given that we would expect a nine month hold to exhibit considerably less trading, analysis that includes trading cost estimates may exhibit even greater peakedness in the results.

Does the Effect Persist for Long-Only Portfolios?

In analyzing factors, it is often important to try to determine whether a given result is arising from an effect found in the long leg or the short leg. After all, most investors implement strategies in a long-only capacity. While long-only strategies are, technically, equal to a benchmark plus a dollar-neutral long/short portfolio², the long/short portfolio rarely reflects the true factor definition.

Therefore, we want to evaluate long-only construction to determine whether the same result holds, or whether it is a feature of the short-leg.

We find incredibly similar results. Again, country indices appear to peak between 12-to-14 months after the beginning of the uptrend. Industry group results, while not as strong as country results, still appear to offer fairly flat results until 12-to-14 months after the beginning of the uptrend. Taken together, it appears that this result is sustained for long-only portfolio implementations as well.

Conclusion

Traditionally, momentum is considered a high turnover factor. Relative ranking of recent returns can vary substantially over time and our intuition would lead us to expect that the shorter the horizon we use to measure returns, the shorter the time we expect the relative ranking to persist.

Yet recent research published by HIMCO finds this intuition may not be true. Rather, they find that momentum portfolio performance tends to peak 14-to-18 months after the beginning of the uptrend in measured. In other words, a portfolio formed on prior 5-month returns should hold between 9-to-13 months, while a portfolio formed on the prior 12-months of returns should only hold 2-to-6 months.

This result is rather counter-intuitive, as we would expect that shorter formation periods would require shorter holding periods.

We test this result out-of-sample, constructing momentum portfolios using country indices, 12-industry group indices, and 49-industry group indices. We find a similar result in this data. We then further test whether the result is an artifact found in only long/short implementations whether this information is useful for long-only investors. Indeed, we find very similar results for long-only implementations.

Precisely why this result exists is still up in the air. One argument may be that the trade-off is ultimately centered around win rate versus the size of winners. If relative momentum tends to persist for only for 12-to-18 months total, then using 12-month formation may give us a higher win rate but reduce the size of the winners we pick. Conversely, using a shorter formation period may reduce the number of winners we pick correctly (i.e. lower win rate), but those we pick have further to run. Selecting a formation period and a holding period such that their sum equals approximately 14 months may simply be a hack to find the balance of win rate and win size that maximizes return.

The New Glide Path

By Corey Hoffstein

On July 2, 2018

In Portfolio Construction, Risk Management, Sequence Risk, Weekly Commentary

This post is available as a PDF download here.

Summary

In practice, investors and institutions alike have spending patterns that makes the sequence of market returns a relevant risk factor.
All else held equal, investors would prefer to make contributions before large returns and withdrawals before large declines.
For retirees making constant withdrawals, sustained declines in portfolio value represent a significant risk. Trend-following has demonstrated historical success in helping reduce the risk these types of losses.
Traditionally, stock/bond glide paths have been used to control sequence risk. However, trend-following may be able to serve as a valuable hybrid between equities and bonds and provide a means to diversify our diversifiers.
Using backward induction and a number of simplifying assumptions, we generate a glide path based upon investor age and level of wealth.
We find that trend-following receives a significant allocation – largely in lieu of equity exposure – for investors early in retirement and whose initial consumption rate closely reflects the 4% level.

In past commentaries, we have written at length about investor sequence risk. Summarized simply, sequence risk is the sensitivity of investor goals to the sequence of market returns. In finance, we traditionally assume the sequence of returns does not matter. However, for investors and institutions that are constantly making contributions and withdrawals, the sequence can be incredibly important.

Consider for example, an investor who retires with $1,000,000 and uses the traditional 4% spending rule to allocate a $40,000 annual withdrawal to themselves. Suddenly, in the first year, their portfolio craters to $500,000. That $40,000 no longer represents just 4%, but now it represents 8%.

Significant drawdowns and fixed withdrawals mix like oil and water.

Sequence risk is the exact reason why traditional glide paths have investors de-risk their portfolios over time from growth-focused, higher volatility assets like equities to traditionally less volatile assets, like short-duration investment grade fixed income.

Bonds, however, are not the only way investors can manage risk. There are a variety of other methods, and frequent readers will know that we are strong advocates for the incorporation of trend-following techniques.

But how much trend-following should investors use? And when?

That is exactly what this commentary aims to explore.

Building a New Glidepath

In many ways, this is a very open-ended question. As a starting point, we will create some constraints that simplify our approach:

The assets we will be limited to are broad U.S. equities, a trend-following strategy applied to U.S. equities, a 10-year U.S. Treasury index, and a U.S. Treasury Bill index.
In any simulations we perform, we will use resampled historical returns.
We assume an annual spend rate of $40,000 growing at 3.5% per year (the historical rate of annualized inflation over the period).
We assume our investor retires at 60.
We assume a male investor and use the Social Security Administration’s 2014 Actuarial Life Table to estimate the probability of death.

Source: St. Louis Federal Reserve and Kenneth French Database. Past performance is hypothetical and backtested. Trend Strategy is a simple 200-day moving average cross-over strategy that invests in U.S. equities when the price of U.S. equities is above its 200-day moving average and in U.S. T-Bills otherwise. Returns are gross of all fees and assume the reinvestment of all dividends. None of the equity curves presented here represent a strategy managed by Newfound Research.

To generate our glide path, we will use a process of backwards induction similar to that proposed by Gordon Irlam in his article Portfolio Size Matters (Journal of Personal Finance, Vol 13 Issue 2). The process works thusly:

Starting at age 100, assume a success rate of 100% for all wealth levels except for $0, which has a 0% success rate.
Move back in time 1 year and generate 10,000 1-year return simulations.
For each possible wealth level and each possible portfolio configuration of the four assets, use the 10,000 simulations to generate 10,000 possible future wealth levels, subtracting the inflation-adjusted annual spend.
For a given simulation, use standard mortality tables to determine if the investor died during the year. If he did, set the success rate to 100% for that simulation. Otherwise, set the success rate to the success rate of the wealth bucket the simulation falls into at T+1.
For the given portfolio configuration, set the success rate as the average success rate across all simulations.
For the given wealth level, select the portfolio configuration that maximizes success rate.
Return to step 2.

As a technical side-note, we should mention that exploring all possible portfolio configurations is a computationally taxing exercise, as would be an optimization-based approach. To circumvent this, we employ a quasi-random low-discrepancy sequence generator known as a Sobol sequence. This process allows us to generate 100 samples that efficiently span the space of a 4-dimensional unit hypercube. We can then normalize these samples and use them as our sample allocations.

If that all sounded like gibberish, the main thrust is this: we’re not really checking every single portfolio configuration, but trying to use a large enough sample to capture most of them.

By working backwards, we can tackle what would be an otherwise computationally intractable problem. In effect, we are saying, “if we know the optimal decision at time T+1, we can use that knowledge to guide our decision at time T.”

This methodology also allows us to recognize that the relative wealth level to spending level is important. For example, having $2,000,000 at age 70 with a $40,000 real spending rate is very different than having $500,000, and we would expect that the optimal allocation would different.

Consider the two extremes. The first extreme is we have an excess of wealth. In this case, since we are optimizing to maximize the probability of success, the result will be to take no risk and hold a significant amount of T-Bills. If, however, we had optimized to acknowledge a desire to bequeath wealth to the next generation, you would likely see the opposite extreme: with little risk of failure, you can load up on stocks and to try to maximize growth.

The second extreme is having a significant dearth of wealth. In this case, we would expect to see the optimizer recommend a significant amount of stocks, since the safer assets will likely guarantee failure while the risky assets provide a lottery’s chance of success.

The Results

To plot the results both over time as well as over the different wealth levels, we have to plot each asset individually, which we do below. As an example of how to read these graphs, below we can see that in the table for U.S. equities, at age 74 and a $1,600,000 wealth level, the glide path would recommend an 11% allocation to U.S. equities.

A few features we can identify:

When there is little chance of success, the glide path tilts towards equities as a potential lottery ticket.
When there is a near guarantee of success, the glide path completely de-risks.
While we would expect a smooth transition in these glide paths, there are a few artifacts in the table (e.g. U.S. equities with $200,000 wealth at age 78). This may be due to a particular set of return samples that cascade through the tables. Or, because the trend following strategy can exhibit nearly identical returns to U.S. equities over a number of periods, we can see periods where the trend strategy received weight instead of equities (e.g. $400,000 wealth level at age 96 or $200,000 at 70).

Ignoring the data artifacts, we can broadly see that trend following seems to receive a fairly healthy weight in the earlier years of retirement and at wealth levels where capital preservation is critical, but growth cannot be entirely sacrificed. For example, we can see that an investor with $1,000,000 at age 60 would allocate approximately 30% of their portfolio to a trend following strategy.

Note that the initially assumed $40,000 consumption level aligns with the generally recommended 4% withdrawal assumption. In other words, the levels here are less important than their size relative to desired spending.

It is also worth pointing out again that this analysis uses historical returns. Hence, we see a large allocation to T-Bills which, once upon a time, offered a reasonable rate of return. This may not be the case going forward.

Conclusion

Financial theory generally assumes that the order of returns is not important to investors. Any investor contributing or withdrawing from their investment portfolio, however, is dramatically affected by the order of returns. It is much better to save before a large gain or spend before a large loss.

For investors in retirement who are making frequent and consistent withdrawals from their portfolios, sequence manifests itself in the presence of large and prolonged drawdowns. Strategies that can help avoid these losses are, therefore, potentially very valuable.

This is the basis of the traditional glidepath. By de-risking the portfolio over time, investors become less sensitive to sequence risk. However, as bond yields remain low and investor life expectancy increases, investors may need to rely more heavily on higher volatility growth assets to avoid running out of money.

To explore these concepts, we have built our own glide path using four assets: broad U.S. equities, 10-year U.S. Treasuries, U.S. T-Bills, and a trend following strategy. Not surprisingly, we find that trend following commands a significant allocation, particularly in the years and wealth levels where sequence risk is highest, and often is allocated to in lieu of equities themselves.

Beyond recognizing the potential value-add of trend following, however, an important second takeaway may be that there is room for significant value-add in going beyond traditional target-date-based glide paths for investors.

The Research Library of Newfound Research

Month: July 2018

Measuring Process Diversification in Trend Following

Summary

The Setup

Diversifying Parameterization Risk

Diversifying Model Risk

Combining Model and Parameterization Diversification

Conclusion

Machine Learning, Subset Resampling, and Portfolio Optimization

Summary

Estimation Risk in Portfolio Optimization

Applying Machine Learning to Reduce Estimation Risk

Applying Subset Resampling to Reduce Estimation Error

Conclusion

Momentum’s Magic Number

Summary

Out-of-Sample Testing with Countries and Sectors

Does the Effect Persist for Long-Only Portfolios?

Conclusion

The New Glide Path

Summary

Building a New Glidepath

The Results

Conclusion

Month: July 2018

Measuring Process Diversification in Trend Following

Summary­

The Setup

Diversifying Parameterization Risk

Diversifying Model Risk

Combining Model and Parameterization Diversification

Conclusion

Machine Learning, Subset Resampling, and Portfolio Optimization

Summary

Estimation Risk in Portfolio Optimization

Applying Machine Learning to Reduce Estimation Risk

Applying Subset Resampling to Reduce Estimation Error

Conclusion

Momentum’s Magic Number

Summary­

Out-of-Sample Testing with Countries and Sectors

Does the Effect Persist for Long-Only Portfolios?

Conclusion

The New Glide Path

Summary­

Building a New Glidepath

The Results

Conclusion

Summary

Summary

Summary