The Research Library of Newfound Research

Category: Weekly Commentary Page 13 of 21

The State of Risk Management

This post is available as PDF download here

Summary

  • We compare and contrast different approaches to risk managing equity exposure; including fixed income, risk parity, managed futures, tactical equity, and options-based strategies; over the last 20 years.
  • We find that all eight strategies studied successfully reduce risk, while six of the eight strategies improve risk-adjusted returns. The lone exceptions are two options-based strategies that involve being long volatility and therefore are on the wrong side of the volatility risk premium.
  • Over time, performance of the risk management strategies varies significantly both relative to the S&P 500 and compared to the other strategies. Generally, risk-managed strategies tend to behave like insurance, underperforming on the upside and outperforming on the downside.
  • Diversifying your diversifiers by blending a number of complementary risk-managed strategies together can be a powerful method of improving long-term outcomes. The diversified approach to risk management shows promise in terms of reducing sequence risk for those investors nearing or in retirement.

I was perusing Twitter the other day and came across this tweet from Jim O’Shaughnessy, legendary investor and author of What Works on Wall Street.

As always. Jim’s wisdom is invaluable.  But what does this idea mean for Newfound as a firm?  Our first focus is on managing risk.  As a result, one of the questions that we MUST know the answer to is how to get more investors comfortable with sticking to a risk management plan through a full market cycle.

Unfortunately, performance chasing seems to us to be just as prevalent in risk management as it is in investing as a whole.  The benefits of giving up some upside participation in exchange for downside protection seemed like a no brainer in March of 2009.  After 8+ years of strong equity market returns (although it hasn’t always been as smooth of a ride as the market commentators may make you think), the juice may not quite seem worth the squeeze.

While we certainly don’t profess to know the answer to our burning question from above, we do think the first step towards finding one is a thorough understanding on the risk management landscape.  In that vein, this week we will update our State of Risk Management presentation from early 2016.

We examine eight strategies that roughly fit into four categories:

  • Diversification Strategies: strategic 60/40 stock/bond mix1 and risk parity2
  • Options Strategies: equity collar3, protective put4, and put-write5
  • Equity Strategies: long-only defensive equity that blends a minimum volatility strategy6, a quality strategy7, and a dividend growth strategy8 in equal weights
  • Trend-Following Strategies: managed futures9 and tactical equity10

The Historical Record

We find that over the period studied (December 1997 to July 2018) six of the eight strategies outperform the S&P 500 on a risk-adjusted basis both when we define risk as volatility and when we define risk as maximum drawdown.  The two exceptions are the equity collar strategy and the protective put strategy.  Both of these strategies are net long options and therefore are forced to pay the volatility risk premium.  This return drag more than offsets the reduction of losses on the downside.

Data Source: Bloomberg, CSI. Calculations by Newfound Research. Past performance does not guarantee future results. Volatility is a statistical measure of the amount of variation around the average returns for a security or strategy. All returns are hypothetical index returns. You cannot invest directly in an index and unmanaged index returns do not reflect any fees, expenses, sales charges, or trading expenses. Index returns include the reinvestment of dividends. No index is meant to measure any strategy that is or ever has been managed by Newfound Research. The Tactical Equity strategy was constructed by Newfound in August 2018 for purposes of this analysis and is therefore entirely backtested and not an investment strategy that is currently managed and offered by Newfound.

 

Data Source: Bloomberg, CSI. Calculations by Newfound Research. Past performance does not guarantee future results. Drawdown is a statistical measure of the losses experienced by a security or strategy relative to its historical maximum. The maximum drawdown is the largest drawdown over the security or strategy’s history. All returns are hypothetical index returns. You cannot invest directly in an index and unmanaged index returns do not reflect any fees, expenses, sales charges, or trading expenses. Index returns include the reinvestment of dividends. No index is meant to measure any strategy that is or ever has been managed by Newfound Research. The Tactical Equity strategy was constructed by Newfound in August 2018 for purposes of this analysis and is therefore entirely backtested and not an investment strategy that is currently managed and offered by Newfound.

 

Not Always a Smooth Ride

While it would be nice if this outperformance accrued steadily over time, reality is quite a bit messier.  All eight strategies exhibit significant variation in their rolling one-year returns vs. the S&P 500.  Interestingly, the two strategies with the widest ranges of historical one-year performance vs. the S&P 500 are also the two strategies that have delivered the most downside protection (as measured by maximum drawdown).  Yet another reminder that there is no free lunch in investing.  The more aggressively you wish to reduce downside capture, the more short-term tracking error you must endure.

Relative 1-Year Performance vs. S&P 500 (December 1997 to July 2018)

Data Source: Bloomberg, CSI. Calculations by Newfound Research. Past performance does not guarantee future results. All returns are hypothetical index returns. You cannot invest directly in an index and unmanaged index returns do not reflect any fees, expenses, sales charges, or trading expenses. Index returns include the reinvestment of dividends. No index is meant to measure any strategy that is or ever has been managed by Newfound Research. The Tactical Equity strategy was constructed by Newfound in August 2018 for purposes of this analysis and is therefore entirely backtested and not an investment strategy that is currently managed and offered by Newfound.

 

Thinking of Risk Management as (Uncertain) Portfolio Insurance

When we examine this performance dispersion across different market environments, we find a totally intuitive result: risk management strategies generally underperform the S&P 500 when stocks advance and outperform the S&P 500 when stocks decline.  The hit rate for the risk management strategies relative to the S&P 500 is 81.2% in the four years that the S&P 500 was down (2000, 2001, 2002, and 2008) and 19.8% in the seventeen years that the S&P was up.

In this way, risk management strategies are akin to insurance.  A premium, in the form of upside capture ratios less than 100%, is paid in exchange for a (hopeful) reduction in downside capture.

With this perspective, it’s totally unsurprising that these strategies have underperformed since the market bottomed during the global market crisis.   Seven of the eight strategies (with the long-only defensive equity strategy being the lone exception) underperformed the S&P 500 on an absolute return basis and six of the eight strategies (with defensive equity and the 60/40 stock/bond blend) underperformed on a risk-adjusted basis.

Annual Out/Underperformance Relative to S&P 500 (December 1997 to July 2018)

Data Source: Bloomberg, CSI. Calculations by Newfound Research. Past performance does not guarantee future results. All returns are hypothetical index returns. You cannot invest directly in an index and unmanaged index returns do not reflect any fees, expenses, sales charges, or trading expenses. Index returns include the reinvestment of dividends. No index is meant to measure any strategy that is or ever has been managed by Newfound Research. The Tactical Equity strategy was constructed by Newfound in August 2018 for purposes of this analysis and is therefore entirely backtested and not an investment strategy that is currently managed and offered by Newfound.

 

Diversifying Your Diversifiers

The good news is that there is significant year-to-year variation in the performance across strategies, as evidenced by the periodic table of returns above, suggesting there are diversification benefits to be harvested by allocating to multiple risk management strategies.  The average annual performance differential between the best performing strategy and the worst performing strategy is 20.0%.  This spread was less than 10% in only 3 of the 21 years studied.

We see the power of diversifying your diversifiers when we test simple equal-weight blends of the risk management strategies.  Both blends have higher Sharpe Ratios than 7 of the 8 individual strategies and higher excess return to drawdown ratios than 6 of the eight individual strategies.

This is a very powerful result, indicating that naïve diversification is nearly as good as being able to pick the best individual strategies with perfect foresight.

Data Source: Bloomberg, CSI. Calculations by Newfound Research. Past performance does not guarantee future results. All returns are hypothetical index returns. You cannot invest directly in an index and unmanaged index returns do not reflect any fees, expenses, sales charges, or trading expenses. Index returns include the reinvestment of dividends. No index is meant to measure any strategy that is or ever has been managed by Newfound Research. The Tactical Equity strategy was constructed by Newfound in August 2018 for purposes of this analysis and is therefore entirely backtested and not an investment strategy that is currently managed and offered by Newfound.

 

Why Bother with Risk Management in the First Place?

As we’ve written about previously, we believe that for most investors investing “failure” means not meeting one’s financial objectives.  In the portfolio management context, failure comes in two flavors.  “Slow” failure results from taking too little risk, while “fast” failure results from taking too much risk.

In this book, Red Blooded Risk, Aaron Brown summed up this idea nicely: “Taking less risk than is optimal is not safer; it just locks in a worse outcome.  Taking more risk than is optimal also results in a worst outcome, and often leads to complete disaster.”

Risk management is not synonymous with risk reduction.  It is about taking the right amount of risk, not too much or too little.

Having a pre-defined risk management plan in place before a crisis can help investors avoid panicked decisions that can turn a bad, but survivable event into catastrophe (e.g. the retiree that sells all of his equity exposure in early 2009 and then stays out of the market for the next five years).

It’s also important to remember that individuals are not institutions.  They have a finite investment horizon.  Those that are at or near retirement are exposed to sequence risk, the risk of experiencing a bad investment outcome at the wrong time.

We can explore sequence risk using Monte Carlo simulation.  We start by assessing the S&P 500 with no risk management overlay and assume a 30-year retirement horizon.  The simulation process works as follows:

  1. Randomly choose a sequence of 30 annual returns from the set of actual annual returns over the period we studied (December 1998 to July 2018).
  2. Adjust returns for inflation.
  3. For the sequence of returns chosen, calculate the perfect withdrawal rate (PWR). Clare et al, 2016 defines the PWR as “the withdrawal rate that effectively exhausts wealth at death (or at the end of a fixed period, known period) if one had perfect foresight of all returns over the period.11
  4. Return to #1, repeating 1000 times in total.

We plot the distribution of PWRs for the S&P 500 below.  While the average PWR is a respectable 5.7%, the range of outcomes is very wide (0.6% to 14.7%).  The 95 percent confidence interval around the mean is 2.0% to 10.3%.  This is sequence risk.  Unfortunately, investors do not have the luxury of experiencing the average, they only see one draw.  Get lucky and you may get to fund a better lifestyle than you could have imagined with little to no financial stress.  Get unlucky and you may have trouble paying the bills and will be sweating every market move.

Calculations by Newfound Research. Past performance does not guarantee future results. All returns are hypothetical index returns. You cannot invest directly in an index and unmanaged index returns do not reflect any fees, expenses, sales charges, or trading expenses. Index returns include the reinvestment of dividends.

 

Next, we repeat the simulation, replacing the pure S&P 500 exposure with the equal-weight blend of risk management strategies excluding the equity collar and the protective put.  We see quite a different result.  The average PWR is similar (6.2% to 5.7%), but the range of outcomes is much smaller (95% confidence interval from 4.4% to 8.1%).  At its very core, this is what implementing a risk management plan is all about.  Reducing the role of investment luck in financial planning.  We give up some of the best outcomes (in the right tail of the S&P 500 distribution) in exchange for reducing the probability of the very worst outcomes (in the left tail).

Calculations by Newfound Research. Past performance does not guarantee future results. All returns are hypothetical index returns. You cannot invest directly in an index and unmanaged index returns do not reflect any fees, expenses, sales charges, or trading expenses. Index returns include the reinvestment of dividends.

Conclusion

There is no holy grail when it comes to risk management.  While a number of approaches have historically delivered strong results, each comes with its own pros and cons.

In an uncertain world where we cannot predict exactly what the next crisis will look like, diversifying your diversifiers by combining a number of complementary risk-managed strategies may be a prudent course of action. We believe that this type of balanced approach has the potential to deliver compelling results over a full market cycle while managing the idiosyncratic risk of any one manager or strategy.

Diversification can also help to increase the odds of an investor sticking with their risk management plan as the short-term performance lows won’t be quite as low as they would be with a single strategy (conversely, the highs won’t be as high either).

That being said, having the discipline to stick with a risk management plan also requires being realistic.  While it would be great to build a strategy with 100% upside and 0% downside, such an outcome is unrealistic.  Risk-managed strategies tend to behave a lot like uncertain insurance for the portfolio.  A premium, in the form of upside capture ratios less than 100%, is paid in exchange for a (hopeful) reduction in downside capture.  This upside underperformance is a feature, not a bug.  Trying too hard to correct it may lead to overfit strategies fail to deliver adequate protection on the downside.

Measuring Process Diversification in Trend Following

This post is available as a PDF download here.

Summary­

  • We prefer to think about diversification in a three-dimensional framework: what, how, and when.
  • The “how” axis covers the process with which an investment decision is made.
  • There are a number of models that trend-followers might use to capture a trend. For example, trend-followers might employ a time-series momentum model, a price-minus moving average model, or a double moving average cross-over model.
  • Beyond multiple models, each model can have a variety of parameterizations. For example, a time-series momentum model can just as equally be applied with a 3-month formation period as an 18-month period.
  • In this commentary, we attempt to measure how much diversification opportunity is available by employing multiple models with multiple parameterizations in a simple long/flat trend-following process.

When investors talk about diversification, they typically mean across different investments.  Do not just by a single stock, for example, buy a basket of stocks in order to diversify away the idiosyncratic risk.

We call this “what” diversification (i.e. “what are you buying?”) and believe this is only one of three meaningful axes of diversification for investors.  The other two are “how” (i.e. “how are you making your decision?”) and “when” (i.e. “when are you making your decision?”).  In recent years, we have written a great deal about the “when” axis, and you can find a summary of that research in our commentary Quantifying Timing Luck.

In this commentary, we want to discuss the potential benefits of diversifying across the “how” axis in trend-following strategies.

But what, exactly, do we mean by this?  Consider that there are a number of ways investors can implement trend-following signals.  Some popular methods include:

  • Prior total returns (“time-series momentum”)
  • Price-minus-moving-average (e.g. price falls below the 200-day moving average)
  • Moving-average double cross-over (e.g. the 50-day moving average crosses the 200-day moving average)
  • Moving-average change-in-direction (e.g. the 200-day moving average slope turns positive or negative)

As it turns out, these varying methodologies are actually cousins of one another.  Recent research has established that these models can, more or less, be thought of as different weighting schemes of underlying returns.  For example, a time-series momentum model (with no skip month) derives its signal by averaging daily log returns over the lookback period equally.

With this common base, a number of papers over the last decade have found significant relationships between the varying methods.  For example:

 

Evidence
Bruder, Dao, Richard, and Roncalli (2011)Moving-average-double-crossover is just an alternative weighting scheme for time-series momentum.
Marshall, Nguyen and Visaltanachoti (2014)Time-series momentum is related to moving-average-change-in-direction.
Levine and Pedersen (2015)Time-series-momentum and moving-average cross-overs are highly related; both methods perform similarly on 58 liquid futures contracts.
Beekhuizen and Hallerbach (2015)Mathematically linked moving averages with prior returns.
Zakamulin (2015)Price-minus-moving-average, moving-average-double-cross-over, and moving-average-change-of-direction can all be interpreted as a computation of a weighted moving average of momentum rules.

 

As we have argued in past commentaries, we do not believe any single method is necessarily superior to another.  In fact, it is trivial to evaluate these methods over different asset classes and time-horizons and find an example that proves that a given method provides the best result.

Without a crystal ball, however, and without any economic interpretation why one might be superior to another, the choice is arbitrary.  Yet the choice will ultimately introduce randomness into our results: a factor we like to call “process risk.”  A question we should ask ourselves is, “if we have no reason to believe one is better than another, why would we pick one at all?”

We like to think of it this way: ex-post, we will know whether the return over a given period is positive or negative.  Ex-ante, all we have is a handful of trend-following signals that are forecasting that direction.  If, historically, all of these trend signals have been effective, then there may be no reason to necessarily believe on over another.

Combining them, in many ways, is sort of like trying to triangulate on the truth. We have a number of models that all look at the problem from a slightly different perspective and, therefore, provide a slightly different interpretation.  A (very) loose analogy might be using the collective information from a number of cell towers in effort to pinpoint the geographic location of a cellphone.

We may believe that all of the trend models do a good job of identifying trends over the long run, but most will prove false from time-to-time in the short-run. By using them together, we can potentially increase our overall confidence when the models agree and decrease our confidence when they do not.

With all this in mind, we want to explore the simple question: “how much potential benefit does process diversification bring us?”

The Setup

To answer this question, we first generate a number of long/flat trend following strategies that invest in a broad U.S. equity index or the risk-free rate (both provided by the Kenneth French database and ranging from 1926 to 2018). There are 48 strategy variations in total constructed through a combination of four difference processes – time-series momentum, price-minus-moving-average, and moving-average double cross-over– and 16 different lookback periods (from the approximate equivalent of 3-to-18 months).

We then treat each of the 64 variations as its own unique asset.

To measure process diversification, we are going to use the concept of “independent bets.” The greater the number of independent bets within a portfolio, the greater the internal diversification. Below are a couple examples outlining the basic intuition for a two-asset portfolio:

  • If we have a portfolio holding two totally independent assets with similar volatility levels, a 50% allocation to each would maximize our diversification.Intuitively, we have equally allocated across two unique bets.
  • If we have a portfolio holding two totally independent assets with similar volatility levels, a 90% allocation to one asset and a 10% allocation to another would lead us to a highly concentrated bet.
  • If we have a portfolio holding two highly correlated assets, no matter the allocation split, we have a large, concentrated bet.
  • If we have a portfolio of two assets with disparate volatility levels, we will have a large concentrated bet unless the lower volatility asset comprises the vast majority of the portfolio.

To measure this concept mathematically, we are going to use the fact that the square of the “diversification ratio” of a portfolio is equal to the number of independent bets that portfolio is taking.1

Diversifying Parameterization Risk

Within process diversification, the first variable we can tweak is the formation period of our trend signal.  For example, if we are using a time-series momentum model that simply looks at the sign of the total return over the prior period, the length of that period may have a significant influence in the identification of a trend.  Intuition tells us that shorter formation periods might identify short-term trends as well as react to long-term trend changes more quickly but may be more sensitive to whipsaw risk.

To explore the diversification opportunities available to us simply by varying our formation parameterization, we build equal-weight portfolios comprised of two strategies at a time, where each strategy utilizes the same trend model but a different parameterization.  We then measure the number of independent bets in that combination.

We run this test for each trend following process independently.  As an example, we compare using a shorter lookback period with a longer lookback period in the context of time-series momentum in isolation. We will compare across models in the next section.

In the graphs below, L0 through L15 represent the lookback periods, with L0 being the shortest lookback period and L15 representing the longest lookback period.

As we might suspect, the largest increase in available bets arises from combining shorter formation periods with longer formation periods.  This makes sense, as they represent the two horizons that share the smallest proportion of data and therefore have the least “information leakage.” Consider, for example, a time-series momentum signal that has a 4-monnth lookback and one with an 8-month lookback. At all times, 50% of the information used to derive the latter model is contained within the former model.  While the technical details are subtler, we would generally expect that the more informational overlap, the less diversification is available.

We can see that combining short- and long-term lookbacks, the total number of bets the portfolio is taking from 1.0 to approximately 1.2.

This may not seem like a significant lift, but we should remember Grinold and Kahn’s Fundamental Law of Active Management:

Information Ratio = Information Coefficient x SQRT(Independent Bets)

Assuming the information coefficient stays the same, an increase in the number of independent bets from 1.0 to 1.2 increases our information ratio by approximately 10%.  Such is the power of diversification.

Another interesting way to approach this data is by allowing an optimizer to attempt to maximize the diversification ratio.  In other words, instead of only looking at naïve, equal-weight combinations of two processes at a time, we can build a portfolio from all available lookback variations.

Doing so may provide two interesting insights.

First, we can see how the optimizer might look to combine different variations to maximize diversification.  Will it barbell long and short lookbacks, or is there benefit to including medium lookbacks? Will the different processes have different solutions?  Second, by optimizing over the full history of data, we can find an upper limit threshold to the number of independent bets we might be able to capture if we had a crystal ball.

A few takeaways from the graphs above:

  • Almost all of the processes barbell short and long lookback horizons to maximize diversification.
  • The optimizer finds value, in most cases, in introducing medium-term lookback horizons as well. We can see for Time-Series MOM, the significant weights are placed on L0, L1, L6, L10, and L15.  While not perfectly spaced or equally weighted, this still provides a strong cross-section of available information.  Double MA Cross-Over, on the other hand, finds value in weighting L0, L8, and L15.
  • While the optimizer increases the number of independent bets in all cases versus a naïve, equal-weight approach, the pickup is not incredibly dramatic. At the end of the day, a crystal ball does not find a meaningfully better solution than our intuition may provide.

Diversifying Model Risk

Similar to the process taken in the above section, we will now attempt to quantify the benefits of cross-process diversification.

For each trend model, we will calculate the number of independent bets available by combining it with another trend model but hold the lookback period constant. As an example, we will combine the shortest lookback period of the Time-Series MOM model with the shortest lookback period of the MA Double Cross-Over.

We plot the results below of the number of independent bets available through a naïve, equal-weight combination.

We can see that model combinations can lift the number of independent bets from by 0.05 to 0.1.  Not as significant as the theoretical lift from parameter diversification, but not totally insignificant.

Combining Model and Parameterization Diversification

We can once again employ our crystal ball in an attempt to find an upper limit to the diversification available to trend followers, as well as the process / parameterization combinations that will maximize this opportunity.  Below, we plot the results.

We see a few interesting things of note:

  • The vast majority of models and parameterizations are ignored.
  • Time-Series MOM is heavily favored as a model, receiving nearly 60% of the portfolio weight.
  • We see a spread of weight across short, medium, and long-term weights. Short-term is heavily favored, with Time-Series MOM L0 and Price-Minus MA L0 approaching nearly 45% of model weight.
  • All three models are, ultimately, incorporated, with approximately 10% being allocated to Double MA Cross-Over, 30% to Price-Minus MA, and 60% to Time-Series MOM.

It is worth pointing out that naively allocating equally across all 48 models creates 1.18 independent bets while the full-period crystal ball generated 1.29 bets.

Of course, having a crystal ball is unrealistic.  Below, we look at a rolling window optimization that looks at the prior 5 years of weekly returns to create the most diversified portfolio.  To avoid plotting a graph with 48 different components, we have plot the results two ways: (1) clustered by process and (2) clustered by lookback period.

Using the rolling window, we see similar results as we saw with the crystal ball. First, Time-Series MOM is largely favored, often peaking well over 50% of the portfolio weights.  Second, we see that a barbelling approach is frequently employed, balancing allocations to the shortest lookbacks (L0 and L1) with the longest lookbacks (L14 and L15).  Mid-length lookbacks are not outright ignored, however, and L5 through L11 combined frequently make up 20% of the portfolio.

Finally, we can see that the rolling number of bets is highly variable over time, but optimization frequently creates a meaningful impact over an equal-weight approach.2

Conclusion

In this commentary, we have explored the idea of process diversification.  In the context of a simple long/flat trend-following strategy, we find that combining strategies that employ different trend identification models and different formation periods can lead to an increase in the independent number of bets taken by the portfolio.

As it specifically pertains to trend-following, we see that diversification appears to be maximized by allocating across a number of lookback horizons, with an optimizer putting a particular emphasis on barbelling shorter and longer lookback periods.

We also see that incorporating multiple processes can increase available diversification as well.  Interestingly, the optimizer did not equally diversify across models.  This may be due to the fact that these models are not truly independent from one another than they might seem.  For example, Zakamulin (2015) demonstrated that these models can all be decomposed into a different weighted average of the same general momentum rules.

Finding process diversification, then, might require moving to a process that may not have a common basis.  For example, trend followers might consider channel methods or a change in basis (e.g. constant volume bars instead of constant time bars).

Momentum’s Magic Number

This post is available as a PDF download here.

Summary­

  • In HIMCO’s May 2018 Quantitative Insight, they publish a figure that suggests the optimal holding length of a momentum strategy is a function of the formation period.
  • Specifically, the result suggests that the optimal holding period is one selected such that the formation period plus the holding period is equal to 14-to-18 months: a somewhat “magic” result that makes little intuitive, statistical, or economic sense.
  • To investigate this result, we construct momentum strategies for country indices as well as industry groups.
  • We find similar results, with performance peaking when the formation period plus the holding period is equal to 12-to-14 months.
  • While lacking a specific reason why this effect exists, it suggests that investors looking to leverage shorter-term momentum signals may benefit from longer investment horizons, particularly when costs are considered.

A few weeks ago, we came across a study published by HIMCO on momentum investing1.  Contained within this research note was a particularly intriguing exhibit.

Source: HIMCO Quantitative Insights, May 2018

What this figure demonstrates is that the excess cumulative return for U.S. equity momentum strategies peaks as a function of both formation period and holding period.  Specifically, the returns appear to peak when the sum of the formation and holding period is between 14-18 months.

For example, if you were to form a portfolio based upon trailing 6-1 momentum – i.e. ranking on the prior 6-month total returns and skipping the most recent month (labeled in the figure above as “2_6”) – this evidence suggests that you would want to hold such a portfolio for 8-to-12 months (labeled in the figure above as 14-to-18 months since the beginning of the uptrend).

Which is a rather odd conclusion.  Firstly, we would intuitively expect that we should employ holding periods that are shorter than our formation periods.  The notion here is that we want to use enough data to harvest information that will be stationary over the next, smaller time-step.  So, for example, we might use 36 months of returns to create a covariance matrix that we might hold constant for the next month (i.e. a 36-month formation period with a 1-month hold).  Given that correlations are non-stable, we would likely find the idea of using 1-month of data to form a correlation matrix we hold for the next 36-months rather ludicrous.

And, yet, here we are in a similar situation, finding that if we use a formation period of 5 months, we should hold our portfolio steady for the next 8-to-10 months.  And this is particularly weird in the world of momentum, which we typically expect to be a high turnover strategy.  How in the world can having a holding period longer than our formation period make sense when we expect information to quickly decay in value?

Perhaps the oddest thing of all is the fact that all these results center around 14-18 months.  It would be one thing if the conclusion was simply, “holding for six months after formation is optimal”; here the conclusion is that the optimal holding period is a function of formation period.  Nor is the conclusion something intuitive, like “the holding period should be half the formation period.”

Rather, the result – that the holding period should be 14-to-18 months minus the length of the formation period – makes little intuitive, statistical, or economic sense.

Out-of-Sample Testing with Countries and Sectors

In effort to explore this result further, we wanted to determine whether similar results were found when cross-sectional momentum was applied to country indices and industry groups.

Specifically, we ran three tests.

In the first, we constructed momentum portfolios using developed country index returns (U.S. dollar denominated; net of withholding taxes) from MSCI.  The countries included in the test are: Australia, Austria, Belgium, Canada, Denmark, Finland, France, Germany, Hong Kong, Ireland, Israel, Italy, Japan, Netherlands, New Zealand, Norway, Portugal, Singapore, Spain, Sweden, Switzerland, the United Kingdom, and the United States of America.  The data extends back to 12/1969.

In the second, we constructed momentum portfolios using the 12 industry group data set from the Kenneth French Data Library.  The data extends back to 7/1926.

In the third, we constructed momentum portfolios using the 49 industry group data set from the Kenneth French Data Library.  The data extends back to 7/1926.

For each data set, we ran the same test:

  • Vary formation periods from 5-1 to 12-1 months.
  • Vary holding periods from 1-to-26 months.
  • Using this data, construct dollar-neutral long/short portfolios that go long, in equal-weight, the top third ranking holdings and go short, in equal-weight, the bottom third.

Note that for holding periods exceeding 1 month, we employed an overlapping portfolio construction process.

Below we plot the results.

Source: MSCI and Kenneth French Data Library. Calculations by Newfound Research. Past performance is not a predictor of future results.  All information is backtested and hypothetical and does not reflect the actual strategy managed by Newfound Research.  Performance is net of all fees except for underlying ETF expense ratios.  Returns assume the reinvestment of all dividends, capital gains, and other earnings.

 

While the results are not as clear as those published by HIMCO, we still see an intriguing effect: returns peak as a function of both formation and holding period. For the country strategy, formation and holding appear to peak between 12-14 months, indicating that an investor using 5-1 month signals would want to hold for 7 months while an investor using 12-1 signals would only want to hold for 1 month.

For the industry data, the results are less clear.  Where the HIMCO and country results exhibited a clear “peak,” the industry results simply seem to “decay slower.”  In particular, we can see in the results for the 12-industry group test that almost all strategies peak with a 1-month holding period.  However, they all appear to fall off rapidly, and uniformly, after the time where formation plus holding period exceeds 16 months.

While less pronounced, it is worth pointing out that this result is achieved without the consideration of trading costs or taxes.  So, while the 5-1 strategy 12-industry group strategy return may peak with a 1-month hold, we can see that it later forms a second peak at a 9-month hold (“14 months since beginning uptrend”).  Given that we would expect a nine month hold to exhibit considerably less trading, analysis that includes trading cost estimates may exhibit even greater peakedness in the results.

Does the Effect Persist for Long-Only Portfolios?

In analyzing factors, it is often important to try to determine whether a given result is arising from an effect found in the long leg or the short leg.  After all, most investors implement strategies in a long-only capacity.  While long-only strategies are, technically, equal to a benchmark plus a dollar-neutral long/short portfolio2, the long/short portfolio rarely reflects the true factor definition.

Therefore, we want to evaluate long-only construction to determine whether the same result holds, or whether it is a feature of the short-leg.

Source: MSCI and Kenneth French Data Library. Calculations by Newfound Research. Past performance is not a predictor of future results.  All information is backtested and hypothetical and does not reflect the actual strategy managed by Newfound Research.  Performance is net of all fees except for underlying ETF expense ratios.  Returns assume the reinvestment of all dividends, capital gains, and other earnings.

We find incredibly similar results.  Again, country indices appear to peak between 12-to-14 months after the beginning of the uptrend.  Industry group results, while not as strong as country results, still appear to offer fairly flat results until 12-to-14 months after the beginning of the uptrend.  Taken together, it appears that this result is sustained for long-only portfolio implementations as well.

Conclusion

Traditionally, momentum is considered a high turnover factor.  Relative ranking of recent returns can vary substantially over time and our intuition would lead us to expect that the shorter the horizon we use to measure returns, the shorter the time we expect the relative ranking to persist.

Yet recent research published by HIMCO finds this intuition may not be true.  Rather, they find that momentum portfolio performance tends to peak 14-to-18 months after the beginning of the uptrend in measured. In other words, a portfolio formed on prior 5-month returns should hold between 9-to-13 months, while a portfolio formed on the prior 12-months of returns should only hold 2-to-6 months.

This result is rather counter-intuitive, as we would expect that shorter formation periods would require shorter holding periods.

We test this result out-of-sample, constructing momentum portfolios using country indices, 12-industry group indices, and 49-industry group indices. We find a similar result in this data. We then further test whether the result is an artifact found in only long/short implementations whether this information is useful for long-only investors.  Indeed, we find very similar results for long-only implementations.

Precisely why this result exists is still up in the air.  One argument may be that the trade-off is ultimately centered around win rate versus the size of winners.  If relative momentum tends to persist for only for 12-to-18 months total, then using 12-month formation may give us a higher win rate but reduce the size of the winners we pick.  Conversely, using a shorter formation period may reduce the number of winners we pick correctly (i.e. lower win rate), but those we pick have further to run. Selecting a formation period and a holding period such that their sum equals approximately 14 months may simply be a hack to find the balance of win rate and win size that maximizes return.

 


 

The New Glide Path

This post is available as a PDF download here.

Summary­

  • In practice, investors and institutions alike have spending patterns that makes the sequence of market returns a relevant risk factor.
  • All else held equal, investors would prefer to make contributions before large returns and withdrawals before large declines.
  • For retirees making constant withdrawals, sustained declines in portfolio value represent a significant risk. Trend-following has demonstrated historical success in helping reduce the risk these types of losses.
  • Traditionally, stock/bond glide paths have been used to control sequence risk. However, trend-following may be able to serve as a valuable hybrid between equities and bonds and provide a means to diversify our diversifiers.
  • Using backward induction and a number of simplifying assumptions, we generate a glide path based upon investor age and level of wealth.
  • We find that trend-following receives a significant allocation – largely in lieu of equity exposure – for investors early in retirement and whose initial consumption rate closely reflects the 4% level.

In past commentaries, we have written at length about investor sequence risk. Summarized simply, sequence risk is the sensitivity of investor goals to the sequence of market returns.  In finance, we traditionally assume the sequence of returns does not matter.  However, for investors and institutions that are constantly making contributions and withdrawals, the sequence can be incredibly important.

Consider for example, an investor who retires with $1,000,000 and uses the traditional 4% spending rule to allocate a $40,000 annual withdrawal to themselves. Suddenly, in the first year, their portfolio craters to $500,000.  That $40,000 no longer represents just 4%, but now it represents 8%.

Significant drawdowns and fixed withdrawals mix like oil and water.

Sequence risk is the exact reason why traditional glide paths have investors de-risk their portfolios over time from growth-focused, higher volatility assets like equities to traditionally less volatile assets, like short-duration investment grade fixed income.

Bonds, however, are not the only way investors can manage risk.  There are a variety of other methods, and frequent readers will know that we are strong advocates for the incorporation of trend-following techniques.

But how much trend-following should investors use?  And when?

That is exactly what this commentary aims to explore.

Building a New Glidepath

In many ways, this is a very open-ended question.  As a starting point, we will create some constraints that simplify our approach:

  1. The assets we will be limited to are broad U.S. equities, a trend-following strategy applied to U.S. equities, a 10-year U.S. Treasury index, and a U.S. Treasury Bill index.
  2. In any simulations we perform, we will use resampled historical returns.
  3. We assume an annual spend rate of $40,000 growing at 3.5% per year (the historical rate of annualized inflation over the period).
  4. We assume our investor retires at 60.
  5. We assume a male investor and use the Social Security Administration’s 2014 Actuarial Life Table to estimate the probability of death.

Source: St. Louis Federal Reserve and Kenneth French Database.  Past performance is hypothetical and backtested.  Trend Strategy is a simple 200-day moving average cross-over strategy that invests in U.S. equities when the price of U.S. equities is above its 200-day moving average and in U.S. T-Bills otherwise.  Returns are gross of all fees and assume the reinvestment of all dividends.  None of the equity curves presented here represent a strategy managed by Newfound Research. 

To generate our glide path, we will use a process of backwards induction similar to that proposed by Gordon Irlam in his article Portfolio Size Matters (Journal of Personal Finance, Vol 13 Issue 2). The process works thusly:

  1. Starting at age 100, assume a success rate of 100% for all wealth levels except for $0, which has a 0% success rate.
  2. Move back in time 1 year and generate 10,000 1-year return simulations.
  3. For each possible wealth level and each possible portfolio configuration of the four assets, use the 10,000 simulations to generate 10,000 possible future wealth levels, subtracting the inflation-adjusted annual spend.
  4. For a given simulation, use standard mortality tables to determine if the investor died during the year. If he did, set the success rate to 100% for that simulation. Otherwise, set the success rate to the success rate of the wealth bucket the simulation falls into at T+1.
  5. For the given portfolio configuration, set the success rate as the average success rate across all simulations.
  6. For the given wealth level, select the portfolio configuration that maximizes success rate.
  7. Return to step 2.

As a technical side-note, we should mention that exploring all possible portfolio configurations is a computationally taxing exercise, as would be an optimization-based approach.  To circumvent this, we employ a quasi-random low-discrepancy sequence generator known as a Sobol sequence.  This process allows us to generate 100 samples that efficiently span the space of a 4-dimensional unit hypercube.  We can then normalize these samples and use them as our sample allocations.

If that all sounded like gibberish, the main thrust is this: we’re not really checking every single portfolio configuration, but trying to use a large enough sample to capture most of them.

By working backwards, we can tackle what would be an otherwise computationally intractable problem.  In effect, we are saying, “if we know the optimal decision at time T+1, we can use that knowledge to guide our decision at time T.”

This methodology also allows us to recognize that the relative wealth level to spending level is important.  For example, having $2,000,000 at age 70 with a $40,000 real spending rate is very different than having $500,000, and we would expect that the optimal allocation would different.

Consider the two extremes.  The first extreme is we have an excess of wealth.  In this case, since we are optimizing to maximize the probability of success, the result will be to take no risk and hold a significant amount of T-Bills.  If, however, we had optimized to acknowledge a desire to bequeath wealth to the next generation, you would likely see the opposite extreme: with little risk of failure, you can load up on stocks and to try to maximize growth.

The second extreme is having a significant dearth of wealth.   In this case, we would expect to see the optimizer recommend a significant amount of stocks, since the safer assets will likely guarantee failure while the risky assets provide a lottery’s chance of success.

The Results

To plot the results both over time as well as over the different wealth levels, we have to plot each asset individually, which we do below.  As an example of how to read these graphs, below we can see that in the table for U.S. equities, at age 74 and a $1,600,000 wealth level, the glide path would recommend an 11% allocation to U.S. equities.

A few features we can identify:

  • When there is little chance of success, the glide path tilts towards equities as a potential lottery ticket.
  • When there is a near guarantee of success, the glide path completely de-risks.
  • While we would expect a smooth transition in these glide paths, there are a few artifacts in the table (e.g. U.S. equities with $200,000 wealth at age 78). This may be due to a particular set of return samples that cascade through the tables.  Or, because the trend following strategy can exhibit nearly identical returns to U.S. equities over a number of periods, we can see periods where the trend strategy received weight instead of equities (e.g. $400,000 wealth level at age 96 or $200,000 at 70).

Ignoring the data artifacts, we can broadly see that trend following seems to receive a fairly healthy weight in the earlier years of retirement and at wealth levels where capital preservation is critical, but growth cannot be entirely sacrificed.  For example, we can see that an investor with $1,000,000 at age 60 would allocate approximately 30% of their portfolio to a trend following strategy.

Note that the initially assumed $40,000 consumption level aligns with the generally recommended 4% withdrawal assumption.  In other words, the levels here are less important than their size relative to desired spending.

It is also worth pointing out again that this analysis uses historical returns.  Hence, we see a large allocation to T-Bills which, once upon a time, offered a reasonable rate of return.  This may not be the case going forward.

Conclusion

Financial theory generally assumes that the order of returns is not important to investors. Any investor contributing or withdrawing from their investment portfolio, however, is dramatically affected by the order of returns.  It is much better to save before a large gain or spend before a large loss.

For investors in retirement who are making frequent and consistent withdrawals from their portfolios, sequence manifests itself in the presence of large and prolonged drawdowns.  Strategies that can help avoid these losses are, therefore, potentially very valuable.

This is the basis of the traditional glidepath.  By de-risking the portfolio over time, investors become less sensitive to sequence risk.  However, as bond yields remain low and investor life expectancy increases, investors may need to rely more heavily on higher volatility growth assets to avoid running out of money.

To explore these concepts, we have built our own glide path using four assets: broad U.S. equities, 10-year U.S. Treasuries, U.S. T-Bills, and a trend following strategy. Not surprisingly, we find that trend following commands a significant allocation, particularly in the years and wealth levels where sequence risk is highest, and often is allocated to in lieu of equities themselves.

Beyond recognizing the potential value-add of trend following, however, an important second takeaway may be that there is room for significant value-add in going beyond traditional target-date-based glide paths for investors.

Factor Fimbulwinter

This post is available as a PDF download here.

Summary­

  • Value investing continues to experience a trough of sorrow. In particular, the traditional price-to-book factor has failed to establish new highs since December 2006 and sits in a 25% drawdown.
  • While price-to-book has been the academic measure of choice for 25+ years, many practitioners have begun to question its value (pun intended).
  • We have also witnessed the turning of the tides against the size premium, with many practitioners no longer considering it to be a valid stand-alone anomaly. This comes 35+ years after being first published.
  • With this in mind, we explore the evidence that would be required for us to dismiss other, already established anomalies.  Using past returns to establish prior beliefs, we simulate out forward environments and use Bayesian inference to adjust our beliefs over time, recording how long it would take for us to finally dismiss a factor.
  • We find that for most factors, we would have to live through several careers to finally witness enough evidence to dismiss them outright.
  • Thus, while factors may be established upon a foundation of evidence, their forward use requires a bit of faith.

In Norse mythology, Fimbulvetr (commonly referred to in English as “Fimbulwinter”) is a great and seemingly never-ending winter.  It continues for three seasons – long, horribly cold years that stretch on longer than normal – with no intervening summers.  It is a time of bitterly cold, sunless days where hope is abandoned and discord reigns.

This winter-to-end-all-winters is eventually punctuated by Ragnarok, a series of events leading up to a great battle that results in the ultimate death of the major gods, destruction of the cosmos, and subsequent rebirth of the world.

Investment mythology is littered with Ragnarok-styled blow-ups and we often assume the failure of a strategy will manifest as sudden catastrophe.  In most cases, however, failure may more likely resemble Fimbulwinter: a seemingly never-ending winter in performance with returns blown to-and-fro by the harsh winds of randomness.

Value investors can attest to this.  In particular, the disciples of price-to-book have suffered greatly as of late, with “expensive” stocks having outperformed “cheap” stocks for over a decade.  The academic interpretation of the factor sits nearly 25% belowits prior high-water mark seen in December 2006.

Expectedly, a large number of articles have been written about the death of the value factor.  Some question the factor itself, while others simply argue that price-to-book is a broken implementation.

But are these simply retrospective narratives, driven by a desire to have an explanation for a result that has defied our expectations?  Consider: if price-to-book had exhibited positive returns over the last decade, would we be hearing from nearly as large a number of investors explaining why it is no longer a relevant metric?

To be clear, we believe that many of the arguments proposed for why price-to-book is no longer a relevant metric are quite sound. The team at O’Shaughnessy Asset Management, for example, wrote a particularly compelling piece that explores how changes to accounting rules have led book value to become a less relevant metric in recent decades.1

Nevertheless, we think it is worth taking a step back, considering an alternate course of history, and asking ourselves how it would impact our current thinking.  Often, we look back on history as if it were the obvious course.  “If only we had better prior information,” we say to ourselves, “we would have predicted the path!”2  Rather, we find it more useful to look at the past as just one realized path of many that’s that could have happened, none of which were preordained.  Randomness happens.

With this line of thinking, the poor performance of price-to-book can just as easily be explained by a poor roll of the dice as it can be by a fundamental break in applicability.  In fact, we see several potential truths based upon performance over the last decade:

  1. This is all normal course performance variance for the factor.
  2. The value factor works, but the price-to-book measure itself is broken.
  3. The price-to-book measure is over-crowded in use, and thus the “troughs of sorrow” will need to be deeper than ever to get weak hands to fold and pass the alpha to those with the fortitude to hold.
  4. The value factor never existed in the first place; it was an unfortunate false positive that saturated the investing literature and broad narrative.

The problem at hand is two-fold: (1) the statistical evidence supporting most factors is considerable and (2) the decade-to-decade variance in factor performance is substantial.  Taken together, you run into a situation where a mere decade of underperformance likely cannot undue the previously established significance.  Just as frustrating is the opposite scenario. Consider that these two statements are not mutually exclusive: (1) price-to-book is broken, and (2) price-to-book generates positive excess return over the next decade.

In investing, factor return variance is large enough that the proof is not in the eating of the short-term return pudding.

The small-cap premium is an excellent example of the difficulty in discerning, in real time, the integrity of an established factor.  The anomaly has failed to establish a meaningful new high since it was originally published in 1981.  Only in the last decade – nearly 30 years later – have the tides of the industry finally seemed to turn against it as an established anomaly and potential source of excess return.

Thirty years.

The remaining broadly accepted factors – e.g. value, momentum, carry, defensive, and trend – have all been demonstrated to generate excess risk-adjusted returns across a variety of economic regimes, geographies, and asset classes, creating a great depth of evidence supporting their existence. What evidence, then, would make us abandon faith from the Church of Factors?

To explore this question, we ran a simple experiment for each factor.  Our goal was to estimate how long it would take to determine that a factor was no longer statistically significant.

Our assumption is that the salient features of each factor’s return pattern will remain the same (i.e. autocorrelation, conditional heteroskedasticity, skewness, kurtosis, et cetera), but the forward average annualized return will be zero since the factor no longer “works.”

Towards this end, we ran the following experiment: 

  1. Take the full history for the factor and calculate prior estimates for mean annualized return and standard error of the mean.
  2. De-mean the time-series.
  3. Randomly select a 12-month chunk of returns from the time series and use the data to perform a Bayesian update to our mean annualized return.
  4. Repeat step 3 until the annualized return is no longer statistically non-zero at a 99% confidence threshold.

For each factor, we ran this test 10,000 times, creating a distribution that tells us how many years into the future we would have to wait until we were certain, from a statistical perspective, that the factor is no longer significant.

Sixty-seven years.

Based upon this experience, sixty-seven years is median number of years we will have to wait until we officially declare price-to-book (“HML,” as it is known in the literature) to be dead.3  At the risk of being morbid, we’re far more likely to die before the industry finally sticks a fork in price-to-book.

We perform this experiment for a number of other factors – including size (“SMB” – “small-minus-big”), quality (“QMJ” – “quality-minus-junk”), low-volatility (“BAB” – “betting-against-beta”), and momentum (“UMD” – “up-minus-down”) – and see much the same result.  It will take decades before sufficient evidence mounts to dethrone these factors.

HMLSMB4QMJBABUMD
Median Years-until-Failure6743132284339

 

Now, it is worth pointing out that these figures for a factor like momentum (“UMD”) might be a bit skewed due to the design of the test.  If we examine the long-run returns, we see a fairly docile return profile punctuated by sudden and significant drawdowns (often called “momentum crashes”).

Since a large proportion of the cumulative losses are contained in these short but pronounced drawdown periods, demeaning the time-series ultimately means that the majority of 12-month periods actually exhibit positive returns.  In other words, by selecting random 12-month samples, we actually expect a high frequency of those samples to have a positive return.

For example, using this process, 49.1%, 47.6%, 46.7%, 48.8% of rolling 12-month periods are positive for HML, SMB, QMJ, and BAB factors respectively.  For UMD, that number is 54.7%.  Furthermore, if you drop the worst 5% of rolling 12-month periods for UMD, the average positive period is 1.4x larger than the average negative period.  Taken together, not only are you more likely to select a positive 12-month period, but those positive periods are, on average, 1.4x larger than the negative periods you will pick, except for the rare (<5%) cases.

The process of the test was selected to incorporate the salient features of each factor.  However, in the case of momentum, it may lead to somewhat outlandish results.

Conclusion

While an evidence-based investor should be swayed by the weight of the data, the simple fact is that most factors are so well established that the majority of current practitioners will likely go our entire careers without experiencing evidence substantial enough to dismiss any of the anomalies.

Therefore, in many ways, there is a certain faith required to use them going forward. Yes, these are ideas and concepts derived from the data.  Yes, we have done our best to test their robustness out-of-sample across time, geographies, and asset classes.  Yet we must also admit that there is a non-zero probability, however small it is, that these are false positives: a fact we may not have sufficient evidence to address until several decades hence.

And so a bit of humility is warranted.  Factors will not suddenly stand up and declare themselves broken.  And those that are broken will still appear to work from time-to-time.

Indeed, the death of a factor will be more Fimulwinter than Ragnarok: not so violent to be the end of days, but enough to cause pain and frustration among investors.

 

Addendum

We have received a large number of inbound notes about this commentary, which fall upon two primary lines of questions.  We want to address these points.

How were the tests impacted by the Bayesian inference process?

The results of the tests within this commentary are rather astounding.  We did seek to address some of the potential flaws of the methodology we employed, but by-in-large we feel the overarching conclusion remains on a solid foundation.

While we only presented the results of the Bayesian inference approach in this commentary, as a check we actually tested two other approaches:

  1. A Bayesian inference approach assuming that forward returns would be a random walk with constant variance (based upon historical variance) and zero mean.
  2. Forward returns were simulated using the same bootstrap approach, but the factor was being discovered for the first time and the entire history was being evaluated for its significance.

The two tests were in effort to isolate the effects of the different components of our test.

What we found was that while the reported figures changed, the overall  magnitude did not.  In other words, the median death-date of HML may not have been 67 years, but the order of magnitude remained much the same: decades.

Stepping back, these results were somewhat a foregone conclusion.  We would not expect an effect that has been determined to be statistically significant over a hundred year period to unravel in a few years.  Furthermore, we would expect a number of scenarios that continue to bolster the statistical strength just due to randomness alone.

Why are we defending price-to-book?

The point of this commentary was not to defend price-to-book as a measure.  Rather, it was to bring up a larger point.

As a community, quantitative investors often leverage statistical significance as a defense for the way we invest.

We think that is a good thing.  We should look at the weight of the evidence.  We should be data driven.  We should try to find ideas that have proven to be robust over decades of time and when applied in different markets or with different asset classes.  We should want to find strategies that are robust to small changes in parameterization.

Many quants would argue (including us among them), however, that there also needs to be a why.  Why does this factor work?  Without the why, we run the risk of glorified data mining.  With the why, we can choose for ourselves whether we believe the effect will continue going forward.

Of course, there is nothing that prevents the why from being pure narrative fallacy.  Perhaps we have simply weaved a story into a pattern of facts.

With price-to-book, one might argue we have done the exact opposite.  The effect, technically, remains statistically significant and yet plenty of ink has been spilled as to why it shouldn’t work in the future.

The question we must answer, then, is, “when does statistically significant apply and when does it not?”  How can we use it as a justification in one place and completely ignore it in others?

Furthermore, if we are going to rely on hundreds of years of data to establish significance, how can we determine when something is “broken” if the statistical evidence does not support it?

Price-to-book may very well be broken.  But that is not the point of this commentary.  The point is simply that the same tools we use to establish and defend factors may prevent us from tearing them down.

 

Page 13 of 21

Powered by WordPress & Theme by Anders Norén