This post is available as a PDF download here.
Summary
- Recent market volatility has caused many tactical models to make sudden and significant changes in their allocation profiles.
- Periods such as Q4 2018 highlight model specification risk: the sensitivity of a strategy’s performance to specific implementation decisions.
- We explore this idea with a case study, using the popular Dual Momentum GEM strategy and a variety of lookback horizons for portfolio formation.
- We demonstrate that the year-to-year performance difference can span hundreds, if not thousands, of basis points between the implementations.
- By simply diversifying across multiple implementations, we can dramatically reduce model specification risk and even potentially see improvements in realized metrics such as Sharpe ratio and maximum drawdown.
Introduction
Among do-it-yourself tactical investors, Gary Antonacci’s Dual Momentum is the strategy we tend to see implemented the most. The Dual Momentum approach is simple: by combining both relative momentum and absolute momentum (i.e. trend following), Dual Momentum seeks to rotate into areas of relative strength while preserving the flexibility to shift entirely to safety assets (e.g. short-term U.S. Treasury bills) during periods of pervasive, negative trends.
In our experience, the precise implementation of Dual Momentum tends to vary (with various bells-and-whistles applied) from practitioner to practitioner. The most popular benchmark model, however, is the Global Equities Momentum (“GEM”), with some variation of Dual Momentum Sector Rotation (“DMSR”) a close second.
Recently, we’ve spoken to several members in our extended community who have bemoaned the fact that Dual Momentum kept them mostly aggressively positioned in Q4 2018 and signaled a defensive shift at the beginning of January 2019, at which point the S&P 500 was already in a -14% drawdown (having peaked at over -19% on December 24th). Several DIYers even decided to override their signal in some capacity, either ignoring it entirely, waiting a few days for “confirmation,” or implementing some sort of “half-and-half” rule where they are taking a partially defensive stance.
Ignoring the fact that a decision to override a systematic model somewhat defeats the whole point of being systematic in the first place, this sort of behavior highlights another very important truth: there is a significant gap of risk that exists between the long-term supporting evidence of an investment style (e.g. momentum and trend) and the precise strategy we attempt to implement with (e.g. Dual Momentum GEM).
At Newfound, we call that gap model specification risk. There is significant evidence supporting both momentum and trend as quantitative styles, but the precise means by which we measure these concepts can lead to dramatically different portfolios and outcomes. When a portfolio’s returns are highly sensitive to its specification – i.e. slight variation in returns or model parameters lead to dramatically different return profiles – we label the strategy as fragile.
In this brief commentary, we will use the Global Equities Momentum (“GEM”) strategy as a case study in fragility.
Global Equities Momentum (“GEM”)
To implement the GEM strategy, an investor merely needs to follow the decision tree below at the end of each month.
From a practitioner stand-point, there are several attractive features about this model. First, it is based upon the long-run evidence of both trend-following and momentum. Second, it is very easy to model and generate signals for. Finally, it is fairly light-weight from an implementation perspective: only twelve potential rebalances a year (and often much less), with the portfolio only holding one ETF at a time.
Despite the evidence that “simple beats complex,” the simplicity of GEM belies its inherent fragility. Below we plot the equity curves for GEM implementations that employ different lookback horizons for measuring trend and momentum, ranging from 6- to 12-months.
Source: CSI Analytics. Calculations by Newfound Research. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees except for underlying ETF expense ratios. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
We can see a significant dispersion in potential terminal wealth. That dispersion, however, is not necessarily consistent with the notion that one formation period is inherently better than another. While we would argue, ex-ante, that there should be little performance difference between a 9-month and 10-month lookback – they both, after all, capture the notion of “intermediate-term trends” – the former returned just 43.1% over the period while the latter returned 146.1%.
These total return figures further hide the year-to-year disparity that exists. The 9-month model, for example, was not a consistent loser. Below we plot these results, highlighting both the best (blue) and worst (orange) performing specifications. We see that the yearly spread between these strategies can be hundreds-to-thousands of basis points; consider that in 2010, the strategy formed using a 10-month lookback returned 12.2% while the strategy formed using a 9-month lookback returned -9.31%.
Same thesis. Same strategy. Slightly different specification. Dramatically different outcomes. That single year is likely the difference between hired and fired for most advisors and asset managers.
Source: CSI Analytics. Calculations by Newfound Research. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees except for underlying ETF expense ratios. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
☞ Explore a diversified approach with the Newfound/ReSolve Robust Equity Momentum Index.
For those bemoaning their 2018 return, note that the 10-month specification would have netted a positive result! That specification turned defensive at the end of October.
Now, some may cry “foul” here. The evidence for trend and momentum is, after all, centuries in length and the efficacy of all these horizons is supported. Surely the noise we see over this ten-year period would average out over the long run, right?
The unfortunate reality is that these performance differences are not expected to mean-revert. The gambler’s fallacy would have us believe that bad luck in one year should be offset by good luck in another and vice versa. Unfortunately, this is not the case. While we would expect, at any given point in time, that each strategy has equal likelihood of experiencing good or bad luck going forward, that luck is expected to occur completely independently from what has happened in the past.
The implication is that performance differences due to model specification are not expected to mean-revert and are therefore expected to be random, but very permanent, return artifacts.1
The larger problem at hand is that none of us have a hundred years to invest. In reality, most investors have a few decades. And we act with the temperament of having just a few years. Therefore, bad luck can have very permanent and very scarring effects not only upon our psyche, but upon our realized wealth.
But consider what happens if we try to neutralize the role of model specification risk and luck by diversifying across the seven different models equally (rebalanced annually). We see that returns closer in line with the median result, a boost to realized Sharpe ratio, and a reduction in the maximum realized drawdown.
Source: CSI Analytics. Calculations by Newfound Research. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees except for underlying ETF expense ratios. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
These are impressive results given that all we employed was naïve diversification.
Conclusion
The odd thing about strategy diversification is that it guarantees we will be wrong. Each and every year, we will, by definition, allocate at least part of our capital to the worst performing strategy. The potential edge, however, is in being vaguely wrong rather than precisely wrong. The former is annoying. The latter can be catastrophic.
In this commentary we use the popular Dual Momentum GEM strategy as a case study to demonstrate how model specification choices can lead to performance differences that span hundreds, if not thousands, of basis points a year. Unfortunately, we should not expect these performance differences to mean revert. The realizations of good and bad luck are permanent, and potentially very significant, artifacts within our track records.
By simply diversifying across the different models, however, we can dramatically reduce specification risk and thereby reduce strategy fragility.
To be clear, no amount of diversification will protect you from the risk of the style. As we like to say, “risk cannot be destroyed, only transformed.” In that vein, trend following strategies will always incur some sort of whipsaw risk. The question is whether it is whipsaw related to the style as a whole or to the specific implementation.
For example, in the graphs above we can see that Dual Momentum GEM implemented with a 10-month formation period experienced whipsaw in 2011 when few of the other implementations did. This is more specification whipsaw than style whipsaw. On the other hand, we can see that almost all the specifications exhibited whipsaw in late 2015 and early 2016, an indication of style whipsaw, not specification whipsaw.
Specification risk we can attempt to control for; style risk is just something we have to bear.
At Newfound, evidence such as this informs our own trend-following mandates. We seek to diversify ourselves across the axes of what (“what are we investing in?”), how (“how are we making the decisions?”), and when (“when are we making those decisions?”) in an effort to reduce specification risk and provide the greatest style consistency possible.
Is Multi-Manager Diversification Worth It?
By Corey Hoffstein
On January 7, 2019
In Popular, Portfolio Construction, Risk Management, Weekly Commentary
This post is available as a PDF download here.
Summary
Introduction
In their 2014 paper The Free Lunch Effect: The Value of Decoupling Diversification and Risk, Croce, Guinn, and Robinson draw a distinction between the risk reduction effects that occur due to de-risking and those that occur due to diversification benefits.
To illustrate the distinction, the authors compare the volatility of an all equity portfolio versus a balanced stock/bond mix. In the 1984-2014 sample period, they find that the all equity portfolio has an annualized volatility of 15.25% while the balanced portfolio has an annualized volatility of just 9.56%.
Over 75% of this reduction in volatility, however, is due simply to the fact that bonds were much less volatile than stocks over the period. In fact, of the 568-basis-point reduction, only 124 basis points was due to actual diversification benefits.
Why does this matter?
Because de-risking carries none of the benefits of diversification. If there is a commensurate trade-off between expected return and risk, then all we have done is reduced the expected return of our portfolio.1
It is only by combining assets of like volatility – and, it is assumed, like expected return – that should allow us to enjoy the free lunch of diversification.
Unfortunately, unless you are willing to apply leverage (e.g. risky parity), the reality of finding such free lunch opportunities across assets is limited. The classic example of inter-asset diversification, though, is taught in Finance 101: as we add more stocks to a portfolio, we drive the contribution of idiosyncratic volatility towards zero.
Yet volatility is only one way to measure risk. If we build a portfolio of 30 stocks and you build a portfolio of 30 stocks, the portfolios may have nearly identical levels of volatility, but we almost assuredly will end up with different realized results. This difference between the expected and the realized is captured by a measure known as terminal wealth dispersion, first introduced by Robert Radcliffe in his book Investment: Concepts, Analysis, Strategy.
This form of risk naturally arises when we select between investment managers. Two managers may both select securities from the same universe using the same investment thesis, but the realized results of their portfolios can be starkly different. In rare cases, the specific choice of one manager over another can even lead to catastrophic results.
The selection of a manager reflects not only an allocation to an asset class, but also reflects an allocation to a process. In this commentary, we ask: how much diversification benefit exists in process diversification?
The Theory Behind Manager Diversification
In Factors from Scratch, the research team at O’Shaughnessy Asset Management (OSAM), in partnership with anonymous blogger Jesse Livermore, digs into the driving elements behind value and momentum equity strategies.
They find that value stocks do tend to exhibit negative EPS growth, but this decay in fundamentals is offset by multiple expansion. In other words, markets do appear to correctly identify companies with contracting fundamentals, but they also exaggerate and over-extrapolate that weakness. The historical edge for the strategy has been that the re-rating – measured via multiple expansion – tends to overcompensate for the contraction in fundamentals.
For momentum, OSAM finds a somewhat opposite effect. The strategy correctly identifies companies with strengthening fundamentals, but during the holding period a valuation contraction occurs as the market recognizes that its outlook might have been too optimistic. Historically, however, the growth outweighed the contraction to create a net positive effect.
These are the true, underlying economic and behavioral effects that managers are trying to capture when they implement value and momentum strategies.
These are not, however, effects we can observe directly in the market; they are effects that we have to forecast. To do so, we have to utilize semi-noisy signals that we believe are correlated. Therefore, every manager’s strategy will be somewhat inefficient at capturing these effects.
For example, there are a number of quantitative measures we may apply in our attempt to identify value opportunities; e.g. price-to-book, price-to-earnings, and EBITDA-to-enterprise-value to name a few. Two different noisy signals might end up with different performance just due to randomness.
This noise between signals is further compounded when we consider all the other decisions that must be made in the portfolio construction process. Two managers may use the same signals and still end up with very different portfolios based upon how the signals are translated into allocations.
Consider this: Morningstar currently2 lists 1,217 large-cap value funds in its mutual fund universe and trailing 1-year returns ranged from 1.91% to -22.90%. This is not just a case of extreme outliers, either: the spread between the 10th and 90thpercentile returning funds was 871 basis points.
It bears repeating that these are funds that, in theory, are all trying to achieve the same goal: large-cap value exposure.
Yet this result is not wholly surprising to us. In Separating Ingredients and Recipe in Factor Investing we demonstrated that the performance dispersion between different momentum strategy definitions (e.g. momentum measure, look-back length, rebalance frequency, weighting scheme, et cetera) was larger than the performance dispersion between the traditional Fama-French factors themselves in 90% of rolling 1-year periods. As it turns out, intra-factor differences can cause greater dispersion than inter-factor differences.
Without an ex-ante view as to the superiority of one signal, one process, or one fund versus another, it seems prudent for a portfolio to have diversified exposure to a broad range of signals that seem plausibly related to the underlying phenomenon.
Literature Review
While foundational literature on modern portfolio diversification extends back to the 1950s, little has been written in the field of manager diversification. While it is a well-established teaching that a portfolio of 25-40 stocks is typically sufficient to reduce idiosyncratic risk, there is no matching rule for how many managers to combine together.
One of the earliest articles on the topic was written by Edward O’Neal in 1997, titled How Many Mutual Funds Constitute a Diversified Mutual Fund Portfolio?
Published in the Financial Analysts Journal, this article explores risk across two different dimensions: the volatility of returns over time and the dispersion in terminal period wealth. Again, the idea behind the latter measure is that two investors with identical horizons and different investments will achieve different terminal wealth levels, even if those investments have the same volatility.
Exploring equity mutual fund returns from 1986 to 1997, the study adopts a simulation-based approach to constructing portfolios and tracking returns. Multi-manager portfolios of varying sizes are randomly constructed and compared against other multi-manager portfolios of the same size.
O’Neal finds that while combining managers has little-to-no effect on volatility (manager returns were too homogenous), it had a significant effect upon the dispersion of terminal wealth. To quote the article,
Allocating to three managers instead of just one could reduce the dispersion in terminal wealth by nearly 50%, an effect found to be quite consistent across the different time horizons measured.
In 1999, O’Neal teamed up with L. Franklin Fant to publish Do You Need More than One Manager for a Given Equity Style? Adopting a similar simulation-based approach, Fant and O’Neal explored multi-manager equity portfolios in the context of the style-box framework.
And, as before, they find that taking a multi-manager approach has little effect upon portfolio volatility.
It does, however, again prove to have a significant effect on the deviation in terminal wealth.
To quote the paper,
In 2002, François-Serge Lhabitant and Michelle Learned pursued a similar vein of research in the realm of hedge funds in their article Hedge Fund Diversification: How Much is Enough? They employ the same simulation-based approach but evaluate diversification effects within the different hedge fund styles.
They find that while diversification does little to affect the expected return for a given style, it does appear to help reduce portfolio volatility: sometimes quite significantly so. This somewhat contradictory result to the prior research is likely due to the fact that hedge funds within a given category exhibit far more heterogeneity in process and returns than do equity managers in the same style box.
(Note that while the graphs below only show the period 1990-1993, the paper explores three time periods: 1990-1993, 1994-1997, and 1998-2001 and finds a similar conclusion in all three).
Perhaps most importantly, however, they find a rather significant reduction in risk characteristics like a portfolio’s realized maximum drawdown.
To quote the article,
Taken together, this literature paints an important picture:
But why is minimizing “the dispersion of terminal wealth” important? The answer is the same reason why we diversify in the first place: risk management.
The potential for high dispersion in terminal wealth means that we can have dramatically different outcomes based upon the choices we are making, placing significant emphasis on our skill in manager selection. Choosing just one manager is more right style thinking rather than our preferred less wrong.
But What About Dilution?
The number one response we hear when we talk about manager diversification is: “when we combine managers, won’t we just dilute our exposure back to the market?”
The answer, as with all things, is: “it depends.” For the sake of brevity, we’re just going to leave it with, “no.”
No?
No.
If we identify three managers as providing exposure to value, then it makes little logical sense that somehow a combination of them would suddenly remove that exposure. Subtraction through addition only works if there is a negative involved; i.e. one of the managers would have to provide anti-value exposure to offset the others.
Remember that an active manager’s portfolio can always be decomposed into two pieces: the benchmark and a dollar-neutral long/short portfolio that isolates the active over/under-weights that manager has made.
To “dilute back to the benchmark,” we’d have to identify managers and then weight them such that all of their over/under-weights net out to equal zero.
Candidly, we’d be impressed if you managed to do that. Especially if you combine managers within the same style who should all be, at least directionally, taking similar bets. The dilution that occurs is only across those bets which they disagree on and therefore reflect the idiosyncrasies of their specific process.
What a multi-manager implementation allows us to diversify is our selection risk, leading to a return profile more “in-line” with a given style or category. In fact, Lhabitant and Learned (2002) demonstrated this exact notion with a graph that plots the correlation of multi-manager portfolios with their broad category. While somewhat tautological, an increase in manager diversification leads to a return profile closer to the given style than to the idiosyncrasies of those managers.
We can also see this with a practical example. Below we take several available ETFs that implement quantitative value strategies and plot their rolling 52-week return relative to the S&P 500. We also construct a multi-manager index (“MM_IDX”) that is a naïve, equal-weight portfolio. The only wrinkle to this portfolio is that ETFs are not introduced immediately, but rather slowly over a 12-month period.3
Source: CSI Analytics. Calculations by Newfound Research. It is not possible to invest in an index. Returns are total returns (i.e. assume the reinvestment of all distributions) and are gross of all fees except for underlying expense ratios of ETFs. Past performance does not guarantee future results.
We can see that while the multi-manager blend is never the best performing strategy, it is also never the worst. Never the hero; never a zero.
It should be noted that while manager diversification may be able to reduce the idiosyncratic returns that result from process differences, it will not prevent losses (or relative underperformance) of the underlying style itself. In other words, we might avoid the full brunt of losses specific to the Sequoia Fund, but no amount of diversification would prevent the relative drag seen by the quantitative value style in general over the last decade.
We can see this in the graph above by the fact that all the lines generally tend to move together. 2015 was bad for value managers. 2016 was much better. But we can also see that every once in a while, a specific implementation will hit a rough patch that is idiosyncratic to that approach; e.g. IWD in 2017 and most of 2018.
Multi-manager diversification is the tool that allows us to avoid the full brunt of this risk.
Conclusion
Taken together, the research behind manager diversification suggests:
For advisors and investors, this evidence may cause a sigh of relief. Instead of having to spend time trying to identify the best manager or the best process, there may be significant advantages to simply “avoiding the brain damage”4 and allocating equally among a few. In other words, if you don’t know which low-volatility ETF to pick, just buy a couple and move on with your life.
But what are the cons?
For investment managers, a natural interpretation of this research is that approaches blending different signals and portfolio construction methods together should lead to more consistent outcomes. It should be no surprise, then, that asset managers adopting machine learning are finding significant advantages with ensemble techniques. After all, they invoke the low-hanging fruit of manager diversification.
Perhaps most interesting is that this research suggests that fund-of-funds really are not such bad ideas so long as costs can be kept under control. As the asset management business continues to be more competitive, perhaps there is an edge – and a better client result – found in cooperation.