This post is available as a PDF download here.
Summary
- Recent market volatility has caused many tactical models to make sudden and significant changes in their allocation profiles.
- Periods such as Q4 2018 highlight model specification risk: the sensitivity of a strategy’s performance to specific implementation decisions.
- We explore this idea with a case study, using the popular Dual Momentum GEM strategy and a variety of lookback horizons for portfolio formation.
- We demonstrate that the year-to-year performance difference can span hundreds, if not thousands, of basis points between the implementations.
- By simply diversifying across multiple implementations, we can dramatically reduce model specification risk and even potentially see improvements in realized metrics such as Sharpe ratio and maximum drawdown.
Introduction
Among do-it-yourself tactical investors, Gary Antonacci’s Dual Momentum is the strategy we tend to see implemented the most. The Dual Momentum approach is simple: by combining both relative momentum and absolute momentum (i.e. trend following), Dual Momentum seeks to rotate into areas of relative strength while preserving the flexibility to shift entirely to safety assets (e.g. short-term U.S. Treasury bills) during periods of pervasive, negative trends.
In our experience, the precise implementation of Dual Momentum tends to vary (with various bells-and-whistles applied) from practitioner to practitioner. The most popular benchmark model, however, is the Global Equities Momentum (“GEM”), with some variation of Dual Momentum Sector Rotation (“DMSR”) a close second.
Recently, we’ve spoken to several members in our extended community who have bemoaned the fact that Dual Momentum kept them mostly aggressively positioned in Q4 2018 and signaled a defensive shift at the beginning of January 2019, at which point the S&P 500 was already in a -14% drawdown (having peaked at over -19% on December 24th). Several DIYers even decided to override their signal in some capacity, either ignoring it entirely, waiting a few days for “confirmation,” or implementing some sort of “half-and-half” rule where they are taking a partially defensive stance.
Ignoring the fact that a decision to override a systematic model somewhat defeats the whole point of being systematic in the first place, this sort of behavior highlights another very important truth: there is a significant gap of risk that exists between the long-term supporting evidence of an investment style (e.g. momentum and trend) and the precise strategy we attempt to implement with (e.g. Dual Momentum GEM).
At Newfound, we call that gap model specification risk. There is significant evidence supporting both momentum and trend as quantitative styles, but the precise means by which we measure these concepts can lead to dramatically different portfolios and outcomes. When a portfolio’s returns are highly sensitive to its specification – i.e. slight variation in returns or model parameters lead to dramatically different return profiles – we label the strategy as fragile.
In this brief commentary, we will use the Global Equities Momentum (“GEM”) strategy as a case study in fragility.
Global Equities Momentum (“GEM”)
To implement the GEM strategy, an investor merely needs to follow the decision tree below at the end of each month.
From a practitioner stand-point, there are several attractive features about this model. First, it is based upon the long-run evidence of both trend-following and momentum. Second, it is very easy to model and generate signals for. Finally, it is fairly light-weight from an implementation perspective: only twelve potential rebalances a year (and often much less), with the portfolio only holding one ETF at a time.
Despite the evidence that “simple beats complex,” the simplicity of GEM belies its inherent fragility. Below we plot the equity curves for GEM implementations that employ different lookback horizons for measuring trend and momentum, ranging from 6- to 12-months.
Source: CSI Analytics. Calculations by Newfound Research. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees except for underlying ETF expense ratios. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
We can see a significant dispersion in potential terminal wealth. That dispersion, however, is not necessarily consistent with the notion that one formation period is inherently better than another. While we would argue, ex-ante, that there should be little performance difference between a 9-month and 10-month lookback – they both, after all, capture the notion of “intermediate-term trends” – the former returned just 43.1% over the period while the latter returned 146.1%.
These total return figures further hide the year-to-year disparity that exists. The 9-month model, for example, was not a consistent loser. Below we plot these results, highlighting both the best (blue) and worst (orange) performing specifications. We see that the yearly spread between these strategies can be hundreds-to-thousands of basis points; consider that in 2010, the strategy formed using a 10-month lookback returned 12.2% while the strategy formed using a 9-month lookback returned -9.31%.
Same thesis. Same strategy. Slightly different specification. Dramatically different outcomes. That single year is likely the difference between hired and fired for most advisors and asset managers.
Source: CSI Analytics. Calculations by Newfound Research. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees except for underlying ETF expense ratios. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
☞ Explore a diversified approach with the Newfound/ReSolve Robust Equity Momentum Index.
For those bemoaning their 2018 return, note that the 10-month specification would have netted a positive result! That specification turned defensive at the end of October.
Now, some may cry “foul” here. The evidence for trend and momentum is, after all, centuries in length and the efficacy of all these horizons is supported. Surely the noise we see over this ten-year period would average out over the long run, right?
The unfortunate reality is that these performance differences are not expected to mean-revert. The gambler’s fallacy would have us believe that bad luck in one year should be offset by good luck in another and vice versa. Unfortunately, this is not the case. While we would expect, at any given point in time, that each strategy has equal likelihood of experiencing good or bad luck going forward, that luck is expected to occur completely independently from what has happened in the past.
The implication is that performance differences due to model specification are not expected to mean-revert and are therefore expected to be random, but very permanent, return artifacts.1
The larger problem at hand is that none of us have a hundred years to invest. In reality, most investors have a few decades. And we act with the temperament of having just a few years. Therefore, bad luck can have very permanent and very scarring effects not only upon our psyche, but upon our realized wealth.
But consider what happens if we try to neutralize the role of model specification risk and luck by diversifying across the seven different models equally (rebalanced annually). We see that returns closer in line with the median result, a boost to realized Sharpe ratio, and a reduction in the maximum realized drawdown.
Source: CSI Analytics. Calculations by Newfound Research. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees except for underlying ETF expense ratios. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
These are impressive results given that all we employed was naïve diversification.
Conclusion
The odd thing about strategy diversification is that it guarantees we will be wrong. Each and every year, we will, by definition, allocate at least part of our capital to the worst performing strategy. The potential edge, however, is in being vaguely wrong rather than precisely wrong. The former is annoying. The latter can be catastrophic.
In this commentary we use the popular Dual Momentum GEM strategy as a case study to demonstrate how model specification choices can lead to performance differences that span hundreds, if not thousands, of basis points a year. Unfortunately, we should not expect these performance differences to mean revert. The realizations of good and bad luck are permanent, and potentially very significant, artifacts within our track records.
By simply diversifying across the different models, however, we can dramatically reduce specification risk and thereby reduce strategy fragility.
To be clear, no amount of diversification will protect you from the risk of the style. As we like to say, “risk cannot be destroyed, only transformed.” In that vein, trend following strategies will always incur some sort of whipsaw risk. The question is whether it is whipsaw related to the style as a whole or to the specific implementation.
For example, in the graphs above we can see that Dual Momentum GEM implemented with a 10-month formation period experienced whipsaw in 2011 when few of the other implementations did. This is more specification whipsaw than style whipsaw. On the other hand, we can see that almost all the specifications exhibited whipsaw in late 2015 and early 2016, an indication of style whipsaw, not specification whipsaw.
Specification risk we can attempt to control for; style risk is just something we have to bear.
At Newfound, evidence such as this informs our own trend-following mandates. We seek to diversify ourselves across the axes of what (“what are we investing in?”), how (“how are we making the decisions?”), and when (“when are we making those decisions?”) in an effort to reduce specification risk and provide the greatest style consistency possible.
- This can be more precisely analyzed using Augmented Dick-Fuller test to establish that the various pairs to not exhibit mean-reversionary behavior. A one-way ANOVA test can also be employed to establish that the various implementations have the same population mean. For the sake of brevity, we leave this as an exercise to the reader.