This post is available as a PDF download here.
Summary
- Value investing continues to experience a trough of sorrow. In particular, the traditional price-to-book factor has failed to establish new highs since December 2006 and sits in a 25% drawdown.
- While price-to-book has been the academic measure of choice for 25+ years, many practitioners have begun to question its value (pun intended).
- We have also witnessed the turning of the tides against the size premium, with many practitioners no longer considering it to be a valid stand-alone anomaly. This comes 35+ years after being first published.
- With this in mind, we explore the evidence that would be required for us to dismiss other, already established anomalies. Using past returns to establish prior beliefs, we simulate out forward environments and use Bayesian inference to adjust our beliefs over time, recording how long it would take for us to finally dismiss a factor.
- We find that for most factors, we would have to live through several careers to finally witness enough evidence to dismiss them outright.
- Thus, while factors may be established upon a foundation of evidence, their forward use requires a bit of faith.
In Norse mythology, Fimbulvetr (commonly referred to in English as “Fimbulwinter”) is a great and seemingly never-ending winter. It continues for three seasons – long, horribly cold years that stretch on longer than normal – with no intervening summers. It is a time of bitterly cold, sunless days where hope is abandoned and discord reigns.
This winter-to-end-all-winters is eventually punctuated by Ragnarok, a series of events leading up to a great battle that results in the ultimate death of the major gods, destruction of the cosmos, and subsequent rebirth of the world.
Investment mythology is littered with Ragnarok-styled blow-ups and we often assume the failure of a strategy will manifest as sudden catastrophe. In most cases, however, failure may more likely resemble Fimbulwinter: a seemingly never-ending winter in performance with returns blown to-and-fro by the harsh winds of randomness.
Value investors can attest to this. In particular, the disciples of price-to-book have suffered greatly as of late, with “expensive” stocks having outperformed “cheap” stocks for over a decade. The academic interpretation of the factor sits nearly 25% belowits prior high-water mark seen in December 2006.
Expectedly, a large number of articles have been written about the death of the value factor. Some question the factor itself, while others simply argue that price-to-book is a broken implementation.
But are these simply retrospective narratives, driven by a desire to have an explanation for a result that has defied our expectations? Consider: if price-to-book had exhibited positive returns over the last decade, would we be hearing from nearly as large a number of investors explaining why it is no longer a relevant metric?
To be clear, we believe that many of the arguments proposed for why price-to-book is no longer a relevant metric are quite sound. The team at O’Shaughnessy Asset Management, for example, wrote a particularly compelling piece that explores how changes to accounting rules have led book value to become a less relevant metric in recent decades.1
Nevertheless, we think it is worth taking a step back, considering an alternate course of history, and asking ourselves how it would impact our current thinking. Often, we look back on history as if it were the obvious course. “If only we had better prior information,” we say to ourselves, “we would have predicted the path!”2 Rather, we find it more useful to look at the past as just one realized path of many that’s that could have happened, none of which were preordained. Randomness happens.
With this line of thinking, the poor performance of price-to-book can just as easily be explained by a poor roll of the dice as it can be by a fundamental break in applicability. In fact, we see several potential truths based upon performance over the last decade:
- This is all normal course performance variance for the factor.
- The value factor works, but the price-to-book measure itself is broken.
- The price-to-book measure is over-crowded in use, and thus the “troughs of sorrow” will need to be deeper than ever to get weak hands to fold and pass the alpha to those with the fortitude to hold.
- The value factor never existed in the first place; it was an unfortunate false positive that saturated the investing literature and broad narrative.
The problem at hand is two-fold: (1) the statistical evidence supporting most factors is considerable and (2) the decade-to-decade variance in factor performance is substantial. Taken together, you run into a situation where a mere decade of underperformance likely cannot undue the previously established significance. Just as frustrating is the opposite scenario. Consider that these two statements are not mutually exclusive: (1) price-to-book is broken, and (2) price-to-book generates positive excess return over the next decade.
In investing, factor return variance is large enough that the proof is not in the eating of the short-term return pudding.
The small-cap premium is an excellent example of the difficulty in discerning, in real time, the integrity of an established factor. The anomaly has failed to establish a meaningful new high since it was originally published in 1981. Only in the last decade – nearly 30 years later – have the tides of the industry finally seemed to turn against it as an established anomaly and potential source of excess return.
Thirty years.
The remaining broadly accepted factors – e.g. value, momentum, carry, defensive, and trend – have all been demonstrated to generate excess risk-adjusted returns across a variety of economic regimes, geographies, and asset classes, creating a great depth of evidence supporting their existence. What evidence, then, would make us abandon faith from the Church of Factors?
To explore this question, we ran a simple experiment for each factor. Our goal was to estimate how long it would take to determine that a factor was no longer statistically significant.
Our assumption is that the salient features of each factor’s return pattern will remain the same (i.e. autocorrelation, conditional heteroskedasticity, skewness, kurtosis, et cetera), but the forward average annualized return will be zero since the factor no longer “works.”
Towards this end, we ran the following experiment:
- Take the full history for the factor and calculate prior estimates for mean annualized return and standard error of the mean.
- De-mean the time-series.
- Randomly select a 12-month chunk of returns from the time series and use the data to perform a Bayesian update to our mean annualized return.
- Repeat step 3 until the annualized return is no longer statistically non-zero at a 99% confidence threshold.
For each factor, we ran this test 10,000 times, creating a distribution that tells us how many years into the future we would have to wait until we were certain, from a statistical perspective, that the factor is no longer significant.
Sixty-seven years.
Based upon this experience, sixty-seven years is median number of years we will have to wait until we officially declare price-to-book (“HML,” as it is known in the literature) to be dead.3 At the risk of being morbid, we’re far more likely to die before the industry finally sticks a fork in price-to-book.
We perform this experiment for a number of other factors – including size (“SMB” – “small-minus-big”), quality (“QMJ” – “quality-minus-junk”), low-volatility (“BAB” – “betting-against-beta”), and momentum (“UMD” – “up-minus-down”) – and see much the same result. It will take decades before sufficient evidence mounts to dethrone these factors.
HML | SMB4 | QMJ | BAB | UMD | |
Median Years-until-Failure | 67 | 43 | 132 | 284 | 339 |
Now, it is worth pointing out that these figures for a factor like momentum (“UMD”) might be a bit skewed due to the design of the test. If we examine the long-run returns, we see a fairly docile return profile punctuated by sudden and significant drawdowns (often called “momentum crashes”).
Since a large proportion of the cumulative losses are contained in these short but pronounced drawdown periods, demeaning the time-series ultimately means that the majority of 12-month periods actually exhibit positive returns. In other words, by selecting random 12-month samples, we actually expect a high frequency of those samples to have a positive return.
For example, using this process, 49.1%, 47.6%, 46.7%, 48.8% of rolling 12-month periods are positive for HML, SMB, QMJ, and BAB factors respectively. For UMD, that number is 54.7%. Furthermore, if you drop the worst 5% of rolling 12-month periods for UMD, the average positive period is 1.4x larger than the average negative period. Taken together, not only are you more likely to select a positive 12-month period, but those positive periods are, on average, 1.4x larger than the negative periods you will pick, except for the rare (<5%) cases.
The process of the test was selected to incorporate the salient features of each factor. However, in the case of momentum, it may lead to somewhat outlandish results.
Conclusion
While an evidence-based investor should be swayed by the weight of the data, the simple fact is that most factors are so well established that the majority of current practitioners will likely go our entire careers without experiencing evidence substantial enough to dismiss any of the anomalies.
Therefore, in many ways, there is a certain faith required to use them going forward. Yes, these are ideas and concepts derived from the data. Yes, we have done our best to test their robustness out-of-sample across time, geographies, and asset classes. Yet we must also admit that there is a non-zero probability, however small it is, that these are false positives: a fact we may not have sufficient evidence to address until several decades hence.
And so a bit of humility is warranted. Factors will not suddenly stand up and declare themselves broken. And those that are broken will still appear to work from time-to-time.
Indeed, the death of a factor will be more Fimulwinter than Ragnarok: not so violent to be the end of days, but enough to cause pain and frustration among investors.
Addendum
We have received a large number of inbound notes about this commentary, which fall upon two primary lines of questions. We want to address these points.
How were the tests impacted by the Bayesian inference process?
The results of the tests within this commentary are rather astounding. We did seek to address some of the potential flaws of the methodology we employed, but by-in-large we feel the overarching conclusion remains on a solid foundation.
While we only presented the results of the Bayesian inference approach in this commentary, as a check we actually tested two other approaches:
- A Bayesian inference approach assuming that forward returns would be a random walk with constant variance (based upon historical variance) and zero mean.
- Forward returns were simulated using the same bootstrap approach, but the factor was being discovered for the first time and the entire history was being evaluated for its significance.
The two tests were in effort to isolate the effects of the different components of our test.
What we found was that while the reported figures changed, the overall magnitude did not. In other words, the median death-date of HML may not have been 67 years, but the order of magnitude remained much the same: decades.
Stepping back, these results were somewhat a foregone conclusion. We would not expect an effect that has been determined to be statistically significant over a hundred year period to unravel in a few years. Furthermore, we would expect a number of scenarios that continue to bolster the statistical strength just due to randomness alone.
Why are we defending price-to-book?
The point of this commentary was not to defend price-to-book as a measure. Rather, it was to bring up a larger point.
As a community, quantitative investors often leverage statistical significance as a defense for the way we invest.
We think that is a good thing. We should look at the weight of the evidence. We should be data driven. We should try to find ideas that have proven to be robust over decades of time and when applied in different markets or with different asset classes. We should want to find strategies that are robust to small changes in parameterization.
Many quants would argue (including us among them), however, that there also needs to be a why. Why does this factor work? Without the why, we run the risk of glorified data mining. With the why, we can choose for ourselves whether we believe the effect will continue going forward.
Of course, there is nothing that prevents the why from being pure narrative fallacy. Perhaps we have simply weaved a story into a pattern of facts.
With price-to-book, one might argue we have done the exact opposite. The effect, technically, remains statistically significant and yet plenty of ink has been spilled as to why it shouldn’t work in the future.
The question we must answer, then, is, “when does statistically significant apply and when does it not?” How can we use it as a justification in one place and completely ignore it in others?
Furthermore, if we are going to rely on hundreds of years of data to establish significance, how can we determine when something is “broken” if the statistical evidence does not support it?
Price-to-book may very well be broken. But that is not the point of this commentary. The point is simply that the same tools we use to establish and defend factors may prevent us from tearing them down.
Risk-Adjusted Momentum: A Momentum and Low-Volatility Barbell?
By Corey Hoffstein
On October 21, 2019
In Defensive, Momentum, Weekly Commentary
This post is available as a PDF download here.
Summary
A research note recently crossed my desk that aimed to undress the post-Global Financial Crisis (GFC) performance of the momentum factor in U.S. equities. Not only have we witnessed a significant reduction in the factor’s return, but the majority of the return has been generated by the short side of the strategy, which can be more difficult for long-only investors to access.
Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. The Long (Alpha) strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on 12-1 month momentum and shorts an equal-weight S&P 500 portfolio. The Short (Alpha) strategy is a monthly rebalanced portfolio that goes long an equal-weight S&P 500 portfolio and shorts, in equal weight, the bottom 50 securities in the S&P 500 ranked on 12-1 month momentum.
The note makes the narratively-appealing argument that the back-to-back recessions of the dot-com bubble and the Great Financial Crisis amplified investor risk aversion to downside losses. The proposed evidence of this fact is the correlation of the cumulative alpha generated from shorting low momentum stocks and the cumulative alpha generated from shorting high volatility stocks.
While correlation does not imply causation, one argument might be that in a heightened period of risk aversion, investors may consistently punish higher risk stocks, causing them to become persistent losers. Or, conversely, losers may be rapidly sold, creating both persistence and high levels of volatility. We can arguably see this in the convergence of holdings in low momentum and high volatility stocks during “risk off” regimes.
Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. The HI VOL (Alpha) strategy is a monthly rebalanced portfolio that goes long an equal-weight S&P 500 portfolio and shorts, in equal weight, the bottom 50 securities in the S&P 500 ranked on trailing 252-day realized volatility. The LO MOM (Alpha) strategy is a monthly rebalanced portfolio that goes long an equal-weight S&P 500 portfolio and shorts, in equal weight, the bottom 50 securities in the S&P 500 ranked on 12-1 month momentum.
Given these facts, we would expect long-only momentum investors to have harvested little out-performance in recent years. Yet we find that the popular iShares Momentum ETF (MTUM) has out-performed the S&P 500 by 290 basis points per year since its inception in 2013.
The answer to this conundrum, as proposed by the research note, is that MTUM’s use of risk-adjusted momentum is the key.
If we think of risk-adjusted momentum as simply momentum divided by volatility (which is how MTUM defines it), we might interpret it as an integrated signal of both the momentum and low-volatility factors. Therefore, risk-adjusting creates a multi-factor portfolio that tilts away from high volatility stocks.
And hence the out-performance.
Except if we actually create a risk-adjusted momentum portfolio, that does not appear to really be the case at all.
Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. The alpha of the risk-adjusted momentum strategy is defined as the return of a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on risk-adjusted momentum (12-1 month momentum divided by 252-day realized volatility) and shorts an equal-weight S&P 500 portfolio.
To be fair, MTUM’s construction methodology differs quite a bit from that employed herein. We are simply equally-weighting the top 50 stocks in the S&P 500 when ranked by risk-adjusted momentum, whereas MTUM uses a blend of 6- and 12-month risk-adjusted momentum scores and then tilts market-capitalization weights based upon those scores.
Nevertheless, if we look at actual holdings overlap over time of our Risk-Adjusted Momentum portfolio versus Momentum and Low Volatility portfolios, not only do we see persistently higher overlap with the Momentum portfolio, but we see fairly low average overlap with the Low Volatility portfolio.
For the latter point, it is worth first anchoring ourselves to the standard overlap between Momentum and Low Volatility (green line below). While we can see that the Risk-Adjusted Momentum portfolio does indeed have a higher average overlap with Low Volatility than does the Momentum portfolio, the excess tilt to Low Volatility due to the use of risk-adjusted momentum (i.e. the orange line minus the green line) appears rather small. In fact, on average, it is just 10%.
Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. The risk-adjusted momentum strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on risk-adjusted momentum (12-1 month momentum divided by 252-day realized volatility). The momentum strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on 12-1 month momentum. The low volatility strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on trailing 252-day realized volatility.
This is further evident by looking at the actual returns of the strategies themselves:
Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. The risk-adjusted momentum strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on risk-adjusted momentum (12-1 month momentum divided by 252-day realized volatility). The momentum strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on 12-1 month momentum. The low volatility strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on trailing 252-day realized volatility.
The Risk-Adjusted Momentum portfolio performance tracks that of the Momentum portfolio very closely.
As it turns out, the step of adjusting for risk creates far less of a low volatility factor tilt in our top-decile portfolio than one might initially suspect. (Or, at least, I’ll speak for myself: it created far less of a tilt than I expected.)
To understand this point, we will first re-write our risk-adjusted momentum signal as:
While trivial algebra, re-writing risk-adjusted momentum as the product of momentum and inverse volatility is informative to understanding why risk-adjusted momentum appears to load much more heavily on momentum than low volatility.
At a given point in time, it would appear as if Momentum and Low Volatility should have an equal influence on the rank of a given security. However, we need to dig a level deeper and consider how changes in these variables impact change in risk-adjusted momentum.
Fortunately, the product makes this a trivial exercise: holding INVVOL constant, changes in MOM are scaled by INVVOL and vice versa. This scaling effect can cause large changes in risk-adjusted momentum – and therefore ordinal ranking – particularly as MOM crosses the zero level.
Consider a trivial example where INVVOL is a very large number (e.g. 20) due to a security having a very low volatility profile (e.g. 5%). This would appear, at first glance, to give a security a structural advantage and hence create a low volatility tilt in the portfolio. However, a move from positive prior returns to negative prior returns would shift the security from ranking among the best to ranking among the worst in risk-adjusted momentum.1
A first order estimate of change in risk-adjusted momentum is:
So which term ultimately has more influence on the change in scores over time?
To get a sense of relative scale, we plot the cross-sectional mean absolute difference between the two terms over time. This should, at least partially, capture interaction effects between the two terms.
Source: Sharadar. Calculations by Newfound Research.
We can see that the term including the change in MOM has a much more significant influence on changes in risk-adjusted momentum than changes in INVVOL do. Thus, we might expect a portfolio driven entirely by changes in momentum to share more in common with our risk-adjusted momentum portfolio than one driven entirely by changes in volatility.
This is somewhat evident when we plot the return of MTUM versus our top 50 style portfolios. The correlation of daily returns between MTUM and our Momentum, Low Volatility, and Risk-Adjusted Momentum portfolios is 0.93, 0.72, and 0.93 respectively, further suggesting that MTUM is driven more by momentum than volatility.
Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. The risk-adjusted momentum strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on risk-adjusted momentum (12-1 month momentum divided by 252-day realized volatility). The momentum strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on 12-1 month momentum. The low volatility strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on trailing 252-day realized volatility.
This is only one part of the equation, however, as it is possible that changes to the risk-adjusted momentum score are so small – despite being largely driven by momentum – that relative rankings never actually change. Or, because we have constructed our portfolios by choosing only the top 50 ranked securities, that momentum does drive the majority of change across the entire universe, but the top 50 are always structurally advantaged by the non-linear scaling of low volatility.
To create a more accurate picture, we can rank-weight the entire S&P 500 and evaluate the holdings overlap over time.
Source: Sharadar. Calculations by Newfound Research.
Note that by now including all securities, and not just selecting the top 50, the overlap with both the Momentum and Low Volatility portfolios naturally appears higher on average. Nonetheless, we can see that the overlap with the Momentum portfolio is consistently higher than that of the Low Volatility portfolio, again suggesting that momentum has a larger influence on the overall portfolio composition than volatility does.
Conclusion
Without much deep thought, it would be easy to assume that a risk-adjusted momentum measure – i.e. prior returns divided by realized volatility – would tilt a portfolio towards both prior winners and low-volatility securities, resulting in a momentum / low-volatility barbell.
Upon deeper consideration, however, the picture complicates quickly. For example, momentum can be both positive and negative; dividing by volatility creates a non-linear impact; and momentum tends to change more rapidly than volatility.
We do not attempt to derive a precise, analytical equation that determines which of the two variables ultimately drives portfolio composition, but we do construct long-only example portfolios for empirical study. We find that a high-concentration risk-adjusted momentum portfolio has significantly more overlap in holdings with a traditional momentum portfolio than a low-volatility portfolio, resulting in a more highly correlated return stream.
The most important takeaway from this note is that intuition can be deceiving: it is important to empirically test our assumptions to ensure we truly understand the impact of our strategy construction choices.