This post is available as a PDF download here.
Summary
- Value investing continues to experience a trough of sorrow. In particular, the traditional price-to-book factor has failed to establish new highs since December 2006 and sits in a 25% drawdown.
- While price-to-book has been the academic measure of choice for 25+ years, many practitioners have begun to question its value (pun intended).
- We have also witnessed the turning of the tides against the size premium, with many practitioners no longer considering it to be a valid stand-alone anomaly. This comes 35+ years after being first published.
- With this in mind, we explore the evidence that would be required for us to dismiss other, already established anomalies. Using past returns to establish prior beliefs, we simulate out forward environments and use Bayesian inference to adjust our beliefs over time, recording how long it would take for us to finally dismiss a factor.
- We find that for most factors, we would have to live through several careers to finally witness enough evidence to dismiss them outright.
- Thus, while factors may be established upon a foundation of evidence, their forward use requires a bit of faith.
In Norse mythology, Fimbulvetr (commonly referred to in English as “Fimbulwinter”) is a great and seemingly never-ending winter. It continues for three seasons – long, horribly cold years that stretch on longer than normal – with no intervening summers. It is a time of bitterly cold, sunless days where hope is abandoned and discord reigns.
This winter-to-end-all-winters is eventually punctuated by Ragnarok, a series of events leading up to a great battle that results in the ultimate death of the major gods, destruction of the cosmos, and subsequent rebirth of the world.
Investment mythology is littered with Ragnarok-styled blow-ups and we often assume the failure of a strategy will manifest as sudden catastrophe. In most cases, however, failure may more likely resemble Fimbulwinter: a seemingly never-ending winter in performance with returns blown to-and-fro by the harsh winds of randomness.
Value investors can attest to this. In particular, the disciples of price-to-book have suffered greatly as of late, with “expensive” stocks having outperformed “cheap” stocks for over a decade. The academic interpretation of the factor sits nearly 25% belowits prior high-water mark seen in December 2006.
Expectedly, a large number of articles have been written about the death of the value factor. Some question the factor itself, while others simply argue that price-to-book is a broken implementation.
But are these simply retrospective narratives, driven by a desire to have an explanation for a result that has defied our expectations? Consider: if price-to-book had exhibited positive returns over the last decade, would we be hearing from nearly as large a number of investors explaining why it is no longer a relevant metric?
To be clear, we believe that many of the arguments proposed for why price-to-book is no longer a relevant metric are quite sound. The team at O’Shaughnessy Asset Management, for example, wrote a particularly compelling piece that explores how changes to accounting rules have led book value to become a less relevant metric in recent decades.1
Nevertheless, we think it is worth taking a step back, considering an alternate course of history, and asking ourselves how it would impact our current thinking. Often, we look back on history as if it were the obvious course. “If only we had better prior information,” we say to ourselves, “we would have predicted the path!”2 Rather, we find it more useful to look at the past as just one realized path of many that’s that could have happened, none of which were preordained. Randomness happens.
With this line of thinking, the poor performance of price-to-book can just as easily be explained by a poor roll of the dice as it can be by a fundamental break in applicability. In fact, we see several potential truths based upon performance over the last decade:
- This is all normal course performance variance for the factor.
- The value factor works, but the price-to-book measure itself is broken.
- The price-to-book measure is over-crowded in use, and thus the “troughs of sorrow” will need to be deeper than ever to get weak hands to fold and pass the alpha to those with the fortitude to hold.
- The value factor never existed in the first place; it was an unfortunate false positive that saturated the investing literature and broad narrative.
The problem at hand is two-fold: (1) the statistical evidence supporting most factors is considerable and (2) the decade-to-decade variance in factor performance is substantial. Taken together, you run into a situation where a mere decade of underperformance likely cannot undue the previously established significance. Just as frustrating is the opposite scenario. Consider that these two statements are not mutually exclusive: (1) price-to-book is broken, and (2) price-to-book generates positive excess return over the next decade.
In investing, factor return variance is large enough that the proof is not in the eating of the short-term return pudding.
The small-cap premium is an excellent example of the difficulty in discerning, in real time, the integrity of an established factor. The anomaly has failed to establish a meaningful new high since it was originally published in 1981. Only in the last decade – nearly 30 years later – have the tides of the industry finally seemed to turn against it as an established anomaly and potential source of excess return.
Thirty years.
The remaining broadly accepted factors – e.g. value, momentum, carry, defensive, and trend – have all been demonstrated to generate excess risk-adjusted returns across a variety of economic regimes, geographies, and asset classes, creating a great depth of evidence supporting their existence. What evidence, then, would make us abandon faith from the Church of Factors?
To explore this question, we ran a simple experiment for each factor. Our goal was to estimate how long it would take to determine that a factor was no longer statistically significant.
Our assumption is that the salient features of each factor’s return pattern will remain the same (i.e. autocorrelation, conditional heteroskedasticity, skewness, kurtosis, et cetera), but the forward average annualized return will be zero since the factor no longer “works.”
Towards this end, we ran the following experiment:
- Take the full history for the factor and calculate prior estimates for mean annualized return and standard error of the mean.
- De-mean the time-series.
- Randomly select a 12-month chunk of returns from the time series and use the data to perform a Bayesian update to our mean annualized return.
- Repeat step 3 until the annualized return is no longer statistically non-zero at a 99% confidence threshold.
For each factor, we ran this test 10,000 times, creating a distribution that tells us how many years into the future we would have to wait until we were certain, from a statistical perspective, that the factor is no longer significant.
Sixty-seven years.
Based upon this experience, sixty-seven years is median number of years we will have to wait until we officially declare price-to-book (“HML,” as it is known in the literature) to be dead.3 At the risk of being morbid, we’re far more likely to die before the industry finally sticks a fork in price-to-book.
We perform this experiment for a number of other factors – including size (“SMB” – “small-minus-big”), quality (“QMJ” – “quality-minus-junk”), low-volatility (“BAB” – “betting-against-beta”), and momentum (“UMD” – “up-minus-down”) – and see much the same result. It will take decades before sufficient evidence mounts to dethrone these factors.
HML | SMB4 | QMJ | BAB | UMD | |
Median Years-until-Failure | 67 | 43 | 132 | 284 | 339 |
Now, it is worth pointing out that these figures for a factor like momentum (“UMD”) might be a bit skewed due to the design of the test. If we examine the long-run returns, we see a fairly docile return profile punctuated by sudden and significant drawdowns (often called “momentum crashes”).
Since a large proportion of the cumulative losses are contained in these short but pronounced drawdown periods, demeaning the time-series ultimately means that the majority of 12-month periods actually exhibit positive returns. In other words, by selecting random 12-month samples, we actually expect a high frequency of those samples to have a positive return.
For example, using this process, 49.1%, 47.6%, 46.7%, 48.8% of rolling 12-month periods are positive for HML, SMB, QMJ, and BAB factors respectively. For UMD, that number is 54.7%. Furthermore, if you drop the worst 5% of rolling 12-month periods for UMD, the average positive period is 1.4x larger than the average negative period. Taken together, not only are you more likely to select a positive 12-month period, but those positive periods are, on average, 1.4x larger than the negative periods you will pick, except for the rare (<5%) cases.
The process of the test was selected to incorporate the salient features of each factor. However, in the case of momentum, it may lead to somewhat outlandish results.
Conclusion
While an evidence-based investor should be swayed by the weight of the data, the simple fact is that most factors are so well established that the majority of current practitioners will likely go our entire careers without experiencing evidence substantial enough to dismiss any of the anomalies.
Therefore, in many ways, there is a certain faith required to use them going forward. Yes, these are ideas and concepts derived from the data. Yes, we have done our best to test their robustness out-of-sample across time, geographies, and asset classes. Yet we must also admit that there is a non-zero probability, however small it is, that these are false positives: a fact we may not have sufficient evidence to address until several decades hence.
And so a bit of humility is warranted. Factors will not suddenly stand up and declare themselves broken. And those that are broken will still appear to work from time-to-time.
Indeed, the death of a factor will be more Fimulwinter than Ragnarok: not so violent to be the end of days, but enough to cause pain and frustration among investors.
Addendum
We have received a large number of inbound notes about this commentary, which fall upon two primary lines of questions. We want to address these points.
How were the tests impacted by the Bayesian inference process?
The results of the tests within this commentary are rather astounding. We did seek to address some of the potential flaws of the methodology we employed, but by-in-large we feel the overarching conclusion remains on a solid foundation.
While we only presented the results of the Bayesian inference approach in this commentary, as a check we actually tested two other approaches:
- A Bayesian inference approach assuming that forward returns would be a random walk with constant variance (based upon historical variance) and zero mean.
- Forward returns were simulated using the same bootstrap approach, but the factor was being discovered for the first time and the entire history was being evaluated for its significance.
The two tests were in effort to isolate the effects of the different components of our test.
What we found was that while the reported figures changed, the overall magnitude did not. In other words, the median death-date of HML may not have been 67 years, but the order of magnitude remained much the same: decades.
Stepping back, these results were somewhat a foregone conclusion. We would not expect an effect that has been determined to be statistically significant over a hundred year period to unravel in a few years. Furthermore, we would expect a number of scenarios that continue to bolster the statistical strength just due to randomness alone.
Why are we defending price-to-book?
The point of this commentary was not to defend price-to-book as a measure. Rather, it was to bring up a larger point.
As a community, quantitative investors often leverage statistical significance as a defense for the way we invest.
We think that is a good thing. We should look at the weight of the evidence. We should be data driven. We should try to find ideas that have proven to be robust over decades of time and when applied in different markets or with different asset classes. We should want to find strategies that are robust to small changes in parameterization.
Many quants would argue (including us among them), however, that there also needs to be a why. Why does this factor work? Without the why, we run the risk of glorified data mining. With the why, we can choose for ourselves whether we believe the effect will continue going forward.
Of course, there is nothing that prevents the why from being pure narrative fallacy. Perhaps we have simply weaved a story into a pattern of facts.
With price-to-book, one might argue we have done the exact opposite. The effect, technically, remains statistically significant and yet plenty of ink has been spilled as to why it shouldn’t work in the future.
The question we must answer, then, is, “when does statistically significant apply and when does it not?” How can we use it as a justification in one place and completely ignore it in others?
Furthermore, if we are going to rely on hundreds of years of data to establish significance, how can we determine when something is “broken” if the statistical evidence does not support it?
Price-to-book may very well be broken. But that is not the point of this commentary. The point is simply that the same tools we use to establish and defend factors may prevent us from tearing them down.
- http://www.osam.com/Commentary/negative-equity-veiled-value-and-the-erosion-of-price-to-book
- What, you don’t talk to yourself this way?
- It is worth mentioning that we are completely ignoring both historical and forward evidence, or counter-evidence, found in other geographies.
- Since size is no longer statistically significant at the 1% level, we use the 5% level as the threshold.