This post is available as a PDF download here.
Summary
- Value investing continues to experience a trough of sorrow. In particular, the traditional price-to-book factor has failed to establish new highs since December 2006 and sits in a 25% drawdown.
- While price-to-book has been the academic measure of choice for 25+ years, many practitioners have begun to question its value (pun intended).
- We have also witnessed the turning of the tides against the size premium, with many practitioners no longer considering it to be a valid stand-alone anomaly. This comes 35+ years after being first published.
- With this in mind, we explore the evidence that would be required for us to dismiss other, already established anomalies. Using past returns to establish prior beliefs, we simulate out forward environments and use Bayesian inference to adjust our beliefs over time, recording how long it would take for us to finally dismiss a factor.
- We find that for most factors, we would have to live through several careers to finally witness enough evidence to dismiss them outright.
- Thus, while factors may be established upon a foundation of evidence, their forward use requires a bit of faith.
In Norse mythology, Fimbulvetr (commonly referred to in English as “Fimbulwinter”) is a great and seemingly never-ending winter. It continues for three seasons – long, horribly cold years that stretch on longer than normal – with no intervening summers. It is a time of bitterly cold, sunless days where hope is abandoned and discord reigns.
This winter-to-end-all-winters is eventually punctuated by Ragnarok, a series of events leading up to a great battle that results in the ultimate death of the major gods, destruction of the cosmos, and subsequent rebirth of the world.
Investment mythology is littered with Ragnarok-styled blow-ups and we often assume the failure of a strategy will manifest as sudden catastrophe. In most cases, however, failure may more likely resemble Fimbulwinter: a seemingly never-ending winter in performance with returns blown to-and-fro by the harsh winds of randomness.
Value investors can attest to this. In particular, the disciples of price-to-book have suffered greatly as of late, with “expensive” stocks having outperformed “cheap” stocks for over a decade. The academic interpretation of the factor sits nearly 25% belowits prior high-water mark seen in December 2006.
Expectedly, a large number of articles have been written about the death of the value factor. Some question the factor itself, while others simply argue that price-to-book is a broken implementation.
But are these simply retrospective narratives, driven by a desire to have an explanation for a result that has defied our expectations? Consider: if price-to-book had exhibited positive returns over the last decade, would we be hearing from nearly as large a number of investors explaining why it is no longer a relevant metric?
To be clear, we believe that many of the arguments proposed for why price-to-book is no longer a relevant metric are quite sound. The team at O’Shaughnessy Asset Management, for example, wrote a particularly compelling piece that explores how changes to accounting rules have led book value to become a less relevant metric in recent decades.1
Nevertheless, we think it is worth taking a step back, considering an alternate course of history, and asking ourselves how it would impact our current thinking. Often, we look back on history as if it were the obvious course. “If only we had better prior information,” we say to ourselves, “we would have predicted the path!”2 Rather, we find it more useful to look at the past as just one realized path of many that’s that could have happened, none of which were preordained. Randomness happens.
With this line of thinking, the poor performance of price-to-book can just as easily be explained by a poor roll of the dice as it can be by a fundamental break in applicability. In fact, we see several potential truths based upon performance over the last decade:
- This is all normal course performance variance for the factor.
- The value factor works, but the price-to-book measure itself is broken.
- The price-to-book measure is over-crowded in use, and thus the “troughs of sorrow” will need to be deeper than ever to get weak hands to fold and pass the alpha to those with the fortitude to hold.
- The value factor never existed in the first place; it was an unfortunate false positive that saturated the investing literature and broad narrative.
The problem at hand is two-fold: (1) the statistical evidence supporting most factors is considerable and (2) the decade-to-decade variance in factor performance is substantial. Taken together, you run into a situation where a mere decade of underperformance likely cannot undue the previously established significance. Just as frustrating is the opposite scenario. Consider that these two statements are not mutually exclusive: (1) price-to-book is broken, and (2) price-to-book generates positive excess return over the next decade.
In investing, factor return variance is large enough that the proof is not in the eating of the short-term return pudding.
The small-cap premium is an excellent example of the difficulty in discerning, in real time, the integrity of an established factor. The anomaly has failed to establish a meaningful new high since it was originally published in 1981. Only in the last decade – nearly 30 years later – have the tides of the industry finally seemed to turn against it as an established anomaly and potential source of excess return.
Thirty years.
The remaining broadly accepted factors – e.g. value, momentum, carry, defensive, and trend – have all been demonstrated to generate excess risk-adjusted returns across a variety of economic regimes, geographies, and asset classes, creating a great depth of evidence supporting their existence. What evidence, then, would make us abandon faith from the Church of Factors?
To explore this question, we ran a simple experiment for each factor. Our goal was to estimate how long it would take to determine that a factor was no longer statistically significant.
Our assumption is that the salient features of each factor’s return pattern will remain the same (i.e. autocorrelation, conditional heteroskedasticity, skewness, kurtosis, et cetera), but the forward average annualized return will be zero since the factor no longer “works.”
Towards this end, we ran the following experiment:
- Take the full history for the factor and calculate prior estimates for mean annualized return and standard error of the mean.
- De-mean the time-series.
- Randomly select a 12-month chunk of returns from the time series and use the data to perform a Bayesian update to our mean annualized return.
- Repeat step 3 until the annualized return is no longer statistically non-zero at a 99% confidence threshold.
For each factor, we ran this test 10,000 times, creating a distribution that tells us how many years into the future we would have to wait until we were certain, from a statistical perspective, that the factor is no longer significant.
Sixty-seven years.
Based upon this experience, sixty-seven years is median number of years we will have to wait until we officially declare price-to-book (“HML,” as it is known in the literature) to be dead.3 At the risk of being morbid, we’re far more likely to die before the industry finally sticks a fork in price-to-book.
We perform this experiment for a number of other factors – including size (“SMB” – “small-minus-big”), quality (“QMJ” – “quality-minus-junk”), low-volatility (“BAB” – “betting-against-beta”), and momentum (“UMD” – “up-minus-down”) – and see much the same result. It will take decades before sufficient evidence mounts to dethrone these factors.
HML | SMB4 | QMJ | BAB | UMD | |
Median Years-until-Failure | 67 | 43 | 132 | 284 | 339 |
Now, it is worth pointing out that these figures for a factor like momentum (“UMD”) might be a bit skewed due to the design of the test. If we examine the long-run returns, we see a fairly docile return profile punctuated by sudden and significant drawdowns (often called “momentum crashes”).
Since a large proportion of the cumulative losses are contained in these short but pronounced drawdown periods, demeaning the time-series ultimately means that the majority of 12-month periods actually exhibit positive returns. In other words, by selecting random 12-month samples, we actually expect a high frequency of those samples to have a positive return.
For example, using this process, 49.1%, 47.6%, 46.7%, 48.8% of rolling 12-month periods are positive for HML, SMB, QMJ, and BAB factors respectively. For UMD, that number is 54.7%. Furthermore, if you drop the worst 5% of rolling 12-month periods for UMD, the average positive period is 1.4x larger than the average negative period. Taken together, not only are you more likely to select a positive 12-month period, but those positive periods are, on average, 1.4x larger than the negative periods you will pick, except for the rare (<5%) cases.
The process of the test was selected to incorporate the salient features of each factor. However, in the case of momentum, it may lead to somewhat outlandish results.
Conclusion
While an evidence-based investor should be swayed by the weight of the data, the simple fact is that most factors are so well established that the majority of current practitioners will likely go our entire careers without experiencing evidence substantial enough to dismiss any of the anomalies.
Therefore, in many ways, there is a certain faith required to use them going forward. Yes, these are ideas and concepts derived from the data. Yes, we have done our best to test their robustness out-of-sample across time, geographies, and asset classes. Yet we must also admit that there is a non-zero probability, however small it is, that these are false positives: a fact we may not have sufficient evidence to address until several decades hence.
And so a bit of humility is warranted. Factors will not suddenly stand up and declare themselves broken. And those that are broken will still appear to work from time-to-time.
Indeed, the death of a factor will be more Fimulwinter than Ragnarok: not so violent to be the end of days, but enough to cause pain and frustration among investors.
Addendum
We have received a large number of inbound notes about this commentary, which fall upon two primary lines of questions. We want to address these points.
How were the tests impacted by the Bayesian inference process?
The results of the tests within this commentary are rather astounding. We did seek to address some of the potential flaws of the methodology we employed, but by-in-large we feel the overarching conclusion remains on a solid foundation.
While we only presented the results of the Bayesian inference approach in this commentary, as a check we actually tested two other approaches:
- A Bayesian inference approach assuming that forward returns would be a random walk with constant variance (based upon historical variance) and zero mean.
- Forward returns were simulated using the same bootstrap approach, but the factor was being discovered for the first time and the entire history was being evaluated for its significance.
The two tests were in effort to isolate the effects of the different components of our test.
What we found was that while the reported figures changed, the overall magnitude did not. In other words, the median death-date of HML may not have been 67 years, but the order of magnitude remained much the same: decades.
Stepping back, these results were somewhat a foregone conclusion. We would not expect an effect that has been determined to be statistically significant over a hundred year period to unravel in a few years. Furthermore, we would expect a number of scenarios that continue to bolster the statistical strength just due to randomness alone.
Why are we defending price-to-book?
The point of this commentary was not to defend price-to-book as a measure. Rather, it was to bring up a larger point.
As a community, quantitative investors often leverage statistical significance as a defense for the way we invest.
We think that is a good thing. We should look at the weight of the evidence. We should be data driven. We should try to find ideas that have proven to be robust over decades of time and when applied in different markets or with different asset classes. We should want to find strategies that are robust to small changes in parameterization.
Many quants would argue (including us among them), however, that there also needs to be a why. Why does this factor work? Without the why, we run the risk of glorified data mining. With the why, we can choose for ourselves whether we believe the effect will continue going forward.
Of course, there is nothing that prevents the why from being pure narrative fallacy. Perhaps we have simply weaved a story into a pattern of facts.
With price-to-book, one might argue we have done the exact opposite. The effect, technically, remains statistically significant and yet plenty of ink has been spilled as to why it shouldn’t work in the future.
The question we must answer, then, is, “when does statistically significant apply and when does it not?” How can we use it as a justification in one place and completely ignore it in others?
Furthermore, if we are going to rely on hundreds of years of data to establish significance, how can we determine when something is “broken” if the statistical evidence does not support it?
Price-to-book may very well be broken. But that is not the point of this commentary. The point is simply that the same tools we use to establish and defend factors may prevent us from tearing them down.
The New Glide Path
By Corey Hoffstein
On July 2, 2018
In Portfolio Construction, Risk Management, Sequence Risk, Weekly Commentary
This post is available as a PDF download here.
Summary
In past commentaries, we have written at length about investor sequence risk. Summarized simply, sequence risk is the sensitivity of investor goals to the sequence of market returns. In finance, we traditionally assume the sequence of returns does not matter. However, for investors and institutions that are constantly making contributions and withdrawals, the sequence can be incredibly important.
Consider for example, an investor who retires with $1,000,000 and uses the traditional 4% spending rule to allocate a $40,000 annual withdrawal to themselves. Suddenly, in the first year, their portfolio craters to $500,000. That $40,000 no longer represents just 4%, but now it represents 8%.
Significant drawdowns and fixed withdrawals mix like oil and water.
Sequence risk is the exact reason why traditional glide paths have investors de-risk their portfolios over time from growth-focused, higher volatility assets like equities to traditionally less volatile assets, like short-duration investment grade fixed income.
Bonds, however, are not the only way investors can manage risk. There are a variety of other methods, and frequent readers will know that we are strong advocates for the incorporation of trend-following techniques.
But how much trend-following should investors use? And when?
That is exactly what this commentary aims to explore.
Building a New Glidepath
In many ways, this is a very open-ended question. As a starting point, we will create some constraints that simplify our approach:
Source: St. Louis Federal Reserve and Kenneth French Database. Past performance is hypothetical and backtested. Trend Strategy is a simple 200-day moving average cross-over strategy that invests in U.S. equities when the price of U.S. equities is above its 200-day moving average and in U.S. T-Bills otherwise. Returns are gross of all fees and assume the reinvestment of all dividends. None of the equity curves presented here represent a strategy managed by Newfound Research.
To generate our glide path, we will use a process of backwards induction similar to that proposed by Gordon Irlam in his article Portfolio Size Matters (Journal of Personal Finance, Vol 13 Issue 2). The process works thusly:
As a technical side-note, we should mention that exploring all possible portfolio configurations is a computationally taxing exercise, as would be an optimization-based approach. To circumvent this, we employ a quasi-random low-discrepancy sequence generator known as a Sobol sequence. This process allows us to generate 100 samples that efficiently span the space of a 4-dimensional unit hypercube. We can then normalize these samples and use them as our sample allocations.
If that all sounded like gibberish, the main thrust is this: we’re not really checking every single portfolio configuration, but trying to use a large enough sample to capture most of them.
By working backwards, we can tackle what would be an otherwise computationally intractable problem. In effect, we are saying, “if we know the optimal decision at time T+1, we can use that knowledge to guide our decision at time T.”
This methodology also allows us to recognize that the relative wealth level to spending level is important. For example, having $2,000,000 at age 70 with a $40,000 real spending rate is very different than having $500,000, and we would expect that the optimal allocation would different.
Consider the two extremes. The first extreme is we have an excess of wealth. In this case, since we are optimizing to maximize the probability of success, the result will be to take no risk and hold a significant amount of T-Bills. If, however, we had optimized to acknowledge a desire to bequeath wealth to the next generation, you would likely see the opposite extreme: with little risk of failure, you can load up on stocks and to try to maximize growth.
The second extreme is having a significant dearth of wealth. In this case, we would expect to see the optimizer recommend a significant amount of stocks, since the safer assets will likely guarantee failure while the risky assets provide a lottery’s chance of success.
The Results
To plot the results both over time as well as over the different wealth levels, we have to plot each asset individually, which we do below. As an example of how to read these graphs, below we can see that in the table for U.S. equities, at age 74 and a $1,600,000 wealth level, the glide path would recommend an 11% allocation to U.S. equities.
A few features we can identify:
Ignoring the data artifacts, we can broadly see that trend following seems to receive a fairly healthy weight in the earlier years of retirement and at wealth levels where capital preservation is critical, but growth cannot be entirely sacrificed. For example, we can see that an investor with $1,000,000 at age 60 would allocate approximately 30% of their portfolio to a trend following strategy.
Note that the initially assumed $40,000 consumption level aligns with the generally recommended 4% withdrawal assumption. In other words, the levels here are less important than their size relative to desired spending.
It is also worth pointing out again that this analysis uses historical returns. Hence, we see a large allocation to T-Bills which, once upon a time, offered a reasonable rate of return. This may not be the case going forward.
Conclusion
Financial theory generally assumes that the order of returns is not important to investors. Any investor contributing or withdrawing from their investment portfolio, however, is dramatically affected by the order of returns. It is much better to save before a large gain or spend before a large loss.
For investors in retirement who are making frequent and consistent withdrawals from their portfolios, sequence manifests itself in the presence of large and prolonged drawdowns. Strategies that can help avoid these losses are, therefore, potentially very valuable.
This is the basis of the traditional glidepath. By de-risking the portfolio over time, investors become less sensitive to sequence risk. However, as bond yields remain low and investor life expectancy increases, investors may need to rely more heavily on higher volatility growth assets to avoid running out of money.
To explore these concepts, we have built our own glide path using four assets: broad U.S. equities, 10-year U.S. Treasuries, U.S. T-Bills, and a trend following strategy. Not surprisingly, we find that trend following commands a significant allocation, particularly in the years and wealth levels where sequence risk is highest, and often is allocated to in lieu of equities themselves.
Beyond recognizing the potential value-add of trend following, however, an important second takeaway may be that there is room for significant value-add in going beyond traditional target-date-based glide paths for investors.