This post is available as a PDF download here.
Summary
- We have shown previously that it is possible to time factors using value and momentum but that the benefit is not large.
- By constructing a simple model for factor timing, we examine what accuracy would be required to do better than a momentum-based timing strategy.
- While the accuracy required is not high, finding the system that achieves that accuracy may be difficult.
- For investors focused on managing the risks of underperformance – both in magnitude and frequency – a diversified factor portfolio may be the best choice.
- Investors seeking outperformance will have to bear more concentration risk and may be open to more model risk as they forego the diversification among factors.
A few years ago, we began researching factor timing – moving among value, momentum, low volatility, quality, size etc. – with the hope of earning returns in excess not only of the equity market, but also of buy-and-hold factor strategies.
To time the factors, our natural first course of action was to exploit the behavioral biases that may create the factors themselves. We examined value and momentum across the factors and used these metrics to allocate to factors that we expected to outperform in the future.
The results were positive. However, taking into account transaction costs led to the conclusion that investors were likely better off simply holding a diversified factor portfolio.
We then looked at ways to time the factors using the business cycle.
The results in this case were even less convincing and were a bit too similar to a data-mined optimal solution to instill much faith going forward.
But this evidence does not necessarily remove the temptation to take a stab at timing the factors, especially since explicit transactions costs have been slashed for many investors accessing long-only factors through ETFs.Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.
After all, there is a lot to gain by choosing the right factors. For example, in the first 9 months of 2019, the spread between the best (Quality) and worst (Value) performing factors was nearly 1,000 basis points (“bps”). One month prior, that spread had been double!
In this research note, we will move away from devising a systematic approach to timing the factors (as AQR asserts, this is deceptively difficult) and instead focus on what a given method would have to overcome to achieve consistent outperformance.
Benchmarking Factor Timing
With all equity factor strategies, the goal is usually to outperform the market-cap weighted equity benchmark.
Since all factor portfolios can be thought of as a market cap weighted benchmark plus a long/short component that captures the isolated factor performance, we can focus our study solely on the long/short portfolio.
Using the common definitions of the factors (from Kenneth French and AQR), we can look at periods over which these self-financing factor portfolios generate positive returns to see if overlaying them on a market-cap benchmark would have added value over different lengths of time.1
We will also include the performance of an equally weighted basket of the four factors (“Blend”).
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
The persistence of factor outperformance over one-month periods is transient. If the goal is to outperform the most often, then the blended portfolio satisfies this requirement, and any timing strategy would have to be accurate enough to overcome this already existing spread.
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
The results for the blended portfolio are so much better than the stand-alone factors because the factors have correlations much lower than many other asset classes, allowing even naïve diversification to add tremendous value.
The blended portfolio also cuts downside risk in terms of returns. If the timing strategy is wrong, and chooses, for example, momentum in an underperforming month, then it could take longer for the strategy to climb back to even. But investors are used to short periods of underperformance and often (we hope) realize that some short-term pain is necessary for long-term gains.
Looking at the same analysis over rolling 1-year periods, we do see some longer periods of factor outperformance. Some examples are quality in the 1980s, value in the mid-2000s, momentum in the 1960s and 1990s, and size in the late-1970s.
However, there are also decent stretches where the factors underperform. For example, the recent decade for value, quality in the early 2010s, momentum sporadically in the 2000s, and size in the 1980s and 1990s. If the timing strategy gets stuck in these periods, then there can be a risk of abandoning it.
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
Again, a blended portfolio would have addressed many of these underperforming periods, giving up some of the upside with the benefit of reducing the risk of choosing the wrong factor in periods of underperformance.
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
And finally, if we extend our holding period to three years, which may be used for a slower moving signal based on either value or the business cycle, we see that the diversified portfolio still exhibits outperformance over the most rolling periods and has a strong ratio of upside to downside.
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
The diversified portfolio stands up to scrutiny against the individual factors but could a generalized model that can time the factors with a certain degree of accuracy lead to better outcomes?
Generic Factor Timing
To construct a generic factor timing model, we will consider a strategy that decides to hold each factor or not with a certain degree of accuracy.
For example, if the accuracy is 50%, then the strategy would essentially flip a coin for each factor. Heads and that factor is included in the portfolio; tails and it is left out. If the accuracy is 55%, then the strategy will hold the factor with a 55% probability when the factor return is positive and not hold the factor with the same probability when the factor return is negative. Just to be clear, this strategy is constructed with look-ahead bias as a tool for evaluation.
All factors included in the portfolio are equally weighted, and if no factors are included, then the returns is zero for that period.
This toy model will allow us to construct distributions to see where the blended portfolio of all the factors falls in terms of frequency of outperformance (hit rate), average outperformance, and average underperformance. The following charts show the percentiles of the diversified portfolio for the different metrics and model accuracies using 1,000 simulations.2
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
In terms of hit rate, the diversified portfolio behaves in the top tier of the models over all time periods for accuracies up to about 57%. Even with a model that is 60% accurate, the diversified portfolio was still above the median.
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
For average underperformance, the diversified portfolio also did very well in the context of these factor timing models. The low correlation between the factors leads to opportunities for the blended portfolio to limit the downside of individual factors.
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
For average outperformance, the diversified portfolio did much worse than the timing model over all time horizons. We can attribute this also to the low correlation between the factors, as choosing only a subset of factors and equally weighting them often leads to more extreme returns.
Overall, the diversified portfolio manages the risks of underperformance, both in magnitude and in frequency, at the expense of sacrificing outperformance potential. We saw this in the first section when we compared the diversified portfolio to the individual factors.
But if we want to have increased return potential, we will have to introduce some model risk to time the factors.
Checking in on Momentum
Momentum is one model-based way to time the factors. Under our definition of accuracy in the toy model, a 12-1 momentum strategy on the factors has an accuracy of about 56%. While the diversified portfolio exhibited some metrics in line with strategies that were even more accurate than this, it never bore concentration risk: it always held all four factors.
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
For the hit rate percentiles of the momentum strategy, we see a more subdued response. Momentum does not win as much as the diversified portfolio over the different time periods.
But not winning as much can be fine if you win bigger when you do win.
The charts below show that momentum does indeed have a higher outperformance percentile but with a worse underperformance percentile, especially for 1-month periods, likely due to mean reversionary whipsaw.
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
While momentum is definitely not the only way to time the factors, it is a good baseline to see what is required for higher average outperformance.
Now, turning back to our generic factor timing model, what accuracy would you need to beat momentum?
Sharpening our Signal
The answer is: not a whole lot. Most of the time, we only need to be about 53% accurate to beat the momentum-based factor timing.
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.
The caveat is that this is the median performance of the simulations. The accuracy figure climbs closer to 60% if we use the 25th percentile as our target.
While these may not seem like extremely high requirements for running a successful factor timing strategy, it is important to observe that not many investors are doing this. True accuracy may be hard to discover, and sticking with the system may be even harder when the true accuracy can never be known.
Conclusion
If you made it this far looking for some rosy news on factor timing or the Holy Grail of how to do it skillfully, you may be disappointed.
However, for most investors looking to generate some modest benefits relative to market-cap equity, there is good news. Any signal for timing factors does not have to be highly accurate to perform well, and in the absence of a signal for timing, a diversified portfolio of the factors can lead to successful results by the metrics of average underperformance and frequency of underperformance.
For those investors looking for higher outperformance, concentration risk will be necessary.
Any timing strategy on low correlation investments will generally forego significant diversification in the pursuit of higher returns.
While this may be the goal when constructing the strategy, we should always pause and determine whether the potential benefits outweigh the costs. Transaction costs may be lower now. However, there are still operational burdens and the potential stress caused by underperformance when a system is not automated or when results are tracked too frequently.
Factor timing may be possible, but timing and tactical rotation may be better suited to scenarios where some of the model risk can be mitigated.
Should I Stay or Should I Growth Now?
By Corey Hoffstein
On January 21, 2020
In Value, Weekly Commentary
This post is available as a PDF download here.
Summary
“Should I stay or should I go now?
If I go, there will be trouble
And if I stay it will be double”
— The Clash
It is no secret that quantitative value strategies have struggled as of late. Naïve sorts – like the Fama-French HML factor – peaked around 2007, but most quants would stick their noses up and say, “See? Craftsmanship matters.” Composite metrics, industry-specific scoring, sector-neutral constraints, factor-neutral constraints, and quality screens all helped quantitative value investors stay in the game.
Even a basket of long-only value ETFs didn’t peak against the S&P 500 until mid-2014.
Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. The Value ETF basket is an equal-weight portfolio of FVAL, IWD, JVAL, OVLU, QVAL, RPV, VLU, and VLUE, with each ETF being included when it is first available. Performance of the long/short portfolio is calculated as the monthly return of the Value ETF Basket minus the monthly return of the S&P 500 (“SPY”).
Many strategies were able to keep the mojo going until 2016 or so. But at that point, the wheels came off for just about everyone.
A decade of under-performance for the most naïve approaches and three-plus years of under-performance for some of the most thoughtful has many people asking, “is quantitative value an outdated idea? Should we throw in the towel and just buy growth?”
Of course, it should come as no surprise that many quantitative value managers are now clamoring that this is potentially the best time to invest in value since the dot-com bubble. “No pain, no premium,” as we like to say.
Nevertheless, the question of value’s attractiveness itself is muddied for a variety of reasons:
By no means will this commentary be a comprehensive evaluation as to the attractiveness of Value, but we do hope to provide some more data for the debate.
Replicating Style-Box Growth and Value
If you want the details of how we are defining Growth and Value, read on. Otherwise, you can skip ahead to the next section.
Morningstar invented the style box back in the early 1990s. Originally, value was simply defined based upon price-to-book and price-to-earnings. But somewhere along the line, things changed. Not only was the definition of value expanded to include more metrics, but growth was given an explicit set of metrics to quantify it, as well.
The subtle difference here is rather than measuring cheap versus expensive, the new model more explicitly attempted to capture value versus growth. The problem – at least in my opinion – is that the model makes it such that the growth-iest fund is now the one that simultaneously ranks the highest on growth metrics and the lowest on value metrics. Similarly, the value-iest fund is the one that ranks the highest on value metrics and the lowest on growth metrics. So growth is growing but expensive and value is cheap but contracting.
The index providers took the same path Morningstar did. For example, while MSCI originally defined value and growth based only upon price-to-book, they later amended it to include not only other value metrics, but growth metrics as well. S&P Dow Jones and FTSE Russell follow this same general scheme. Which is all a bit asinine if you ask me.1
Nevertheless, it is relevant to the discussion as to whether value is attractive or not, as value defined by a style-box methodology can differ from value as defined by a factor methodology. Therefore, to dive under the hood, we created our own “Frankenstein’s style-box” by piecing together different components of S&P Dow Jones’, FTSE Russell’s, and MSCI’s methodologies.
From this point, we basically follow MSCI’s methodology. Each security is plotted onto a “style space” (see image below) and assigned value and growth inclusion factors based upon the region it falls into. These inclusion factors represent the proportion of a security’s market cap that can be allocated to the Value or Growth index.
Securities are then sorted by their distance from the origin point. Starting with the securities that are furthest from the origin (i.e. those with more extreme style scores), market capitalizations are proportionally allocated to Value and Growth based upon their inclusion factors. Once one style hits 50%, the remaining securities are allocated to the other style regardless of inclusion factors.
Source: MSCI.
The result of this process is that each style represents approximately 50% of the total market capitalization of the S&P 500. The market capitalization for each security will be fully represented in the combination of growth and value and may even be represented in both Value and Growth as a partial weight (though never double counted).
Portfolios are rebalanced semi-annually using six overlapping portfolios.
How Attractive is Value?
To evaluate the relative attractiveness of Growth versus Value, we will evaluate two approaches.
In the first approach, we will make the assumption that fundamentals will not change but prices will revert. In this approach, we will plot the ratio of price-to-fundamental measures (e.g. price-to-earnings of Growth over price-to-earnings of Value) minus 1. This can be thought of as how far price would have to revert between the two indices before valuations are equal.
As an example, consider the following two cases. First, Value has an earnings yield of 2% and Growth has an earnings yield of 1%. In this case, both are expensive (Value has a P/E of 50 and Growth has a P/E of 100), but the price of Value would have to double (or the price of Growth would have to get cut in half) for their valuations to meet. As a second case, Value has an earnings yield of 100% and Growth has an earnings yield of 50%. Both are very cheap, but we would still have to see the same price moves for their fundamentals to meet.
For our second approach, we will assume prices and fundamentals remain constant and ask the question, “how much carry do I earn for this trade?” Specifically, we will measure shareholder yield (dividend yield plus buyback yield) for each index and evaluate the spread.
In both cases, we will decompose our analysis into Growth versus the Market and the Market versus Value to gain a better perspective as to how each leg of the trade is influencing results.
Below we plot the relative ratio for price-to-book, price-to-earnings, price-to-free-cash-flow, and price-to-sales.
Source: Sharadar. Calculations by Newfound Research.
A few things stand out:
Below we plot our estimate of carry (i.e. our return expectation given no change in prices): shareholder yield. Again, we see recent-era highs, but levels still well below 2000 and 2008 extremes.
Source: Sharadar. Calculations by Newfound Research.
Taken all together, value certainly appears cheaper – and a trade we likely would be paid more to sit on than we had previously – but a 2000s-era opportunity seems a stretch.
Growth is not Glamour
One potential flaw in the above analysis is that we are evaluating “Value 1.0” indices. More modern factor indices drop the “not Growth” aspect of defining value, preferring to focus only on valuation metrics. Therefore, to acknowledge that investors today may be evaluating the choice of a Growth 1.0 index versus a modern Value factor index, we repeat the above analysis using a Value strategy more consistent with current smart-beta products.
Specifically, we winsorize earnings yield, free-cash-flow yield, and sales yield and then compute market-cap-weighted z-scores. A security’s Value score is then equal to its average z-score across all three metrics with no mention of growth scores. The strategy selects the securities in the top quintile of Value scores and weights them in proportion to their value-score-scaled market capitalization. The strategy is rebalanced semi-annually using six overlapping portfolios.
Source: Sharadar. Calculations by Newfound Research.
We can see:
Plotting our carry for this trade, we do see a more meaningful divergence between Value and Growth. Furthermore, the carry for bearing Value risk does appear to be at decade highs; however it is certainly not at extreme levels and it has actually reverted from Q3 2019 highs.
Source: Sharadar. Calculations by Newfound Research.
Conclusion
In this research note, we sought to explore the current value-of-value. Unfortunately, it proves to be an elusive question, as the very definition of value is difficult to pin down.
For our first approach, we build a style-box driven definition of Value. We then plot the relative ratio of four fundamental measures – price-to-book, price-to-earnings, price-to-sales, and price-to-free-cash-flow – of Growth versus the S&P 500 and the S&P 500 versus Value. We find that both Growth and the S&P 500 look historically expensive on price-to-book and price-to-earnings metrics (implying that Value is very, very cheap), whereas just Growth looks particularly expensive for price-to-sales (implying that Value may not be cheap relative to the Market). However, none of the metrics look particularly cheap compared to the dot-com era.
We also evaluate Shareholder Yield as a measure of carry, finding that Value minus Growth reached a 20-year high in 2019 if the dot-com and 2008 periods are excluded.
Recognizing that many investors may prefer a more factor-based definition of value, we run the same analysis for a more concentrated value portfolio. Whereas the first analysis generally pointed to Growth versus the S&P 500 being more expensive than the S&P 500 versus Value trade, the factor-based approach finds the opposite conclusion. Similar to the prior results, Value appears historically cheap for price-to-book, price-to-earnings, and price-to-sales metrics, though it appears to have peaked in Q3 2019.
Finally, the Shareholder Yield spread for the factor approach also appears to be at multi-decade highs ignoring the dot-com and 2008 extremes.
Directionally, this analysis suggests that Value may indeed be cheaper-than-usual. Whether that cheapness is rational or not, however, is only something we’ll know with the benefit of hindsight.
For further reading on style timing, we highly recommend Style Timing: Value vs Growth (AQR). For more modern interpretations: Value vs. Growth: The New Bubble (QMA), It’s Time for a Venial Value-Timing (AQR), and Reports of Value’s Death May Be Greatly Exaggerated (Research Affiliates).