This post is available as a PDF download here.
Summary
- We have shown previously that it is possible to time factors using value and momentum but that the benefit is not large.
- By constructing a simple model for factor timing, we examine what accuracy would be required to do better than a momentum-based timing strategy.
- While the accuracy required is not high, finding the system that achieves that accuracy may be difficult.
- For investors focused on managing the risks of underperformance – both in magnitude and frequency – a diversified factor portfolio may be the best choice.
- Investors seeking outperformance will have to bear more concentration risk and may be open to more model risk as they forego the diversification among factors.
A few years ago, we began researching factor timing – moving among value, momentum, low volatility, quality, size etc. – with the hope of earning returns in excess not only of the equity market, but also of buy-and-hold factor strategies.
To time the factors, our natural first course of action was to exploit the behavioral biases that may create the factors themselves. We examined value and momentum across the factors and used these metrics to allocate to factors that we expected to outperform in the future.
The results were positive. However, taking into account transaction costs led to the conclusion that investors were likely better off simply holding a diversified factor portfolio.
We then looked at ways to time the factors using the business cycle.
The results in this case were even less convincing and were a bit too similar to a data-mined optimal solution to instill much faith going forward.
But this evidence does not necessarily remove the temptation to take a stab at timing the factors, especially since explicit transactions costs have been slashed for many investors accessing long-only factors through ETFs.
After all, there is a lot to gain by choosing the right factors. For example, in the first 9 months of 2019, the spread between the best (Quality) and worst (Value) performing factors was nearly 1,000 basis points (“bps”). One month prior, that spread had been double!
In this research note, we will move away from devising a systematic approach to timing the factors (as AQR asserts, this is deceptively difficult) and instead focus on what a given method would have to overcome to achieve consistent outperformance.
Benchmarking Factor Timing
With all equity factor strategies, the goal is usually to outperform the market-cap weighted equity benchmark.
Since all factor portfolios can be thought of as a market cap weighted benchmark plus a long/short component that captures the isolated factor performance, we can focus our study solely on the long/short portfolio.
Using the common definitions of the factors (from Kenneth French and AQR), we can look at periods over which these self-financing factor portfolios generate positive returns to see if overlaying them on a market-cap benchmark would have added value over different lengths of time.1
We will also include the performance of an equally weighted basket of the four factors (“Blend”).
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
The persistence of factor outperformance over one-month periods is transient. If the goal is to outperform the most often, then the blended portfolio satisfies this requirement, and any timing strategy would have to be accurate enough to overcome this already existing spread.
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
The results for the blended portfolio are so much better than the stand-alone factors because the factors have correlations much lower than many other asset classes, allowing even naïve diversification to add tremendous value.
The blended portfolio also cuts downside risk in terms of returns. If the timing strategy is wrong, and chooses, for example, momentum in an underperforming month, then it could take longer for the strategy to climb back to even. But investors are used to short periods of underperformance and often (we hope) realize that some short-term pain is necessary for long-term gains.
Looking at the same analysis over rolling 1-year periods, we do see some longer periods of factor outperformance. Some examples are quality in the 1980s, value in the mid-2000s, momentum in the 1960s and 1990s, and size in the late-1970s.
However, there are also decent stretches where the factors underperform. For example, the recent decade for value, quality in the early 2010s, momentum sporadically in the 2000s, and size in the 1980s and 1990s. If the timing strategy gets stuck in these periods, then there can be a risk of abandoning it.
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
Again, a blended portfolio would have addressed many of these underperforming periods, giving up some of the upside with the benefit of reducing the risk of choosing the wrong factor in periods of underperformance.
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
And finally, if we extend our holding period to three years, which may be used for a slower moving signal based on either value or the business cycle, we see that the diversified portfolio still exhibits outperformance over the most rolling periods and has a strong ratio of upside to downside.
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
The diversified portfolio stands up to scrutiny against the individual factors but could a generalized model that can time the factors with a certain degree of accuracy lead to better outcomes?
Generic Factor Timing
To construct a generic factor timing model, we will consider a strategy that decides to hold each factor or not with a certain degree of accuracy.
For example, if the accuracy is 50%, then the strategy would essentially flip a coin for each factor. Heads and that factor is included in the portfolio; tails and it is left out. If the accuracy is 55%, then the strategy will hold the factor with a 55% probability when the factor return is positive and not hold the factor with the same probability when the factor return is negative. Just to be clear, this strategy is constructed with look-ahead bias as a tool for evaluation.
All factors included in the portfolio are equally weighted, and if no factors are included, then the returns is zero for that period.
This toy model will allow us to construct distributions to see where the blended portfolio of all the factors falls in terms of frequency of outperformance (hit rate), average outperformance, and average underperformance. The following charts show the percentiles of the diversified portfolio for the different metrics and model accuracies using 1,000 simulations.2
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
In terms of hit rate, the diversified portfolio behaves in the top tier of the models over all time periods for accuracies up to about 57%. Even with a model that is 60% accurate, the diversified portfolio was still above the median.
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
For average underperformance, the diversified portfolio also did very well in the context of these factor timing models. The low correlation between the factors leads to opportunities for the blended portfolio to limit the downside of individual factors.
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
For average outperformance, the diversified portfolio did much worse than the timing model over all time horizons. We can attribute this also to the low correlation between the factors, as choosing only a subset of factors and equally weighting them often leads to more extreme returns.
Overall, the diversified portfolio manages the risks of underperformance, both in magnitude and in frequency, at the expense of sacrificing outperformance potential. We saw this in the first section when we compared the diversified portfolio to the individual factors.
But if we want to have increased return potential, we will have to introduce some model risk to time the factors.
Checking in on Momentum
Momentum is one model-based way to time the factors. Under our definition of accuracy in the toy model, a 12-1 momentum strategy on the factors has an accuracy of about 56%. While the diversified portfolio exhibited some metrics in line with strategies that were even more accurate than this, it never bore concentration risk: it always held all four factors.
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
For the hit rate percentiles of the momentum strategy, we see a more subdued response. Momentum does not win as much as the diversified portfolio over the different time periods.
But not winning as much can be fine if you win bigger when you do win.
The charts below show that momentum does indeed have a higher outperformance percentile but with a worse underperformance percentile, especially for 1-month periods, likely due to mean reversionary whipsaw.
Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.
While momentum is definitely not the only way to time the factors, it is a good baseline to see what is required for higher average outperformance.
Now, turning back to our generic factor timing model, what accuracy would you need to beat momentum?
Sharpening our Signal
The answer is: not a whole lot. Most of the time, we only need to be about 53% accurate to beat the momentum-based factor timing.
The caveat is that this is the median performance of the simulations. The accuracy figure climbs closer to 60% if we use the 25th percentile as our target.
While these may not seem like extremely high requirements for running a successful factor timing strategy, it is important to observe that not many investors are doing this. True accuracy may be hard to discover, and sticking with the system may be even harder when the true accuracy can never be known.
Conclusion
If you made it this far looking for some rosy news on factor timing or the Holy Grail of how to do it skillfully, you may be disappointed.
However, for most investors looking to generate some modest benefits relative to market-cap equity, there is good news. Any signal for timing factors does not have to be highly accurate to perform well, and in the absence of a signal for timing, a diversified portfolio of the factors can lead to successful results by the metrics of average underperformance and frequency of underperformance.
For those investors looking for higher outperformance, concentration risk will be necessary.
Any timing strategy on low correlation investments will generally forego significant diversification in the pursuit of higher returns.
While this may be the goal when constructing the strategy, we should always pause and determine whether the potential benefits outweigh the costs. Transaction costs may be lower now. However, there are still operational burdens and the potential stress caused by underperformance when a system is not automated or when results are tracked too frequently.
Factor timing may be possible, but timing and tactical rotation may be better suited to scenarios where some of the model risk can be mitigated.
- We should note here that the long/short portfolios implicitly derived from long-only factor portfolios are not the same as the academic long/short portfolios we are using here. In fact, they can be substantially different at times. Nevertheless, we do not believe this fact meaningfully alters the results of our study.
- We typically see accuracies in the range of 50%-60% for many different strategies that take frequent bets with the hope of realizing a premium over a longer time horizon. When there can be large positive tail events, the accuracy would not need to be as high.