This blog post is available as a PDF download here.
Summary
- After stumbling across a set of old research notes from 2009 and 2012, we attempt to implement a Cyclicals versus Defensives sector trade out-of-sample.
- Post-2012 returns prove unconvincing and we find little evidence supporting the notion that PMI changes can be used for constructing this trade.
- Using data from the Kenneth French website, we extend the study to 1948, and similarly find that changes in PMI (regardless of lookback period) are not an effective signal for trading Cyclical versus Defensive sectors.
I love coming across old research because it allows for truly out-of-sample testing.
Earlier this week, I stumbled across a research note from 2009 and a follow-up note from 2012, both exploring the use of macro-based signals for constructing dollar-neutral long/short sector trades. Specifically, the pieces focused on using manufacturing Purchasing Manager Indices (PMIs) as a predictor for Cyclical versus Defensive sectors.1
The strategy outlined is simple: when the prior month change in manufacturing PMI is positive, the strategy is long Cyclicals and short Defensives; when the change is negative, the strategy is long Defensives and short Cyclicals. The intuition behind this signal is that PMIs provide a guide to hard economic activity.
The sample period for the initial test is from 1998 to 2009, a period over which the strategy performed quite well on a global basis and even better when using the more forward-looking ratio of new orders to inventory.
Red flags start to go up, however, when we read the second note from 2012. “It appears that the new orders-to-inventory ratio has lost its ability to forecast the output index.” “In addition, the optimal lookback period … has shifted from one to two months.”
At this point, we can believe one of a few things:
- The initial strategy works, has simply hit a rough patch in the three years after publishing, and will work again in the future.
- The initial strategy worked but has broken since publishing.
- The initial strategy never worked and was an artifact of datamining.
I won’t even bother addressing the whole “one-month versus two-month” comment. Long-time readers know where we come down on ensembles versus parameter specification…
Fortunately, we do not have to pass qualitative judgement: we can let the numbers speak for themselves.
While the initial notes focused on global implementation, we can rebuild the strategy using U.S. equity sectors and US manufacturing PMI as the driving signal. This will serve both as an out-of-sample test for assets, as well as provide approximately 7 more years of out-of-sample time to evaluate returns.
Below we plot the results of this strategy for both 1-month and 2-month lookback periods, highlighting the in-sample and out-of-sample periods for each specification based upon the date the original research notes were published. We use the State Street SPDR Sector Select ETFs as our implementation vehicles, with the exception of the iShares Dow Jones US Telecom ETF.
Source: CSI Data; Quandl. Calculations by Newfound Research. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.
The first thing we notice is that the original 1-month implementation – which appeared to work on a global scale – does not seem particularly robust when implemented with U.S. sectors. Post publish date results do not fare much better.
The 2-month specification, however, does appear to work reasonably well both in- and out-of-sample.
But is there something inherently magical about that two-month specification? We are hard-pressed to find a narrative explanation.
If we plot lookback specifications from 3- to 12-months, we see that the 2-month specification proves to be a significant outlier. Given the high correlation between all the other specifications, it is more likely that the 2-month lookback was the beneficiary of luck rather than capturing a special particular edge.
Source: CSI Data; Quandl. Calculations by Newfound Research. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.
Perhaps we’re not giving this idea enough breathing room. After all, were we to evaluate most value strategies in the most recent decades, we’d likely declare them insignificant as well.
With manufacturing PMI data extending back to the 1948, we can use sector index data from the Kenneth French website to reconstruct this strategy.
Unfortunately, the Kenneth French definitions do not match GICs perfectly, so we have to change the definition of Cyclicals and Defensives slightly. Using the Kenneth French data, we will define Cyclicals to be Manufacturing, Non-Durables, Technology, and Shops. Defensives are defined to be Durables, Telecom, Health Care, and Utilities.
We use the same strategy as before, going long Cyclicals and short Defensives when changes in PMI are positive, and short Cyclicals and long Defensives when changes to PMI are negative. We again vary the lookback period from 1- to 12-months.
Source: Kenneth French Data Library; Quandl. Calculations by Newfound Research. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.
The results are less than convincing. Not only do we see significant dispersion across implementations, but there is also no consistency in those implementations that do well versus those that do not.
Perhaps worse, the best performing variation only returned a paltry 1.40% annualized gross of any implementation costs. Once we start accounting for transaction costs, slippage, and management fees, this figure deflates towards zero rather quickly.
Source: Kenneth French Data Library; Quandl. Calculations by Newfound Research. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.
Conclusion
There is no shortage of quantitative research in the market and the research can be particularly compelling when it seems to fit a pre-existing narrative.
Cyclicals versus Defensives are a perfect example. Their very names imply the regimes during which they are supposed to add value, but actually translating this notion into a robust strategy proves to be less than easy.
I would make the philosophical argument that it quite simply cannot be easy. Consider the two pieces of information we need to believe for this strategy to work:
- Cyclicals outperform Defensives in an economic expansion and Defensives outperform Cyclicals in an economic contraction.
- We can forecast economic expansions and contractions before it is priced into the market.
If we have very high confidence in both statements, it effectively implies an arbitrage.
Therefore, if we have very high confidence in the truth of the first statement, then for markets to be reasonably efficient, we must have little confidence in the second statement.
Similarly, if we have high confidence in the trust of the second statement, then for markets to be reasonably efficient, we must have little confidence in the first statement.
Thus, a more reasonable expectation might be that Cyclicals tend to outperform Defensives during an expansion, and Defensives tend to outperform Cyclicals in a contraction, but there may be meaningful exceptions depending upon the particular cycle.
Furthermore, we may believe we have an edge in forecasting expansions and contractions (perhaps not with just PMI, though), but there will be many false positives and false negatives along the way.
Taken together, we might believe we can construct such a strategy, but errors in both assumptions will lead to periods of frustration. However, we should recognize that for such an “open secret” strategy to work in the long run, there have to be troughs of sorrow deep enough to avoid permanent crowding.
In this case, we believe there is little evidence to suggest that level changes in PMI provide particular insight into Cyclicals versus Defensives, but that does not mean there are no macro signals that might.
Es-CAPE Velocity: Value-Driven Sector Rotation
By Corey Hoffstein
On August 26, 2019
In Portfolio Construction, Risk & Style Premia, Value, Weekly Commentary
This post is available as a PDF download here.
Summary
It is no secret that systematic value investing of all sorts has struggled as of late. With the curious exception, that is, of the Barclays Shiller CAPE sector rotation strategy, a strategy explored by Bunn, Staal, Zhuang, Lazanas, Ural and Shiller in their 2014 paper Es-cape-ing from Overvalued Sectors: Sector Selection Based on the Cyclically Adjusted Price-Earnings (CAPE) Ratio. Initial performance suggests that the idea has performed quite well out-of-sample, which stands out among many “smart-beta” strategies which have failed to live up to their backtests.
Source: CSI Data. Calculations by Newfound Research. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.
Why is this strategy finding success where other value strategies have not? That is what we aim to explore in this commentary.
On a monthly basis, the Shiller CAPE sector rotation portfolio is rebalanced into an equal-weight allocation across four of the ten primary GICS sectors. The four are selected first by ranking the 10 primary sectors based upon their Relative CAPE ratios and choosing the cheapest five sectors. Of those cheapest five sectors, the sector with the worst trailing 12-month return (“momentum”) is removed.
The CAPE ratio – standing for Cyclically-Adjusted Price-to-Earnings ratio – is the current price divided by the 10-year moving average of inflation-adjusted earnings. The purpose of this smoothing is to reduce the impact of business cycle fluctuations.
The potential problem with using the raw CAPE value for each sector is that certain sectors have structurally higher and lower CAPE ratios than their peers. High growth sectors – e.g. Technology – tend to have higher CAPE ratios because they reinvest a substantial portion of their earnings while more stable sectors – e.g. Utilities – tend to have much lower CAPE ratios. Were we to simply sort sectors based upon their current CAPE ratio, we would tend to create structural over- and under-weights towards certain sectors.
To adjust for this structural difference, the strategy uses the idea of a Relative CAPE ratio, which is calculated by taking the current CAPE ratio and dividing it by a rolling 20-year average CAPE ratio1 for that sector. The thesis behind this step is that dividing by a long-term mean normalizes the sectors and allows for better comparison. Relative CAPE values above 1 mean that the sector is more expensive than it has historically been, while values less than 1 mean it is cheaper.
It is important to note here that the actual selection is still performed on a cross-sector basis. It is entirely possible that all the sectors appear cheap or expensive on a historical basis at the same time. The portfolio will simply pick the cheapest sectors available.
Poking and Prodding the Parameters
With an understanding of the rules, our first step is to poke and prod a bit to figure out what is really driving the strategy.
We begin by first exploring the impact of using the Relative CAPE ratio versus just the CAPE ratio.
For each of these ratios, we’ll plot two strategies. The first is a naïve Value strategy, which will equally-weight the four cheapest sectors. The second is the Shiller strategy, which chooses the top five cheapest sectors and drops the one with the worst momentum. This should provide a baseline for comparing the impact of the momentum filter.
Strategy returns are plotted relative to the S&P 500.
Source: Siblis Research; Morningstar; CS Data. Calculations by Newfound Research. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.
For the Relative CAPE ratio, we also vary the lookback period for calculating the rolling average CAPE from 5- to 20-years.
Source: Siblis Research; Morningstar; CSI Data. Calculations by Newfound Research. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.
A few things immediately stand out:
The second-to-last point is particularly curious, as it implies that using momentum to “avoid the value trap” creates significant value (no pun intended; okay, pun intended) for the strategy.
Varying the Value Metric (in Vain)
To gain more insight, we next test the impact of the choice of the CAPE ratio. Below we plot the relative returns of different Shiller-based strategies (again varying lookbacks from 5- to 20-years), but use price-to-book, trailing 12-month price-to-earnings, and trailing 12-month EV/EBITDA as our value metrics.
A few things stand out:
Source: Siblis Research; Morningstar; CSI Data. Calculations by Newfound Research. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.
At this point, we have to ask: is there something special about the Relative CAPE that makes it inherently superior to other metrics?
A Big Bubble-Based Bet?
If we take a step back for a moment, it is worth asking ourselves a simple question: what would it take for a sector rotation strategy to out-perform the S&P 500 over the last decade?
With the benefit of hindsight, we know Consumer Discretionary and Technology have led the pack, while traditionally stodgy sectors like Consumer Staples and Utilities have lagged behind (though not nearly as poorly as Energy).
As we mentioned earlier, a naïve rank on the CAPE ratio would almost certainly prefer Utilities and Staples over Technology and Discretionary. Thus, for us to outperform the market, we must somehow construct a value metric that identifies the two most chronically expensive sectors (ignoring back-dated valuations for the new Communication Services sector) as being among the cheapest.
This is where dividing by the rolling 20-year average comes into play. In spirit, it makes a certain degree of sense. In practice, however, this plays out perfectly for Technology, which went through such an enormous bubble in the late 1990s that the 20-year average was meaningfully skewed upward by an outlier event. Thus, for almost the entire 20-year period after the dot-com bubble, Technology appears to be relatively cheap by comparison. After all, you can buy for 30x earnings today what you used to be able to buy for 180x!
The result is a significant – and near-permanent tilt – towards Technology since the beginning of 2012, which can be seen in the graph of strategy weights below.
One way to explore the impact of this choice is calculate the weight differences between a top-4 CAPE strategy and a top-4 Relative CAPE strategy, which we also plot below. We can see that after early 2012, the Relative CAPE strategy is structurally overweight Technology and underweight Financials and Utilities. Prior to 2008, we can see that it is structurally underweight Energy and overweight Consumer Staples.
If we take these weights and use them to construct a return stream, we can isolate the return impact the choice of using Relative CAPE versus CAPE has. Interestingly, the long Technology / short Financials & Utilities trade did not appear to generate meaningful out-performance in the post-2012 era, suggesting that something else is responsible for post-2012 performance.
Source: Siblis Research; Morningstar; CSI Data. Calculations by Newfound Research. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.
The Miraculous Mojo of Momentum
This is where the 12-month momentum filter plays a crucial role. Narratively, it is to avoid value traps. Practically, it helps the strategy deftly dodge Financials in 2008, avoiding a significant melt-down in one of the S&P 500’s largest sectors.
Now, you might think that valuations alone should have allowed the strategy to avoid Technology in the dot-com fallout. As it turns out, the Technology CAPE fell so precipitously that in using the Relative CAPE metric the Technology sector was still ranked as one of the top five cheapest sectors from 3/2001 to 11/2002. The only way the strategy was able to avoid it? The momentum filter.
Removing this filter makes the relative results a lot less attractive. Below we re-plot the relative performance of a simple “top 4” Relative CAPE strategy.
Source: Siblis Research; Morningstar; CSI Data. Calculations by Newfound Research. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.
Just how much impact does the momentum filter have? We can isolate the effect by taking the weights of the Shiller strategy and subtracting the weights of the Value strategy to construct a long/short index that isolates the effect. Below we plot the returns of this index.
It should be noted that the legs of the long/short portfolio only have a notional exposure of 25%, as that is the most the Value and Shiller strategies can deviate by. Nevertheless, even with this relatively small weight, when isolated the filter generates an annualized return of 1.8% per year with an annualized volatility of 4.8% and a maximum drawdown of 11.6%.
Scaled to a long/short with 100% notional per leg, annualized returns jump to 6.0%. Though volatility and maximum drawdown both climb to 20.4% and 52.6% respectively.
Source: Siblis Research; Morningstar; CSI Data. Calculations by Newfound Research. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.
Conclusion
Few, if any, systematic value strategies have performed well as of late. When one does – as with the Shiller CAPE sector rotation strategy – it is worth further review.
As a brief summary of our findings:
Taken all together, it is hard to not question whether these results are unintentionally datamined. Unfortunately, we just do not have enough data to extend the tests further back in time for truly out-of-sample analysis.
What we can say, however, is that the backtested and live performance hinges almost entirely a few key trades:
Three of these four trades are driven by the momentum filter. When we further consider that the Shiller strategy is in effect the returns of the pure value implementation – which suffered in the dot-com run-up and was a mostly random walk thereafter – and the returns of the isolated momentum filter, it becomes rather difficult to call this a value strategy at all.
As of the date of this document, neither Newfound Research nor Corey Hoffstein holds a position in the securities discussed in this article and do not have any plans to trade in such securities. Newfound Research and Corey Hoffstein do not take a position as to whether this security should be recommended for any particular investor.