This post is available as a PDF download here.

Summary

Systematic value strategies have struggled in the post-2008 environment, so one that has performed well catches our eye.
The Barclays Shiller CAPE sector rotation strategy – a value-based sector rotation strategy – has out-performed the S&P 500 by 267 basis points annualized since it launched in 2012.
The strategy applies a unique Relative CAPE metric to account for structural differences in sector valuations as well as a momentum filter that seeks to avoid “value traps.”
In an effort to derive the source of out-performance, we explore various other valuation metrics and model specifications.
We find that what has actually driven performance in the past may have little to do with value at all.

It is no secret that systematic value investing of all sorts has struggled as of late. With the curious exception, that is, of the Barclays Shiller CAPE sector rotation strategy, a strategy explored by Bunn, Staal, Zhuang, Lazanas, Ural and Shiller in their 2014 paper Es-cape-ing from Overvalued Sectors: Sector Selection Based on the Cyclically Adjusted Price-Earnings (CAPE) Ratio. Initial performance suggests that the idea has performed quite well out-of-sample, which stands out among many “smart-beta” strategies which have failed to live up to their backtests.

Source: CSI Data. Calculations by Newfound Research. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

Why is this strategy finding success where other value strategies have not? That is what we aim to explore in this commentary.

On a monthly basis, the Shiller CAPE sector rotation portfolio is rebalanced into an equal-weight allocation across four of the ten primary GICS sectors. The four are selected first by ranking the 10 primary sectors based upon their Relative CAPE ratios and choosing the cheapest five sectors. Of those cheapest five sectors, the sector with the worst trailing 12-month return (“momentum”) is removed.

The CAPE ratio – standing for Cyclically-Adjusted Price-to-Earnings ratio – is the current price divided by the 10-year moving average of inflation-adjusted earnings. The purpose of this smoothing is to reduce the impact of business cycle fluctuations.

The potential problem with using the raw CAPE value for each sector is that certain sectors have structurally higher and lower CAPE ratios than their peers. High growth sectors – e.g. Technology – tend to have higher CAPE ratios because they reinvest a substantial portion of their earnings while more stable sectors – e.g. Utilities – tend to have much lower CAPE ratios. Were we to simply sort sectors based upon their current CAPE ratio, we would tend to create structural over- and under-weights towards certain sectors.

To adjust for this structural difference, the strategy uses the idea of a Relative CAPE ratio, which is calculated by taking the current CAPE ratio and dividing it by a rolling 20-year average CAPE ratio¹ for that sector. The thesis behind this step is that dividing by a long-term mean normalizes the sectors and allows for better comparison. Relative CAPE values above 1 mean that the sector is more expensive than it has historically been, while values less than 1 mean it is cheaper.

It is important to note here that the actual selection is still performed on a cross-sector basis. It is entirely possible that all the sectors appear cheap or expensive on a historical basis at the same time. The portfolio will simply pick the cheapest sectors available.

Poking and Prodding the Parameters

With an understanding of the rules, our first step is to poke and prod a bit to figure out what is really driving the strategy.

We begin by first exploring the impact of using the Relative CAPE ratio versus just the CAPE ratio.

For each of these ratios, we’ll plot two strategies. The first is a naïve Value strategy, which will equally-weight the four cheapest sectors. The second is the Shiller strategy, which chooses the top five cheapest sectors and drops the one with the worst momentum. This should provide a baseline for comparing the impact of the momentum filter.

Strategy returns are plotted relative to the S&P 500.

Source: Siblis Research; Morningstar; CS Data. Calculations by Newfound Research. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

For the Relative CAPE ratio, we also vary the lookback period for calculating the rolling average CAPE from 5- to 20-years.

Source: Siblis Research; Morningstar; CSI Data. Calculations by Newfound Research. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

A few things immediately stand out:

Interestingly, standard CAPE actually appears to perform better than Relative CAPE for both the traditional value and Shiller implementations.
The Relative CAPE approach fared much more poorly from 2004-2007 than the simple CAPE approach.
There is little difference in performance for the Value and Shiller strategy for standard CAPE, but a meaningful difference for Relative CAPE.
While standard CAPE value has stagnant relative performance since 2007, Relative CAPE appears to continue to work for the Shiller approach.
A naïve value implementation seems to perform quite poorly for Relative CAPE, while the Shiller strategy appears to perform rather well.
There is meaningful performance dispersion based upon the lookback period, with longer-dated lookbacks (darker shades) appearing to perform better than shorter-period lookbacks (lighter shades) for the Relative CAPE variation.

The second-to-last point is particularly curious, as it implies that using momentum to “avoid the value trap” creates significant value (no pun intended; okay, pun intended) for the strategy.

Varying the Value Metric (in Vain)

To gain more insight, we next test the impact of the choice of the CAPE ratio. Below we plot the relative returns of different Shiller-based strategies (again varying lookbacks from 5- to 20-years), but use price-to-book, trailing 12-month price-to-earnings, and trailing 12-month EV/EBITDA as our value metrics.

A few things stand out:

Value-based sector rotation seems to have “worked” from 2000 to 2009, regardless of our metric of choice.
Almost all value-based strategies appear to exhibit significant relative out-performance during the dot-com and 2008 recessions.
After 2009, most value strategies appear to exhibit random relative performance versus the S&P 500.
All three approaches appear to suffer since 2016.

Source: Siblis Research; Morningstar; CSI Data. Calculations by Newfound Research. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

At this point, we have to ask: is there something special about the Relative CAPE that makes it inherently superior to other metrics?

A Big Bubble-Based Bet?

If we take a step back for a moment, it is worth asking ourselves a simple question: what would it take for a sector rotation strategy to out-perform the S&P 500 over the last decade?

With the benefit of hindsight, we know Consumer Discretionary and Technology have led the pack, while traditionally stodgy sectors like Consumer Staples and Utilities have lagged behind (though not nearly as poorly as Energy).

As we mentioned earlier, a naïve rank on the CAPE ratio would almost certainly prefer Utilities and Staples over Technology and Discretionary. Thus, for us to outperform the market, we must somehow construct a value metric that identifies the two most chronically expensive sectors (ignoring back-dated valuations for the new Communication Services sector) as being among the cheapest.

This is where dividing by the rolling 20-year average comes into play. In spirit, it makes a certain degree of sense. In practice, however, this plays out perfectly for Technology, which went through such an enormous bubble in the late 1990s that the 20-year average was meaningfully skewed upward by an outlier event. Thus, for almost the entire 20-year period after the dot-com bubble, Technology appears to be relatively cheap by comparison. After all, you can buy for 30x earnings today what you used to be able to buy for 180x!

The result is a significant – and near-permanent tilt – towards Technology since the beginning of 2012, which can be seen in the graph of strategy weights below.

One way to explore the impact of this choice is calculate the weight differences between a top-4 CAPE strategy and a top-4 Relative CAPE strategy, which we also plot below. We can see that after early 2012, the Relative CAPE strategy is structurally overweight Technology and underweight Financials and Utilities. Prior to 2008, we can see that it is structurally underweight Energy and overweight Consumer Staples.

If we take these weights and use them to construct a return stream, we can isolate the return impact the choice of using Relative CAPE versus CAPE has. Interestingly, the long Technology / short Financials & Utilities trade did not appear to generate meaningful out-performance in the post-2012 era, suggesting that something else is responsible for post-2012 performance.

Source: Siblis Research; Morningstar; CSI Data. Calculations by Newfound Research. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

The Miraculous Mojo of Momentum

This is where the 12-month momentum filter plays a crucial role. Narratively, it is to avoid value traps. Practically, it helps the strategy deftly dodge Financials in 2008, avoiding a significant melt-down in one of the S&P 500’s largest sectors.

Now, you might think that valuations alone should have allowed the strategy to avoid Technology in the dot-com fallout. As it turns out, the Technology CAPE fell so precipitously that in using the Relative CAPE metric the Technology sector was still ranked as one of the top five cheapest sectors from 3/2001 to 11/2002. The only way the strategy was able to avoid it? The momentum filter.

Removing this filter makes the relative results a lot less attractive. Below we re-plot the relative performance of a simple “top 4” Relative CAPE strategy.

Source: Siblis Research; Morningstar; CSI Data. Calculations by Newfound Research. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

Just how much impact does the momentum filter have? We can isolate the effect by taking the weights of the Shiller strategy and subtracting the weights of the Value strategy to construct a long/short index that isolates the effect. Below we plot the returns of this index.

It should be noted that the legs of the long/short portfolio only have a notional exposure of 25%, as that is the most the Value and Shiller strategies can deviate by. Nevertheless, even with this relatively small weight, when isolated the filter generates an annualized return of 1.8% per year with an annualized volatility of 4.8% and a maximum drawdown of 11.6%.

Scaled to a long/short with 100% notional per leg, annualized returns jump to 6.0%. Though volatility and maximum drawdown both climb to 20.4% and 52.6% respectively.

Source: Siblis Research; Morningstar; CSI Data. Calculations by Newfound Research. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

Conclusion

Few, if any, systematic value strategies have performed well as of late. When one does – as with the Shiller CAPE sector rotation strategy – it is worth further review.

As a brief summary of our findings:

Despite potential structural flaws in measuring cross-sectional sector value, CAPE outperformed Relative CAPE for a naïve rank-based value strategy.
There is significant dispersion in results using the Relative CAPE metric depending upon which lookback parameterization is selected.Initial tests suggest that the longer lookbacks appear to have been more effective.
Using valuation metrics other than CAPE – e.g. P/B, P/E (TTM), and EV/EBITDA (TTM) – do not appear as effective in recent years.
Longer lookbacks allow the Relative CAPE methodology to create a structural overweight to the Technology sector over the last 15 years.
The momentum filter plays a crucial role in avoiding the Technology sector in 2001-2002 and the Financial sector in 2008.

Taken all together, it is hard to not question whether these results are unintentionally datamined. Unfortunately, we just do not have enough data to extend the tests further back in time for truly out-of-sample analysis.

What we can say, however, is that the backtested and live performance hinges almost entirely a few key trades:

Avoiding Technology in 2001-2002 due to the momentum filter.
Avoiding Financials in 2008 due to the momentum filter.
Avoiding a Technology underweight in recent years due to an inflated “average” historical CAPE due to the dot-com bubble.
Avoiding Energy in 2014-2016 due to the momentum filter.

Three of these four trades are driven by the momentum filter. When we further consider that the Shiller strategy is in effect the returns of the pure value implementation – which suffered in the dot-com run-up and was a mostly random walk thereafter – and the returns of the isolated momentum filter, it becomes rather difficult to call this a value strategy at all.

As of the date of this document, neither Newfound Research nor Corey Hoffstein holds a position in the securities discussed in this article and do not have any plans to trade in such securities. Newfound Research and Corey Hoffstein do not take a position as to whether this security should be recommended for any particular investor.

Using PMI to Trade Cyclicals vs Defensives

By Corey Hoffstein

On August 19, 2019

In Risk & Style Premia, Weekly Commentary

This blog post is available as a PDF download here.

Summary

After stumbling across a set of old research notes from 2009 and 2012, we attempt to implement a Cyclicals versus Defensives sector trade out-of-sample.
Post-2012 returns prove unconvincing and we find little evidence supporting the notion that PMI changes can be used for constructing this trade.
Using data from the Kenneth French website, we extend the study to 1948, and similarly find that changes in PMI (regardless of lookback period) are not an effective signal for trading Cyclical versus Defensive sectors.

I love coming across old research because it allows for truly out-of-sample testing.

Earlier this week, I stumbled across a research note from 2009 and a follow-up note from 2012, both exploring the use of macro-based signals for constructing dollar-neutral long/short sector trades. Specifically, the pieces focused on using manufacturing Purchasing Manager Indices (PMIs) as a predictor for Cyclical versus Defensive sectors.¹

The strategy outlined is simple: when the prior month change in manufacturing PMI is positive, the strategy is long Cyclicals and short Defensives; when the change is negative, the strategy is long Defensives and short Cyclicals. The intuition behind this signal is that PMIs provide a guide to hard economic activity.

The sample period for the initial test is from 1998 to 2009, a period over which the strategy performed quite well on a global basis and even better when using the more forward-looking ratio of new orders to inventory.

Red flags start to go up, however, when we read the second note from 2012. “It appears that the new orders-to-inventory ratio has lost its ability to forecast the output index.” “In addition, the optimal lookback period … has shifted from one to two months.”

At this point, we can believe one of a few things:

The initial strategy works, has simply hit a rough patch in the three years after publishing, and will work again in the future.
The initial strategy worked but has broken since publishing.
The initial strategy never worked and was an artifact of datamining.

I won’t even bother addressing the whole “one-month versus two-month” comment. Long-time readers know where we come down on ensembles versus parameter specification…

Fortunately, we do not have to pass qualitative judgement: we can let the numbers speak for themselves.

While the initial notes focused on global implementation, we can rebuild the strategy using U.S. equity sectors and US manufacturing PMI as the driving signal. This will serve both as an out-of-sample test for assets, as well as provide approximately 7 more years of out-of-sample time to evaluate returns.

Below we plot the results of this strategy for both 1-month and 2-month lookback periods, highlighting the in-sample and out-of-sample periods for each specification based upon the date the original research notes were published. We use the State Street SPDR Sector Select ETFs as our implementation vehicles, with the exception of the iShares Dow Jones US Telecom ETF.

Source: CSI Data; Quandl. Calculations by Newfound Research. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

The first thing we notice is that the original 1-month implementation – which appeared to work on a global scale – does not seem particularly robust when implemented with U.S. sectors. Post publish date results do not fare much better.

The 2-month specification, however, does appear to work reasonably well both in- and out-of-sample.

But is there something inherently magical about that two-month specification? We are hard-pressed to find a narrative explanation.

If we plot lookback specifications from 3- to 12-months, we see that the 2-month specification proves to be a significant outlier. Given the high correlation between all the other specifications, it is more likely that the 2-month lookback was the beneficiary of luck rather than capturing a special particular edge.

Source: CSI Data; Quandl. Calculations by Newfound Research. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

Perhaps we’re not giving this idea enough breathing room. After all, were we to evaluate most value strategies in the most recent decades, we’d likely declare them insignificant as well.

With manufacturing PMI data extending back to the 1948, we can use sector index data from the Kenneth French website to reconstruct this strategy.

Unfortunately, the Kenneth French definitions do not match GICs perfectly, so we have to change the definition of Cyclicals and Defensives slightly. Using the Kenneth French data, we will define Cyclicals to be Manufacturing, Non-Durables, Technology, and Shops. Defensives are defined to be Durables, Telecom, Health Care, and Utilities.

We use the same strategy as before, going long Cyclicals and short Defensives when changes in PMI are positive, and short Cyclicals and long Defensives when changes to PMI are negative. We again vary the lookback period from 1- to 12-months.

Source: Kenneth French Data Library; Quandl. Calculations by Newfound Research. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

The results are less than convincing. Not only do we see significant dispersion across implementations, but there is also no consistency in those implementations that do well versus those that do not.

Perhaps worse, the best performing variation only returned a paltry 1.40% annualized gross of any implementation costs. Once we start accounting for transaction costs, slippage, and management fees, this figure deflates towards zero rather quickly.

Source: Kenneth French Data Library; Quandl. Calculations by Newfound Research. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

Conclusion

There is no shortage of quantitative research in the market and the research can be particularly compelling when it seems to fit a pre-existing narrative.

Cyclicals versus Defensives are a perfect example. Their very names imply the regimes during which they are supposed to add value, but actually translating this notion into a robust strategy proves to be less than easy.

I would make the philosophical argument that it quite simply cannot be easy. Consider the two pieces of information we need to believe for this strategy to work:

Cyclicals outperform Defensives in an economic expansion and Defensives outperform Cyclicals in an economic contraction.
We can forecast economic expansions and contractions before it is priced into the market.

If we have very high confidence in both statements, it effectively implies an arbitrage.

Therefore, if we have very high confidence in the truth of the first statement, then for markets to be reasonably efficient, we must have little confidence in the second statement.

Similarly, if we have high confidence in the trust of the second statement, then for markets to be reasonably efficient, we must have little confidence in the first statement.

Thus, a more reasonable expectation might be that Cyclicals tend to outperform Defensives during an expansion, and Defensives tend to outperform Cyclicals in a contraction, but there may be meaningful exceptions depending upon the particular cycle.

Furthermore, we may believe we have an edge in forecasting expansions and contractions (perhaps not with just PMI, though), but there will be many false positives and false negatives along the way.

Taken together, we might believe we can construct such a strategy, but errors in both assumptions will lead to periods of frustration. However, we should recognize that for such an “open secret” strategy to work in the long run, there have to be troughs of sorrow deep enough to avoid permanent crowding.

In this case, we believe there is little evidence to suggest that level changes in PMI provide particular insight into Cyclicals versus Defensives, but that does not mean there are no macro signals that might.

Your Style-age May Vary

By Corey Hoffstein

On August 12, 2019

In Portfolio Construction, Risk & Style Premia, Weekly Commentary

This post is available as PDF download here.

Summary

New research from Axioma suggests that tilting less – through lower target tracking error – can actually create more academically pure factor implementation in long-only portfolios.
This research highlights an important question: how should long-only investors think about factor exposure in their portfolios?Is measuring against an academically-constructed long/short portfolio really appropriate?
We return to the question of style versus specification, plotting year-to-date excess returns for long-only factor ETFs.While the general style serves as an anchor, we find significant specification-driven performance dispersion.
We believe that the “right answer” to this dispersion problem largely depends upon the investor.

When quants speak about factor and style returns, we often do so with some sweeping generalizations. Typically, we’re talking about some long/short specification, but precisely how that portfolio is formed can vary.

For example, one firm might look at deciles while another looks at quartiles. One shop might equal-weight the holdings while another value-weights them. Some might include mid- and small-caps, while others may work on a more realistic liquidity-screened universe.

More often than not, the precision does not matter a great deal (with the exception of liquidity-screening) because the general conclusion is the same.

But for investors who are actually realizing these returns, the precision matters quite a bit. This is particularly true for long-only investors, who have adopted smart-beta ETFs to tap into the factor research.

As we have discussed in the past, any active portfolio can be decomposed into its benchmark plus a dollar-neutral long/short portfolio that encapsulates the active bets. The active bets, then, can actually approach the true long/short implementation.

To a point, at least. The “shorts” will ultimately be constrained by the amount the portfolio can under-weight a given security.

For long-only portfolios, increasing active share often means having to lean more heavily into the highest quintile or decile holdings. This is not a problem in an idealized world where factor scores have a monotonically increasing relationship with excess returns. In this perfect world, increasing our allocation to high-ranking stocks creates just as much excess return as shorting low-ranking stocks does.

Unfortunately, we do not live in a perfect world and for some factors the premium found in long/short portfolios is mostly found on the short side.¹ For example, consider the Profitability Factor. The annualized spread between the top- and bottom-quintile portfolios is 410 basis points. The difference between the top quintile portfolio and the market, though, is just 154 basis points. Nothing to scoff at, but when appropriately discounted for data-mining risk, transaction costs, and management costs, there is not necessarily a whole lot left over.

Which leads to some interesting results for portfolio construction, at least according to a recent study by Axioma.² For factors where the majority of the premium arises from the short side, tilting less might mean achieving more.

For example, Axioma found that a portfolio optimized maximize exposure to the profitability factor while targeting a tracking error to the market of just 10 basis points had a meaningfully higher correlation than the excess returns of a long-only portfolio that simply bought the top quintile. In fact, the excess returns of the top quintile portfolio had zero correlation to the long/short factor returns. Let’s repeat that: the active returns of the top quintile portfolio had zero correlation to the returns of the profitability factor. Makes us sort of wonder what we’re actually buying…

Source: Kenneth French Data Library; Calculations by Newfound Research.

Cumulative Active Returns of Long-Only Portfolios

So, what does it actually mean for long-only investors when we plot long/short equity factor returns? When we see that the Betting-Against-Beta (“BAB”) factor is up 3% on the year, what does that imply for our low-volatility factor ETF? Momentum (“UMD”) was down nearly 10% earlier this year; were long-only momentum ETFs really under-performing by that much?

And what does this all mean for the results in those fancy factor decomposition reports the nice consultants from the big asset management firms have been running for me over the last couple of years?

Source: AQR. Calculations by Newfound Research.

We find ourselves back to a theme we’ve circled many times over the last few years: style versus specification. Choices such as how characteristics are measured, portfolio concentration, the existence or absence of position- and industry/sector-level constraints, weighting methodology, and rebalance frequency (and even date!) can have a profound impact on realized results. The little details compound to matter quite a bit.

To highlight this disparity, below we have plotted the excess return of an equally-weighted portfolio of long-only style ETFs versus the S&P 500 as well as a standard deviation cone for individual style ETF performance.

While most of the ETFs are ultimately anchored to their style, we can see that short-term performance can meaningfully deviate.

Source: CSI Analytics. Calculations by Newfound Research. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes, with the exception of underlying ETF expense ratios. Past performance is not an indicator of future results. Year-to-Date returns are computed by assuming an equal-weight allocation to representative long-only ETFs for each style. Returns are net of underlying ETF expense ratios. Returns are calculated in excess of the SPDR&P 500 ETF (“SPY”). The ETFs used for each style are (in alphabetical order): Value: FVAL, IWD, JVAL, OVLU, QVAL, RPV, VLU, VLUE; Size: IJR, IWM, OSIZ; Momentum: FDMO, JMOM, MMTM, MTUM, OMOM, QMOM, SPMO; Low Volatility: FDLO, JMIN, LGLV, OVOL, SPLV, SPMV, USLB, USMV; Quality; FQAL, JQUA, OQAL, QUAL, SPHQ; Yield: DVY, FDVV, JDIV, OYLD, SYLD, VYM; Growth: CACG, IWF, QGRO, RPG, SCHG, SPGP, SPYG; Trend: BEMO, FVC, LFEQ, PTLC. Newfound may hold positions in any of the above securities.

Conclusion

In our opinion, the research and data outlined in this commentary suggests a few potential courses of action for investors.

For certain styles, we might consider embracing smaller tilts for purer factor exposure.
To avoid specification risk, we might embrace the potential benefits of multi-manager diversification.
Or, if there is a particular approach we prefer, simply acknowledge that it may not behave anything like the academic long/short definition – or even other long-only implementations – in the short-term.

Academically, we might be able to argue for one approach over another. Practically, the appropriate solution is whatever is most suitable for the investor and the approach that they will be able to stick with.

If a client measures their active returns with respect to academic factors, then understanding how portfolio construction choices deviate from the factor definitions will be critical.

An advisor trying to access a style but not wanting to risk choosing the wrong ETF might consider asking themselves, “why choose?” Buying a basket of a few ETFs will do wonders to reduce specification risk.

On the other hand, if an investor is simply trying to maximize their compound annualized return and nothing else, then a concentrated approach may very well be warranted.

Whatever the approach taken, it is important to remember that results between two strategies that claim to implement the same style can and will deviate significantly, especially in the short run.

Harvesting the Bond Risk Premium

By Nathan Faber

On August 5, 2019

In Craftsmanship, Portfolio Construction, Risk & Style Premia, Risk Management, Term, Weekly Commentary

This post is available as a PDF download here.

Summary

The bond risk premium is the return that investors earn by investing in longer duration bonds.
While the most common way that investors can access this return stream is through investing in bond portfolios, bonds often significantly de-risk portfolios and scale back returns.
Investors who desire more equity-like risk can tap into the bond risk premium by overlaying bond exposure on top of equities.
Through the use of a leveraged ETP strategy, we construct a long-only bond risk premium factor and investigate its characteristics in terms of rebalance frequency and timing luck.
By balancing the costs of trading with the risk of equity overexposure, investors can incorporate the bond risk premium as a complementary factor exposure to equities without sacrificing return potential from scaling back the overall risk level unnecessarily.

The discussion surrounding factor investing generally pertains to either equity portfolios or bond portfolios in isolation. We can calculate value, momentum, carry, and quality factors for each asset class and invest in the securities that exhibit the best characteristics of each factor or a combination of factors.

There are also ways to use these factors to shift allocations between stocks and bonds (e.g. trend and standardizing based on historical levels). However, we do not typically discuss bonds as their own standalone factor.

The bond risk premium – or term premium – can be thought of as the premium investors earn from holding longer duration bonds as opposed to cash. In a sense, it is a measure of carry. Its theoretical basis is generally seen to be related to macroeconomic factors such as inflation and growth expectations.¹

While timing the term premium using factors within bond duration buckets is definitely a possibility, this commentary will focus on the term premium in the context of an equity investor who wants long-term exposure to the factor.

The Term Premium as a Factor

For the term premium, we can take the usual approach and construct a self-financing long/short portfolio of 100% intermediate (7-10 year) U.S. Treasuries that borrows the entire portfolio value at the risk-free rate.

This factor, shown in bold in the chart below, has exhibited a much tamer return profile than common equity factors.

Source: CSI Analytics, AQR, and Bloomberg. Calculations by Newfound Research. Data from 1/31/1992 to 6/28/2019. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

But over the entire time period, its returns have been higher than those of both the Size and Value factors. Its maximum drawdown has been less than 40% of that of the next best factor (Quality), and it is worth acknowledging that its volatility – which is generally correlated to drawdown for highly liquid assets with non-linear payoffs – has also been substantially lower.

The term premium also has exhibited very low correlation with the other equity factors.

Source: CSI Analytics, AQR, and Bloomberg. Calculations by Newfound Research. Data from 1/31/1992 to 6/28/2019. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

A Little Free Lunch

Whether we are treating bonds as factor or not, they are generally the primary way investors seek to diversify equity portfolios.

The problem is that they are also a great way to reduce returns during most market environments through their inherently lower risk.

Anytime that an asset with lower volatility is added to a portfolio, the risk will be reduced. Unless the asset class also has a particularly high Sharpe ratio, maintaining the same level of return is virtually impossible even if risk-adjusted returns are improved.

In a 2016 paper², Salient broke down this reduction in risk into two components: de-risking and the “free lunch” affect.

The reduction in risk form the free lunch effect is desirable, but the risk reduction from de-risking may or may not be desirable, depending on the investor’s target risk profile.

The following chart shows the volatility breakdown of a range of portfolios of the S&P 500 (IVV) and 7-10 Year U.S. Treasuries (IEF).

Source: CSI Analytics and Bloomberg. Calculations by Newfound Research. Data from 1/31/1992 to 6/28/2019. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

Moving from an all equity portfolio to a 50/50 equity reduces the volatility from 14.2% to 7.4%. But only 150 bps of this reduction is from the free lunch effect that stems from the lower correlation between the two assets (-0.18). The remaining 530 bps of volatility reduction is simply due to lower risk.

In this case, annualized returns were dampened from 9.6% to 7.8%. While the Sharpe ratio climbed from 0.49 to 0.70, an investor seeking higher risk would not benefit without the use of leverage.

Despite the strong performance of the term premium factor, risk-seeking investors (e.g. those early in their careers) are generally reluctant to tap into this factor too much because of the de-risking effect.

How do investors who want to bear risk commensurate with equities tap into the bond risk premium without de-risking their portfolio?

One solution is using leveraged ETPs.

Long-Only Term Premium

By taking a 50/50 portfolio of the 2x Levered S&P 500 ETF (SSO) and the 2x Levered 7-10 Year U.S. Treasury ETF (UST), we can construct a portfolio that has 100% equity exposure and 100% of the term premium factor.³

But managing this portfolio takes some care.

Left alone to drift, the allocations can get very far away from their target 50/50, spanning the range from 85/15 to 25/75. Periodic rebalancing is a must.

Source: CSI Analytics and Bloomberg. Calculations by Newfound Research. Data from 1/31/1992 to 6/28/2019. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

Of course, now the question is, “How frequently should we rebalance the portfolio?”

This boils down to a balancing act between performance and costs (e.g. ticket charges, tax impacts, operational burden, etc.).

On one hand, we would like to remain as close to the 50/50 allocation as possible to maintain the desired exposure to each asset class. However, this could require a prohibitive amount of trading.

From a performance standpoint, we see improved results with longer holding periods (take note of the y-axes in the following charts; they were scaled to highlight the differences).

Source: CSI Analytics and Bloomberg. Calculations by Newfound Research. Data from 1/31/1992 to 6/28/2019. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

The returns do not show a definitive pattern based on rebalance frequency, but the volatility decreases with increasing time between rebalances. This seems like it would point to waiting longer between rebalances, which would be corroborated by a consideration of trading costs.

The issues with waiting longer between the rebalance are twofold:

Waiting longer is essentially a momentum trade. The better performing asset class garners a larger allocation as time progresses. This can be a good thing – especially in hindsight with how well equities have done – but it allows the portfolio to become overexposed to factors that we are not necessarily intending to exploit.
Longer rebalances are more exposed to timing luck. For example, a yearly rebalance may have done well from a performance perspective, but the short-term performance could vary by as much as 50,000 bps between the best performing rebalance month and the worst! The chart below shows the performance of each iteration relative to the median performance of the 12 different monthly rebalance strategies.

Source: CSI Analytics and Bloomberg. Calculations by Newfound Research. Data from 1/31/1992 to 6/28/2019. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

As the chart also shows, tranching can help mitigate timing luck. Tranching also gives the returns of the strategies over the range of rebalance frequencies a more discernible pattern, with longer rebalance period strategies exhibiting slightly higher returns due to their higher average equity allocations.

Under the assumption that we can tranche any strategy that we choose, we can now compare only the tranched strategies at different rebalance frequencies to address our concern with taking bets on momentum.

Pausing for a minute, we should be clear that we do not actually know what the true factor construction should be; it is a moving target. We are more concerned with robustness than simply trying to achieve outperformance. So we will compare the strategies to the median performance of the previously monthly offset annual rebalance strategies.

The following charts shows the aggregate risk of short-term performance deviations from this benchmark.

The first one shows the aggregate deviations, both positive and negative, and the second focuses on only the downside deviation (i.e. performance that is worse than the median).⁴

Both charts support a choice of rebalance frequency somewhere in the range of 3-6 months.

Source: CSI Analytics and Bloomberg. Calculations by Newfound Research. Data from 1/31/1992 to 6/28/2019. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

With the rebalance frequency set based on the construction of the factor, the last part is a consideration of costs.

Unfortunately, this is more situation-specific (e.g. what commissions does your platform charge for trades?).

From an asset manager point-of-view, where we can trade with costs proportional to the size of the trade, execute efficiently, and automate much of the operational burden, tranching is our preferred approach.

We also prefer this approach over simply rebalancing back to the static 50/50 allocation more frequently.

In our previous commentary on constructing value portfolios to mitigate timing luck, we described how tranching monthly is a different decision than rebalancing monthly and that tranching frequency and rebalance frequency are distinct decisions.

We see the same effect here where we plot the monthly tranched annually rebalanced strategy (blue line) and the strategy rebalanced back to 50/50 every month (orange line).

Source: CSI Analytics and Bloomberg. Calculations by Newfound Research. Data from 1/31/1992 to 6/28/2019. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

Tranching wins out.

However, since the target for the term premium factor is a 50/50 static allocation, running a simple allocation filter to keep the portfolio weights within a certain tolerance can be a way to implement a more dynamic rebalancing model while reducing costs.

For example, rebalancing when the allocations for SSO and UST we outside a 5% band (i.e. the portfolio was beyond a 55/45 or 45/55) achieved better performance metrics than the monthly rebalanced version with an average of only 3 rebalances per year.

Conclusion

The bond term premium does not have to be reserved for risk-averse investors. Investors desiring portfolios tilted heavily toward equities can also tap into this diversifying return stream as a factor within their portfolio.

Utilizing leveraged ETPs is one way to maintaining exposure to equities while capturing a significant portion of the bond risk premium. However, it requires more oversight than investing in other factors such as value, momentum, and quality, which are typically packaged in easy-to-access ETFs.

If a fixed frequency rebalance approach is used, tranching is an effective way to reduce timing risk, especially when markets are volatile. Aside from tranching, we find that, historically, holding periods between 3 and 6 months yield results close in line with the median rolling short-term performance of the individual strategies. Implementing a methodology like this can reduce the risk of poor luck in choosing the rebalance frequency or starting the strategy at an unfortunate time.

If frequent rebalances – like those seen with tranching – are infeasible, a dynamic schedule based on a drift in allocations is also a possibility.

Leveraged ETPs are often seen as risk trading instruments that are not fit for retail investors who are more focused on buy-and-hold systems. However, given the right risk management, these investment vehicles can be a way for investors to access the bond term premium, getting a larger free lunch, and avoiding undesired de-risking along the way.

Timing Luck and Systematic Value

By Corey Hoffstein

On July 29, 2019

In Craftsmanship, Risk & Style Premia, Value, Weekly Commentary

This post is available as a PDF download here.

Summary

We have shown many times that timing luck – when a portfolio chooses to rebalance – can have a large impact on the performance of tactical strategies.
However, fundamental strategies like value portfolios are susceptible to timing luck, as well.
Once the rebalance frequency of a strategy is set, we can mitigate the risk of choosing a poor rebalance date by diversifying across all potential variations.
In many cases, this mitigates the risk of realizing poor performance from an unfortunate choice of rebalance date while achieving a risk profile similar to the top tier of potential strategy variations.
By utilizing strategies that manage timing luck, the investors can more accurately assess performance differences arising from luck and skill.

On August 7^th, 2013 we wrote a short blog post titled The Luck of Rebalance Timing. That means we have been prattling on about the impact of timing luck for over six years now (with apologies to our compliance department…).

(For those still unfamiliar with the idea of timing luck, we will point you to a recent publication from Spring Valley Asset Management that provides a very approachable introduction to the topic.¹)

While most of our earliest studies related to the impact of timing luck in tactical strategies, over time we realized that timing luck could have a profound impact on just about any strategy that rebalances on a fixed frequency. We found that even a simple fixed-mix allocation of stocks and bonds could see annual performance spreads exceeding 700bp due only to the choice of when they rebalanced in a given year.

In seeking to generalize the concept, we derived a formula that would estimate how much timing luck a strategy might have. The details of the derivation can be found in our paper recently published in the Journal of Index Investing, but the basic formula is:

Here T is strategy turnover, F is how many times per year the strategy rebalances, and S is the volatility of a long/short portfolio capturing the difference between what the strategy is currently invested in versus what it could be invested in.

We’re biased, but we think the intuition here works out fairly nicely:

The higher a strategy’s turnover, the greater the impact of our choice of rebalance dates. For example, if we have a value strategy that has 50% turnover per year, an implementation that rebalances in January versus one that rebalances in July might end up holding very different securities. On the other hand, if the strategy has just 1% turnover per year, we don’t expect the differences in holdings to be very large and therefore timing luck impact would be minimal.
The more frequently we rebalance, the lower the timing luck. Again, this makes sense as more frequent rebalancing limits the potential difference in holdings of different implementation dates. Again, consider a value strategy with 50% turnover. If our portfolio rebalances every other month, there are two potential implementations: one that rebalances January, March, May, etc. and one that rebalances February, April, June, etc. We would expect the difference in portfolio holdings to be much more limited than in the case where we rebalance only annually.²
The last term, S, is most easily explained with an example. If we have a portfolio that can hold either the Russell 1000 or the S&P 500, we do not expect there to be a large amount of performance dispersion regardless of when we rebalance or how frequently we do so. The volatility of a portfolio that is long the Russell 1000 and short the S&P 500 is so small, it drives timing luck near zero. On the other hand, if a portfolio can hold the Russell 1000 or be short the S&P 500, differences in holdings due to different rebalance dates can lead to massive performance dispersion. Generally speaking, S is larger for more highly concentrated strategies with large performance dispersion in their investable universe.

Timing Luck in Smart Beta

To date, we have not meaningfully tested timing luck in the realm of systematic equity strategies.³ In this commentary, we aim to provide a concrete example of the potential impact.

A few weeks ago, however, we introduced our Systematic Value portfolio, which seeks to deliver concentrated exposure to the value style while avoiding unintended process and timing luck bets.

To achieve this, we implement an overlapping portfolio process. Each month we construct a concentrated deep value portfolio, selecting just 50 stocks from the S&P 500. However, because we believe the evidence suggests that value is a slow-moving signal, we aim for a holding period between 3-to-5 years. To achieve this, our capital is divided across the prior 60 months of portfolios.⁴

Which all means that we have monthly snapshots of deep value⁵ portfolios going back to November 2012, providing us data to construct all sorts of rebalance variations.

The Luck of Annual Rebalancing

Given our portfolio snapshots, we will create annually rebalanced portfolios. With monthly portfolios, there are twelve variations we can construct: a portfolio that reconstitutes each January; one that reconstitutes each February; a portfolio that reconstitutes each March; et cetera.

Below we plot the equity curves for these twelve variations.

Source: CSI Analytics. Calculations by Newfound Research. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

We cannot stress enough that these portfolios are all implemented using a completely identical process. The only difference is when they run that process. The annualized returns range from 9.6% to 12.2%. And those two portfolios with the largest disparity rebalanced just a month apart: January and February.

To avoid timing luck, we want to diversify when we rebalance. The simplest way of achieving this goal is through overlapping portfolios. For example, we can build portfolios that rebalance annually, but allocate to two different dates. One portfolio could place 50% of its capital in the January rebalance index and 50% in the July rebalance index.

Another variation could place 50% of its capital in the February index and 50% in the August index.⁶ There are six possible variations, which we plot below.

The best performing variation (January and July) returned 11.7% annualized, while the worst (February and August) returned 9.7%. While the spread has narrowed, it would be dangerous to confuse 200bp annualized for alpha instead of rebalancing luck.

Source: CSI Analytics. Calculations by Newfound Research. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

We can go beyond just two overlapping portfolios, though. Below we plot the three variations that contain four overlapping portfolios (January-April-July-October, February-May-August-November, and March-June-September-December). The best variation now returns 10.9% annualized while the worst returns 10.1% annualized. We can see how overlapping portfolios are shrinking the variation in returns.

Finally, we can plot the variation that employs 12 overlapping portfolios. This variation returns 10.6% annualized; almost perfectly in line with the average annualized return of the underlying 12 variations. No surprise: diversification has neutralized timing luck.

Source: CSI Analytics. Calculations by Newfound Research. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

But besides being “average by design,” how can we measure the benefits of diversification?

As with most ensemble approaches, we see a reduction in realized risk metrics. For example, below we plot the maximum realized drawdown for annual variations, semi-annual variations, quarterly variations, and the monthly variation. While the dispersion is limited to just a few hundred basis points, we can see that the diversification embedded in the monthly variation is able to reduce the bad luck of choosing an unfortunate rebalance date.

Source: CSI Analytics. Calculations by Newfound Research. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

Just Rebalance more Frequently?

One of the major levers in the timing luck equation is how frequently the portfolio is rebalanced. However, we firmly believe that while rebalancing frequency impacts timing luck, timing luck should not be a driving factor in our choice of rebalance frequency.

Rather, rebalance frequency choices should be a function of the speed at which our signal decays (e.g. fast-changing signals such as momentum versus slow-changing signals like value) versus implementation costs (e.g. explicit trading costs, market impact, and taxes). Only after this choice is made should we seek to limit timing luck.

Nevertheless, we can ask the question, “how does rebalancing more frequently impact timing luck in this case?”

To answer this question, we will evaluate quarterly-rebalanced portfolios. The distinction here from the quarterly overlapping portfolios above is that the entire portfolio is rebalanced each quarter rather than only a quarter of the portfolio. Below, we plot the equity curves for the three possible variations.

Source: CSI Analytics. Calculations by Newfound Research. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.

The best performing variation returns 11.7% annualized while the worst returns 9.7% annualized, for a spread of 200 basis points. This is actually larger than the spread we saw with the three quarterly overlapping portfolio variations, and likely due to the fact that turnover within the portfolios increased meaningfully.

While we can see that increasing the frequency of rebalancing can help, in our opinion the choice of rebalance frequency should be distinct from the choice of managing timing luck.

Conclusion

In our opinion, there are at least two meaningful conclusions here:

The first is for product manufacturers (e.g. index issuers) and is rather simple: if you’re going to have a fixed rebalance schedule, please implement overlapping portfolios. It isn’t hard. It is literally just averaging. We’re all better off for it.

The second is for product users: realize that performance dispersion between similarly-described systematic strategies can be heavily influenced by when they rebalance. The excess return may really just be a phantom of luck, not skill.

The solution to this problem, in our opinion, is to either: (1) pick an approach and just stick to it regardless of perceived dispersion, accepting the impact of timing luck; (2) hold multiple approaches that rebalance on different days; or (3) implement an approach that accounts for timing luck.

We believe the first approach is easier said than done. And without a framework for distinguishing between timing luck and alpha, we’re largely making arbitrary choices.

The second approach is certainly feasible but has the potential downside of requiring more holdings as well as potentially forcing an investor to purchase an approach they are less comfortable with. For example, blending IWD (Russell 1000 Value), RPV (S&P 500 Pure Value), VLUE (MSCI U.S. Enhanced Value), and QVAL (Alpha Architect U.S. Quantitative Value) may create a portfolio that rebalances on many different dates (annual in May; annual in December; semi-annual in May and November; and quarterly, respectively), it also introduces significant process differences. Though research suggests that investors may benefit from further manager/process diversification.

For investors with conviction in a single strategy implementation, the last approach is certainly the best. Unfortunately, as far as we are aware, there are only a few firms who actively implement overlapping portfolios (including Newfound Research, O’Shaughnessy Asset Management, AQR, and Research Affiliates). Until more firms adopt this approach, timing luck will continue to loom large.

The Research Library of Newfound Research

Category: Risk & Style Premia Page 6 of 16

Es-CAPE Velocity: Value-Driven Sector Rotation

Summary

Poking and Prodding the Parameters

Varying the Value Metric (in Vain)

A Big Bubble-Based Bet?

The Miraculous Mojo of Momentum

Conclusion

Using PMI to Trade Cyclicals vs Defensives

Summary

Conclusion

Your Style-age May Vary

Summary

Conclusion

Harvesting the Bond Risk Premium

Summary

The Term Premium as a Factor

A Little Free Lunch

Long-Only Term Premium

Conclusion

Timing Luck and Systematic Value

Summary

Timing Luck in Smart Beta

The Luck of Annual Rebalancing

Just Rebalance more Frequently?

Conclusion

Category: Risk & Style Premia Page 6 of 16

Es-CAPE Velocity: Value-Driven Sector Rotation

Summary­

Poking and Prodding the Parameters

Varying the Value Metric (in Vain)

A Big Bubble-Based Bet?

The Miraculous Mojo of Momentum

Conclusion

Using PMI to Trade Cyclicals vs Defensives

Summary­

Conclusion

Your Style-age May Vary

Summary­

Conclusion

Harvesting the Bond Risk Premium

Summary­

The Term Premium as a Factor

A Little Free Lunch

Long-Only Term Premium

Conclusion

Timing Luck and Systematic Value

Summary­

Timing Luck in Smart Beta

The Luck of Annual Rebalancing

Just Rebalance more Frequently?

Conclusion

Summary

Summary

Summary

Summary

Summary