The Research Library of Newfound Research

Category: Value Page 2 of 4

The Dumb (Timing) Luck of Smart Beta

This post is available as a PDF download here.

Summary

  • In past research notes we have explored the impact of rebalance timing luck on strategic and tactical portfolios, even using our own Systematic Value methodology as a case study.
  • In this note, we generate empirical timing luck estimates for a variety of specifications for simplified value, momentum, low volatility, and quality style portfolios.
  • Relative results align nicely with intuition: higher concentration and less frequent rebalancing leads to increasing levels of realized timing luck.
  • For more reasonable specifications – e.g. 100 stock portfolios rebalanced semi-annually – timing luck ranges between 100 and 400 basis points depending upon the style under investigation, suggesting a significant risk of performance dispersion due only to when a portfolio is rebalanced and nothing else.
  • The large magnitude of timing luck suggests that any conclusions drawn from performance comparisons between smart beta ETFs or against a standard style index may be spurious.

We’ve written about the concept of rebalance timing luck a lot.  It’s a cowbell we’ve been beating for over half a decade, with our first article going back to August 7th, 2013.

As a reminder, rebalance timing luck is the performance dispersion that arises from the choice of a particular rebalance date (e.g. semi-annual rebalances that occur in June and December versus March and September).

We’ve empirically explored the impact of rebalance timing luck as it relates to strategic asset allocation, tactical asset allocation, and even used our own Systematic Value strategy as a case study for smart beta.  All of our results suggest that it has a highly non-trivial impact upon performance.

This summer we published a paper in the Journal of Index Investing that proposed a simple solution to the timing luck problem: diversification.  If, for example, we believe that our momentum portfolio should be rebalanced every quarter – perhaps as an optimal balance of cost and signal freshness – then we proposed splitting our capital across the three portfolios that spanned different three-month rebalance periods (e.g. JAN-APR-JUL-OCT, FEB-MAY-AUG-NOV, MAR-JUN-SEP-DEC).  This solution is referred to either as “tranching” or “overlapping portfolios.”

The paper also derived a formula for estimating timing luck ex-ante, with a simplified representation of:

Where L is the timing luck measure, T is turnover rate of the strategy, F is how many times per year the strategy rebalances, and S is the volatility of a long/short portfolio that captures the difference of what a strategy is currently invested in versus what it could be invested in if the portfolio was reconstructed at that point in time.

Without numbers, this equation still informs some general conclusions:

  • Higher turnover strategies have higher timing luck.
  • Strategies that rebalance more frequently have lower timing luck.
  • Strategies with a less constrained universe will have higher timing luck.

Bullet points 1 and 3 may seem similar but capture subtly different effects.  This is likely best illustrated with two examples on different extremes.  First consider a very high turnover strategy that trades within a universe of highly correlated securities.  Now consider a very low turnover strategy that is either 100% long or 100% short U.S. equities.  In the first case, the highly correlated nature of the universe means that differences in specific holdings may not matter as much, whereas in the second case the perfect inverse correlation means that small portfolio differences lead to meaningfully different performance.

L, in and of itself, is a bit tricky to interpret, but effectively attempts to capture the potential dispersion in performance between a particular rebalance implementation choice (e.g. JAN-APR-JUL-OCT) versus a timing-luck-neutral benchmark.

After half a decade, you’d would think we’ve spilled enough ink on this subject.

But given that just about every single major index still does not address this issue, and since our passion for the subject clearly verges on fever pitch, here comes some more cowbell.

Equity Style Portfolio Definitions

In this note, we will explore timing luck as it applies to four simplified smart beta portfolios based upon holdings of the S&P 500 from 2000-2019:

  • Value: Sort on earnings yield.
  • Momentum: Sort on prior 12-1 month returns.
  • Low Volatility: Sort on realized 12-month volatility.
  • Quality: Sort on average rank-score of ROE, accruals ratio, and leverage ratio.

Quality is a bit more complicated only because the quality factor has far less consistency in accepted definition.  Therefore, we adopted the signals utilized by the S&P 500 Quality Index.

For each of these equity styles, we construct portfolios that vary across two dimensions:

  • Number of Holdings: 50, 100, 150, 200, 250, 300, 350, and 400.
  • Frequency of Rebalance: Quarterly, Semi-Annually, and Annually.

For the different rebalance frequencies, we also generate portfolios that represent each possible rebalance variation of that mix.  For example, Momentum portfolios with 50 stocks that rebalance annually have 12 possible variations: a January rebalance, February rebalance, et cetera.  Similarly, there are 12 possible variations of Momentum portfolios with 100 stocks that rebalance annually.

By explicitly calculating the rebalance date variations of each Style x Holding x Frequency combination, we can construct an overlapping portfolios solution.  To estimate empirical annualized timing luck, we calculate the standard deviation of monthly return dispersion between the different rebalance date variations of the overlapping portfolio solution and annualize the result.

Empirical Timing Luck Results

Before looking at the results plotted below, we would encourage readers to hypothesize as to what they expect to see.  Perhaps not in absolute magnitude, but at least in relative magnitude.

For example, based upon our understanding of the variables affecting timing luck, would we expect an annually rebalanced portfolio to have more or less timing luck than a quarterly rebalanced one?

Should a more concentrated portfolio have more or less timing luck than a less concentrated variation?

Which factor has the greatest risk of exhibiting timing luck?

Source: Sharadar.  Calculations by Newfound Research.

To create a sense of scale across the styles, below we isolate the results for semi-annual rebalancing for each style and plot it.

Source: Sharadar.  Calculations by Newfound Research.

In relative terms, there is no great surprise in these results:

  • More frequent rebalancing limits the risk of portfolios changing significantly between rebalance dates, thereby decreasing the impact of timing luck.
  • More concentrated portfolios exhibit larger timing luck.
  • Faster-moving signals (e.g. momentum) tend to exhibit more timing luck than more stable, slower-moving signals (e.g. low volatility).

What is perhaps the most surprising is the sheer magnitude of timing luck.  Consider that the S&P 500 Enhanced Value, Momentum, Low Volatility, and Quality portfolios all hold 100 securities and are rebalanced semi-annually.  Our study suggests that timing luck for such approaches may be as large as 2.5%, 4.4%, 1.1%, and 2.0% respectively.

But what does that really mean?  Consider the realized performance dispersion of different rebalance date variations of a Momentum portfolio that holds the top 100 securities in equal weight and is rebalanced on a semi-annual basis.

Source: Sharadar.  Calculations by Newfound Research.  Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes.  Performance assumes the reinvestment of all distributions. 

The 4.4% estimate of annualized timing luck is a measure of dispersion between each underlying variation and the overlapping portfolio solution.  If we isolate two sub-portfolios and calculate rolling 12-month performance dispersion, we can see that the difference can be far larger, as one might exhibit positive timing luck while the other exhibits negative timing luck.  Below we do precisely this for the APR-OCT and MAY-NOV rebalance variations.

Source: Sharadar.  Calculations by Newfound Research.  Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes.  Performance assumes the reinvestment of all distributions. 

In fact, since these variations are identical in every which way except for the date on which they rebalance, a portfolio that is long the APR-OCT variation and short the MAY-NOV variation would explicitly capture the effects of rebalance timing luck.  If we assume the rebalance timing luck realized by these two portfolios is independent (which our research suggests it is), then the volatility of this long/short is approximately the rebalance timing luck estimated above scaled by the square-root of two.

Derivation: For variations vi and vj and overlapping-portfolio solution V, then:

Thus, if we are comparing two identically-managed 100-stock momentum portfolios that rebalance semi-annually, our 95% confidence interval for performance dispersion due to timing luck is +/- 12.4% (2 x SQRT(2) x 4.4%).

Even for more diversified, lower turnover portfolios, this remains an issue.  Consider a 400-stock low-volatility portfolio that is rebalanced quarterly.  Empirical timing luck is still 0.5%, suggesting a 95% confidence interval of 1.4%.

S&P 500 Style Index Examples

One critique of the above analysis is that it is purely hypothetical: the portfolios studied above aren’t really those offered in the market today.

We will take our analysis one step further and replicate (to the best of our ability) the S&P 500 Enhanced Value, Momentum, Low Volatility, and Quality indices.  We then created different rebalance schedule variations.  Note that the S&P 500 Low Volatility index rebalances quarterly, so there are only three possible rebalance variations to compute.

Source: Sharadar.  Calculations by Newfound Research.  Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes.  Performance assumes the reinvestment of all distributions. 

We see a meaningful dispersion in terminal wealth levels, even for the S&P 500 Low Volatility index, which appears at first glance in the graph to have little impact from timing luck.

Minimum Terminal Wealth

Maximum Terminal Wealth

Enhanced Value

$4.45

$5.45

Momentum

$3.07

$4.99

Low Volatility

$6.16

$6.41

Quality

$4.19

$5.25

 

We should further note that there does not appear to be one set of rebalance dates that does significantly better than the others.  For Value, FEB-AUG looks best while JUN-DEC looks the worst; for Momentum it’s almost precisely the opposite.

Furthermore, we can see that even seemingly closely related rebalances can have significant dispersion: consider MAY-NOV and JUN-DEC for Momentum. Here is a real doozy of a statistic: at one point, the MAY-NOV implementation for Momentum is down -50.3% while the JUN-DEC variation is down just -13.8%.

These differences are even more evident if we plot the annual returns for each strategy’s rebalance variations.   Note, in particular, the extreme differences in Value in 2009, Momentum in 2017, and Quality in 2003.

Source: Sharadar.  Calculations by Newfound Research.  Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes.  Performance assumes the reinvestment of all distributions. 

Conclusion

In this study, we have explored the impact of rebalance timing luck on the results of smart beta / equity style portfolios.

We empirically tested this impact by designing a variety of portfolio specifications for four different equity styles (Value, Momentum, Low Volatility, and Quality).  The specifications varied by concentration as well as rebalance frequency.  We then constructed all possible rebalance variations of each specification to calculate the realized impact of rebalance timing luck over the test period (2000-2019).

In line with our mathematical model, we generally find that those strategies with higher turnover have higher timing luck and those that rebalance more frequently have less timing luck.

The sheer magnitude of timing luck, however, may come as a surprise to many.  For reasonably concentrated portfolios (100 stocks) with semi-annual rebalance frequencies (common in many index definitions), annual timing luck ranged from 1-to-4%, which translated to a 95% confidence interval in annual performance dispersion of about +/-1.5% to +/-12.5%.

The sheer magnitude of timing luck calls into question our ability to draw meaningful relative performance conclusions between two strategies.

We then explored more concrete examples, replicating the S&P 500 Enhanced Value, Momentum, Low Volatility, and Quality indices.  In line with expectations, we find that Momentum (a high turnover strategy) exhibits significantly higher realized timing luck than a lower turnover strategy rebalanced more frequently (i.e. Low Volatility).

For these four indices, the amount of rebalance timing luck leads to a staggering level of dispersion in realized terminal wealth.

“But Corey,” you say, “this only has to do with systematic factor managers, right?”

Consider that most of the major equity style benchmarks are managed with annual or semi-annual rebalance schedules.  Good luck to anyone trying to identify manager skill when your benchmark might be realizing hundreds of basis points of positive or negative performance luck a year.

 

Es-CAPE Velocity: Value-Driven Sector Rotation

This post is available as a PDF download here.

Summary­

  • Systematic value strategies have struggled in the post-2008 environment, so one that has performed well catches our eye.
  • The Barclays Shiller CAPE sector rotation strategy – a value-based sector rotation strategy – has out-performed the S&P 500 by 267 basis points annualized since it launched in 2012.
  • The strategy applies a unique Relative CAPE metric to account for structural differences in sector valuations as well as a momentum filter that seeks to avoid “value traps.”
  • In an effort to derive the source of out-performance, we explore various other valuation metrics and model specifications.
  • We find that what has actually driven performance in the past may have little to do with value at all.

It is no secret that systematic value investing of all sorts has struggled as of late.  With the curious exception, that is, of the Barclays Shiller CAPE sector rotation strategy, a strategy explored by Bunn, Staal, Zhuang, Lazanas, Ural and Shiller in their 2014 paper Es-cape-ing from Overvalued Sectors: Sector Selection Based on the Cyclically Adjusted Price-Earnings (CAPE) Ratio.  Initial performance suggests that the idea has performed quite well out-of-sample, which stands out among many “smart-beta” strategies which have failed to live up to their backtests.

Source: CSI Data.  Calculations by Newfound Research.  Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.  

Why is this strategy finding success where other value strategies have not?  That is what we aim to explore in this commentary.

On a monthly basis, the Shiller CAPE sector rotation portfolio is rebalanced into an equal-weight allocation across four of the ten primary GICS sectors.  The four are selected first by ranking the 10 primary sectors based upon their Relative CAPE ratios and choosing the cheapest five sectors.  Of those cheapest five sectors, the sector with the worst trailing 12-month return (“momentum”) is removed.

The CAPE ratio – standing for Cyclically-Adjusted Price-to-Earnings ratio – is the current price divided by the 10-year moving average of inflation-adjusted earnings.  The purpose of this smoothing is to reduce the impact of business cycle fluctuations.

The potential problem with using the raw CAPE value for each sector is that certain sectors have structurally higher and lower CAPE ratios than their peers.  High growth sectors – e.g. Technology – tend to have higher CAPE ratios because they reinvest a substantial portion of their earnings while more stable sectors – e.g. Utilities – tend to have much lower CAPE ratios.  Were we to simply sort sectors based upon their current CAPE ratio, we would tend to create structural over- and under-weights towards certain sectors.

To adjust for this structural difference, the strategy uses the idea of a Relative CAPE ratio, which is calculated by taking the current CAPE ratio and dividing it by a rolling 20-year average CAPE ratio1 for that sector.  The thesis behind this step is that dividing by a long-term mean normalizes the sectors and allows for better comparison.  Relative CAPE values above 1 mean that the sector is more expensive than it has historically been, while values less than 1 mean it is cheaper.

It is important to note here that the actual selection is still performed on a cross-sector basis.  It is entirely possible that all the sectors appear cheap or expensive on a historical basis at the same time.  The portfolio will simply pick the cheapest sectors available.

Poking and Prodding the Parameters

With an understanding of the rules, our first step is to poke and prod a bit to figure out what is really driving the strategy.

We begin by first exploring the impact of using the Relative CAPE ratio versus just the CAPE ratio.

For each of these ratios, we’ll plot two strategies.  The first is a naïve Value strategy, which will equally-weight the four cheapest sectors.  The second is the Shiller strategy, which chooses the top five cheapest sectors and drops the one with the worst momentum.  This should provide a baseline for comparing the impact of the momentum filter.

Strategy returns are plotted relative to the S&P 500.

Source: Siblis Research; Morningstar; CS Data.  Calculations by Newfound Research.  Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.  

For the Relative CAPE ratio, we also vary the lookback period for calculating the rolling average CAPE from 5- to 20-years.

Source: Siblis Research; Morningstar; CSI Data.  Calculations by Newfound Research.  Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.  

A few things immediately stand out:

  • Interestingly, standard CAPE actually appears to perform better than Relative CAPE for both the traditional value and Shiller implementations.
  • The Relative CAPE approach fared much more poorly from 2004-2007 than the simple CAPE approach.
  • There is little difference in performance for the Value and Shiller strategy for standard CAPE, but a meaningful difference for Relative CAPE.
  • While standard CAPE value has stagnant relative performance since 2007, Relative CAPE appears to continue to work for the Shiller approach.
  • A naïve value implementation seems to perform quite poorly for Relative CAPE, while the Shiller strategy appears to perform rather well.
  • There is meaningful performance dispersion based upon the lookback period, with longer-dated lookbacks (darker shades) appearing to perform better than shorter-period lookbacks (lighter shades) for the Relative CAPE variation.

The second-to-last point is particularly curious, as it implies that using momentum to “avoid the value trap” creates significant value (no pun intended; okay, pun intended) for the strategy.

Varying the Value Metric (in Vain)

To gain more insight, we next test the impact of the choice of the CAPE ratio. Below we plot the relative returns of different Shiller-based strategies (again varying lookbacks from 5- to 20-years), but use price-to-book, trailing 12-month price-to-earnings, and trailing 12-month EV/EBITDA as our value metrics.

A few things stand out:

  • Value-based sector rotation seems to have “worked” from 2000 to 2009, regardless of our metric of choice.
  • Almost all value-based strategies appear to exhibit significant relative out-performance during the dot-com and 2008 recessions.
  • After 2009, most value strategies appear to exhibit random relative performance versus the S&P 500.
  • All three approaches appear to suffer since 2016.

Source: Siblis Research; Morningstar; CSI Data.  Calculations by Newfound Research.  Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.  

At this point, we have to ask: is there something special about the Relative CAPE that makes it inherently superior to other metrics?

A Big Bubble-Based Bet?

If we take a step back for a moment, it is worth asking ourselves a simple question: what would it take for a sector rotation strategy to out-perform the S&P 500 over the last decade?

With the benefit of hindsight, we know Consumer Discretionary and Technology have led the pack, while traditionally stodgy sectors like Consumer Staples and Utilities have lagged behind (though not nearly as poorly as Energy).

As we mentioned earlier, a naïve rank on the CAPE ratio would almost certainly prefer Utilities and Staples over Technology and Discretionary.  Thus, for us to outperform the market, we must somehow construct a value metric that identifies the two most chronically expensive sectors (ignoring back-dated valuations for the new Communication Services sector) as being among the cheapest.

This is where dividing by the rolling 20-year average comes into play.  In spirit, it makes a certain degree of sense. In practice, however, this plays out perfectly for Technology, which went through such an enormous bubble in the late 1990s that the 20-year average was meaningfully skewed upward by an outlier event.  Thus, for almost the entire 20-year period after the dot-com bubble, Technology appears to be relatively cheap by comparison.  After all, you can buy for 30x earnings today what you used to be able to buy for 180x!

The result is a significant – and near-permanent tilt – towards Technology since the beginning of 2012, which can be seen in the graph of strategy weights below.

One way to explore the impact of this choice is calculate the weight differences between a top-4 CAPE strategy and a top-4 Relative CAPE strategy, which we also plot below.  We can see that after early 2012, the Relative CAPE strategy is structurally overweight Technology and underweight Financials and Utilities.  Prior to 2008, we can see that it is structurally underweight Energy and overweight Consumer Staples.

If we take these weights and use them to construct a return stream, we can isolate the return impact the choice of using Relative CAPE versus CAPE has.  Interestingly, the long Technology / short Financials & Utilities trade did not appear to generate meaningful out-performance in the post-2012 era, suggesting that something else is responsible for post-2012 performance.

Source: Siblis Research; Morningstar; CSI Data.  Calculations by Newfound Research.  Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.  

The Miraculous Mojo of Momentum

This is where the 12-month momentum filter plays a crucial role.  Narratively, it is to avoid value traps.  Practically, it helps the strategy deftly dodge Financials in 2008, avoiding a significant melt-down in one of the S&P 500’s largest sectors.

Now, you might think that valuations alone should have allowed the strategy to avoid Technology in the dot-com fallout.  As it turns out, the Technology CAPE fell so precipitously that in using the Relative CAPE metric the Technology sector was still ranked as one of the top five cheapest sectors from 3/2001 to 11/2002.  The only way the strategy was able to avoid it?  The momentum filter.

Removing this filter makes the relative results a lot less attractive.  Below we re-plot the relative performance of a simple “top 4” Relative CAPE strategy.

Source: Siblis Research; Morningstar; CSI Data.  Calculations by Newfound Research.  Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.  

Just how much impact does the momentum filter have?  We can isolate the effect by taking the weights of the Shiller strategy and subtracting the weights of the Value strategy to construct a long/short index that isolates the effect.  Below we plot the returns of this index.

It should be noted that the legs of the long/short portfolio only have a notional exposure of 25%, as that is the most the Value and Shiller strategies can deviate by.  Nevertheless, even with this relatively small weight, when isolated the filter generates an annualized return of 1.8% per year with an annualized volatility of 4.8% and a maximum drawdown of 11.6%.

Scaled to a long/short with 100% notional per leg, annualized returns jump to 6.0%. Though volatility and maximum drawdown both climb to 20.4% and 52.6% respectively.

Source: Siblis Research; Morningstar; CSI Data.  Calculations by Newfound Research.  Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.  

Conclusion

Few, if any, systematic value strategies have performed well as of late.  When one does – as with the Shiller CAPE sector rotation strategy – it is worth further review.

As a brief summary of our findings:

  • Despite potential structural flaws in measuring cross-sectional sector value, CAPE outperformed Relative CAPE for a naïve rank-based value strategy.
  • There is significant dispersion in results using the Relative CAPE metric depending upon which lookback parameterization is selected.Initial tests suggest that the longer lookbacks appear to have been more effective.
  • Using valuation metrics other than CAPE – e.g. P/B, P/E (TTM), and EV/EBITDA (TTM) – do not appear as effective in recent years.
  • Longer lookbacks allow the Relative CAPE methodology to create a structural overweight to the Technology sector over the last 15 years.
  • The momentum filter plays a crucial role in avoiding the Technology sector in 2001-2002 and the Financial sector in 2008.

 

Taken all together, it is hard to not question whether these results are unintentionally datamined.  Unfortunately, we just do not have enough data to extend the tests further back in time for truly out-of-sample analysis.

What we can say, however, is that the backtested and live performance hinges almost entirely a few key trades:

  • Avoiding Technology in 2001-2002 due to the momentum filter.
  • Avoiding Financials in 2008 due to the momentum filter.
  • Avoiding a Technology underweight in recent years due to an inflated “average” historical CAPE due to the dot-com bubble.
  • Avoiding Energy in 2014-2016 due to the momentum filter.

 

Three of these four trades are driven by the momentum filter.  When we further consider that the Shiller strategy is in effect the returns of the pure value implementation – which suffered in the dot-com run-up and was a mostly random walk thereafter – and the returns of the isolated momentum filter, it becomes rather difficult to call this a value strategy at all.


As of the date of this document, neither Newfound Research nor Corey Hoffstein holds a position in the securities discussed in this article and do not have any plans to trade in such securities.  Newfound Research and Corey Hoffstein do not take a position as to whether this security should be recommended for any particular investor.  


Timing Luck and Systematic Value

This post is available as a PDF download here.

Summary­

  • We have shown many times that timing luck – when a portfolio chooses to rebalance – can have a large impact on the performance of tactical strategies.
  • However, fundamental strategies like value portfolios are susceptible to timing luck, as well.
  • Once the rebalance frequency of a strategy is set, we can mitigate the risk of choosing a poor rebalance date by diversifying across all potential variations.
  • In many cases, this mitigates the risk of realizing poor performance from an unfortunate choice of rebalance date while achieving a risk profile similar to the top tier of potential strategy variations.
  • By utilizing strategies that manage timing luck, the investors can more accurately assess performance differences arising from luck and skill.

On August 7th, 2013 we wrote a short blog post titled The Luck of Rebalance Timing.  That means we have been prattling on about the impact of timing luck for over six years now (with apologies to our compliance department…).

(For those still unfamiliar with the idea of timing luck, we will point you to a recent publication from Spring Valley Asset Management that provides a very approachable introduction to the topic.1)

While most of our earliest studies related to the impact of timing luck in tactical strategies, over time we realized that timing luck could have a profound impact on just about any strategy that rebalances on a fixed frequency.  We found that even a simple fixed-mix allocation of stocks and bonds could see annual performance spreads exceeding 700bp due only to the choice of when they rebalanced in a given year.

In seeking to generalize the concept, we derived a formula that would estimate how much timing luck a strategy might have.  The details of the derivation can be found in our paper recently published in the Journal of Index Investing, but the basic formula is:

Here is strategy turnover, is how many times per year the strategy rebalances, and S is the volatility of a long/short portfolio capturing the difference between what the strategy is currently invested in versus what it could be invested in.

We’re biased, but we think the intuition here works out fairly nicely:

  • The higher a strategy’s turnover, the greater the impact of our choice of rebalance dates. For example, if we have a value strategy that has 50% turnover per year, an implementation that rebalances in January versus one that rebalances in July might end up holding very different securities.  On the other hand, if the strategy has just 1% turnover per year, we don’t expect the differences in holdings to be very large and therefore timing luck impact would be minimal.
  • The more frequently we rebalance, the lower the timing luck. Again, this makes sense as more frequent rebalancing limits the potential difference in holdings of different implementation dates.  Again, consider a value strategy with 50% turnover.  If our portfolio rebalances every other month, there are two potential implementations: one that rebalances January, March, May, etc. and one that rebalances February, April, June, etc. We would expect the difference in portfolio holdings to be much more limited than in the case where we rebalance only annually.2
  • The last term, S, is most easily explained with an example. If we have a portfolio that can hold either the Russell 1000 or the S&P 500, we do not expect there to be a large amount of performance dispersion regardless of when we rebalance or how frequently we do so.  The volatility of a portfolio that is long the Russell 1000 and short the S&P 500 is so small, it drives timing luck near zero.  On the other hand, if a portfolio can hold the Russell 1000 or be short the S&P 500, differences in holdings due to different rebalance dates can lead to massive performance dispersion. Generally speaking, S is larger for more highly concentrated strategies with large performance dispersion in their investable universe.

Timing Luck in Smart Beta

To date, we have not meaningfully tested timing luck in the realm of systematic equity strategies.3  In this commentary, we aim to provide a concrete example of the potential impact.

A few weeks ago, however, we introduced our Systematic Value portfolio, which seeks to deliver concentrated exposure to the value style while avoiding unintended process and timing luck bets.

To achieve this, we implement an overlapping portfolio process.  Each month we construct a concentrated deep value portfolio, selecting just 50 stocks from the S&P 500.  However, because we believe the evidence suggests that value is a slow-moving signal, we aim for a holding period between 3-to-5 years.  To achieve this, our capital is divided across the prior 60 months of portfolios.4

Which all means that we have monthly snapshots of deep value5 portfolios going back to November 2012, providing us data to construct all sorts of rebalance variations.

The Luck of Annual Rebalancing

Given our portfolio snapshots, we will create annually rebalanced portfolios.  With monthly portfolios, there are twelve variations we can construct: a portfolio that reconstitutes each January; one that reconstitutes each February; a portfolio that reconstitutes each March; et cetera.

Below we plot the equity curves for these twelve variations.

Source: CSI Analytics.  Calculations by Newfound Research.  Results are hypothetical.  Results assume the reinvestment of all distributions.   Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes.  Past performance is not an indicator of future results.  

We cannot stress enough that these portfolios are all implemented using a completely identical process.  The only difference is when they run that process.  The annualized returns range from 9.6% to 12.2%.  And those two portfolios with the largest disparity rebalanced just a month apart: January and February.

To avoid timing luck, we want to diversify when we rebalance.  The simplest way of achieving this goal is through overlapping portfolios.  For example, we can build portfolios that rebalance annually, but allocate to two different dates.  One portfolio could place 50% of its capital in the January rebalance index and 50% in the July rebalance index.

Another variation could place 50% of its capital in the February index and 50% in the August index.6  There are six possible variations, which we plot below.

The best performing variation (January and July) returned 11.7% annualized, while the worst (February and August) returned 9.7%.  While the spread has narrowed, it would be dangerous to confuse 200bp annualized for alpha instead of rebalancing luck.

Source: CSI Analytics.  Calculations by Newfound Research.  Results are hypothetical.  Results assume the reinvestment of all distributions.   Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes.  Past performance is not an indicator of future results.  

We can go beyond just two overlapping portfolios, though.  Below we plot the three variations that contain four overlapping portfolios (January-April-July-October, February-May-August-November, and March-June-September-December).  The best variation now returns 10.9% annualized while the worst returns 10.1% annualized.  We can see how overlapping portfolios are shrinking the variation in returns.

Finally, we can plot the variation that employs 12 overlapping portfolios.  This variation returns 10.6% annualized; almost perfectly in line with the average annualized return of the underlying 12 variations.  No surprise: diversification has neutralized timing luck.

Source: CSI Analytics.  Calculations by Newfound Research.  Results are hypothetical.  Results assume the reinvestment of all distributions.   Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes.  Past performance is not an indicator of future results.  

Source: CSI Analytics.  Calculations by Newfound Research.  Results are hypothetical.  Results assume the reinvestment of all distributions.   Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes.  Past performance is not an indicator of future results.  

But besides being “average by design,” how can we measure the benefits of diversification?

As with most ensemble approaches, we see a reduction in realized risk metrics.  For example, below we plot the maximum realized drawdown for annual variations, semi-annual variationsquarterly variations, and the monthly variation.  While the dispersion is limited to just a few hundred basis points, we can see that the diversification embedded in the monthly variation is able to reduce the bad luck of choosing an unfortunate rebalance date.

Source: CSI Analytics.  Calculations by Newfound Research.  Results are hypothetical.  Results assume the reinvestment of all distributions.   Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes.  Past performance is not an indicator of future results.  

Just Rebalance more Frequently?

One of the major levers in the timing luck equation is how frequently the portfolio is rebalanced.  However, we firmly believe that while rebalancing frequency impacts timing luck, timing luck should not be a driving factor in our choice of rebalance frequency.

Rather, rebalance frequency choices should be a function of the speed at which our signal decays (e.g. fast-changing signals such as momentum versus slow-changing signals like value) versus implementation costs (e.g. explicit trading costs, market impact, and taxes).  Only after this choice is made should we seek to limit timing luck.

Nevertheless, we can ask the question, “how does rebalancing more frequently impact timing luck in this case?”

To answer this question, we will evaluate quarterly-rebalanced portfolios.  The distinction here from the quarterly overlapping portfolios above is that the entire portfolio is rebalanced each quarter rather than only a quarter of the portfolio.  Below, we plot the equity curves for the three possible variations.

Source: CSI Analytics.  Calculations by Newfound Research.  Results are hypothetical.  Results assume the reinvestment of all distributions.   Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes.  Past performance is not an indicator of future results.  

The best performing variation returns 11.7% annualized while the worst returns 9.7% annualized, for a spread of 200 basis points.  This is actually larger than the spread we saw with the three quarterly overlapping portfolio variations, and likely due to the fact that turnover within the portfolios increased meaningfully.

While we can see that increasing the frequency of rebalancing can help, in our opinion the choice of rebalance frequency should be distinct from the choice of managing timing luck.

Conclusion

In our opinion, there are at least two meaningful conclusions here:

The first is for product manufacturers (e.g. index issuers) and is rather simple: if you’re going to have a fixed rebalance schedule, please implement overlapping portfolios.  It isn’t hard.  It is literally just averaging.  We’re all better off for it.

The second is for product users: realize that performance dispersion between similarly-described systematic strategies can be heavily influenced by when they rebalance. The excess return may really just be a phantom of luck, not skill.

The solution to this problem, in our opinion, is to either: (1) pick an approach and just stick to it regardless of perceived dispersion, accepting the impact of timing luck; (2) hold multiple approaches that rebalance on different days; or (3) implement an approach that accounts for timing luck.

We believe the first approach is easier said than done.  And without a framework for distinguishing between timing luck and alpha, we’re largely making arbitrary choices.

The second approach is certainly feasible but has the potential downside of requiring more holdings as well as potentially forcing an investor to purchase an approach they are less comfortable with.   For example, blending IWD (Russell 1000 Value), RPV (S&P  500 Pure Value), VLUE (MSCI U.S. Enhanced Value), and QVAL (Alpha Architect U.S. Quantitative Value) may create a portfolio that rebalances on many different dates (annual in May; annual in December; semi-annual in May and November; and quarterly, respectively), it also introduces significant process differences.  Though research suggests that investors may benefit from further manager/process diversification.

For investors with conviction in a single strategy implementation, the last approach is certainly the best.  Unfortunately, as far as we are aware, there are only a few firms who actively implement overlapping portfolios (including Newfound Research, O’Shaughnessy Asset Management, AQR, and Research Affiliates). Until more firms adopt this approach, timing luck will continue to loom large.

 


 

Value and the Credit Spread

This post is available as a PDF download here.

Summary­

  • We continue our exploration of quantitative signals in fixed income.
  • We use a measure of credit curve steepness as a valuation signal for timing exposure between corporate bonds and U.S. Treasuries.
  • The value signal generates a 0.84% annualized return from 1950 to 2019 but is highly regime dependent with meaningful drawdowns.
  • Introducing a naïve momentum strategy significantly improves the realized Sharpe ratio and drawdown profile, but does not reduce the regime-based nature of the returns.
  • With a combined return of just 1.0% annualized, this strategy may not prove effective after appropriate discounting for hindsight bias, costs, and manager fees. The signal itself, however, may be useful in other contexts.

In the last several weeks, we have been exploring the application of quantitative signals to fixed income.

Recent cross-sectional studies also build off of further research we’ve done in the past on applying trend, value, carry, and explicit measures of the bond risk premium as duration timing mechanisms (see Duration Timing with Style Premia; Timing Bonds with Value, Momentum, and Carry; and A Carry-Trend-Hedge Approach to Duration Timing).

Broadly, our studies have found:

  • Value (measured as deviation from real yield), momentum (prior 12-month returns), and carry (yield-to-worst) were all profitable factors in cross-section municipal bond sector long/short portfolios.
  • Value (measured as deviation from real yield), trend (measured as prior return), and carry (measured as term spread + roll yield) have historically been effective timing signals for U.S. duration exposure.
  • Prior short-term equity returns proved to be an effective signal for near-term returns in U.S. Treasuries (related to the “flight-to-safety premium”).
  • Short-term trend proved effective for high yield bond timing, but the results were vastly determined by performance in 2000-2003 and 2008-2009. While the strategy appeared to still be able to harvest relative carry between high-yield bonds and core fixed income in other environments, a significant proportion of returns came from avoiding large drawdowns in high yield.
  • Short-term cross-section momentum (prior total returns), value (z-score of loss-adjusted yield-to-worst), carry (loss-adjusted yield-to-worst), and 3-year reversals all appeared to offer robust signals for relative selection in fixed income sectors. The time period covered in the study, however, was limited and mostly within a low-inflation regime.
  • Application of momentum, value, carry, and reversal as timing signals proved largely ineffective for generating excess returns.

In this week’s commentary, we want to further contribute to research by introducing a value timing signal for credit.

Finding Value in Credit

Identifying a value signal requires some measure or proxy of an asset’s “fair” value. What can make identifying value in credit so difficult is that there are a number of moving pieces.

Conceptually, credit spreads should be proportional to default rates, recovery rates, and aggregate risk appetite, making determining whether spreads are cheap or expensive rather complicated.  Prior literature typically tackles the problem with one of three major categories of models:

  • Econometric: “Fair value” of credit spreads is modeled through a regression that typically explicitly accounts for default and recovery rates. Inputs are often related to economic and market variables, such as equity market returns, 10-year minus 2-year spreads, corporate leverage, and corporate profitability.  Bottom-up analysis may use metrics such as credit quality, maturity, supply, and liquidity.
  • Merton Model: Based upon the idea the bond holders have sold a put on a company’s asset value. Therefore, options pricing models can be used to calculate a credit spread.  Inputs include the total asset value, asset volatility, and leverage of the firm under analysis.
  • Spread Signal: A simple statistical model derived from credit spread themselves. For example, a rolling z-score of option-adjusted spreads or deviations from real yield.  Other models (e.g. Haghani and Dewey (2016)) have used spread plus real yield versus a long-run constant (e.g. “150 basis points”).

The first method requires a significant amount of economic modeling.  The second approach requires a significant amount of extrapolation from market data.  The third method, while computationally (and intellectually) less intensive, requires a meaningful historical sample that realistically needs to cover at least one full market cycle.

While attractive for its simplicity, there are a number of factors that complicate the third approach.

First, if spreads are measured against U.S. Treasuries, the metric may be polluted by information related to Treasuries due to their idiosyncratic behavior (e.g. scarcity effects and flight-to-safety premiums).  Structural shifts in default rates, recovery rates, and risk appetites may also cause a problem, as spreads may appear unduly thin or wide compared to past regimes.

In light of this, in this piece we will explore a similarly simple-to-calculate spread signal, but one that hopefully addresses some of these short-comings.

Baa vs. Aaa Yields

In order to adjust for these problems, we propose looking at the steepness of the credit curve itself by comparing prime / high-grade yield versus lower-medium grade yields.  For example, we could compare Moody’s Season Aaa Corporate Bond Yield and Moody’s Season Baa Corporate Bond Yield.  In fact, we will use these yields for the remainder of this study.

We may be initially inclined to measure the steepness of the credit curve by taking the difference in yield spreads, which we plot below.

Source: Federal Reserve of St. Louis.  Calculations by Newfound Research.

We can find a stronger mean-reverting signal, however, if we calculate the log-difference in yields.

Source: Federal Reserve of St. Louis.  Calculations by Newfound Research.

We believe this transformation is appropriate for two reasons.  First, the log transformation helps control for the highly heteroskedastic and skewed nature of credit spreads.

Second, it helps capture both the steepness andthe level of the credit curve simultaneously.  For example, a 50-basis-point premium when Aaa yield is 1,000 basis points is very different than when Aaa yield is 100 basis points.  In the former case, investors may not feel any pressure to bear excess risk to achieve their return objectives, and therefore a 50-basis-point spread may be quite thin.  In the latter case, 50 basis points may represent a significant step-up in relative return level in an environment where investors have either low default expectations, high recovery expectations, high risk appetite, or some combination thereof.

Another way of interpreting our signal is that it informs us about the relative decisions investors must make about their expected dispersion in terminal wealth.

Constructing the Value Strategy

With our signal in hand, we can now attempt to time credit exposure.  When our measure signals that the credit curve is historically steep, we will take credit risk.  When our signal indicates that the curve is historically flat we will avoid it.

Specifically, we will construct a dollar-neutral long/short portfolio using the Dow Jones Corporate Bond Index (“DJCORP”) and a constant maturity 5-year U.S. Treasury index (“FV”).   We will calculate a rolling z-score of our steepness measure and go long DJCORP and short FV when the z-score is positive and place the opposite trade when the z-score is negative.

In line with prior studies, we will apply an ensemble approach.  Portfolios are reformed monthly using formation ranging from 3-to-6 years with holding periods ranging from 1-to-6 months.  Portfolio weights for the resulting strategy are plotted below.

Source: Federal Reserve of St. Louis and Global Financial Data.  Calculations by Newfound Research.

We should address the fact that while both corporate bond yield and index data is available back to the 1930s, we have truncated our study to ignore dates prior to 12/1949 to normalize for a post-war period.  It should be further acknowledged that the Dow Jones Corporate Bond index used in this study did not technically exist until 2002.  Prior to that date, the index return tracks a Dow Jones Bond Aggregate, which was based upon four sub-indices: high-grade rails, second-grade rails, public utilities, and industries.  This average existed from 1915 to 1976, when it was replaced with a new average at that point when the number of railway bonds was no longer sufficient to maintain the average.

Below we plot the returns of our long/short strategy.

Source: Federal Reserve of St. Louis and Global Financial Data.  Calculations by Newfound Research. Returns are hypothetical and backtested.  Returns are gross of all management fees, transaction fees, and taxes, but net of underlying fund fees.  Total return series assumes the reinvestment of all distributions.

The strategy has an annualized return of 0.84% with a volatility of 3.89%, generating a Sharpe ratio of 0.22.  Of course, long-term return statistics belie investor and manager experience, with this strategy exhibiting at least two periods of decade-plus-long drawdowns.  In fact, the strategy really has just four major return regimes: 1950 to 1970 (-0.24% annualized), 1970 to 1987 (2.59% annualized), 1987 to 2002 (-0.33%), and 2002 to 2019 (1.49% annualized).

Try the strategy out in the wrong environment and we might be in for a lot of pain.

Momentum to the Rescue?

It is no secret that value and momentum go together like peanut butter and jelly. Instead of tweaking our strategy to death in order to improve it, we may just find opportunity in combining it with a negatively correlated signal.

Using an ensemble model, we construct a dollar-neutral long/short momentum strategy that compares prior total returns of DJCORP and FV.  Rebalanced monthly, the portfolios use formation periods ranging from 9-to-15 months and holding periods ranging from 1-to-6 months.

Below we plot the growth of $1 in our value strategy, our momentum strategy, and a 50/50 combination of the two strategies that is rebalanced monthly.

Source: Federal Reserve of St. Louis and Global Financial Data.  Calculations by Newfound Research. Returns are hypothetical and backtested.  Returns are gross of all management fees, transaction fees, and taxes, but net of underlying fund fees.  Total return series assumes the reinvestment of all distributions.

The first thing we note is – even without calculating any statistics – the meaningful negative correlation we see in the equity curves of the value and momentum strategies.  This should give us confidence that there is the potential for significant improvement through diversification.

The momentum strategy returns 1.11% annualized with a volatility of 3.92%, generating a Sharpe ratio of 0.29.  The 50/50 combination strategy, however, returns 1.03% annualized with a volatility of just 2.16% annualized, resulting in a Sharpe ratio of 0.48.

While we still see significant regime-driven behavior, the negative regimes now come at a far lower cost.

Conclusion

In this study we introduce a simple value strategy based upon the steepness of the credit curve.  Specifically, we calculated a rolling z-score on the log-difference between Moody’s Seasoned Baa and Aaa yields.  We interpreted a positive z-score as a historically steep credit curve and therefore likely one that would revert.  Similarly, when z-scores were negative, we interpreted the signal as a flat credit curve, and therefore a period during which taking credit risk is not well compensated.

Employing an ensemble approach, we generated a long/short strategy that would buy the Dow Jones Corporate Bond Index and short 5-year U.S. Treasuries when credit appeared cheap and place the opposite trade when credit appeared expensive.  We found that this strategy returned 0.84% annualized with a volatility of 3.89% from 1950 to 2019.

Unfortunately, our value signal generated significantly regime-dependent behavior with decade-long drawdowns.  This not only causes us to question the statistical validity of the signal, but also the practicality of implementing it.

Fortunately, a naively constructed momentum signal provides ample diversification.  While a combination strategy is still highly regime-driven, the drawdowns are significantly reduced.  Not only do returns meaningfully improve compared to the stand-alone value signal, but the Sharpe ratio more-than-doubles.

Unfortunately, our study leveraged a long/short construction methodology.  While this isolates the impact of active returns, long-only investors must cut return expectations of the strategy in half, as a tactical timing model can only half-implement this trade without leverage.  A long-only switching strategy, then, would only be expected to generate approximately 0.5% annualized excess return above a 50% Dow Jones Corporate Bond Index / 50% 5-Year U.S. Treasury index portfolio.

And that’s before adjustments for hindsight bias, trading costs, and manager fees.

Nevertheless, more precise implementation may lead to better results.  For example, our indices neither perfectly matched the credit spreads we evaluated, nor did they match each other’s durations.  Furthermore, while this particular implementation may not survive costs, this signal may still provide meaningful information for other credit-based strategies.

Quantitative Styles and Multi-Sector Bonds

This post is available as a PDF download here.

Summary­

  • In this commentary we explore the application of several quantitative signals to a broad set of fixed income exposures.
  • Specifically, we explore value, momentum, carry, long-term reversals, and volatility signals.
  • We find that value, 3-month momentum, carry, and 3-year reversals all create attractive quantile profiles, potentially providing clues for how investors might consider pursuing higher returns or lower risk.
  • This study is by no means comprehensive and only intended to invite further research and conversation around the application of quantitative styles across fixed income exposures.

In Navigating Municipal Bonds with Factors, we employed momentum, value, carry, and low-volatility signals to generate a sector-based approach to navigating municipal bonds.

In this article, we will introduce an initial data dive into applying quantitative signals to a broader set of fixed income exposures.  Specifically, we will incorporate 17 different fixed income sectors, spanning duration, credit, and geographic exposure.

  • U.S. Treasuries: Near (3-Month), short (1-3 Year), mid (3-5 Year) intermediate (7-10 Year), and long (20+ Year).
  • Investment-Grade Corporates: Short-term, intermediate-term, and Floating Rate corporate bonds.
  • High Yield: Short- and intermediate-term high yield.
  • International Government Bonds: Currency hedged and un-hedged government bonds.
  • Emerging Market: Local and US dollar denominated.
  • TIPs: Short- and intermediate-term TIPs.
  • Mortgage-Backed: Investment grade mortgage-backed bonds.

In this study, each exposure is represented by a corresponding ETF.  We extend our research prior to ETF launch by employing underlying index data the ETF seeks to track.

The quantitative styles we will explore are:

  • Momentum: Buy recent winners and sell recent losers.
  • Value: Buy cheap and sell expensive.
  • Carry: Buy high carry and sell low carry.
  • Reversal: Buy long-term losers and sell long-term winners.
  • Volatility: Buy high volatility and sell low volatility.1

The details of each style are explained in greater depth in each section below.

Note that the analysis herein is by no means meant to be prescriptive in any manner, nor is it a comprehensive review.  Rather, it is meant as a launching point for further commentaries we expect to write.

At the risk of spoiling the conclusion, below we plot the annualized returns and volatility profiles of dollar-neutral long-short portfolios.2  We can see that short-term Momentum, Value, Carry, and Volatility signals generate positive excess returns over the testing period.

Curiously, longer-term Momentum does not seem to be a profitable strategy, despite evidence of this approach being rather successful for many other asset classes.

Source: Bloomberg; Tiingo.  Calculations by Newfound Research.  Returns are hypothetical and backtested.  Returns are gross of all management fees, transaction fees, and taxes, but net of underlying fund fees.  Total return series assumes the reinvestment of all distributions.

However, these results are not achievable by most investors who may be constrained to a long-only implementation.  Even when interpreted as over- and under-weight signals, the allocations in the underlying long/short portfolios differ so greatly from benchmark exposures, they would be nearly impossible to implement.

For a long-only investor, then, what is more relevant is how these signals forecast performance of different rank orderings of portfolios.  For example, how does a portfolio of the best-ranking 3-month momentum exposures compare to a portfolio of the worst-ranking?

In the remainder of this commentary, we explore the return and risk profiles of quintile portfolios formed on each signal.  To construct these portfolios, we rank order our exposures based on the given quantitative signal and equally-weight the exposures falling within each quintile.

Momentum

We generate momentum signals by computing 12-, 6- and 3- month prior total returns to reflect slow, intermediate, and fast momentum signals.  Low-ranking exposures are those with the lowest prior total returns, while high ranking exposures have the highest total returns.

The portfolios assume a 1-month holding period for momentum signals.  To avoid timing luck, four sub-indexes are used, each rebalancing on a different week of the month.

Annualized return and volatility numbers for the quintiles are plotted below.

A few interesting data-points stand out:

  • For 12-month prior return, the lowest quintile actually had the highest total return.However, it has a dramatically lower Sharpe ratio than the highest quintile, which only slightly underperforms it.
  • Total returns among the highest quintile increase by 150 basis points (“bps”) from 12-month to 3-month signals, and 3-month rankings create a more consistent profile of increasing total return and Sharpe ratio. This may imply that short-term signals are more effective for fixed income.

Source: Bloomberg; Tiingo.  Calculations by Newfound Research.  Returns are hypothetical and backtested.  Returns are gross of all management fees, transaction fees, and taxes, but net of underlying fund fees.  Total return series assumes the reinvestment of all distributions.

Carry

Carry is the expected excess return of an asset assuming price does not change.  For our fixed income universe, we proxy carry using yield-to-worst minus the risk-free rate.  For non-Treasury holdings, we adjust this figure for expected defaults and recovery.

For reasonably efficient markets, we would expect higher carry to imply higher return, but not necessarily higher risk-adjusted returns.  In other words, we earn higher carry as a reward for bearing more risk.

Therefore, we also calculate an alternate measure of carry: carry-to-risk.  Carry-to-risk is calculated by taking our carry measure and dividing it by recent realized volatility levels.  One way of interpreting this figure is as forecast of Sharpe ratio.  Our expectation is that this signal may be able to identify periods when carry is episodically cheap or rich relative to prevailing market risk.

The portfolios assume a 12-month holding period for carry signals.  To avoid timing luck, 52 sub-indexes are used, each rebalancing on a different week of the year.

We see:

  • Higher carry implies a higher return as well as a higher volatility. As expected, no free lunch here.
  • Carry-to-risk does not seem to provide a meaningful signal. In fact, low carry-to-risk outperforms high carry-to-risk by 100bps annualized.
  • Volatility meaningfully declines for carry-to-risk quintiles, potentially indicating that this integrated carry/volatility signal is being too heavily driven by volatility.

Source: Bloomberg; Tiingo.  Calculations by Newfound Research.  Returns are hypothetical and backtested.  Returns are gross of all management fees, transaction fees, and taxes, but net of underlying fund fees.  Total return series assumes the reinvestment of all distributions.

Value

In past commentaries, we have used real yield as our value proxy in fixed income.  In this commentary, we deviate from that methodology slightly and use a time-series z-score of carry as our value of measure. Historically high carry levels are considered to be cheap while historically low carry levels are considered to be expensive.

The portfolios assume a 12-month holding period for value signals.  To avoid timing luck, 52 sub-indexes are used, each rebalancing on a different week of the year.

We see not only a significant increase in total return in buying cheap versus expensive holdings, but also an increase in risk-adjusted returns.

Source: Bloomberg; Tiingo.  Calculations by Newfound Research.  Returns are hypothetical and backtested.  Returns are gross of all management fees, transaction fees, and taxes, but net of underlying fund fees.  Total return series assumes the reinvestment of all distributions. 

Reversal

Reversal signals are the opposite of momentum: we expect past losers to outperform and past winners to underperform.  Empirically, reversals tend to occur over very short time horizons (e.g. 1 month) and longer-term time horizons (e.g. 3- to 5-years).  In many ways, long-term reversals can be thought of as a naive proxy for value, though there may be other behavioral and structural reasons for the historical efficacy of reversal signals.

We must be careful implementing reversal signals, however, as exposures in our universe have varying return dynamics (e.g. expected return and volatility levels).

To illustrate this problem, consider the simple two-asset example of equities and cash.  A 3-year reversal signal would sell the asset that has had the best performance over the prior 3-years and buy the asset that has performed the worst.  The problem is that we expect stocks to outperform cash due to the equity risk premium. Naively ranking on prior returns alone would have us out of equities during most bull markets.

Therefore, we must be careful in ranking assets with meaningfully different return dynamics.

(Why, then, can we do it for momentum?  In a sense, momentum is explicitly trying to exploit the relative time-series properties over a short-term horizon.  Furthermore, in a universe that contains low-risk, low-return assets, cross-sectional momentum can be thought of as an integrated process between time-series momentum and cross-sectional momentum, as the low-risk asset will bubble to the top when absolute returns are negative.)

To account for this, we use a time-series z-score of prior returns to create a reversal signal.  For example, at each point in time we calculate the current 3-year return and z-score it against all prior rolling 3-year periods.

Note that in this construction, high z-scores will reflect higher-than-normal 3-year numbers and low z-scores will reflect lower-than-normal 3-year returns. Therefore, we negate the z-score to generate our signal such that low-ranked exposures reflect those we want to sell and high-ranked exposures reflect those we want to buy.

The portfolios assume a 12-month holding period for value signals.  To avoid timing luck, 52 sub-indexes are used, each rebalancing on a different week of the year.

Plotting the results below for 1-, 3-, and 5-year reversal signals, we see that 3- and 5-year signals see a meaningful increase in both total return and Sharpe ratio between the lowest quintile.

Source: Bloomberg; Tiingo.  Calculations by Newfound Research.  Returns are hypothetical and backtested.  Returns are gross of all management fees, transaction fees, and taxes, but net of underlying fund fees.  Total return series assumes the reinvestment of all distributions.

Volatility

Volatility signals are trivial to generate: we simply sort assets based on prior realized volatility.  Unfortunately, exploiting the low-volatility anomaly is difficult without leverage, as the empirically higher risk-adjusted return exhibited by low-volatility assets typically coincides with lower total returns.

For example, in the tests below the low quintile is mostly comprised of short-term Treasuries and floating rate corporates.  The top quintile is allocated across local currency emerging market debt, long-dated Treasuries, high yield bonds, and unhedged international government bonds.

As a side note, for the same reason we z-scored reversal signals, we also hypothesized that z-scoring may work on volatility.  Beyond these two sentences, the results were nothing worth writing about.

Nevertheless, we can still attempt to confirm the existence of the low-volatility anomaly in our investable universe by ranking assets on their past volatility.

The portfolios assume a 1-month holding period for momentum signals.  To avoid timing luck, four sub-indexes are used, each rebalancing on a different week of the month.

Indeed, in plotting results we see that the lowest volatility quintiles have significantly higher realized Sharpe ratios.

Source: Bloomberg; Tiingo.  Calculations by Newfound Research.  Returns are hypothetical and backtested.  Returns are gross of all management fees, transaction fees, and taxes, but net of underlying fund fees.  Total return series assumes the reinvestment of all distributions.

Of the results plotted above, our eyes might be drawn to the results in the short-term volatility measure. It would appear that the top quintile has both a lower total return and much higher volatility than the 3rd and 4th quintiles.  This might suggest that we could improve our portfolios risk-adjusted returns without sacrificing total return by avoiding those top-ranked assets.

Unfortunately, this is not so clear cut.  Unlike the other signals where the portfolios had meaningful turnover, these quintiles are largely stable.  This means that the results are driven more by the composition of the portfolios than the underlying signals.  For example, the 3rd and 4th quintiles combine both Treasuries and credit exposure, which allows the portfolio to realize lower volatility due to correlation.  The highest volatility quintile, on the other hand, holds both local currency emerging market debt and un-hedged international government bonds, introducing (potentially uncompensated) currency risk into the portfolio.

Thus, the takeaway may be more strategic than tactical: diversification is good and currency exposure is going to increase your volatility.

Oh – and allocating to zero-to-negatively yielding foreign bonds isn’t going to do much for your return unless currency changes bail you out.

Conclusion

In this study, we explored the application of value, momentum, carry, reversal, and volatility signals across fixed income exposures.  We found that value, 3-month momentum, carry, and 3-year reversal signals may all provide meaningful information about forward expected returns and risk.

Our confidence in this analysis, however, is potentially crippled by several points:

  • The time horizon covered is, at best, two decades, and several economic variables are constant throughout it.
  • The inflation regime over the time period was largely uniform.
  • A significant proportion of the period covered had near-zero short-term Treasury yields and negative yields in foreign government debt.
  • Reversal signals require a significant amount of formation data. For example, the 3-year reversal signal requires 6 years (i.e. 3-years of rolling 3-year returns) of data before a signal can be generated. This represents nearly 1/3rd of the data set.
  • The dispersion in return dynamics (e.g. volatility and correlation) of the underlying assets can lead to the emergence of unintended artifacts in the data that may speak more to portfolio composition than the value-add from the quantitative signal.
  • We did not test whether certain exposures or certain time periods had an outsized impact upon results.
  • We did not thoroughly test stability regions for different signals.
  • We did not test the impact of our holding period assumptions.
  • Holdings within quantile portfolios were assumed to be equally weighted.

Some of these points can be addressed simply.  Stability concerns, for example, can be addressed by testing the impact of varying signal parameterization.

Others are a bit trickier and require more creative thinking or more computational horsepower.

Testing for the outsized impact of a given exposure or a given time period, for example, can be done through sub-sampling and cross-validation techniques.  We can think of this as the application of randomness to efficiently cover our search space.

For example, below we re-create our 3-month momentum quintiles, but do so by randomly selecting only 10 of the exposures and 75% of the return period to test.   We repeat this resampling 10,000 times for each quintile and plot the distribution of annualized returns below.

Even without performing an official difference-in-means test, the separation between the low and high quintile annualized return distributions provides a clue that the performance difference between these two is more likely to be a pervasive effect rather than due to an outlier holding or outlier time period.

We can make this test more explicit by using this subset resampling technique to bootstrap a distribution of annualized returns for a top-minus-bottom quintile long/short portfolio.  Specifically, we randomly select a subset of assets and generate our 3-month momentum signals.  We construct a dollar-neutral long/short portfolio by going long assets falling in the top quintile and short assets falling in the bottom quintile.  We then select a random sub-period and calculate the annualized return.

Only 207 of the 10,000 samples fall below 0%, indicating a high statistical likelihood that the outperformance of recent winners over recent losers is not an effect dominated by a specific subset of assets or time-periods.

While this commentary provides a first step towards analyzing quantitative style signals across fixed income exposures, more tests need to be run to develop greater confidence in their efficacy.

Source: Bloomberg; Tiingo.  Calculations by Newfound Research.  Returns are hypothetical and backtested.  Returns are gross of all management fees, transaction fees, and taxes, but net of underlying fund fees.  Total return series assumes the reinvestment of all distributions.

 


 

Page 2 of 4

Powered by WordPress & Theme by Anders Norén