The Research Library of Newfound Research

Category: Value Page 3 of 4

Tactical Portable Beta

This post is available as a PDF download here.

Summary­

  • In this commentary, we revisit the idea of portable beta: utilizing leverage to overlay traditional risk premia on existing strategic allocations.
  • While a 1.5x levered 60/40 portfolio has historically out-performed an all equity blend with similar risk levels, it can suffer through prolonged periods of under-performance.
  • Positive correlations between stocks and bonds, inverted yield curves, and rising interest rate environments can make simply adding bond exposure on top of equity exposure a non-trivial pursuit.
  • We rely on prior research to introduce a tactical 90/60 model, which uses trend signals to govern equity exposure and value, momentum, and carry signals to govern bond exposure.
  • We find that such a model has historically exhibited returns in-line with equities with significantly lower maximum drawdown.

In November 2017, I was invited to participate in a Bloomberg roundtable discussion with Barry Ritholtz, Dave Nadig, and Ben Fulton about the future of ETFs.  I was quoted as saying,

Most of the industry agrees that we are entering a period of much lower returns for stocks and fixed income. That’s a problem for younger generations. The innovation needs to be around efficient use of capital. Instead of an ETF that holds intermediate-term Treasuries, I would like to see a U.S. Treasury ETF that uses Treasuries as collateral to buy S&P 500 futures, so you end up getting both stock and bond exposure.  By introducing a modest amount of leverage, you can take $1 and trade it as if the investor has $1.50. After 2008, people became skittish around derivatives, shorting, and leverage. But these aren’t bad things when used appropriately.

Shortly after the publication of the discussion, we penned a research commentary titled Portable Beta which extolled the potential virtues of employing prudent leverage to better exploit diversification opportunities.  For investors seeking to enhance returns, increasing beta exposure may be a more reliable approach than the pursuit of alpha.

In August 2018, WisdomTree introduced the 90/60 U.S. Balanced Fund (ticker: NTSX), which blends core equity exposure with a U.S. Treasury futures ladder to create the equivalent of a 1.5x levered 60/40 portfolio.  On March 27, 2019, NTSX was awarded ETF.com’s Most Innovative New ETF of 2018.

The idea of portable beta was not even remotely uniquely ours.  Two anonymous Twitter users – “Jake” (@EconomPic) and “Unrelated Nonsense” (@Nonrelatedsense) – had discussed the idea several times prior to my round-table in 2017.  They argued that such a product could be useful to free up space in a portfolio for alpha-generating ideas.  For example, an investor could hold 66.6% of their wealth in a 90/60 portfolio and use the other 33.3% of their portfolio for alpha ideas.  While the leverage is technically applied to the 60/40, the net effect would be a 60/40 portfolio with a set of alpha ideas overlaid on the portfolio. Portable beta becomes portable alpha.

Even then, the idea was not new.  After NTSX launched, Cliff Asness, co-founder and principal of AQR Capital Management, commented on Twitter that even though he had a “22-year head start,” WisdomTree had beat him to launching a fund.  In the tweet, he linked to an article he wrote in 1996, titled Why Not 100% Equities, wherein Cliff demonstrated that from 1926 to 1993 a 60/40 portfolio levered to the same volatility as equities achieved an excess return of 0.8% annualized above U.S. equities.  Interestingly, the appropriate amount of leverage utilized to match equities was 155%, almost perfectly matching the 90/60 concept.

Source: Asness, Cliff. Why Not 100% Equities.  Journal of Portfolio Management, Winter 1996, Volume 22 Number 2.

Following up on Cliff’s Tweet, Jeremy Schwartz from WisdomTree extended the research out-of-sample, covering the quarter century that followed Cliff’s initial publishing date.  Over the subsequent 25 years, Jeremy found that a levered 60/40 outperformed U.S. equities by 2.6% annualized.

NTSX is not the first product to try to exploit the idea of diversification and leverage.  These ideas have been the backbone of managed futures and risk parity strategies for decades. The entire PIMCO’s StocksPLUS suite – which traces its history back to 1986 – is built on these foundations.  The core strategy combines an actively managed portfolio of fixed income with 100% notional exposure in S&P 500 futures to create a 2x levered 50/50 portfolio.

The concept traces its roots back to the earliest eras of modern financial theory. Finding the maximum Sharpe ratio portfolio and gearing it to the appropriate risk level has always been considered to be the theoretically optimal solution for investors.

Nevertheless, after 2008, the words “leverage” and “derivatives” have largely been terms non gratisin the realm of investment products. But that may be to the detriment of investors.

90/60 Through the Decades

While we are proponents of the foundational concepts of the 90/60 portfolio, frequent readers of our commentary will not be surprised to learn that we believe there may be opportunities to enhance the idea through tactical asset allocation.  After all, while a 90/60 may have out-performed over the long run, the short-run opportunities available to investors can deviate significantly.  The prudent allocation at the top of the dot-com bubble may have looked quite different than that at the bottom of the 2008 crisis.

To broadly demonstrate this idea, we can examine the how the realized efficient frontier of stock/bond mixes has changed shape over time.  In the table below, we calculate the Sharpe ratio for different stock/bond mixes realized in each decade from the 1920s through present.

Source: Global Financial Data.  Calculations by Newfound Research.  Returns are hypothetical and backtested.  Returns are gross of all fees, transaction costs, and taxes.  Returns assume the reinvestment of all distributions.   Bonds are the GFD Indices USA 10-Year Government Bond Total Return Index and Stocks are the S&P 500 Total Return Index (with GFD Extension).  Sharpe ratios are calculated with returns excess of the GFD Indices USA Total Return T-Bill Index.  You cannot invest in an index.  2010s reflect a partial decade through 4/2019.

We should note here that the original research proposed by Asness (1996) assumed a bond allocation to an Ibbotson corporate bond series while we employ a constant maturity 10-year U.S. Treasury index.  While this leads to lower total returns in our bond series, we do not believe it meaningfully changes the conclusions of our analysis.

We can see that while the 60/40 portfolio has a higher realized Sharpe ratio than the 100% equity portfolio in eight of ten decades, it has a lower Sharpe ratio in two consecutive decades from 1950 – 1960.  And the 1970s were not a ringing endorsement.

In theory, a higher Sharpe ratio for a 60/40 portfolio would imply that an appropriately levered version would lead to higher realized returns than equities at the same risk level.  Knowing the appropriate leverage level, however, is non-trivial, requiring an estimate of equity volatility.  Furthermore, leverage requires margin collateral and the application of borrowing rates, which can create a drag on returns.

Even if we conveniently ignore these points and assume a constant 90/60, we can still see that such an approach can go through lengthy periods of relative under-performance compared to buy-and-hold equity.  Below we plot the annualized rolling 3-year returns of a 90/60 portfolio (assuming U.S. T-Bill rates for leverage costs) minus 100% equity returns.  We can clearly see that the 1950s through the 1980s were largely a period where applying such an approach would have been frustrating.

Source: Global Financial Data.  Calculations by Newfound Research.  Returns are hypothetical and backtested.  Returns are gross of all fees, transaction costs, and taxes.   Bonds are the GFD Indices USA 10-Year Government Bond Total Return Index and Stocks are the S&P 500 Total Return Index (with GFD Extension).  The 90/60 portfolio invests 150% each month in the 60/40 portfolio and -50% in the GFD Indices USA Total Return T-Bill Index.  You cannot invest in an index.

Poor performance of the 90/60 portfolio in this era is due to two effects.

First, 10-year U.S. Treasury rates rose from approximately 4% to north of 15%.  While a constant maturity index would constantly roll into higher interest bonds, it would have to do so by selling old holdings at a loss.  Constantly harvesting price losses created a headwind for the index.

This is compounded in the 90/60 by the fact that the yield curve over this period spent significant time in an inverted state, meaning that the cost of leverage exceeded the yield earned on 40% of the portfolio, leading to negative carry. This is illustrated in the chart below, with –T-Bills– realizing a higher total return over the period than –Bonds–.

Source: Global Financial Data.  Calculations by Newfound Research.  Returns are hypothetical and backtested.  Returns are gross of all fees, transaction costs, and taxes.  Returns assume the reinvestment of all distributions.   T-Bills are the GFD Indices USA Total Return T-Bill Index, Bonds are the GFD Indices USA 10-Year Government Bond Total Return Index, and Stocks are the S&P 500 Total Return Index (with GFD Extension). You cannot invest in an index.

This is all arguably further complicated by the fact that while a 1.5x levered 60/40 may closely approximate the risk level of a 100% equity portfolio over the long run, it may be a far cry from it over the short-run.  This may be particularly true during periods where stocks and bonds exhibit positive realized correlations as they did during the 1960s through 1980s.  This can occur when markets are more pre-occupied with inflation risk than economic risk.  As inflationary fears abated and economic risk become the foremost concern in the 1990s, correlations between stocks and bonds flipped.

Thus, during the 1960s-1980s, a 90/60 portfolio exhibited realized volatility levels in excess of an all-equity portfolio, while in the 2000s it has been below.

This all invites the question: should our levered allocation necessarily be static?

Getting Tactical with a 90/60

We might consider two approaches to creating a tactical 90/60.

The first is to abandon the 90/60 model outright for a more theoretically sound approach. Specifically, we could attempt to estimate the maximum Sharpe ratio portfolio, and then apply the appropriate leverage such that we either hit a (1) constant target volatility or (2) the volatility of equities.  This would require us to not only accurately estimate the expected excess returns of stocks and bonds, but also their volatilities and correlations. Furthermore, when the Sharpe optimal portfolio is highly conservative, notional exposure far exceeding 200% may be necessary to hit target volatility levels.

In the second approach, equity and bond exposure would each be adjusted tactically, without regard for the other exposure.  While less theoretically sound, one might interpret this approach as saying, “we generally want exposure to the equity and bond risk premia over the long run, and we like the 60/40 framework, but there might be certain scenarios whereby we believe the expected return does not justify the risk.”  The downside to this approach is that it may sacrifice potential diversification benefits between stocks and bonds.

Given the original concept of portable beta is to increase exposure to the risk premia we’re already exposed to, we prefer the second approach.  We believe it more accurately reflects the notion of trying to provide long-term exposure to return-generating risk premia while trying to avoid the significant and prolonged drawdowns that can be realized with buy-and-hold approaches.

Equity Signals

To manage exposure to the equity risk premium, our preferred method is the application of trend following signals in an approach we call trend equity.  We will approximate this class of strategies with our Newfound Research U.S. Trend Equity Index.

To determine whether our signals are able to achieve their goal of “protect and participate” with the underlying risk premia, we will plot their regime-conditional betas.  To do this, we construct a simple linear model:

We define a bear regime as the worst 16% of monthly returns, a bull regime as the best 16% of monthly returns, and a normal regime as the remaining 68% of months. Note that the bottom and top 16thpercentiles are selected to reflect one standard deviation.

Below we plot the strategy conditional betas relative to U.S. equity

We can see that trend equity has a normal regime beta to U.S. equities of approximately 0.75 and a bear market beta of 0.5, in-line with expectations that such a strategy might capture 70-80% of the upside of U.S. equities in a bull market and 40-50% of the downside in a prolonged bear market. Trend equity beta of U.S. equities in a bull regime is close to the bear market beta, which is consistent with the idea that trend equity as a style has historically sacrificed the best returns to avoid the worst.

Bond Signals

To govern exposure to the bond risk premium, we prefer an approach based upon a combination of quantitative, factor-based signals.  We’ve written about many of these signals over the last two years; specifically in Duration Timing with Style Premia (June 2017), Timing Bonds with Value, Momentum, and Carry (January 2018), and A Carry-Trend-Hedge Approach to Duration Timing (October 2018).  In these three articles we explore various mixes of value, momentum, carry, flight-to-safety, and bond risk premium measures as potential signals for timing duration exposure.

We will not belabor this commentary unnecessarily by repeating past research.  Suffice it to say that we believe there is sufficient evidence that value (deviation in real yield), momentum (prior returns), and carry (term spread) can be utilized as effective timing signals and in this commentary are used to construct bond indices where allocations are varied between 0-100%.  Curious readers can pursue further details of how we construct these signals in the commentaries above.

As before, we can determine conditional regime betas for strategies based upon our signals.

We can see that our value, momentum, and carry signals all exhibit an asymmetric beta profile with respect to 10-year U.S. Treasury returns.  Carry and momentum exhibit an increase in bull market betas while value exhibits a decrease in bear market beta.

Combining Equity and Bond Signals into a Tactical 90/60

Given these signals, we will construct a tactical 90/60 portfolio as being comprised of 90% trend equity, 20% bond value, 20% bond momentum, and 20% bond carry. When notional exposure exceeds 100%, leverage cost is assumed to be U.S. T-Bills.  Taken together, the portfolio has a large breadth of potential configurations, ranging from 100% T-Bills to a 1.5x levered 60/40 portfolio.

But what is the appropriate benchmark for such a model?

In the past, we have argued that the appropriate benchmark for trend equity is a 50% stock / 50% cash benchmark, as it not only reflects the strategic allocation to equities empirically seen in return decompositions, but it also allows both positive and negative trend calls to contribute to active returns.

Similarly, we would argue that the appropriate benchmark for our tactical 90/60 model is not a 90/60 itself – which reflects the upper limit of potential capital allocation – but rather a 45% stock / 30% bond / 25% cash mix.  Though, for good measure we might also consider a bit of hand-waving and just use a 60/40 as a generic benchmark as well.

Below we plot the annualized returns versus maximum drawdown for different passive and active portfolio combinations from 1974 to present (reflecting the full period of time when strategy data is available for all tactical signals).  We can see that not only does the tactical 90/60 model (with both trend equity and tactical bonds) offer a return in line with U.S. equities over the period, it does so with significantly less drawdown (approximately half).  Furthermore, the tactical 90/60 exceeded trend equity and 60/40 annualized returns by 102 and 161 basis points respectively.

These improvements to the return and risk were achieved with the same amount of capital commitment as in the other allocations. That’s the beauty of portable beta.

Source: Federal Reserve of St. Louis, Kenneth French Data Library, and Newfound Research.  Calculations by Newfound Research.  Returns are hypothetical and backtested.  Returns are gross of all fees, transaction costs, and taxes.  Returns assume the reinvestment of all distributions.   You cannot invest in an index.

Of course, full-period metrics can deceive what an investor’s experience may actually be like.  Below we plot rolling 3-year annualized returns of U.S. equities, the 60/40 mix, trend equity, and the tactical 90/60.

Source: Federal Reserve of St. Louis, Kenneth French Data Library, and Newfound Research.  Calculations by Newfound Research.  Returns are hypothetical and backtested.  Returns are gross of all fees, transaction costs, and taxes.  Returns assume the reinvestment of all distributions.   You cannot invest in an index.

The tactical 90/60 model out-performed a 60/40 in 68% of rolling 3-year periods and the trend equity model in 71% of rolling 3-year periods.  The tactical 90/60, however, only out-performs U.S. equities in 35% of rolling 3-year periods, with the vast majority of relative out-performance emerging during significant equity drawdown periods.

For investors already allocated to trend equity strategies, portable beta – or portable tactical beta – may represent an alternative source of potential return enhancement.  Rather than seeking opportunities for alpha, portable beta allows for an overlay of more traditional risk premia, which may be more reliable from an empirical and academic standpoint.

The potential for increased returns is illustrated below in the rolling 3-year annualized return difference between the tactical 90/60 model and the Newfound U.S. Trend Equity Index.

Source: Federal Reserve of St. Louis, Kenneth French Data Library, and Newfound Research.  Calculations by Newfound Research.  Returns are hypothetical and backtested.  Returns are gross of all fees, transaction costs, and taxes.  Returns assume the reinvestment of all distributions.   You cannot invest in an index.

From Theory to Implementation

In practice, it may be easier to acquire leverage through the use of futures contracts. For example, applying portable bond beta on-top of an existing trend equity strategy may be achieved through the use of 10-year U.S. Treasury futures.

Below we plot the growth of $1 in the Newfound U.S. Trend Equity Index and a tactical 90/60 model implemented with Treasury futures.  Annualized return increases from 7.7% to 8.9% and annualized volatility declines from 9.7% to 8.5%.  Finally, maximum drawdown decreases from 18.1% to 14.3%.

We believe the increased return reflects the potential return enhancement benefits from introducing further exposure to traditional risk premia, while the reduction in risk reflects the benefit achieved through greater portfolio diversification.

Source: Quandl and Newfound Research.  Calculations by Newfound Research.  Returns are hypothetical and backtested.  Returns are gross of all fees, transaction costs, and taxes.  Returns assume the reinvestment of all distributions.   You cannot invest in an index.

It should be noted, however, that a levered constant maturity 10-year U.S. Treasury index and 10-year U.S. Treasury futures are not the same.  The futures contracts are specified such that eligible securities for delivery include Treasury notes with a remaining term to maturity of between 6.5 and 10 years.  This means that the investor short the futures contract has the option of which Treasury note to deliver across a wide spectrum of securities with potentially varying characteristics.

In theory, this investor will always choose to deliver the bond that is cheapest. Thus, Treasury futures prices will reflect price changes of this so-calledcheapest-to-deliver bond, which often does not reflect an actual on-the-run 10-year Treasury note.

Treasury futures therefore utilize a “conversion factor” invoicing system referenced to the 6% futures contract standard.  Pricing also reflects a basis adjustment that reflects the coupon income a cash bond holder would receive minus financing costs (i.e. the cost of carry) as well as the value of optionality provided to the futures seller.

Below we plot monthly returns of 10-year U.S. Treasury futures versus the excess returns of a constant maturity 10-year U.S. Treasury index.  We can see that the futures had a beta of approximately 0.76 over the nearly 20-year period, which closely aligns with the conversion factor over the period.

Source: Quandl and the Federal Reserve of St. Louis.  Calculations by Newfound Research.

Despite these differences, futures can represent a highly liquid and cost-effective means of implementing a portable beta strategy.  It should be further noted that having a lower “beta” over the last two decades has not necessarily implied a lower return as the basis adjustment can have a considerable impact.  We demonstrate this in the graph below by plotting the returns of continuously-rolled 10-year U.S. Treasury futures (rolled on open interest) and the excess return of a constant maturity 10-year U.S. Treasury index.

Source: Quandl and Newfound Research.  Calculations by Newfound Research.  Returns are hypothetical and backtested.  Returns are gross of all fees, transaction costs, and taxes.  Returns assume the reinvestment of all distributions.   You cannot invest in an index.

Conclusion

In a low return environment, portable beta may be a necessary tool for investors to generate the returns they need to hit their financial goals and reduce their risk of failing slow.

Historically, a 90/60 portfolio has outperformed equities with a similar level of risk. However, the short-term dynamics between stocks and bonds can make the volatility of a 90/60 portfolio significantly higher than a simple buy-and-hold equity portfolio. Rising interest rates and inverted yield curves can further confound the potential benefits versus an all-equity portfolio.

Since constant leverage is not a guarantee and we do not know how the future will play out, moving beyond standard portable beta implementations to tactical solutions may augment the potential for risk management and lead to a smoother ride over the short-term.

Getting over the fear of using leverage and derivatives may be an uphill battle for investors, but when used appropriately, these tools can make portfolios work harder. Risks that are known and compensated with premiums can be prudent to take for those willing to venture out and bear them.

If you are interested in learning how Newfound applies the concepts of tactical portable beta to its mandates, please reach out (info@thinknewfound.com).

Factor Fimbulwinter

This post is available as a PDF download here.

Summary­

  • Value investing continues to experience a trough of sorrow. In particular, the traditional price-to-book factor has failed to establish new highs since December 2006 and sits in a 25% drawdown.
  • While price-to-book has been the academic measure of choice for 25+ years, many practitioners have begun to question its value (pun intended).
  • We have also witnessed the turning of the tides against the size premium, with many practitioners no longer considering it to be a valid stand-alone anomaly. This comes 35+ years after being first published.
  • With this in mind, we explore the evidence that would be required for us to dismiss other, already established anomalies.  Using past returns to establish prior beliefs, we simulate out forward environments and use Bayesian inference to adjust our beliefs over time, recording how long it would take for us to finally dismiss a factor.
  • We find that for most factors, we would have to live through several careers to finally witness enough evidence to dismiss them outright.
  • Thus, while factors may be established upon a foundation of evidence, their forward use requires a bit of faith.

In Norse mythology, Fimbulvetr (commonly referred to in English as “Fimbulwinter”) is a great and seemingly never-ending winter.  It continues for three seasons – long, horribly cold years that stretch on longer than normal – with no intervening summers.  It is a time of bitterly cold, sunless days where hope is abandoned and discord reigns.

This winter-to-end-all-winters is eventually punctuated by Ragnarok, a series of events leading up to a great battle that results in the ultimate death of the major gods, destruction of the cosmos, and subsequent rebirth of the world.

Investment mythology is littered with Ragnarok-styled blow-ups and we often assume the failure of a strategy will manifest as sudden catastrophe.  In most cases, however, failure may more likely resemble Fimbulwinter: a seemingly never-ending winter in performance with returns blown to-and-fro by the harsh winds of randomness.

Value investors can attest to this.  In particular, the disciples of price-to-book have suffered greatly as of late, with “expensive” stocks having outperformed “cheap” stocks for over a decade.  The academic interpretation of the factor sits nearly 25% belowits prior high-water mark seen in December 2006.

Expectedly, a large number of articles have been written about the death of the value factor.  Some question the factor itself, while others simply argue that price-to-book is a broken implementation.

But are these simply retrospective narratives, driven by a desire to have an explanation for a result that has defied our expectations?  Consider: if price-to-book had exhibited positive returns over the last decade, would we be hearing from nearly as large a number of investors explaining why it is no longer a relevant metric?

To be clear, we believe that many of the arguments proposed for why price-to-book is no longer a relevant metric are quite sound. The team at O’Shaughnessy Asset Management, for example, wrote a particularly compelling piece that explores how changes to accounting rules have led book value to become a less relevant metric in recent decades.1

Nevertheless, we think it is worth taking a step back, considering an alternate course of history, and asking ourselves how it would impact our current thinking.  Often, we look back on history as if it were the obvious course.  “If only we had better prior information,” we say to ourselves, “we would have predicted the path!”2  Rather, we find it more useful to look at the past as just one realized path of many that’s that could have happened, none of which were preordained.  Randomness happens.

With this line of thinking, the poor performance of price-to-book can just as easily be explained by a poor roll of the dice as it can be by a fundamental break in applicability.  In fact, we see several potential truths based upon performance over the last decade:

  1. This is all normal course performance variance for the factor.
  2. The value factor works, but the price-to-book measure itself is broken.
  3. The price-to-book measure is over-crowded in use, and thus the “troughs of sorrow” will need to be deeper than ever to get weak hands to fold and pass the alpha to those with the fortitude to hold.
  4. The value factor never existed in the first place; it was an unfortunate false positive that saturated the investing literature and broad narrative.

The problem at hand is two-fold: (1) the statistical evidence supporting most factors is considerable and (2) the decade-to-decade variance in factor performance is substantial.  Taken together, you run into a situation where a mere decade of underperformance likely cannot undue the previously established significance.  Just as frustrating is the opposite scenario. Consider that these two statements are not mutually exclusive: (1) price-to-book is broken, and (2) price-to-book generates positive excess return over the next decade.

In investing, factor return variance is large enough that the proof is not in the eating of the short-term return pudding.

The small-cap premium is an excellent example of the difficulty in discerning, in real time, the integrity of an established factor.  The anomaly has failed to establish a meaningful new high since it was originally published in 1981.  Only in the last decade – nearly 30 years later – have the tides of the industry finally seemed to turn against it as an established anomaly and potential source of excess return.

Thirty years.

The remaining broadly accepted factors – e.g. value, momentum, carry, defensive, and trend – have all been demonstrated to generate excess risk-adjusted returns across a variety of economic regimes, geographies, and asset classes, creating a great depth of evidence supporting their existence. What evidence, then, would make us abandon faith from the Church of Factors?

To explore this question, we ran a simple experiment for each factor.  Our goal was to estimate how long it would take to determine that a factor was no longer statistically significant.

Our assumption is that the salient features of each factor’s return pattern will remain the same (i.e. autocorrelation, conditional heteroskedasticity, skewness, kurtosis, et cetera), but the forward average annualized return will be zero since the factor no longer “works.”

Towards this end, we ran the following experiment: 

  1. Take the full history for the factor and calculate prior estimates for mean annualized return and standard error of the mean.
  2. De-mean the time-series.
  3. Randomly select a 12-month chunk of returns from the time series and use the data to perform a Bayesian update to our mean annualized return.
  4. Repeat step 3 until the annualized return is no longer statistically non-zero at a 99% confidence threshold.

For each factor, we ran this test 10,000 times, creating a distribution that tells us how many years into the future we would have to wait until we were certain, from a statistical perspective, that the factor is no longer significant.

Sixty-seven years.

Based upon this experience, sixty-seven years is median number of years we will have to wait until we officially declare price-to-book (“HML,” as it is known in the literature) to be dead.3  At the risk of being morbid, we’re far more likely to die before the industry finally sticks a fork in price-to-book.

We perform this experiment for a number of other factors – including size (“SMB” – “small-minus-big”), quality (“QMJ” – “quality-minus-junk”), low-volatility (“BAB” – “betting-against-beta”), and momentum (“UMD” – “up-minus-down”) – and see much the same result.  It will take decades before sufficient evidence mounts to dethrone these factors.

HMLSMB4QMJBABUMD
Median Years-until-Failure6743132284339

 

Now, it is worth pointing out that these figures for a factor like momentum (“UMD”) might be a bit skewed due to the design of the test.  If we examine the long-run returns, we see a fairly docile return profile punctuated by sudden and significant drawdowns (often called “momentum crashes”).

Since a large proportion of the cumulative losses are contained in these short but pronounced drawdown periods, demeaning the time-series ultimately means that the majority of 12-month periods actually exhibit positive returns.  In other words, by selecting random 12-month samples, we actually expect a high frequency of those samples to have a positive return.

For example, using this process, 49.1%, 47.6%, 46.7%, 48.8% of rolling 12-month periods are positive for HML, SMB, QMJ, and BAB factors respectively.  For UMD, that number is 54.7%.  Furthermore, if you drop the worst 5% of rolling 12-month periods for UMD, the average positive period is 1.4x larger than the average negative period.  Taken together, not only are you more likely to select a positive 12-month period, but those positive periods are, on average, 1.4x larger than the negative periods you will pick, except for the rare (<5%) cases.

The process of the test was selected to incorporate the salient features of each factor.  However, in the case of momentum, it may lead to somewhat outlandish results.

Conclusion

While an evidence-based investor should be swayed by the weight of the data, the simple fact is that most factors are so well established that the majority of current practitioners will likely go our entire careers without experiencing evidence substantial enough to dismiss any of the anomalies.

Therefore, in many ways, there is a certain faith required to use them going forward. Yes, these are ideas and concepts derived from the data.  Yes, we have done our best to test their robustness out-of-sample across time, geographies, and asset classes.  Yet we must also admit that there is a non-zero probability, however small it is, that these are false positives: a fact we may not have sufficient evidence to address until several decades hence.

And so a bit of humility is warranted.  Factors will not suddenly stand up and declare themselves broken.  And those that are broken will still appear to work from time-to-time.

Indeed, the death of a factor will be more Fimulwinter than Ragnarok: not so violent to be the end of days, but enough to cause pain and frustration among investors.

 

Addendum

We have received a large number of inbound notes about this commentary, which fall upon two primary lines of questions.  We want to address these points.

How were the tests impacted by the Bayesian inference process?

The results of the tests within this commentary are rather astounding.  We did seek to address some of the potential flaws of the methodology we employed, but by-in-large we feel the overarching conclusion remains on a solid foundation.

While we only presented the results of the Bayesian inference approach in this commentary, as a check we actually tested two other approaches:

  1. A Bayesian inference approach assuming that forward returns would be a random walk with constant variance (based upon historical variance) and zero mean.
  2. Forward returns were simulated using the same bootstrap approach, but the factor was being discovered for the first time and the entire history was being evaluated for its significance.

The two tests were in effort to isolate the effects of the different components of our test.

What we found was that while the reported figures changed, the overall  magnitude did not.  In other words, the median death-date of HML may not have been 67 years, but the order of magnitude remained much the same: decades.

Stepping back, these results were somewhat a foregone conclusion.  We would not expect an effect that has been determined to be statistically significant over a hundred year period to unravel in a few years.  Furthermore, we would expect a number of scenarios that continue to bolster the statistical strength just due to randomness alone.

Why are we defending price-to-book?

The point of this commentary was not to defend price-to-book as a measure.  Rather, it was to bring up a larger point.

As a community, quantitative investors often leverage statistical significance as a defense for the way we invest.

We think that is a good thing.  We should look at the weight of the evidence.  We should be data driven.  We should try to find ideas that have proven to be robust over decades of time and when applied in different markets or with different asset classes.  We should want to find strategies that are robust to small changes in parameterization.

Many quants would argue (including us among them), however, that there also needs to be a why.  Why does this factor work?  Without the why, we run the risk of glorified data mining.  With the why, we can choose for ourselves whether we believe the effect will continue going forward.

Of course, there is nothing that prevents the why from being pure narrative fallacy.  Perhaps we have simply weaved a story into a pattern of facts.

With price-to-book, one might argue we have done the exact opposite.  The effect, technically, remains statistically significant and yet plenty of ink has been spilled as to why it shouldn’t work in the future.

The question we must answer, then, is, “when does statistically significant apply and when does it not?”  How can we use it as a justification in one place and completely ignore it in others?

Furthermore, if we are going to rely on hundreds of years of data to establish significance, how can we determine when something is “broken” if the statistical evidence does not support it?

Price-to-book may very well be broken.  But that is not the point of this commentary.  The point is simply that the same tools we use to establish and defend factors may prevent us from tearing them down.

 

Timing Bonds with Value, Momentum, and Carry

This post is available as a PDF download here.

Summary­­

  • Bond timing has been difficult for the past 35 years as interest rates have declined, especially since bonds started the period with high coupons.
  • With low current rates and higher durations, the stage may be set for systematic, factor-based bond investing.
  • Strategies such as value, momentum, and carry have done well historically, especially on a risk-adjusted basis.
  • Diversifying across these three strategies and employing prudent leverage takes advantage of differences in the processes and the information contained in their joint decisions.

This commentary is a slight re-visit and update to a commentary we wrote last summer, Duration Timing with Style Premia[1].  The models we use here are similar in nature, but have been updated with further details and discussion, warranting a new piece.

Historically Speaking, This is a Bad Idea

Let’s just get this out of the way up front: the results of this study are probably not going to look great.

Since interest rates peaked in September 1981, the excess return of a constant maturity 10-year U.S. Treasury bond index has been 3.6% annualized with only 7.3% volatility and a maximum drawdown of 16.4%.  In other words, about as close to a straight line up and to the right as you can get.

Source: Federal Reserve of St. Louis.  Calculations by Newfound Research.

With the benefit of hindsight, this makes sense.  As we demonstrated in Did Declining Rates Actually Matter?[2], the vast majority of bond index returns over the last 30+ years have been a result of the high average coupon rate.  High average coupons kept duration suppressed, meaning that changes in rates produced less volatile movements in bond prices.

Source: Federal Reserve of St. Louis.  Calculations by Newfound Research.

Ultimately, we estimate that roll return and benefits from downward shifts in the yield curve only accounted for approximately 30% of the annualized return.

Put another way, whenever you got “out” of bonds over this period, there was a very significant opportunity cost you were experiencing in terms of foregone interest payments, which accounted for 70% of the total return.

If we use this excess return as our benchmark, we’ve made the task nearly impossible for ourselves.  Consider that if we are making “in or out” tactical decisions (i.e. no leverage or shorting) and our benchmark is fully invested at all times, we can only outperform due to our “out” calls.  Relative to the long-only benchmark, we get no credit for correct “in” calls since correct “in” calls mean we are simply keeping up with the benchmark.  (Note: Broadly speaking, this highlights the problems with applying traditional benchmarks to tactical strategies.)  In a period of consistently positive returns, our “out” calls must be very accurate, in fact probably unrealistically accurate, to be able to outperform.

When you put this all together, we’re basically asking, “Can you create a tactical strategy that can only outperform based upon its calls to get out of the market over a period of time when there was never a good time to sell?”

The answer, barring some serious data mining, is probably, “No.”

This Might Now be a Good Idea

Yet this idea might have legs.

Since the 10-year rate peaked in 1981, the duration of a constant maturity 10-year U.S. bond index has climbed from 4.8 to 8.7.  In other words, bonds are now 1.8x more sensitive to changes in interest rates than they were 35 years ago.

If we decompose bond returns in the post-crisis era, we can see that shifts in the yield curve have played a large role in year-to-year performance.  The simple intuition is that as coupons get smaller, they are less effective as cushions against rate volatility.

Higher durations and lower coupons are a potential double whammy when it comes to fixed income volatility.

Source: Federal Reserve of St. Louis.  Calculations by Newfound Research.

With rates low and durations high, strategies like value, momentum, and carry may afford us more risk-managed access to fixed income.

Timing Bonds with Value

Following the standard approach taken in most literature, we will use real yields as our measure of value.  Specifically, we will estimate real yield by taking the current 10-year U.S. Treasury rate minus the 10-year forecasted inflation rate from Philadelphia Federal Reserve’s Survey of Professional Forecasters.[3]

To come up with our value timing signal, we will compare real yield to a 3-year exponentially weighted average of real yield.

Here we need to be a bit careful.  With a secular decline in real yields over the last 30 years, comparing current real yield against a trailing average of real yield will almost surely lead to an overvalued conclusion, as the trailing average will likely be higher.

Thus, we need to de-trend twice.  We first subtract real yield from the trailing average, and then subtract this difference from a trailing average of differences.  Note that if there is no secular change in real yields over time, this second step should have zero impact. What this is measuring is the deviation of real yields relative to any linear trend.

After both of these steps, we are left with an estimate of how far our real rates are away from fair value, where fair value is defined by our particular methodology rather than any type of economic analysis.  When real rates are below our fair value estimate, we believe they are overvalued and thus expect rates to go up.  Similarly, when rates are above our fair value estimate, we believe they are undervalued and thus expect them to go down.

Source: Federal Reserve of St. Louis.  Philadelphia Federal Reserve.  Calculations by Newfound Research.

Before we can use this valuation measure as our signal, we need to take one more step.  In the graph above, we see that the deviation from fair value in September 1993 was approximately the same as it was in June 2003: -130bps (implying that rates were 130bps below fair value and therefore bonds were overvalued).  However, in 1993, rates were at about 5.3% while in 2003 rates were closer to 3.3%.  Furthermore, duration was about 0.5 higher in 2003 than it was 1993.

In other words, a -130bps deviation from fair value does not mean the same thing in all environments.

One way of dealing with this is by forecasting the actual bond return over the next 12 months, including any coupons earned, by assuming real rates revert to fair value (and taking into account any roll benefits due to yield curve steepness).  This transformation leaves us with an actual forecast of expected return.

We need to be careful, however, as our question of whether to invest or not is not simply based upon whether the bond index has a positive expected return.  Rather, it is whether it has a positive expected return in excess of our alternative investment.  In this case, that is “cash.”  Here, we will proxy cash with a constant maturity 1-year U.S. Treasury index.

Thus, we need to net out the expected return from the 1-year position, which is just its yield. [4]

Source: Federal Reserve of St. Louis.  Philadelphia Federal Reserve.  Calculations by Newfound Research.

While the differences here are subtle, had our alternative position been something like a 5-year U.S. Treasury Index, we may see much larger swings as the impact of re-valuation and roll can be much larger.

Using this total expected return, we can create a simple timing model that goes long the 10-year index and short cash when expected excess return is positive and short the 10-year index and long cash when expected excess return is negative.  As we are forecasting our returns over a 1-year period, we will employ a 1-year hold with 52 overlapping portfolios to mitigate the impact of timing luck.

We plot the results of the strategy below.

Source: Federal Reserve of St. Louis.  Philadelphia Federal Reserve.  Calculations by Newfound Research.  Results are hypothetical and backtested.  Past performance is not a guarantee of future results.  Returns are gross of all fees (including management fees, transaction costs, and taxes).  Returns assume the reinvestment of all income and distributions.

The value strategy return matches the 10-year index excess return nearly exactly (2.1% vs 2.0%) with just 70% of the volatility (5.0% vs 7.3%) and 55% of the max drawdown (19.8% versus 36.2%).

Timing Bonds with Momentum

For all the hoops we had to jump through with value, the momentum strategy will be fairly straightforward.

We will simply look at the trailing 12-1 month total return of the index versus the alternative (e.g. the 10-year index vs. the 1-year index) and invest in the security that has outperformed and short the other.  For example, if the 12-1 month total return for the 10-year index exceeds that of the 1-year index, we will go long the 10-year and short the 1-year, and vice versa.

Since momentum tends to decay quickly, we will use a 1-month holding period, implemented with four overlapping portfolios.

Source: Federal Reserve of St. Louis.  Philadelphia Federal Reserve.  Calculations by Newfound Research.  Results are hypothetical and backtested.  Past performance is not a guarantee of future results.  Returns are gross of all fees (including management fees, transaction costs, and taxes).  Returns assume the reinvestment of all income and distributions.

(Note that this backtest starts earlier than the value backtest because it only requires 12 months of returns to create a trading signal versus 6 years of data – 3 for the value anchor and 3 to de-trend the data – for the value score.)

Compared to the buy-and-hold approach, the momentum strategy increases annualized return by 0.5% (1.7% versus 1.2%) while closely matching volatility (6.7% versus 6.9%) and having less than half the drawdown (20.9% versus 45.7%).

Of course, it cannot be ignored that the momentum strategy has largely gone sideways since the early 1990s.  In contrast to how we created our bottom-up value return expectation, this momentum approach is a very blunt instrument.  In fact, using momentum this way means that returns due to differences in yield, roll yield, and re-valuation are all captured simultaneously.  We can really think of decomposing our momentum signal as:

10-Year Return – 1-Year Return = (10-Year Yield – 1-Year Yield) + (10-Year Roll – 1-Year Roll) + (10-Year Shift – 1-Year Shift)

Our momentum score is indiscriminately assuming momentum in all the components.  Yet when we actually go to put on our trade, we do not need to assume momentum will persist in the yield and roll differences: we have enough data to measure them explicitly.

With this framework, we can isolate momentum in the shift component by removing yield and roll return expectations from total returns.

Source: Federal Reserve of St. Louis.  Calculations by Newfound Research.

Ultimately, the difference in signals is minor for our use of 10-year versus 1-year, though it may be far less so in cases like trading the 10-year versus the 5-year.  The actual difference in resulting performance, however, is more pronounced.

Source: Federal Reserve of St. Louis.  Philadelphia Federal Reserve.  Calculations by Newfound Research.  Results are hypothetical and backtested.  Past performance is not a guarantee of future results.  Returns are gross of all fees (including management fees, transaction costs, and taxes).  Returns assume the reinvestment of all income and distributions.

Ironically, by doing worse mid-period, the adjusted momentum long/short strategy appears to be more consistent in its return from the early 1990s through present.  We’re certain this is more noise than signal, however.

Timing Bonds with Carry

Carry is the return we earn by simply holding the investment, assuming everything else stays constant.  For a bond, this would be the yield-to-maturity.  For a constant maturity bond index, this would be the coupon yield (assuming we purchase our bonds at par) plus any roll yield we capture.

Our carry signal, then, will simply be the difference in yields between the 10-year and 1-year rates plus the difference in expected roll return.

For simplicity, we will assume roll over a 1-year period, which makes the expected roll of the 1-year bond zero.  Thus, this really becomes, more or less, a signal to be long the 10-year when the yield curve is positively sloped, and long the 1-year when it is negatively sloped.

As we are forecasting returns over the next 12-month period, we will use a 12-month holding period and implement with 52 overlapping portfolios.

Source: Federal Reserve of St. Louis.  Philadelphia Federal Reserve.  Calculations by Newfound Research.  Results are hypothetical and backtested.  Past performance is not a guarantee of future results.  Returns are gross of all fees (including management fees, transaction costs, and taxes).  Returns assume the reinvestment of all income and distributions.

Again, were we comparing the 10-year versus the 5-year instead of the 10-year versus the 1-year, the roll can have a large impact.  If the curve is fairly flat between the 5- and 10-year rates, but gets steep between the 5- and the 1-year rates, then the roll expectation from the 5-year can actually overcome the yield difference between the 5- and the 10-year rates.

Building a Portfolio of Strategies

With three separate methods to timing bonds, we can likely benefit from process diversification by constructing a portfolio of the approaches.  The simplest method to do so is to simply give each strategy an equal share.  Below we plot the results.

Source: Federal Reserve of St. Louis.  Philadelphia Federal Reserve.  Calculations by Newfound Research.  Results are hypothetical and backtested.  Past performance is not a guarantee of future results.  Returns are gross of all fees (including management fees, transaction costs, and taxes).  Returns assume the reinvestment of all income and distributions.

Indeed, by looking at per-strategy performance, we can see a dramatic jump in Information Ratio and an exceptional reduction in maximum drawdown.  In fact, the maximum drawdown of the equal weight approach is below that of any of the individual strategies, highlighting the potential benefit of diversifying away conflicting investment signals.

StrategyAnnualized ReturnAnnualized VolatilityInformation
Ratio
Max
Drawdown
10-Year Index Excess Return2.0%7.3%0.2736.2%
Value L/S2.0%5.0%0.4119.8%
Momentum L/S1.9%6.9%0.2720.9%
Carry L/S2.5%6.6%0.3820.1%
Equal Weight2.3%4.0%0.5710.2%

Source: Federal Reserve of St. Louis.  Philadelphia Federal Reserve.  Calculations by Newfound Research.  Results are hypothetical and backtested.  Past performance is not a guarantee of future results.  Returns are gross of all fees (including management fees, transaction costs, and taxes).  Returns assume the reinvestment of all income and distributions.  Performance measured from 6/1974 to 1/2018, representing the full overlapping investment period of the strategies.

One potential way to improve upon the portfolio construction is by taking into account the actual covariance structure among the strategies (correlations shown in the table below).  We can see that, historically, momentum and carry have been fairly positively correlated while value has been independent, if not slightly negatively correlated.  Therefore, an equal-weight approach may not be taking full advantage of the diversification opportunities presented.

Value L/SMomentum L/SCarry L/S
Value L/S1.0-0.2-0.1
Momentum L/S-0.21.00.6
Carry L/S-0.10.61.0

To avoid making any assumptions about the expected returns of the strategies, we will construct a portfolio where each strategy contributes equally to the overall risk profile (“ERC”).  So as to avoid look-ahead bias, we will use an expanding window to compute our covariance matrix (seeding with at least 5 years of data).  While the weights vary slightly over time, the result is a portfolio where the average weights are 43% value, 27% momentum, and 30% carry.

The ERC approach matches the equal-weight approach in annualized return, but reduces annualized volatility from 4.2% to 3.8%, thereby increasing the information ratio from 0.59 to 0.64.  The maximum drawdown also falls from 10.2% to 8.7%.

A second step we can take is to try to use the “collective intelligence” of the strategies to set our risk budget.  For example, we can have our portfolio target the long-term volatility of the 10-year Index Excess Return, but scale this target between 0-2x depending on how invested we are.

For example, if the strategies are, in aggregate, only 20% invested, then our target volatility would be 0.4x that of the long-term volatility.  If they are 100% invested, though, then we would target 2x the long-term volatility.  When the strategies are providing mixed signals, we will simply target the long-term volatility level.

Unfortunately, such an approach requires going beyond 100% notional exposure, often requiring 2x – if not 3x – leverage when current volatility is low.  That makes this system less useful in the context of “bond timing” since we are now placing a bet on current volatility remaining constant and saying that our long-term volatility is an appropriate target.

One way to limit the leverage is to increase how much we are willing to scale our risk target, but truncate our notional exposure at 100% per leg.  For example, we can scale our risk target between 0-4x.  This may seem very risky (indeed, an asymmetric bet), but since we are clamping our notional exposure to 100% per leg, we should recognize that we will only hit that risk level if current volatility is greater than 4x that of the long-term average and all the strategies recommend full investment.

With a little mental arithmetic, the approach it is equivalent to saying: “multiply the weights by 4x and then scale based on current volatility relative to historical volatility.”  By clamping weights between -100% and +100%, the volatility targeting really does not come into play until current volatility is 4x that of long-term volatility.  In effect, we leg into our trades more quickly, but de-risk when volatility spikes to abnormally high levels.

Source: Federal Reserve of St. Louis.  Philadelphia Federal Reserve.  Calculations by Newfound Research.  Results are hypothetical and backtested.  Past performance is not a guarantee of future results.  Returns are gross of all fees (including management fees, transaction costs, and taxes).  Returns assume the reinvestment of all income and distributions.

Compared to the buy-and-hold model, the variable risk ERC model increases annualized returns by 90bps (2.4% to 3.3%), reduces volatility by 260bps (7.6% to 5.0%), doubles the information ratio (0.31 to 0.66) and halves the maximum drawdown (30% to 15%).

There is no magic to the choice of “4” above: it is just an example.  In general, we can say that as the number goes higher, the strategy will approach a binary in-or-out system and the volatility scaling will have less and less impact.

Conclusion

Bond timing has been hard for the past 35 years as interest rates have declined. Small current coupons do not provide nearly the cushion against rate volatility that investors have been used to, and these lower rates mean that bonds are also exposed to higher duration.

These two factors are a potential double whammy when it comes to fixed income volatility.

This can open the door for systematic, factor-based bond investing.

Value, momentum, and carry strategies have all historically outperformed a buy-and-hold bond strategy on a risk adjusted basis despite the bond bull market.  Diversifying across these three strategies and employing prudent leverage takes advantage of differences in the processes and the information contained in their joint decisions.

We should point out that in the application of this approach, there were multiple periods of time in the backtest where the strategy went years without being substantially invested.  A smooth, nearly 40-year equity curve tells us very little about what it is actually like to sit on the sidelines during these periods and we should not underestimate the emotional burden of using such a timing strategy.

Even with low rates and high rate movement sensitivity, bonds can still play a key role within a portfolio. Going forward, however, it may be prudent for investors to consider complementary risk-management techniques within their bond sleeve.

 


 

[1] https://blog.thinknewfound.com/2017/06/duration-timing-style-premia/

[2] https://blog.thinknewfound.com/2017/04/declining-rates-actually-matter/

[3] Prior to the availability of the 10-year inflation estimate, the 1-year estimate is utilized; prior to the 1-year inflation estimate availability, the 1-year GDP price index estimate is utilized.

[4] This is not strictly true, as it largely depends on how the constant maturity indices are constructed.  For example, if they are rebalanced on a monthly basis, we would expect that re-valuation and roll would have impact on the 1-year index return.  We would also have to alter the horizon we are forecasting over as we are assuming we are rolling into new bonds (with different yields) more frequently.

Duration Timing with Style Premia

This post is available as a PDF download here.

Summary­­

  • In a rising rate environment, conventional wisdom says to shorten duration in bond portfolios.
  • Even as rates rise in general, the influence of central banks and expectations for inflation can create short term movements in the yield curve that can be exploited using systematic style premia.
  • Value, momentum, carry, and an explicit measure of the bond risk premium all produce strong absolute and risk-adjusted returns for timing duration.
  • Since these methods are reasonably diversified to each other, combining factors using either a mixed or integrated approach can mitigate short-term underperformance in any given factor leading to more robust duration timing.

In past research commentaries, we have demonstrated that the current level of interest rates is much more important than the future change in interest rates when it comes to long-term bond index returns[1].

That said, short-term changes in rates may present an opportunity for investors to enhance return or mitigate risk.  Specifically, by timing our duration exposure, we can try to increase duration during periods of falling rates and decrease duration during periods of rising rates.

In timing our duration exposure, we are effectively trying to time the bond risk premium (“BRP”).  The BRP is the expected extra return earned from holding longer-duration government bonds over shorter-term government bonds.

In theory, if investors are risk neutral, the return an investor receives from holding a current long-duration bond to maturity should be equivalent to the expected return of rolling 1-period bonds over the same horizon.  For example, if we buy a 10-year bond today, our return should be equal to the return we would expect from annually rolling 1-year bond positions over the next 10 years.

Risk averse investors will require a premium for the uncertainty associated with rolling over the short-term bonds at uncertain future interest rates.

In an effort to time the BRP, we explore the tried-and-true style premia: value, carry, and momentum.  We also seek to explicitly measure BRP and use it as a timing mechanism.

To test these methods, we will create long/short portfolios that trade a 10-year constant maturity U.S. Treasury index and a 3-month constant maturity U.S. Treasury index.  While we do not expect most investors to implement these strategies in a long/short fashion, a positive return in the strategy will imply successful duration timing.  Therefore, instead of implementing these strategies directly, we can use them to inform how much duration risk we should take (e.g. if a strategy is long a 10-year index and short a 3-month index, it implies a long-duration position and would inform us to extend duration risk within our long-only portfolio).  In evaluating these results as a potential overlay, the average profit, volatility, and Sharpe ratio can be thought of as alpha, tracking error, and information ratio, respectively.

As a general warning, we should be cognizant of the fact that we know long duration was the right trade to make over the last three decades.  As such, hindsight bias can play a big role in this sort of research, as we may be subtly biased towards approaches that are naturally long duration.  In effort to combat this effect, we will attempt to stick to standard academic measures of value, carry, and momentum within this space (see, for example, Ilmanen (1997)[2]).

Timing with Value

Following the standard approach in most academic literature, we will use “real yield” as our proxy of bond valuation.  To estimate real yield, we will use the current 10-year rate minus a survey-based estimate for 10-year inflation (from the Philadelphia Federal Reserve’s Survey of Professional Forecasters)[3].

If the real yield is positive (negative), we will go long (short) the 10-year and short (long) the 3-month.  We will hold the portfolio for 1 year (using 12 overlapping portfolios).

It is worth noting that the value model has been predominately long duration for the first 25 years of the sample period.  While real yield may make an appropriate cross-sectional value measure, it’s applicability as a time-series value measure is questionable given the lack of trades made by this strategy.

One potential solution is to perform a rolling z-score on the value measure, to determine relative richness versus some normalized local history.  In at least one paper, we have seen a long-term “normal” level established as an anchor point.  With the complete benefit of hindsight, however, we know that such an approach would ultimately load to a short-duration position over the last 15 years during the period of secular decline in real rates.

For example, Ilmanen and Sayood (2002)[4] compare real yield versus its previous-decade average when trading 7- to 10-year German Bunds.  Expectedly, the result is non-profitable.

Timing with Momentum

How to measure momentum within fixed income seems to be up for some debate.  Some measures include:

  • Change in bond yields (e.g. Ilmanen (1997))
  • Total return of individual bonds (e.g. Kolanovic and Wei (2015)[5] and Brooks and Moskowitz (2017)[6])
  • Total return of bond indices (or futures) (e.g. Asness, Moskowitz, and Pedersen (2013)[7], Durham (2013)[8], and Hurst, Ooi, Pedersen (2014)[9])

In our view, the approaches have varying trade-offs:

  • While empirical evidence suggests that nominal interest rates can exhibit secular trends, rate evolution is most frequently modeled as mean-reversionary. Our research suggests that very short-term momentum can be effective, but leads to a significant amount of turnover.
  • The total return of individual bonds makes sense if we plan on running a cross-sectional bond model (i.e. identifying individual bonds), but is less applicable if we want to implement with a constant maturity index.
  • The total return of a bond index may capture past returns that are attributable to securities that have been recently removed.

We think it is worth noting that the latter two methods can capture yield curve effects beyond shift, including roll return, steepening and curvature changes.  In fact, momentum in general may even be able to capture other effects such as flight-to-safety and liquidity (supply-demand) factors.  This may be a positive or negative thing depending on your view of where momentum is originating from.

As our intention is to ultimately invest using products that follow constant maturity indices, we choose to compare the total return of bond indices.

Specifically, we will compute the 12-1 month return of the 10-year index and subtract the 12-1 month return of the 3-month index.  If the return is positive (negative), we will go long (short) the 10-year and short (long) the 3-month.

 

Timing with Carry

We define the carry to be the term spread (or slope) of the yield curve, measured as the 5-year rate minus the 2-year rate.

A steeper curve has two implications.  First, if there is a premium for bearing duration risk, longer-dated bonds should offer a higher yield than shorter-dated bonds.  Hence, we would expect a steeper curve to be correlated with a higher BRP.

Second, all else held equal, a steeper curve implies a higher roll return for the constant maturity index.  So long as the spread is positive, we will remain invested in the longer duration bonds.

Similar to the value strategy, we can see that term-spread was largely positive over the entire period, favoring a long-duration position.  Again, this calls into question the efficacy of using term spread as a timing model since we didn’t see much timing.

Similar to value, we could employ a z-scoring method or compare the measure to a long-term average.  Ilmanen and Sayood (2002) find such an approach profitable in 7- to 10-year German Bunds.  We similarly find comparing current term-spread versus its 10-year average to be a profitable strategy, though annualized return falls by 200bp.  The increased number of trades, however, may give us more confidence in the sustainability of the model.

One complicating factor to the carry strategy is that rate steepness simultaneously captures both the expectation of rising short rates as well as an embedded risk premium.  In particular, evidence suggests that mean-reverting rate expectations dominate steepness when short rates are exceptionally low or high.  Anecdotally, this may be due to the fact that the front end of the curve is determined by central bank policy while the back end is determined by inflation expectations.  In Expected Returns, Antti Ilmanen highlights that the steepness of the yield curve and a de-trended short-rate have an astoundingly high correlation of -0.79.

While a steep curve may be a positive sign for the roll return that can be captured (and our carry strategy), it may simultaneously be a negative sign if flattening is expected (which would erode the roll return).  The fact that the term spread simultaneously captures both of these effects can lead to confusing interpretations.

We can see that, generally, term spread does a good job of predicting forward 12-month realized returns for our carry strategy, particularly post 2000.  However, having two sets of expectations embedded into a single measure can lead to potentially poor interpretations in the extreme.

 

 

Explicitly Estimating the Bond Risk Premium

While value, momentum, and carry strategies employ different measures that seek to exploit the time-varying nature of the BRP, we can also try to explicitly measure the BRP itself.  We mentioned in the introduction that the BRP is compensation that an investor demands to hold a long-dated bond instead of simply rolling short-dated bonds.

One way of approximating the BRP, then, is to subtract the expected average 1-year rate over the next decade from the current 10-year rate.

While the current 10-year rate is easy to find, the expected average 1-year rate over the next decade is a bit more complicated.  Fortunately, the Philadelphia Federal Reserve’s Survey of Professional Forecasters asks for that explicit data point.  Using this information, we can extract the BRP.

When the BRP is positive (negative) – implying that we expect to earn a positive (negative) return for bearing term risk –  we will go long (short) the 10-year index and short (long) the 3-month index.  We will hold the position for one year (using 12 overlapping portfolios).

Diversifying Style Premia

A benefit of implementing multiple timing strategies is that we have the potential to benefit from process diversification.  A simple correlation matrix shows us, for example, that the returns of the BRP model are well diversified against those of the Momentum and Carry models.

BRPMomentumValueCarry
BRP1.000.350.760.37
Momentum0.351.000.680.68
Value0.760.681.000.73
Carry0.370.680.731.00

One simple method of embracing this diversification is simply using a composite multi-factor approach: just dividing our capital among the four strategies equally.

We can also explore combining the strategies through an integrated method.  In the composite method, weights are averaged together, often resulting in allocations canceling out, leaving the strategy less than fully invested.  In the integrated method, weights are averaged together and then the direction of the implied trade is fully implemented (e.g. if the composite method says be 25% long the 10-year index and -25% short the 3-month index, the integrated method would go 100% long the 10-year and -100% short the 3-month). If the weights fully cancel out, the integrated portfolio remains unallocated.

We can see that while the integrated method significantly increases full-period returns (adding approximately 150bp per year), it does so with a commensurate amount of volatility, leading to nearly identical information ratios in the two approaches.

Did Timing Add Value?

In quantitative research, it pays to be skeptical of your own results.  A question worth asking ourselves is, “did timing actually add value or did we simply identify a process that happened to give us a good average allocation profile?”  In other words, is it possible we just data-mined our way to good average exposures?

For example, the momentum strategy had an average allocation that was 55% long the 10-year index and -55% short the 3-month index.  Knowing that long-duration was the right bet to make over the last 25 years, it is entirely possible that it was the average allocation that added the value: timing may actually be detrimental.

We can test for this by explicitly creating indices that represent the average long-term allocations.  Our timing models are labeled “Timing” while the average weight models are labeled “Strategic.”

CAGRVolatilitySharpe RatioMax Drawdown
BRP Strategic2.75%3.36%0.827.17%
BRP Timing3.89%5.48%0.7114.00%
Momentum Strategic3.54%4.32%0.829.09%
Momentum Timing3.62%7.20%0.5017.68%
Value Strategic4.37%5.38%0.8111.27%
Value Timing5.75%6.84%0.8415.17%
Carry Strategic4.71%5.80%0.8112.11%
Carry Timing5.47%6.97%0.7912.03%

While timing appears to add value from an absolute return perspective, in many cases it significantly increases volatility, reducing the resulting risk-adjusted return.

Attempting to rely on process diversification does not alleviate the issue either.

CAGRVolatilitySharpe RatioMax Drawdown
Composite Strategic3.78%4.63%0.829.71%
Composite Timing4.03%5.26%0.779.15%

 As a more explicit test, we can also construct a long/short portfolio that goes long the timing strategy and short the strategic strategy.  Statistically significant positive expectancy of this long/short would imply value added by timing above and beyond the average weights.

Unfortunately, in conducting such a test, we find that none of the timing models conclusively offer statistically significant benefits.

We want to be clear here that this does not mean timing did not add value.  Rather, in this instance, timing does not seem to add value beyond the average strategic weights the timing models harvested.

One explanation for this result is that there was largely one regime over our testing period where long-duration was the correct bet.  Therefore, there was little room for models to add value beyond just being net long duration – and in that sense, the models succeeded.  However, this predominately long-duration position created strategic benchmark bogeys that were harder to beat.  This test could really only show if the models detracted significantly from a long-duration benchmark.  Ideally, we need to test these models in other market environments (geographies or different historical periods) to further assess their efficacy. 

Robustness Testing: International Markets

We can try to allay our fears of overfitting by testing these methods on a different dataset.  For example, we can run the momentum, value, and carry strategies on German Bund yields and see if the models are still effective.

Due to data accessibility, instead of switching between 10-year and 3-month indices, we will use 10-year and 2-year indices.  We also slightly alter our strategy definitions:

  • Momentum: 12-1 month 10-year index return versus 12-1 month 2-year index return.
  • Value: 10-year yield minus trailing 1-year CPI change
  • Carry: 10-year yield minus 2-year yield

Given the regime concerns highlighted above, we will also test two other measures:

  • Value #2: Demeaned (using prior 10-year average) 10-year yield minus trailing 1-year CPI change
  • Carry #2: Demeaned (using prior 10-year average) 10-year yield minus 2-year yield

We can see similar results applying these methods with German rates as we saw with U.S. rates: momentum and both carry strategies remain successful while value fails when demeaned.

However, given that developed rates around the globe post-2008 were largely dominated by similar policies and factors, a healthy dose of skepticism is still well deserved.

Robustness Testing: Different Time Period

While success of these methods in an international market may bolster our confidence, it would be useful to test them during a period with very different interest rate and inflation evolutions.  If we are again willing to slightly alter our definitions, we can take our U.S. tests back to the 1960s – 1980s.

Instead of switching between 10-year and 3-month indices, we will use 10-year and 1-year indices.  Furthermore, we use the following methodology definitions:

  • Momentum: 12-1 month 10-year index return versus 12-1 month 1-year index return.
  • Value: 10-year yield minus trailing 1-year CPI change
  • Carry: 10-year yield minus 1-year yield
  • Value #2: Demeaned (using prior 10-year average) 10-year yield minus trailing 1-year CPI change
  • Carry #2: Demeaned (using prior 10-year average) 10-year yield minus 1-year yield

Over this period, all of the strategies exhibit statistically significant (95% confidence) positive annualized returns.[10]

That said, the value strategy suffers out of the gate, realizing a drawdown exceeding -25% during the 1960s through 6/1970, as 10-year rates climbed from 4% to nearly 8%.  Over that period, prior 1-year realized inflation climbed from less than 1% to over 5%.  With the nearly step-for-step increase in rates and inflation, the spread remained positive – and hence the strategy remained long duration.  Without a better estimate of expected inflation (e.g. 5-year, 5-year forward inflation expectations, TIPs, or survey estimates)[11], value may be a failed methodology.

On the other hand, there is nothing that says that inflation expectations would have necessarily been more accurate in forecasting actual inflation.  It is entirely plausible that future inflation was an unexpected surprise, and a more accurate model of inflation expectations would have kept real-yield elevated over the period.

Again, we find the power in diversification.  While value had a loss of approximately -25% during the initial hikes, momentum was up 12% and carry was flat.  Diversifying across all three methods would leave an investor with a loss of approximately -4.3%: certainly not a confidence builder for a decade of (mis-)timing decisions, but not catastrophic from a portfolio perspective.[12]

Conclusion

With fear of rising rates high, shortening bond during might be a gut reaction.  However, even as rates rise in general, the influence of central banks and expectations for inflation can create short term movements in the yield curve that can potentially be exploited using style premia.

We find that value, momentum, carry, and an explicit measure of the bond risk premium all produce strong absolute and risk-adjusted returns for timing duration. The academic and empirical evidence of these factors in a variety of asset classes gives us confidence that there are behavioral reasons to expect that style premia will persist over long enough periods. Combining multiple factors into a portfolio can harness the benefits of diversification and smooth out the short-term fluctuations that can lead to emotion-driven decisions.

Our in-sample testing period, however, leaves much to be desired.  Dominated largely by a single regime that benefited long-duration trades, all of the timing models harvested average weights that were net-long duration.  Our research shows that the timing models did not add any statistically meaningful value above-and-beyond these average weights.  Caveat emptor: without further testing in different geographies or interest rate regimes – and despite our best efforts to use simple, industry-standard models – these results may be the result of data mining.

As a robustness test, we run value, momentum, and carry strategies for German Bund yields and over the period of the 1960s-1980s within the United States.  While we continue to see success to momentum and carry, we find that the value method may prove to be too blunt an instrument for timing (or we may simply need a better measure as our anchor for value).

Nevertheless, we believe that utilizing systematic, factor-based methods for making duration changes in a portfolio can be a way to adapt to the market environment and manage risk without relying solely on our own judgements or those we hear in the media.

As inspiration for future research, Brooks and Moskowitz (2017)[13] recently demonstrated that style premia – i.e. momentum, value, and carry strategies – provide a better description of bond risk premia than traditional model factors.  Interestingly, they find that not only are momentum, value, and carry predictive when applied to the level of the yield curve, but also when applied to slope and curvature positions.  While this research focuses on the cross-section of government bond returns across 13 countries, there may be important implications for timing models as well.


[1] https://blog.thinknewfound.com/2017/04/declining-rates-actually-matter/

[2] https://www.aqr.com/library/journal-articles/forecasting-us-bond-returns

[3] https://www.philadelphiafed.org/research-and-data/real-time-center/survey-of-professional-forecasters

[4] https://www.aqr.com/library/journal-articles/quantitative-forecasting-models-and-active-diversification-for-international-bonds

[5] http://www.cmegroup.com/education/files/jpm-momentum-strategies-2015-04-15-1681565.pdf

[6] https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2956411

[7] https://www.aqr.com/library/journal-articles/value-and-momentum-everywhere

[8] https://www.newyorkfed.org/medialibrary/media/research/staff_reports/sr657.pdf

[9] https://www.aqr.com/library/aqr-publications/a-century-of-evidence-on-trend-following-investing

[10] While not done here, these strategies should be further tested against their average allocations as well.

[11] It is worth noting that The Cleveland Federal Reserve does offers model-based inflation expectations going back to 1982 (https://www.clevelandfed.org/our-research/indicators-and-data/inflation-expectations.aspx) and The New York Federal Reserve also offers model-based inflation expectations going back to the 1970s (http://libertystreeteconomics.newyorkfed.org/2013/08/creating-a-history-of-us-inflation-expectations.html).

[12] Though certainly a long enough period to get a manager fired.

[13] https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2956411

Navigating Municipal Bonds With Factors

This post is available as a PDF download here.

Summary

  • In this case study, we explore building a simple, low cost, systematic municipal bond portfolio.
  • The portfolio is built using the low volatility, momentum, value, and carry factors across a set of six municipal bond sectors. It favors sectors with lower volatility, better recent performance, cheaper valuations, and higher yields.  As with other factor studies, a multi-factor approach is able to harvest major benefits from active strategy diversification since the factors have low correlations to one another.
  • The factor tilts lead to over- and underweights to both credit and duration through time. Currently, the portfolio is significantly underweight duration and modestly overweight credit.
  • A portfolio formed with the low volatility, value, and carry factors has sufficiently low turnover that these factors may have value in setting strategic allocations across municipal bond sectors.

 

Recently, we’ve been working on building a simple, ETF-based municipal bond strategy.  Probably to the surprise of nobody who regularly reads our research, we are coming at the problem from a systematic, multi-factor perspective.

For this exercise, our universe consists of six municipal bond indices:

  • Bloomberg Barclays AMT-Free Short Continuous Municipal Index
  • Bloomberg Barclays AMT-Free Intermediate Continuous Municipal Index
  • Bloomberg Barclays AMT-Free Long Continuous Municipal Index
  • Bloomberg Barclays Municipal Pre-Refunded-Treasury-Escrowed Index
  • Bloomberg Barclays Municipal Custom High Yield Composite Index
  • Bloomberg Barclays Municipal High Yield Short Duration Index

These indices, all of which are tracked by VanEck Vectors ETFs, offer access to municipal bonds across a range of durations and credit qualities.

Source: VanEck

Before we get started, why are we writing another multi-factor piece after addressing factors in the context of a multi-asset universe just two weeks ago?

The simple answer is that we find the topic to be that pressing for today’s investors.  In a world of depressed expected returns and elevated correlations, we believe that factor-based strategies have a role as both return generators and risk mitigators.

Our confidence in what we view as the premier factors (value, momentum, low volatility, carry, and trend) stems largely from their robustness in out-of-sample tests across asset classes, geographies, and timeframes.  The results in this case study not only suggest that a factor-based approach is feasible in muni investing, but also in our opinion strengthens the case for factor investing in other contexts (e.g. equities, taxable fixed income, commodities, currencies, etc.).

Constructing Long/Short Factor Portfolios

For the municipal bond portfolio, we consider four factors:

  1. Value: Buy undervalued sectors, sell overvalued sectors
  2. Momentum: Buy strong recent performers, sell weak recent performers
  3. Low Volatility: Buy low risk sectors, sell high risk sectors
  4. Carry: Buy higher yielding sectors, sell lower yielding sectors

As a first step, we construct long/short single factor portfolios.  The weight on index i at time t in long/short factor portfolio f is equal to:

In this formula, c is a scaling coefficient,  S is index i’s time t score on factor f, and N is the number of indices in the universe at time t.

We measure each factor with the following metrics:

  1. Value: Normalized deviation of real yield from the 5-year trailing average yield[1]
  2. Momentum: Trailing twelve month return
  3. Low Volatility: Historical standard deviation of monthly returns[2]
  4. Carry: Yield-to-worst

For the value, momentum, and carry factors, the scaling coefficient  is set so that the portfolio is dollar neutral (i.e. we are long and short the same dollar amount of securities).  For the low volatility factor, the scaling coefficient is set so that the volatilities of the long and short portfolios are approximately equal.  This is necessary since a dollar neutral construction would be perpetually short “beta” to the overall municipal bond market.

Data Source: Blooomberg. Calculations by Newfound Research. All returns are hypothetical and backtested. Returns reflect the reinvestment of all distributions and are gross of all fees (including any management fees and transaction costs). The hypothetical indices start on June 30, 1998. The start date was chosen based on data availability of the underlying indices and the time necessary to calibrate the factor models. Data is through April 30, 2017. The portfolios are reconstituted monthly. Past performance does not guarantee future results.

All four factors are profitable over the period from June 1998 to April 2017.  The value factor is the top performer both from an absolute return and risk-adjusted return perspective.

Data Source: Blooomberg. Calculations by Newfound Research. All returns are hypothetical and backtested. Returns reflect the reinvestment of all distributions and are gross of all fees (including any management fees and transaction costs). The hypothetical indices start on June 30, 1998. The start date was chosen based on data availability of the underlying indices and the time necessary to calibrate the factor models. Data is through April 30, 2017. The portfolios are reconstituted monthly. Past performance does not guarantee future results. The benchmark is an equal-weight portfolio of all indices in the universe adjusted for the indices that are calibrated and included in each long/short factor index based on data availability.

 

There is significant variation in performance over time.  All four factors have years where they are the best performing factor and years where they are the worst performing factor.  The average annual spread between the best performing factor and the worst performing factor is 11.3%.

Data Source: Blooomberg. Calculations by Newfound Research. All returns are hypothetical and backtested. Returns reflect the reinvestment of all distributions and are gross of all fees (including any management fees and transaction costs). The hypothetical indices start on June 30, 1998. The start date was chosen based on data availability of the underlying indices and the time necessary to calibrate the factor models. Data is through April 30, 2017. The portfolios are reconstituted monthly. Past performance does not guarantee future results. The benchmark is an equal-weight portfolio of all indices in the universe adjusted for the indices that are calibrated and included in each long/short factor index based on data availability. 1998 is a partial year beginning in June 1998 and 2017 is a partial year ending in April 2017.

 

The individual long/short factor portfolios are diversified to both each other (average pairwise correlation of -0.11) and to the broad municipal bond market.

Data Source: Blooomberg. Calculations by Newfound Research. All returns are hypothetical and backtested. Returns reflect the reinvestment of all distributions and are gross of all fees (including any management fees and transaction costs). The hypothetical indices start on June 30, 1998. The start date was chosen based on data availability of the underlying indices and the time necessary to calibrate the factor models. Data is through April 30, 2017. The portfolios are reconstituted monthly. Past performance does not guarantee future results.

 

Data Source: Blooomberg. Calculations by Newfound Research. All returns are hypothetical and backtested. Returns reflect the reinvestment of all distributions and are gross of all fees (including any management fees and transaction costs). The hypothetical indices start on June 30, 1998. The start date was chosen based on data availability of the underlying indices and the time necessary to calibrate the factor models. Data is through April 30, 2017. The portfolios are reconstituted monthly. Past performance does not guarantee future results.

Moving From Single Factor to Multi-Factor Portfolios

The diversified nature of the long/short return streams makes a multi-factor approach hard to beat in terms of risk-adjusted returns.  This is another example of the type of strategy diversification that we have long lobbied for.

As evidence of these benefits, we have built two versions of a portfolio combining the low volatility, value, carry, and momentum factors.  The first version targets an equal dollar allocation to each factor.  The second version uses a naïve risk parity approach to target an approximately equal risk contribution from each factor.

Both approaches outperform all four individual factors on a risk-adjusted basis, delivering Sharpe Ratios of 1.19 and 1.23, respectively, compared to 0.96 for the top single factor (value).

Data Source: Bloomberg. Calculations by Newfound Research. All returns are hypothetical and backtested. Returns reflect the reinvestment of all distributions and are gross of all fees (including any management fees and transaction costs). The hypothetical indices start on June 30, 1998. The start date was chosen based on data availability of the underlying indices and the time necessary to calibrate the factor models. Data is through April 30, 2017. The portfolios are reconstituted monthly. Past performance does not guarantee future results. The benchmark is an equal-weight portfolio of all indices in the universe adjusted for the indices that are calibrated and included in each long/short factor index based on data availability. The factor risk parity construction uses a simple inverse volatility methodology. Volatility estimates are shrunk in the early periods when less data is available.

 

To stress this point, diversification is so plentiful across the factors that even the simplest portfolio construction methodologies outperforms an investor who was able to identify the best performing factor with perfect foresight.  For additional context, we constructed a “Look Ahead Mean-Variance Optimization (“MVO”) Portfolio” by calculating the Sharpe optimal weights using actual realized returns, volatilities, and correlations.  The Look Ahead MVO Portfolio has a Sharpe Ratio of 1.43, not too far ahead of our two multi-factor portfolios.  The approximate weights in the Look Ahead MVO Portfolio are 49% to Low Volatility, 25% to Value, 15% to Carry, and 10% to Momentum.  While the higher Sharpe Ratio factors (Low Volatility and Value) do get larger allocations, Momentum and Carry are still well represented due to their diversification benefits.

Data Source: Bloomberg. Calculations by Newfound Research. All returns are hypothetical and backtested. Returns reflect the reinvestment of all distributions and are gross of all fees (including any management fees and transaction costs). The hypothetical indices start on June 30, 1998. The start date was chosen based on data availability of the underlying indices and the time necessary to calibrate the factor models. Data is through April 30, 2017. The portfolios are reconstituted monthly. Past performance does not guarantee future results. The benchmark is an equal-weight portfolio of all indices in the universe adjusted for the indices that are calibrated and included in each long/short factor index based on data availability. The factor risk parity construction uses a simple inverse volatility methodology. Volatility estimates are shrunk in the early periods when less data is available.

 

From a risk perspective, both multi-factor portfolios have lower volatility than any of the individual factors and a maximum drawdown that is within 1% of the individual factor with the least amount of historical downside risk.  It’s also worth pointing out that the risk parity construction leads to a return stream that is very close to normally distributed (skew of 0.1 and kurtosis of 3.0).

Data Source: Blooomberg. Calculations by Newfound Research. All returns are hypothetical and backtested. Returns reflect the reinvestment of all distributions and are gross of all fees (including any management fees and transaction costs). The hypothetical indices start on June 30, 1998. The start date was chosen based on data availability of the underlying indices and the time necessary to calibrate the factor models. Data is through April 30, 2017. The portfolios are reconstituted monthly. Past performance does not guarantee future results. The benchmark is an equal-weight portfolio of all indices in the universe adjusted for the indices that are calibrated and included in each long/short factor index based on data availability. The factor risk parity construction uses a simple inverse volatility methodology. Volatility estimates are shrunk in the early periods when less data is available.

 

In the graph on the next page, we present another lens through which we can view the tremendous amount of diversification that can be harvested between factors.  Here we plot how the allocation to a specific factor, using MVO, will change as we vary that factor’s Sharpe Ratio.  We perform this analysis for each factor individually, holding all other parameters fixed at their historical levels.

As an example, to estimate the allocation to the Low Volatility factor at a Sharpe Ratio of 0.1, we:

  1. Assume the covariance matrix is equal to the historical covariance over the full sample period.
  2. Assume the excess returns for the other three factors (Carry, Momentum, and Value) are equal to their historical averages.
  3. Assume the annualized excess return for the Low Volatility factor is 0.16% so that the Sharpe Ratio is equal to our target of 0.1 (Low Volatility’s annualized volatility is 1.6%).
  4. Calculate the MVO optimal weights using these excess return and risk assumptions.

Data Source: Blooomberg. Calculations by Newfound Research. All returns are hypothetical and backtested. Returns reflect the reinvestment of all distributions and are gross of all fees (including any management fees and transaction costs). The hypothetical indices start on June 30, 1998. The start date was chosen based on data availability of the underlying indices and the time necessary to calibrate the factor models. Data is through April 30, 2017. The portfolios are reconstituted monthly. Past performance does not guarantee future results. The benchmark is an equal-weight portfolio of all indices in the universe adjusted for the indices that are calibrated and included in each long/short factor index based on data availability. The factor risk parity construction uses a simple inverse volatility methodology. Volatility estimates are shrunk in the early periods when less data is available.

 

As expected, Sharpe Ratios and allocation sizes are positively correlated.  Higher Sharpe Ratios lead to higher allocations.

That being said, three of the factors (Low Volatility, Carry, and Momentum) would receive allocations even if their Sharpe Ratios were slightly negative.

The allocations to carry and momentum are particularly insensitive to Sharpe Ratio level.  Momentum would receive an allocation of 4% with a 0.00 Sharpe, 9% with a 0.25 Sharpe, 13% with a 0.50 Sharpe, 17% with a 0.75 Sharpe, and 20% with a 1.00 Sharpe.  For the same Sharpe Ratios, the allocations to Carry would be 10%, 15%, 19%, 22%, and 24%, respectively.

Holding these factors provides a strong ballast within the multi-factor portfolio.

Moving From Long/Short to Long Only

Most investors have neither the space in their portfolio for a long/short muni strategy nor sufficient access to enough affordable leverage to get the strategy to an attractive level of volatility (and hence return).  A more realistic approach would be to layer our factor bets on top of a long only strategic allocation to muni bonds.

In a perfect world, we could slap one of our multi-factor long/short portfolios right on top of a strategic municipal bond portfolio.  The results of this approach (labeled “Benchmark + Equal Weight Factor Long/Short” in the graphics below) are impressive (Sharpe Ratio of 1.17 vs. 0.93 for the strategic benchmark and return to maximum drawdown of 0.72 vs. 0.46 for the strategic benchmark).  Unfortunately, this approach still requires just a bit of shorting. The size of the total short ranges from 0% to 19% with an average of 5%.

We can create a true long only portfolio (“Long Only Factor”) by removing all shorts and normalizing so that our weights sum to one.  Doing so modestly reduces risk, return, and risk-adjusted return, but still leads to outperformance vs. the benchmark.

Data Source: Blooomberg. Calculations by Newfound Research. All returns are hypothetical and backtested. Returns reflect the reinvestment of all distributions and are gross of all fees (including any management fees and transaction costs). The hypothetical indices start on June 30, 1998. The start date was chosen based on data availability of the underlying indices and the time necessary to calibrate the factor models. Data is through April 30, 2017. The portfolios are reconstituted monthly. Past performance does not guarantee future results. The benchmark is an equal-weight portfolio of all indices in the universe adjusted for the indices that are calibrated and included in each long/short factor index based on data availability.

 

Data Source: Blooomberg. Calculations by Newfound Research. All returns are hypothetical and backtested. Returns reflect the reinvestment of all distributions and are gross of all fees (including any management fees and transaction costs). The hypothetical indices start on June 30, 1998. The start date was chosen based on data availability of the underlying indices and the time necessary to calibrate the factor models. Data is through April 30, 2017. The portfolios are reconstituted monthly. Past performance does not guarantee future results. The benchmark is an equal-weight portfolio of all indices in the universe adjusted for the indices that are calibrated and included in each long/short factor index based on data availability.

 

Below we plot both the historical and current allocations for the long only factor portfolio.  Currently, the portfolio would have approximately 25% in each short-term investment grade, pre-refunded, and short-term high yield with the remaining 25% split roughly 80/20 between high yield and intermediate-term investment grade. There is currently no allocation to long-term investment grade.

Data Source: Blooomberg. Calculations by Newfound Research. All allocations are backtested and hypothetical. The hypothetical indices start on June 30, 1998. The start date was chosen based on data availability of the underlying indices and the time necessary to calibrate the factor models. Data is through April 30, 2017. The portfolios are reconstituted monthly. Past performance does not guarantee future results.

 

Data Source: Blooomberg. Calculations by Newfound Research. All allocations are backtested and hypothetical. The hypothetical indices start on June 30, 1998. The start date was chosen based on data availability of the underlying indices and the time necessary to calibrate the factor models. Data is through April 30, 2017. The portfolios are reconstituted monthly. Past performance does not guarantee future results.

 

A few interesting observations relating to the long only portfolio and muni factor investing in general:

  1. The factor tilts lead to clear duration and credit bets over time.  Below we plot the duration and a composite credit score for the factor portfolio vs. the benchmark over time.

    Data source: Calculations by Newfound Research. All allocations are backtested and hypothetical. The hypothetical indices start on June 30, 1998. The start date was chosen based on data availability of the underlying indices and the time necessary to calibrate the factor models. Data is through April 30, 2017. The portfolios are reconstituted monthly. Past performance does not guarantee future results. Weighted average durations are estimated using current constituent durations.

    Data source: Calculations by Newfound Research. All allocations are backtested and hypothetical. The hypothetical indices start on June 30, 1998. The start date was chosen based on data availability of the underlying indices and the time necessary to calibrate the factor models. Data is through April 30, 2017. The portfolios are reconstituted monthly. Past performance does not guarantee future results. Weighted average credit scores are estimated using current constituent credit scores. Credit scores use S&P’s methodology to aggregate scores based on the distribution of credit scores of individual bonds.

    Currently, the portfolio is near an all-time low in terms of duration and is slightly titled towards lower credit quality sectors relative to the benchmark.  Historically, the factor portfolio was most often overweight both duration and credit, having this positioning in 53.7% of the months in the sample.  The second and third most common tilts were underweight duration / underweight credit (22.0% of sample months) and underweight duration / overweight credit (21.6% of sample months).  The portfolio was overweight duration / underweight credit in only 2.6% of sample months.

  2. Even for more passive investors, a factor-based perspective can be valuable in setting strategic allocations.  The long only portfolio discussed above has annualized turnover of 77%.  If we remove the momentum factor, which is by far the biggest driver of turnover, and restrict ourselves to a quarterly rebalance, we can reduce turnover to just 18%.  This does come at a cost, as the Sharpe Ratio drops from 1.12 to 1.04, but historical performance would still be strong relative to our benchmark. This suggests that carry, value, and low volatility may be valuable in setting strategic allocations across municipal bond ETFs with only periodic updates at a normal strategic rebalance frequency.
  3. We ran regressions with our long/short factors on all funds in the Morningstar Municipal National Intermediate category with a track record that extended over our full sample period from June 1998 to April 2017.  Below, we plot the betas of each fund to each of our four long/short factors.  Blue bars indicate that the factor beta was significant at a 5% level.  Gray bars indicate that the factor beta was not significant at a 5% level.  We find little evidence of the active managers following a factor approach similar to what we outline in this post.  Part of this is certainly the result of the constrained nature of the category with respect to duration and credit quality.  In addition, these results do not speak to whether any of the managers use a factor-based approach to pick individual bonds within their defined duration and credit quality mandates.

    Data source: Calculations by Newfound Research. Analysis over the period from June 1998 to April 2017.

    The average beta to the low volatility factor, ignoring non-statistically significant values, is -0.23.  This is most likely a function of category since the category consists of funds with both investment grade credit quality and durations ranging between 4.5 and 7.0 years.  In contrast, our low volatility factor on average has short exposure to the intermediate and long-term investment grade sectors.

    Data source: Calculations by Newfound Research. Analysis over the period from June 1998 to April 2017.

    Only 14 of the 33 funds in the universe have statistically significant exposure to the value factor with an average beta of -0.03.

    Data source: Calculations by Newfound Research. Analysis over the period from June 1998 to April 2017.

    The average beta to the carry factor, ignoring non-statistically significant values, is -0.23.  As described above with respect to low volatility, this is most likely function of category as our carry factor favors the long-term investment grade and high yield sectors.

    Data source: Calculations by Newfound Research. Analysis over the period from June 1998 to April 2017.

    Only 9 of the 33 funds in the universe have statistically significant exposure to the momentum factor with an average beta of 0.02.

Conclusion

Multi-factor investing has generated significant press in the equity space due to the (poorly named) “smart beta” movement.  The popular factors in the equity space have historically performed well both within other asset classes (rates, commodities, currencies, etc.) and across asset classes.  The municipal bond market is no different.  A simple, systematic multi-factor process has the potential to improve risk-adjusted performance relative to static benchmarks.  The portfolio can be implemented with liquid, low cost ETFs.

Moving beyond active strategies, factors can also be valuable tools when setting strategic sector allocations within a municipal bond sleeve and when evaluating and blending municipal bond managers.

Perhaps more importantly, the out-of-sample evidence for the premier factors (momentum, value, low volatility, carry, and trend) across asset classes, geographies, and timeframes continues to mount.  In our view, this evidence can be crucial in getting investors comfortable to introducing systematic active premia into their portfolios as both return generators and risk mitigators.

 

[1] Computed using yield-to-worst.  Inflation estimates are based on 1-year and 10-year survey-based expected inflation.  We average the value score over the last 2.5 years, allowing the portfolio to realize a greater degree of valuation mean reversion before closing out a position.

[2] We use a rolling 5-year (60-month) window to calculate standard deviation.  We require at least 3 years of data for an index to be included in the low volatility portfolio.  The standard deviation is multiplied by -1 so that higher values are better across all four factor scores.

 

 

Page 3 of 4

Powered by WordPress & Theme by Anders Norén