The Research Library of Newfound Research

Month: July 2019

Timing Luck and Systematic Value

This post is available as a PDF download here.

Summary­

  • We have shown many times that timing luck – when a portfolio chooses to rebalance – can have a large impact on the performance of tactical strategies.
  • However, fundamental strategies like value portfolios are susceptible to timing luck, as well.
  • Once the rebalance frequency of a strategy is set, we can mitigate the risk of choosing a poor rebalance date by diversifying across all potential variations.
  • In many cases, this mitigates the risk of realizing poor performance from an unfortunate choice of rebalance date while achieving a risk profile similar to the top tier of potential strategy variations.
  • By utilizing strategies that manage timing luck, the investors can more accurately assess performance differences arising from luck and skill.

On August 7th, 2013 we wrote a short blog post titled The Luck of Rebalance Timing.  That means we have been prattling on about the impact of timing luck for over six years now (with apologies to our compliance department…).

(For those still unfamiliar with the idea of timing luck, we will point you to a recent publication from Spring Valley Asset Management that provides a very approachable introduction to the topic.1)

While most of our earliest studies related to the impact of timing luck in tactical strategies, over time we realized that timing luck could have a profound impact on just about any strategy that rebalances on a fixed frequency.  We found that even a simple fixed-mix allocation of stocks and bonds could see annual performance spreads exceeding 700bp due only to the choice of when they rebalanced in a given year.

In seeking to generalize the concept, we derived a formula that would estimate how much timing luck a strategy might have.  The details of the derivation can be found in our paper recently published in the Journal of Index Investing, but the basic formula is:

Here is strategy turnover, is how many times per year the strategy rebalances, and S is the volatility of a long/short portfolio capturing the difference between what the strategy is currently invested in versus what it could be invested in.

We’re biased, but we think the intuition here works out fairly nicely:

  • The higher a strategy’s turnover, the greater the impact of our choice of rebalance dates. For example, if we have a value strategy that has 50% turnover per year, an implementation that rebalances in January versus one that rebalances in July might end up holding very different securities.  On the other hand, if the strategy has just 1% turnover per year, we don’t expect the differences in holdings to be very large and therefore timing luck impact would be minimal.
  • The more frequently we rebalance, the lower the timing luck. Again, this makes sense as more frequent rebalancing limits the potential difference in holdings of different implementation dates.  Again, consider a value strategy with 50% turnover.  If our portfolio rebalances every other month, there are two potential implementations: one that rebalances January, March, May, etc. and one that rebalances February, April, June, etc. We would expect the difference in portfolio holdings to be much more limited than in the case where we rebalance only annually.2
  • The last term, S, is most easily explained with an example. If we have a portfolio that can hold either the Russell 1000 or the S&P 500, we do not expect there to be a large amount of performance dispersion regardless of when we rebalance or how frequently we do so.  The volatility of a portfolio that is long the Russell 1000 and short the S&P 500 is so small, it drives timing luck near zero.  On the other hand, if a portfolio can hold the Russell 1000 or be short the S&P 500, differences in holdings due to different rebalance dates can lead to massive performance dispersion. Generally speaking, S is larger for more highly concentrated strategies with large performance dispersion in their investable universe.

Timing Luck in Smart Beta

To date, we have not meaningfully tested timing luck in the realm of systematic equity strategies.3  In this commentary, we aim to provide a concrete example of the potential impact.

A few weeks ago, however, we introduced our Systematic Value portfolio, which seeks to deliver concentrated exposure to the value style while avoiding unintended process and timing luck bets.

To achieve this, we implement an overlapping portfolio process.  Each month we construct a concentrated deep value portfolio, selecting just 50 stocks from the S&P 500.  However, because we believe the evidence suggests that value is a slow-moving signal, we aim for a holding period between 3-to-5 years.  To achieve this, our capital is divided across the prior 60 months of portfolios.4

Which all means that we have monthly snapshots of deep value5 portfolios going back to November 2012, providing us data to construct all sorts of rebalance variations.

The Luck of Annual Rebalancing

Given our portfolio snapshots, we will create annually rebalanced portfolios.  With monthly portfolios, there are twelve variations we can construct: a portfolio that reconstitutes each January; one that reconstitutes each February; a portfolio that reconstitutes each March; et cetera.

Below we plot the equity curves for these twelve variations.

Source: CSI Analytics.  Calculations by Newfound Research.  Results are hypothetical.  Results assume the reinvestment of all distributions.   Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes.  Past performance is not an indicator of future results.  

We cannot stress enough that these portfolios are all implemented using a completely identical process.  The only difference is when they run that process.  The annualized returns range from 9.6% to 12.2%.  And those two portfolios with the largest disparity rebalanced just a month apart: January and February.

To avoid timing luck, we want to diversify when we rebalance.  The simplest way of achieving this goal is through overlapping portfolios.  For example, we can build portfolios that rebalance annually, but allocate to two different dates.  One portfolio could place 50% of its capital in the January rebalance index and 50% in the July rebalance index.

Another variation could place 50% of its capital in the February index and 50% in the August index.6  There are six possible variations, which we plot below.

The best performing variation (January and July) returned 11.7% annualized, while the worst (February and August) returned 9.7%.  While the spread has narrowed, it would be dangerous to confuse 200bp annualized for alpha instead of rebalancing luck.

Source: CSI Analytics.  Calculations by Newfound Research.  Results are hypothetical.  Results assume the reinvestment of all distributions.   Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes.  Past performance is not an indicator of future results.  

We can go beyond just two overlapping portfolios, though.  Below we plot the three variations that contain four overlapping portfolios (January-April-July-October, February-May-August-November, and March-June-September-December).  The best variation now returns 10.9% annualized while the worst returns 10.1% annualized.  We can see how overlapping portfolios are shrinking the variation in returns.

Finally, we can plot the variation that employs 12 overlapping portfolios.  This variation returns 10.6% annualized; almost perfectly in line with the average annualized return of the underlying 12 variations.  No surprise: diversification has neutralized timing luck.

Source: CSI Analytics.  Calculations by Newfound Research.  Results are hypothetical.  Results assume the reinvestment of all distributions.   Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes.  Past performance is not an indicator of future results.  

Source: CSI Analytics.  Calculations by Newfound Research.  Results are hypothetical.  Results assume the reinvestment of all distributions.   Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes.  Past performance is not an indicator of future results.  

But besides being “average by design,” how can we measure the benefits of diversification?

As with most ensemble approaches, we see a reduction in realized risk metrics.  For example, below we plot the maximum realized drawdown for annual variations, semi-annual variationsquarterly variations, and the monthly variation.  While the dispersion is limited to just a few hundred basis points, we can see that the diversification embedded in the monthly variation is able to reduce the bad luck of choosing an unfortunate rebalance date.

Source: CSI Analytics.  Calculations by Newfound Research.  Results are hypothetical.  Results assume the reinvestment of all distributions.   Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes.  Past performance is not an indicator of future results.  

Just Rebalance more Frequently?

One of the major levers in the timing luck equation is how frequently the portfolio is rebalanced.  However, we firmly believe that while rebalancing frequency impacts timing luck, timing luck should not be a driving factor in our choice of rebalance frequency.

Rather, rebalance frequency choices should be a function of the speed at which our signal decays (e.g. fast-changing signals such as momentum versus slow-changing signals like value) versus implementation costs (e.g. explicit trading costs, market impact, and taxes).  Only after this choice is made should we seek to limit timing luck.

Nevertheless, we can ask the question, “how does rebalancing more frequently impact timing luck in this case?”

To answer this question, we will evaluate quarterly-rebalanced portfolios.  The distinction here from the quarterly overlapping portfolios above is that the entire portfolio is rebalanced each quarter rather than only a quarter of the portfolio.  Below, we plot the equity curves for the three possible variations.

Source: CSI Analytics.  Calculations by Newfound Research.  Results are hypothetical.  Results assume the reinvestment of all distributions.   Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes.  Past performance is not an indicator of future results.  

The best performing variation returns 11.7% annualized while the worst returns 9.7% annualized, for a spread of 200 basis points.  This is actually larger than the spread we saw with the three quarterly overlapping portfolio variations, and likely due to the fact that turnover within the portfolios increased meaningfully.

While we can see that increasing the frequency of rebalancing can help, in our opinion the choice of rebalance frequency should be distinct from the choice of managing timing luck.

Conclusion

In our opinion, there are at least two meaningful conclusions here:

The first is for product manufacturers (e.g. index issuers) and is rather simple: if you’re going to have a fixed rebalance schedule, please implement overlapping portfolios.  It isn’t hard.  It is literally just averaging.  We’re all better off for it.

The second is for product users: realize that performance dispersion between similarly-described systematic strategies can be heavily influenced by when they rebalance. The excess return may really just be a phantom of luck, not skill.

The solution to this problem, in our opinion, is to either: (1) pick an approach and just stick to it regardless of perceived dispersion, accepting the impact of timing luck; (2) hold multiple approaches that rebalance on different days; or (3) implement an approach that accounts for timing luck.

We believe the first approach is easier said than done.  And without a framework for distinguishing between timing luck and alpha, we’re largely making arbitrary choices.

The second approach is certainly feasible but has the potential downside of requiring more holdings as well as potentially forcing an investor to purchase an approach they are less comfortable with.   For example, blending IWD (Russell 1000 Value), RPV (S&P  500 Pure Value), VLUE (MSCI U.S. Enhanced Value), and QVAL (Alpha Architect U.S. Quantitative Value) may create a portfolio that rebalances on many different dates (annual in May; annual in December; semi-annual in May and November; and quarterly, respectively), it also introduces significant process differences.  Though research suggests that investors may benefit from further manager/process diversification.

For investors with conviction in a single strategy implementation, the last approach is certainly the best.  Unfortunately, as far as we are aware, there are only a few firms who actively implement overlapping portfolios (including Newfound Research, O’Shaughnessy Asset Management, AQR, and Research Affiliates). Until more firms adopt this approach, timing luck will continue to loom large.

 


 

Ensemble Multi-Asset Momentum

This post is available as a PDF download here.

Summary­

  • We explore a representative multi-asset momentum model that is similar to many bank-based indexes behind structured products and market-linked CDs.
  • With a monthly rebalance cycle, we find substantial timing luck risk.
  • Using the same basic framework, we build a simple ensemble approach, diversifying both process and rebalance timing risk.
  • We find that the virtual strategy-of-strategies is able to harvest diversification benefits, realizing a top-quartile Sharpe ratio with a bottom-quartile maximum drawdown.

Early in the 2010s, a suite of index-linked products came to market that raised billions of dollars.  These products – offered by just about every major bank – sought to simultaneously exploit the diversification benefits of modern portfolio theory and the potential for excess returns from the momentum anomaly.

While each index has its own bells and whistles, they generally follow the same approach:

  • A global, multi-asset universe covering equities, fixed income, and commodities.
  • Implemented using highly liquid ETFs.
  • Asset class and position-level allocation limits.
  • A monthly rebalance schedule.
  • A portfolio optimization that seeks to maximize weighted prior returns (e.g. prior 6 month returns) while limiting portfolio volatility to some maximum threshold (e.g. 5%).

And despite their differences, we can see in plotting their returns below that these indices generally share a common return pattern, indicating a common, driving style.

Source: Bloomberg.

Frequent readers will know that “monthly rebalance” is an immediate red flag for us here at Newfound: an indicator that timing luck is likely lurking nearby.

Replicating Multi-Asset Momentum

To test the impact of timing luck, we replicate a simple multi-asset momentum strategy based upon available index descriptions.

We rebalance the portfolio at the end of each month.  Our optimization process seeks to identify the portfolio with a realized volatility less than 5% that would have maximized returns over the prior six months, subject to a number of position and asset-level limits.  If the 5% volatility target is not achievable, the target is increased by 1% until a portfolio can be constructed that satisfies our constraints.

We use the following ETFs and asset class limits:

As a naïve test for timing luck, rather than assuming the index rebalances at the end of each month, we will simply assume the index rebalances every 21 trading days. In doing so, we can construct 21 different variations of the index, each representing the results from selecting a different rebalance date.

Source: CSI Analytics; Calculations by Newfound Research.  Results are backtested and hypothetical.  Results assume the reinvestment of all distributions.  Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes, with the exception of underlying ETF expense ratios.  Past performance is not an indicator of future results. 

As expected, the choice of rebalance date has a meaningful impact.  Annualized returns range from 4.7% to 5.5%, Sharpe ratios range from 0.6 to 0.9, and maximum drawdowns range from 9.9% to 20.8%.

On a year-by-year basis, the only thing that is consistent is the large spread between the worst and best-performing rebalance date.  On average, the yearly spread exceeds 400 basis points.

Min

Max

2008*

-9.91%

0.85%

2009

2.36%

4.59%

2010

6.46%

9.65%

2011

3.31%

10.15%

2012

6.76%

10.83%

2013

3.42%

6.13%

2014

5.98%

10.60%

2015

-5.93%

-2.51%

2016

4.18%

8.45%

2017

9.60%

11.62%

2018

-6.00%

-2.53%

2019 YTD

5.93%

10.01%

* Partial year starting 7/22/2018

We’ve said it in the past and we’ll say it again: timing luck can be the difference between hired and fired.  And while we’d rather be on the side of good luck, the lack of control means we’d rather just avoid this risk all together.

If it isn’t nailed down for a reason, diversify it

The choice of when to rebalance is certainly not the only free variable of our multi-asset momentum strategy.  Without an explicit view as to why a choice is made, our preference is always to diversify so as to avoid specification risk.

We will leave the constraints (e.g. volatility target and weight constraints) well enough alone in this example, but we should consider the process by which we’re measuring past returns as well as the horizon over which we’re measuring it.  There is plenty of historical efficacy to using prior 6-month total returns for momentum, but no lack of evidence supporting other lookback horizons or measurements.

Therefore, we will use three models of momentum: prior total return, the distance of price from its moving average, and the distance of a short-term moving average from a longer-term moving average.  We will vary the parameterization of these signals to cover horizons ranging from 3- to 15-months in length.

We will also vary which day of the month the portfolio rebalances on.

By varying the signal, the lookback horizon, and the rebalance date, we can generate hundreds of different portfolios, all supported by the same theoretical evidence but having slightly different realized results due to their particular specification.

Our robust portfolio emerges by calculating the weights for all these different variations and averaging them together, in many ways creating a virtual strategy-of-strategies.

Below we plot the result of this –ensemble approach– as compared to a –random sample of the underlying specifications–.  We can see that while there are specifications that do much better, there are also those that do much worse.  By employing an ensemble approach, we forgo the opportunity for good luck and avoid the risk of bad luck.   Along the way, though, we may pick up some diversification benefits: the Sharpe ratio of the ensemble approach fell in the top quartile of specifications and its maximum drawdown was in the bottom quartile (i.e. lower drawdown).

Source: CSI Analytics; Calculations by Newfound Research.  Results are backtested and hypothetical.  Results assume the reinvestment of all distributions.  Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes, with the exception of underlying ETF expense ratios.  Past performance is not an indicator of future results.

Conclusion

In this commentary, we again demonstrate the potential risk of needless specification and the potential power of diversification.

Using a popular multi-asset momentum model as our example, we again find a significant amount of timing luck lurking in a monthly rebalance specification.  By building a virtual strategy-of-strategies, we are able to manage this risk by partially rebalancing our portfolio on different days.

We go a step further, acknowledging that processrepresents another axis of risk. Specifically, we vary both how we measure momentum and the horizon over which it is measured.  Through the variation of rebalance days, model specifications, and lookback horizons, we generate over 500 different strategy specifications and combine them into a virtual strategy-of-strategies to generate our robust multi-asset momentum model.

As with prior commentaries, we find that the robust model is able to effectively reduce the risk of both specification and timing luck.  But perhaps most importantly, it was able to harvest the benefits of diversification, realizing a Sharpe ratio in the top quartile of specifications and a maximum drawdown in the lowest quartile.

Dynamic Spending in Retirement Monte Carlo

This post is available as a PDF download here.

Summary­

  • Many retirement planning analyses rely on Monte Carlo simulations with static assumptions for withdrawals.
  • Incorporating dynamic spending rules can more closely align the simulations with how investors would likely behave during times when the plan looked like it was on a path to failure.
  • Even a modest reduction in withdrawals (e.g. 10%) can have a meaningful impact on reducing failure rates, nearly cutting it in half in a sample simulation.
  • Combining dynamic spending rules with other marginal improvements, such as supplemental income and active risk management, can lead to more robust retirement plans and give investors a better understanding of the variables that are within their realm of control.

Monte Carlo simulations are a prevalent tool in financial planning, especially pertaining to retirement success calculations.

Under a typical framework of normally distributed portfolio returns and constant inflation-adjusted withdrawals, calculating the success of a given retirement portfolio is straightforward. But as with most tools in finance, the art lies both in the assumptions that go into the calculation and in the proper interpretation of the result.

If a client is told they have a 10% chance of running out of money over their projected retirement horizon, what does that mean for them?

They cannot make 9 copies of themselves to live out separate lives, with one copy (hopefully not the original) unfortunately burning through the account prematurely.

They also cannot create 9 parallel universes and ensure they do not choose whichever one does not work out.

We wrote previously how investors follow a single path (You Are Not a Monte-Carlo Simulation). If that path hits zero, the other hypothetical simulation paths don’t mean a thing.

A simulation path is only as valuable as the assumptions that go into creating it, and fortunately, we can make our simulations align more closely with investor behavior.

The best way to interpret the 10% failure rate is to think of it as a 10% chance of having to make an adjustment before it hits zero. Rarely would an investor stand by while their account went to zero. There are circumstances that are entirely out of investor control, but to the extent that there was something they could do to prevent that event, they would most likely do it.

Derek Tharp, on Michael Kitces’ blog, wrote a post a few years ago weighing the relative benefit of implementing small but permanent adjustments vs. large but temporary adjustments to retirement withdrawals and found that making small adjustments and leaving them in place led to greater likelihoods of success over retirement horizons (Dynamic Retirement Spending Adjustments: Small-But-Permanent Vs Large-But-Temporary).

In this week’s commentary, we want to dig a little deeper into some simple path dependent modifications that we can make to retirement Monte-Carlo simulations with the hope of creating a more robust toolset for financial planning.

The Initial Plan

Suppose an investor is 65 and holds a moderate portfolio of 60% U.S. stocks and 40% U.S. Treasuries. From 1871 until mid-2019, this portfolio would have returned an inflation-adjusted 5.1% per year with 10.6% volatility according to Global Financial Data.

Sticking with the rule-of-thumb 4% annual withdrawal of the initial portfolio balance and assuming a 30-year retirement horizon, this yields a predicted failure rate of 8% (plus or minus about 50 bps).

The financial plan is complete.

If you start with $1,000,000, simply withdraw $3,333/month and you should be fine 92% of the time.

But what if the portfolio drops 5% in the first month? (It almost did that in October 2018).

The projected failure rate over the next 29 years and 11 months has gone up to 11%. That violates a 10% threshold that may have been a target in the planning process.

Or what if it drops 30% in the first 6 months, like it would have in the second half of 1931?

Now the project failure rate is a staggering 46%. Retirement success has been reduced to a coin flip.

Admittedly, these are trying scenarios, but these numbers are a key driver for financial planning. If we can better understand the risks and spell out a course of action beforehand, then the risk of making a rash emotion-driven decision can be mitigated.

Aligning the Plan with Reality

When the market environment is challenging, investors can benefit by being flexible. The initial financial plan does not have to be jettisoned; agreed upon actions within it are implemented.

One of the simplest – and most impactful – modifications to make is an adjustment to spending. For instance, an investor might decide at the outset to scale back spending by a set amount when the probably of failure crosses a threshold.Source: Global Financial Data. Calculations by Newfound.

This reduction in spending would increase the probability of success going forward through the remainder of the retirement horizon.

And if we knew that this spending cut would likely happen if it was necessary, then we can quantify it as a rule in the initial Monte Carlo simulation used for financial planning.

Graphically, we can visualize this process by looking at the probabilities of failure for varying asset levels over time. For example, at 10 years after retirement, the orange line indicates that a portfolio value ~80% of the initial value would have about a 5% failure rate.

Source: Global Financial Data. Calculations by Newfound.

As long as the portfolio value remains above a given line, no adjustment would be needed based on a standard Monte Carlo analysis. Once a line is crossed, the probability of success is below that threshold.

This chart presents a good illustration of sequence risk: the lines are flatter initially after retirement and the slope progressively steepens as the time progresses. A large drawdown initially puts the portfolio below the threshold for making and adjustment.

For instance, at 5 years, the portfolio has more than a 10% failure rate if the value is below 86%. Assuming zero real returns, withdrawals alone would have reduced the value to 80%. Positive returns over this short time period would be necessary to feel secure in the plan.

Looking under the hood along the individual paths used for the Monte Carlo simulation, at 5 years, a quarter of them would be in a state requiring an adjustment to spending at this 10% failure level.

Source: Global Financial Data. Calculations by Newfound.

This belies the fact that some of the paths that would have crossed this 10% failure threshold prior to the 5-year mark improved before the 5-year mark was hit. 75% of the paths were below this 10% failure rate at some point prior to the 5-year mark. Without more appropriate expectations of a what these simulations mean, under this model, most investors would have felt like their plan’s failure rate was uncomfortable at some point in the first 5 years after retirement!

Dynamic Spending Rules

If the goal is ultimately not to run out of funds in retirement, the first spending adjustment case can substantially improve those chances (aside from a large negative return in the final periods prior to the last withdrawals).

Each month, we will compare the portfolio value to the 90% success value. If the portfolio is below that cutoff, we will size the withdrawal to hit improve the odds of success back to that level, if possible.

The benefit of this approach is greatly improved success along the different paths. The cost is forgone income.

But this can mean forgoing a lot of income over the life of the portfolio in a particularly bad state of the world. The worst case in terms of this total forgone income is shown below.

Source: Global Financial Data. Calculations by Newfound.

The portfolio gives up withdrawals totaling 74%, nearly 19 years’ worth. Most of this is given up in consecutive periods during the prolonged drawdown that occurs shortly after retirement.

This is an extreme case that illustrates how large of income adjustments could be required to ensure success under a Monte Carlo framework.

The median case foregoes 9 months of total income over the portfolio horizon, and the worst 5% of cases all give up 30% (7.5 years) of income based off the initial portfolio value.

That is still a bit extreme in terms of potential cutbacks.

As a more realistic scenario that is easier on the pocketbook, we will limit the total annual cutback to 30% of the withdrawal in the following manner:

  • If the current chance of failure is greater than 20%, cut spending by 30%. This equates to reducing the annual withdrawal by $12,000 assuming a $1,000,000 initial balance.
  • If the current chance of failure is between 15% and 20%, cut spending by 20%. This equates to reducing the annual withdrawal by $8,000 assuming a $1,000,000 initial balance.
  • If the current chance of failure is between 10% and 15%, cut spending by 10%. This equates to reducing the annual withdrawal by $4,000 assuming a $1,000,000 initial balance.

These rules still increase the success rate to 99% but substantially reduce the amount of reductions in income.

Looking again at the worst-case scenario, we see that this case still “fails” (even though it lasts another 4.5 years) but that its reduction in come is now less than half of what it was in the extreme cutback case. This pattern is in line with the “lower for longer” reductions that Derek had looked at in the blog post.

Source: Global Financial Data. Calculations by Newfound.

On the 66% of sample paths where there was a cut in spending at some point, the average total cut amounted to 5% of the portfolio (a little over a year of withdrawals spread over the life of the portfolio).

Even moving to an even less extreme reduction regime where only 10% cuts are ever made if the probability of failure increases above 10%, the average reduction in the 66% of cases that required cuts was about 9 months of withdrawals over the 30-year period.

In these scenarios, the failure rate is reduced to 5% (from 8% with no dynamic spending rules).

Source: Global Financial Data. Calculations by Newfound.

Conclusion

Retirement simulations can be a powerful planning tool, but they are only as good as their inputs and assumptions. Making them align as closes with reality as possible can be a way to quantify the impact of dynamic spending rules in retirement.

While the magnitude of spending reductions necessary to guarantee success of a retirement plan in all potential states of the world is prohibitive. However, small modifications to spending can have a large impact on success.

For example, reducing withdrawal by 10% when the forecasted failure rate increases above 10% nearly cut the failure rate of the entire plan in half.

But dynamic spending rules do not exist in a vacuum; they can be paired with other marginal improvements to boost the likelihood of success:

  • Seek out higher returns – small increases in portfolio returns can have a significant impact over the 30 -ear planning horizon.
  • Supplement income – having supplements to income, even small ones, can offset spending during any market environment, improving the success rate of the financial plan.
  • Actively manage risk – managing risk, especially early in retirement is a key factor to now having to reduce withdrawals in retirement.
  • Plan for more flexibility – having the ability to reduce spending when necessary reduces the need to rely on the portfolio balance when the previous factors are not working.

While failure is certainly possible for investors, a “too big to fail” mentality is much more in line with the reality of retirement.

Even if absolute failure is unlikely, adjustments will likely be a requirement. These can be built into the retirement planning process and can shed light on stress testing scenarios and sensitivity.

From a retirement planning perspective, flexibility is simply another form of risk management.

Decomposing the Credit Curve

This post is available as a PDF download here.

Summary­

  • In this research note, we continue our exploration of credit.
  • Rather than test a quantitative signal, we explore credit changes through the lens of statistical decomposition.
  • As with the Treasury yield curve, we find that changes in the credit spread curve can be largely explained by Level, Slope, and Curvature (so long as we adjust for relative volatility levels).
  • We construct stylized portfolios to reflect these factors, adjusting position weights such that they contribute an equal amount of credit risk. We then neutralize interest rate exposure such that the return of these portfolios represents credit-specific information.
  • We find that the Level trade suggests little-to-no realized credit premium over the last 25 years, and Slope suggests no realized premium of junk-minus-quality within credit either. However, results may be largely affected by idiosyncratic events (e.g.  LTCM in 1998) or unhedged risks (e.g. sector differences in credit indices).

In this week’s research note, we continue our exploration of credit with a statistical decomposition of the credit spread curve.  Just as the U.S. Treasury yield curve plots yields versus maturity, the credit spread curve plots excess yield versus credit quality, providing us insight into how much extra return we demand for the risks of declining credit quality.

Source: Federal Reserve of St. Louis; Bloomberg.  Calculations by Newfound Research. 

Our goal in analyzing the credit spread curve is to gain a deeper understanding of the principal drivers behind its changes.  In doing so, we hope to potentially gain intuition and ideas for trading signals between low- and high-quality credit.

To begin our, we must first construct our credit spread curve.  We will use the following index data to represent our different credit qualities.

  • Aaa: Bloomberg U.S. Corporate Aaa Index (LCA3TRUU)
  • Aa: Bloomberg U.S. Corporate Aa Index (LCA2TRUU)
  • A:Bloomberg U.S. Corporate A Index (LCA1TRUU)
  • Baa: Bloomberg U.S. Corporate Baa Index (LCB1TRUU)
  • Ba: Bloomberg U.S. Corporate HY Ba Index (BCBATRUU)
  • B: Bloomberg U.S. Corporate HY B Index (BCBHTRUU)
  • Caa: Bloomberg U.S. Corporate HY Caa Index (BCAUTRUU)

Unfortunately, we cannot simply plot the yield-to-worst for each index, as spread captures the excess yield.  Which raises the question: excess to what?  As we want to isolate the credit component of the yield, we need to remove the duration-equivalent Treasury rate.

Plotting the duration of each credit index over time, we can immediately see why incorporating this duration data will be important.  Not only do durations vary meaningfully over time (e.g. Aaa durations varying between 4.95 and 11.13), but they also deviate across quality (e.g. Caa durations currently sit near 3.3 while Aaa durations are north of 11.1).

Source: Bloomberg.

To calculate our credit spread curve, we must first calculate the duration-equivalent Treasury bond yield for each index at each point in time.  For each credit index at each point in time, we use the historical Treasury yield curve to numerically solve for the Treasury maturity that matches the credit index’s duration.  We then subtract that matching rate from the credit index’s reported yield-to-worst to estimate the credit spread.

We plot the spreads over time below.

Source: Federal Reserve of St. Louis; Bloomberg.  Calculations by Newfound Research.

Statistical Decomposition: Eigen Portfolios

With our credit spreads in hand, we can now attempt to extract the statistical drivers of change within the curve.  One method of achieving this is to:

  • Calculate month-to-month differences in the curve.
  • Calculate the correlation matrix of the differences.
  • Calculate an eigenvalue decomposition of the correlation matrix.

Stopping after just the first two steps, we can begin to see some interesting visual patterns emerge in the correlation matrix.

  • There is not a monotonic decline in correlation between credit qualities. For example, Aaa is not more highly correlated to Aa than Ba and A is more correlated to B than it is Aa.
  • Aaa appears to behave rather uniquely.
  • Baa, Ba, B, and to a lesser extent Caa, appear to visually cluster in behavior.
  • Ba, B, and Caa do appear to have more intuitive correlation behavior, with correlations increasing as credit qualities get closer.

Step 3 might seem foreign for those unfamiliar with the technique, but in this context eigenvalue decomposition has an easy interpretation.   The process will take our universe of credit indices and return a universe of statistically independent factor portfolios, where each portfolio is made up of a combination of credit indices.

As our eigenvalue decomposition was applied to the correlation matrix of credit spread changes, the factors will explain the principal vectors of variance in credit spread changes.  We plot the weights of the first three factors below.

Source: Federal Reserve of St. Louis; Bloomberg.  Calculations by Newfound Research.

For anyone who has performed an eigenvalue decomposition on the yield curve before, three familiar components emerge.

We can see that Factor #1 applies nearly equal-weights across all the credit indices. Therefore, we label this factor “level” as it represents a level shift across the entire curve.

Factor #2 declines in weight from Aaa through Caa.  Therefore, we label this factor “slope,” as it controls steepening and flattening of the credit curve.

Factor #3 appears as a barbell: negative weights in the wings and positive weights in the belly.  Therefore, we call this factor “curvature,” as it will capture convexity changes in the curve.

Together, these three factors explain 80% of the variance in credit spread changes. Interestingly, the 4thfactor – which brings variance explained up to 87.5% – also looks very much like a curvature trade, but places zero weight on Aaa and barbells Aa/Caa against A/Baa.  We believe this serves as further evidence as to the unique behavior of Aaa credit.

Tracking Credit Eigen Portfolios

As we mentioned, each factor is constructed as a combination of exposure to our Aaa-Caa credit universe; in other words, they are portfolios!  This means we can track their performance over time and see how these different trades behave in different market regimes.

To avoid overfitting and estimation risk, we decided to simplify the factor portfolios into more stylized trades, whose weights are plotted below (though ignore, for a moment, the actual weights, as they are meant only to represent relative weighting within the portfolio and not absolute level).  Note that the Level trade has a cumulative positive weight while the Slope and Curvature trades sum to zero.

To actually implement these trades, we need to account for the fact that each credit index will have a different level of credit duration.

Akin to duration, which measure’s a bond’s sensitivity to interest rate changes, credit duration measures a bond’s sensitivity to changes in its credit spread. As with Treasuries, we need to adjust the weights of our trades to account for this difference in credit durations across our indices.

For example, if we want to place a trade that profits in a steepening of the Treasury yield curve, we might sell 10-year US Treasuries and buy 2-year US Treasuries. However, we would not buy and sell the same notional amount, as that would leave us with a significantly negative duration position.  Rather, we would scale each leg such that their durations offset.  In the end, this causes us to buy significantly more 2s than we sell 10s.

To continue, therefore, we must calculate credit spread durations.

Without this data on hand, we employ a statistical approach.  Specifically, we take monthly total return data and subtract yield return and impact from interest rate changes (employing the duration-matched rates we calculated above).  What is left over is an estimate of return due to changes in credit spreads. We then regress these returns against changes in credit spreads to calculate credit spread durations, which we plot below.

Source: Federal Reserve of St. Louis; Bloomberg.  Calculations by Newfound Research.

The results are a bit of a head scratcher.  Unlike duration in the credit curve which typically increases monotonically across maturities, we get a very different effect here.  Aaa credit spread duration is 10.7 today while Caa credit spread duration is 2.8.  How is that possible?  Why is lower-quality credit not more sensitiveto credit changes than higher quality credit?

Here we run into a very interesting empirical result in credit spreads: spread change is proportional to spread level.  Thus, a true “level shift” rarely occurs in the credit space; e.g. a 1bp change in the front-end of the credit spread curve may actually manifest as a 10bp change in the back end.  Therefore, the lower credit spread duration of the back end of the curve is offset by larger changes.

There is some common-sense intuition to this effect.  Credit has a highly non-linear return component: defaults.  If we enter an economic environment where we expect an increase in default rates, it tends to happen in a non-linear fashion across the curve.  To offset the larger increase in defaults in lower quality credit, investors will demand larger corresponding credit spreads.

(Side note: this is why we saw that the Baa–Aaa  spread did not appear to mean-revert as cleanly as the log-difference of spreads did in last week’s commentary, Value and the Credit Spread.)

While our credit spread durations may be correct, we still face a problem: weighting such that each index contributes equal credit spread duration will create an outsized weight to the Caa index.

DTS Scaling

Fortunately, some very smart folks thought about this problem many years ago. Recognizing the stability of relative spread changes, Dor, Dynkin, Hyman, Houweling, van Leeuwen, and Penninga (2007)recommend the measure of duration times spread (“DTS”) for credit risk.

With a more appropriate measure of credit sensitivity, we can now scale our stylized factor portfolio weights such that each position contributes an equal level of DTS.  This will have two effects: (1) the relative weights in the portfolios will change over time, and (2) the notional size of the portfolios will change over time.

We scale each position such that (1) they contribute an equal level of DTS to the portfolio and (2) each leg of the portfolio has a total DTS of 500bps.  The Level trade, therefore, represents a constant 500bps of DTS risk over time, while the Slope and Curvature trades represent 0bps, as the longs and short legs net out.

One problem still remains: interest rate risk.  As we plotted earlier in this piece, the credit indices have time-varying – and sometimes substantial – interest rate exposure.  This creates an unintended bet within our portfolios.

Fortunately, unlike the credit curve, true level shift does empirically apply in the Treasury yield curve.  Therefore, to simplify matters, we construct a 5-year zero-coupon bond, which provides us with a constant duration instrument.  At each point in time, we calculate the net duration of our credit trades and use the 5-year ZCB to neutralize the interest rate risk.  For example, if the Level portfolio has a duration of 1, we would take a -20% notional position in the 5-year ZCB.

Source: Federal Reserve of St. Louis; Bloomberg.  Calculations by Newfound Research.

Some things we note when evaluating the portfolios over time:

  • In all three portfolios, notional exposure to higher credit qualities is substantially larger than lower credit qualities. This captures the meaningfully higher exposure that lower credit quality indices have to credit risk than higher quality indices.
  • The total notional exposure of each portfolio varies dramatically over time as market regimes change. In tight spread environments, DTS is low, and therefore notional exposures increase. In wide spread environments – like 2008 – DTS levels expand dramatically and therefore only a little exposure is necessary to achieve the same risk target.
  • 2014 highlights a potential problem with our approach: as Aaa spreads reached just 5bps, DTS dipped as low as 41bps, causing a significant swing in notional exposure to maintain the same DTS contribution.

Conclusion

The fruit of our all our labor is the graph plotted below, which shows the growth of $1 in our constant DTS, stylized credit factor portfolios.

What can we see?

First and foremost, constant credit exposure has not provided much in the last 25 years until recently.  It would appear that investors did not demand a high enough premium for the risks that were realized over the period, which include the 1998 LTCM blow-up, the burst of the dot-com bubble, and the 2008 recession.

From 12/31/2008 lows through Q1 2019, however, a constant 500bps DTS exposure generated a 2.0% annualized return with 2.4% annualized volatility, reflecting a nice annual premium for investors willing to bear the credit risk.

Slope captures the high-versus-low-quality trade.  We can see that junk meaningfully out-performed quality in the 1990s, after which there really did not appear to be a meaningful difference in performance until 2013 when oil prices plummeted and high yield bond prices collapsed.  This result does highlight a potential problem in our analysis: the difference in sector composition of the underlying indices. High yield bonds had an outsized reaction compared to higher quality investment grade credit due to more substantial exposure to the energy sector, leading to a lop-sided reaction.

What is also interesting about the Slope trade is that the market did not seem to price a meaningful premium for holding low-quality credit over high-quality credit.

Finally, we can see that Curvature (“barbell versus belly”) – trade was rather profitable for the first decade, before deflating pre-2008 and going on a mostly-random walk ever since.  However, as mentioned when the curvature trade was initially introduced, the 4th factor in our decomposition also appeared to reflect a similar trade but shorts Aa and Caa versus a long position in A and Baa.  This trade has been a fairly consistent money-loser since the early 2000s, indicating that a barbell of high quality (just not Aaa) and junk might do better than the belly of the curve.

It is worth pointing out that these trades represent a significant amount of compounding estimation – from duration-matching Treasury rates to credit spread durations – which also means a significant risk of compounding estimation error.  Nevertheless, we believe there are a few takeaways worth exploring further:

  • The Level trade appears highly regime dependent (in positive and negative economic environments), suggesting a potential opportunity for on/off credit trades.
  • The 4th factor is a consistent loser, suggesting a potential structural tilt that can be made by investors by holding quality and junk (e.g. QLTA + HYG) rather than the belly of the curve (LQD).  Implementing this in a long-only fashion would require more substantial analysis of duration trade-offs, as well as a better intuition as to whythe returns are emerging as they are.
  • Finally, a recognition that maintaining a constant credit risk level requires reducing notional exposure as rates go up, as rate changes are proportional to rate levels. This is an important consideration for strategic asset allocation.

 

Value and the Credit Spread

This post is available as a PDF download here.

Summary­

  • We continue our exploration of quantitative signals in fixed income.
  • We use a measure of credit curve steepness as a valuation signal for timing exposure between corporate bonds and U.S. Treasuries.
  • The value signal generates a 0.84% annualized return from 1950 to 2019 but is highly regime dependent with meaningful drawdowns.
  • Introducing a naïve momentum strategy significantly improves the realized Sharpe ratio and drawdown profile, but does not reduce the regime-based nature of the returns.
  • With a combined return of just 1.0% annualized, this strategy may not prove effective after appropriate discounting for hindsight bias, costs, and manager fees. The signal itself, however, may be useful in other contexts.

In the last several weeks, we have been exploring the application of quantitative signals to fixed income.

Recent cross-sectional studies also build off of further research we’ve done in the past on applying trend, value, carry, and explicit measures of the bond risk premium as duration timing mechanisms (see Duration Timing with Style Premia; Timing Bonds with Value, Momentum, and Carry; and A Carry-Trend-Hedge Approach to Duration Timing).

Broadly, our studies have found:

  • Value (measured as deviation from real yield), momentum (prior 12-month returns), and carry (yield-to-worst) were all profitable factors in cross-section municipal bond sector long/short portfolios.
  • Value (measured as deviation from real yield), trend (measured as prior return), and carry (measured as term spread + roll yield) have historically been effective timing signals for U.S. duration exposure.
  • Prior short-term equity returns proved to be an effective signal for near-term returns in U.S. Treasuries (related to the “flight-to-safety premium”).
  • Short-term trend proved effective for high yield bond timing, but the results were vastly determined by performance in 2000-2003 and 2008-2009. While the strategy appeared to still be able to harvest relative carry between high-yield bonds and core fixed income in other environments, a significant proportion of returns came from avoiding large drawdowns in high yield.
  • Short-term cross-section momentum (prior total returns), value (z-score of loss-adjusted yield-to-worst), carry (loss-adjusted yield-to-worst), and 3-year reversals all appeared to offer robust signals for relative selection in fixed income sectors. The time period covered in the study, however, was limited and mostly within a low-inflation regime.
  • Application of momentum, value, carry, and reversal as timing signals proved largely ineffective for generating excess returns.

In this week’s commentary, we want to further contribute to research by introducing a value timing signal for credit.

Finding Value in Credit

Identifying a value signal requires some measure or proxy of an asset’s “fair” value. What can make identifying value in credit so difficult is that there are a number of moving pieces.

Conceptually, credit spreads should be proportional to default rates, recovery rates, and aggregate risk appetite, making determining whether spreads are cheap or expensive rather complicated.  Prior literature typically tackles the problem with one of three major categories of models:

  • Econometric: “Fair value” of credit spreads is modeled through a regression that typically explicitly accounts for default and recovery rates. Inputs are often related to economic and market variables, such as equity market returns, 10-year minus 2-year spreads, corporate leverage, and corporate profitability.  Bottom-up analysis may use metrics such as credit quality, maturity, supply, and liquidity.
  • Merton Model: Based upon the idea the bond holders have sold a put on a company’s asset value. Therefore, options pricing models can be used to calculate a credit spread.  Inputs include the total asset value, asset volatility, and leverage of the firm under analysis.
  • Spread Signal: A simple statistical model derived from credit spread themselves. For example, a rolling z-score of option-adjusted spreads or deviations from real yield.  Other models (e.g. Haghani and Dewey (2016)) have used spread plus real yield versus a long-run constant (e.g. “150 basis points”).

The first method requires a significant amount of economic modeling.  The second approach requires a significant amount of extrapolation from market data.  The third method, while computationally (and intellectually) less intensive, requires a meaningful historical sample that realistically needs to cover at least one full market cycle.

While attractive for its simplicity, there are a number of factors that complicate the third approach.

First, if spreads are measured against U.S. Treasuries, the metric may be polluted by information related to Treasuries due to their idiosyncratic behavior (e.g. scarcity effects and flight-to-safety premiums).  Structural shifts in default rates, recovery rates, and risk appetites may also cause a problem, as spreads may appear unduly thin or wide compared to past regimes.

In light of this, in this piece we will explore a similarly simple-to-calculate spread signal, but one that hopefully addresses some of these short-comings.

Baa vs. Aaa Yields

In order to adjust for these problems, we propose looking at the steepness of the credit curve itself by comparing prime / high-grade yield versus lower-medium grade yields.  For example, we could compare Moody’s Season Aaa Corporate Bond Yield and Moody’s Season Baa Corporate Bond Yield.  In fact, we will use these yields for the remainder of this study.

We may be initially inclined to measure the steepness of the credit curve by taking the difference in yield spreads, which we plot below.

Source: Federal Reserve of St. Louis.  Calculations by Newfound Research.

We can find a stronger mean-reverting signal, however, if we calculate the log-difference in yields.

Source: Federal Reserve of St. Louis.  Calculations by Newfound Research.

We believe this transformation is appropriate for two reasons.  First, the log transformation helps control for the highly heteroskedastic and skewed nature of credit spreads.

Second, it helps capture both the steepness andthe level of the credit curve simultaneously.  For example, a 50-basis-point premium when Aaa yield is 1,000 basis points is very different than when Aaa yield is 100 basis points.  In the former case, investors may not feel any pressure to bear excess risk to achieve their return objectives, and therefore a 50-basis-point spread may be quite thin.  In the latter case, 50 basis points may represent a significant step-up in relative return level in an environment where investors have either low default expectations, high recovery expectations, high risk appetite, or some combination thereof.

Another way of interpreting our signal is that it informs us about the relative decisions investors must make about their expected dispersion in terminal wealth.

Constructing the Value Strategy

With our signal in hand, we can now attempt to time credit exposure.  When our measure signals that the credit curve is historically steep, we will take credit risk.  When our signal indicates that the curve is historically flat we will avoid it.

Specifically, we will construct a dollar-neutral long/short portfolio using the Dow Jones Corporate Bond Index (“DJCORP”) and a constant maturity 5-year U.S. Treasury index (“FV”).   We will calculate a rolling z-score of our steepness measure and go long DJCORP and short FV when the z-score is positive and place the opposite trade when the z-score is negative.

In line with prior studies, we will apply an ensemble approach.  Portfolios are reformed monthly using formation ranging from 3-to-6 years with holding periods ranging from 1-to-6 months.  Portfolio weights for the resulting strategy are plotted below.

Source: Federal Reserve of St. Louis and Global Financial Data.  Calculations by Newfound Research.

We should address the fact that while both corporate bond yield and index data is available back to the 1930s, we have truncated our study to ignore dates prior to 12/1949 to normalize for a post-war period.  It should be further acknowledged that the Dow Jones Corporate Bond index used in this study did not technically exist until 2002.  Prior to that date, the index return tracks a Dow Jones Bond Aggregate, which was based upon four sub-indices: high-grade rails, second-grade rails, public utilities, and industries.  This average existed from 1915 to 1976, when it was replaced with a new average at that point when the number of railway bonds was no longer sufficient to maintain the average.

Below we plot the returns of our long/short strategy.

Source: Federal Reserve of St. Louis and Global Financial Data.  Calculations by Newfound Research. Returns are hypothetical and backtested.  Returns are gross of all management fees, transaction fees, and taxes, but net of underlying fund fees.  Total return series assumes the reinvestment of all distributions.

The strategy has an annualized return of 0.84% with a volatility of 3.89%, generating a Sharpe ratio of 0.22.  Of course, long-term return statistics belie investor and manager experience, with this strategy exhibiting at least two periods of decade-plus-long drawdowns.  In fact, the strategy really has just four major return regimes: 1950 to 1970 (-0.24% annualized), 1970 to 1987 (2.59% annualized), 1987 to 2002 (-0.33%), and 2002 to 2019 (1.49% annualized).

Try the strategy out in the wrong environment and we might be in for a lot of pain.

Momentum to the Rescue?

It is no secret that value and momentum go together like peanut butter and jelly. Instead of tweaking our strategy to death in order to improve it, we may just find opportunity in combining it with a negatively correlated signal.

Using an ensemble model, we construct a dollar-neutral long/short momentum strategy that compares prior total returns of DJCORP and FV.  Rebalanced monthly, the portfolios use formation periods ranging from 9-to-15 months and holding periods ranging from 1-to-6 months.

Below we plot the growth of $1 in our value strategy, our momentum strategy, and a 50/50 combination of the two strategies that is rebalanced monthly.

Source: Federal Reserve of St. Louis and Global Financial Data.  Calculations by Newfound Research. Returns are hypothetical and backtested.  Returns are gross of all management fees, transaction fees, and taxes, but net of underlying fund fees.  Total return series assumes the reinvestment of all distributions.

The first thing we note is – even without calculating any statistics – the meaningful negative correlation we see in the equity curves of the value and momentum strategies.  This should give us confidence that there is the potential for significant improvement through diversification.

The momentum strategy returns 1.11% annualized with a volatility of 3.92%, generating a Sharpe ratio of 0.29.  The 50/50 combination strategy, however, returns 1.03% annualized with a volatility of just 2.16% annualized, resulting in a Sharpe ratio of 0.48.

While we still see significant regime-driven behavior, the negative regimes now come at a far lower cost.

Conclusion

In this study we introduce a simple value strategy based upon the steepness of the credit curve.  Specifically, we calculated a rolling z-score on the log-difference between Moody’s Seasoned Baa and Aaa yields.  We interpreted a positive z-score as a historically steep credit curve and therefore likely one that would revert.  Similarly, when z-scores were negative, we interpreted the signal as a flat credit curve, and therefore a period during which taking credit risk is not well compensated.

Employing an ensemble approach, we generated a long/short strategy that would buy the Dow Jones Corporate Bond Index and short 5-year U.S. Treasuries when credit appeared cheap and place the opposite trade when credit appeared expensive.  We found that this strategy returned 0.84% annualized with a volatility of 3.89% from 1950 to 2019.

Unfortunately, our value signal generated significantly regime-dependent behavior with decade-long drawdowns.  This not only causes us to question the statistical validity of the signal, but also the practicality of implementing it.

Fortunately, a naively constructed momentum signal provides ample diversification.  While a combination strategy is still highly regime-driven, the drawdowns are significantly reduced.  Not only do returns meaningfully improve compared to the stand-alone value signal, but the Sharpe ratio more-than-doubles.

Unfortunately, our study leveraged a long/short construction methodology.  While this isolates the impact of active returns, long-only investors must cut return expectations of the strategy in half, as a tactical timing model can only half-implement this trade without leverage.  A long-only switching strategy, then, would only be expected to generate approximately 0.5% annualized excess return above a 50% Dow Jones Corporate Bond Index / 50% 5-Year U.S. Treasury index portfolio.

And that’s before adjustments for hindsight bias, trading costs, and manager fees.

Nevertheless, more precise implementation may lead to better results.  For example, our indices neither perfectly matched the credit spreads we evaluated, nor did they match each other’s durations.  Furthermore, while this particular implementation may not survive costs, this signal may still provide meaningful information for other credit-based strategies.

Powered by WordPress & Theme by Anders Norén