Drawdowns and Portfolio Longevity

By Corey Hoffstein

On January 22, 2019

In Risk Management, Sequence Risk

This post is available as a PDF download here.

Summary

While retirement planning is often performed with Monte Carlo simulations, investors only experience a single path.
Large or prolonged drawdowns early in retirement can have a significant impact upon the probability of success.
We explore this idea by simulation returns of a 60/40 portfolio and measuring the probability of portfolio failure based upon a quantitative measure of risk called the Ulcer Index.
We find that a high Ulcer Index reading early in an investor’s retirement can dramatically increase the probability of failure as well as decrease the expected longevity of a portfolio.

Introduction

At Newfound we often say, “while other asset managers focus on alpha, our first focus is on risk.”

Not that there is anything wrong with the pursuit of alpha. We’d argue that the pursuit of alpha is actually a necessary component for well-functioning financial markets.

It’s simply that we have never met a financial advisor who has built a financial plan that assumed any sort of alpha. Alpha is great if we can harvest it, but the empirical evidence suggesting how difficult that can be (both for the manager net-of-fees as well as the investor behaviorally) would make the presumption of achieving alpha rather bold.

Furthermore, alpha is a zero-sum game: we can’t all plan for it.

Risk, however, is a crucial element of every investor’s plan. Bearing too little risk can lead to a portfolio that “fails slowly,” falling short of achieving the escape velocity required to outpace inflation. Bearing too much risk, however, can lead to sudden and catastrophic ruin: a case of “failing fast.”

When investors hit retirement, the usual portfolio math changes. While we’re taught in Finance 101 that the order of returns does not matter, the introduction of portfolio withdrawals makes the order of returns a large determinant of plan success. This phenomenon is known as “sequence risk” and it peaks in the years just before and after retirement.

Typically, we look at returns through the lens of the investment. In retirement, however, what really matters is the returns of the investor.

We’re often told that our primitive brain, trained on the African veldt, is unsuited for investing. Yet our brain seems to understand quite well that we do not get to live our lives as the average of a Monte Carlo simulation.

If we lose our arm to a lion because we did not flee when we heard a rustle in the bushes, we do not end up with half of an arm because of all the other parallel universes where we did flee. On the timeline we live, the situation is binary.

As investors, the same is true. We live but a single path and there are very real, very permanent knock-out conditions we need to be aware of. Prolonged and significant drawdowns during the first years of retirement rank among the most dangerous.

Drawdowns and the Risk of Ruin

A retirement plan typically establishes a safe withdrawal rate. This is the amount of inflation-adjusted money an investor can withdraw from their portfolio every year and still retain a sufficiently high probability that they will not run out of money before they die.

A well-established (albeit controversial) rule is that 4% of an investor’s portfolio level at retirement is usually an appropriate withdrawal amount. For example, if an investor retires with a $1,000,000 portfolio, they can theoretically safely withdraw $40,000 a year. Another way to think of this is that the portfolio reflects 25 years of spending assuming growth matches inflation.

The problem with portfolio drawdowns is that the withdrawal rate now reflects a larger proportion of capital unless it is commensurately adjusted downward. For example, if the portfolio falls to $700,000, a $40,000 withdrawal is now 5.7% of capital and the portfolio reflects just 17.5 years of spending units.

Even shallow, prolonged drawdowns can have a damaging effect. If the portfolio falls to $900,000 and stays stagnant for the next five years, the $40,000 withdrawals grow from representing 4% of the portfolio to nearly 5.5% of the portfolio. If we do not adjust the withdrawal, at five years into retirement we have gone from 25 spending units to 18.5, losing a year and a half of portfolio longevity.

As sudden and steep drawdowns can be just as damaging as shallow and prolonged ones, we prefer to use a quantitative measure known as the Ulcer Index to measure this risk. Specifically, the Ulcer Index is calculated as the root mean square of monthly drawdowns, capturing both severity and duration simultaneously.

In an effort to demonstrate the damaging impact of drawdowns early in retirement, we will run the following experiment:

Generate 250,000 simulations, each block-bootstrapped from monthly real U.S. equity and real U.S. 5-year Treasury bond returns from 1918 – 2018.
Assume a 65 year old investor with a $1,000,000 starting portfolio and a fixed real $3,333 withdrawal monthly ($40,000 annual).
Assume the investor holds a 60/40 portfolio at all times.
For each simulation:
- Calculate the Ulcer Index of the first five years of portfolio returns (ignoring withdrawals).
- Determine how many years until the portfolio runs out of money.

Based upon this data, below we plot the probability of failure – i.e. the probability we run out of money before we die – given an assumed age of death as well as the Ulcer Index realized by the portfolio in the first five years of retirement.

As an example of how to read this graph, consider the darkest blue line in the middle of the graph, which reflects an assumed age of death of 84. Along the x-axis are different bins of Ulcer Index levels, with lower numbers reflecting fewer and less severe drawdowns, while higher numbers reflect steeper and more frequent ones.

As we trace the line, we can see that the probability of failure – i.e. running out of money before death – increases dramatically as the Ulcer Index increases. While for shallow and infrequent drawdowns the probability of failure is <5%, we can see that the probability approaches 50% for more severe, frequent losses.

Beyond the binary question of failure, it is also important to consider when a portfolio runs out of money relative to when we die. Below we plot how many years prior to death a portfolio runs out of money, on average, based upon the Ulcer Index.

Once again using the darkest blue line as an example, we can see that for most minor-to-moderate Ulcer Index levels, the portfolio would only run out of money a year or two before we die in the case of failure. For more extreme losses, however, the portfolio can run out of money a full decade before we kick the bucket.

It is worth stressing here that these Ulcer Index readings are derived using simulations based upon prior realized U.S. equity and fixed income returns. In other words, while improbable (see the histogram below), extreme readings are not impossible.

It is worth further acknowledging that U.S. assets have experienced some of the highest realized risk premia in the world, and more conservative estimates may put a higher probability mass on more extreme Ulcer Index readings.

Conclusion

For early retirees, large or prolonged drawdowns early in retirement can have a significant impact on the probability of success.

In this commentary, we capture both the depth and duration of drawdowns using a single metric known as the Ulcer Index. We simulate 250,000 possible return paths for a 60/40 portfolio and calculate the Ulcer Index in the first five years of returns. We then plot the probability of failure as well as expected portfolio longevity conditional upon the Ulcer Index level realized.

We clearly see a positive relationship between failure and Ulcer Index, with larger and more prolonged drawdowns earlier in retirement leading to a higher probability of failure. This phenomenon is precisely why investors tend to de-risk their portfolios over time.

While the right risk profile and a well-diversified portfolio make for a strong foundation, we believe that investors should also consider expanding their investment palette to include alternative assets and style premia that may be more defensive oriented in nature. For example, defensive equities (e.g. low-volatility and quality approaches) have historically demonstrated an ability to reduce drawdown risk. Diversified, multi-asset style premia also tend to exhibit low correlation to traditional risk factors and a low intrinsic style premia.

Here at Newfound, we focus on trend equity strategies, which seek to overlay trend-following approaches on top of equity exposures in an effort to reduce left-tail risk and create a higher quality of return profile.

However, an investor chooses to build their portfolio, however, it should be risk that is on the forefront of their mind.

Fragility Case Study: Dual Momentum GEM

By Corey Hoffstein

On January 14, 2019

In Craftsmanship, Momentum, Popular, Portfolio Construction, Risk Management, Trend

This post is available as a PDF download here.

Summary

Recent market volatility has caused many tactical models to make sudden and significant changes in their allocation profiles.
Periods such as Q4 2018 highlight model specification risk: the sensitivity of a strategy’s performance to specific implementation decisions.
We explore this idea with a case study, using the popular Dual Momentum GEM strategy and a variety of lookback horizons for portfolio formation.
We demonstrate that the year-to-year performance difference can span hundreds, if not thousands, of basis points between the implementations.
By simply diversifying across multiple implementations, we can dramatically reduce model specification risk and even potentially see improvements in realized metrics such as Sharpe ratio and maximum drawdown.

Introduction

Among do-it-yourself tactical investors, Gary Antonacci’s Dual Momentum is the strategy we tend to see implemented the most. The Dual Momentum approach is simple: by combining both relative momentum and absolute momentum (i.e. trend following), Dual Momentum seeks to rotate into areas of relative strength while preserving the flexibility to shift entirely to safety assets (e.g. short-term U.S. Treasury bills) during periods of pervasive, negative trends.

In our experience, the precise implementation of Dual Momentum tends to vary (with various bells-and-whistles applied) from practitioner to practitioner. The most popular benchmark model, however, is the Global Equities Momentum (“GEM”), with some variation of Dual Momentum Sector Rotation (“DMSR”) a close second.

Recently, we’ve spoken to several members in our extended community who have bemoaned the fact that Dual Momentum kept them mostly aggressively positioned in Q4 2018 and signaled a defensive shift at the beginning of January 2019, at which point the S&P 500 was already in a -14% drawdown (having peaked at over -19% on December 24^th). Several DIYers even decided to override their signal in some capacity, either ignoring it entirely, waiting a few days for “confirmation,” or implementing some sort of “half-and-half” rule where they are taking a partially defensive stance.

Ignoring the fact that a decision to override a systematic model somewhat defeats the whole point of being systematic in the first place, this sort of behavior highlights another very important truth: there is a significant gap of risk that exists between the long-term supporting evidence of an investment style (e.g. momentum and trend) and the precise strategy we attempt to implement with (e.g. Dual Momentum GEM).

At Newfound, we call that gap model specification risk. There is significant evidence supporting both momentum and trend as quantitative styles, but the precise means by which we measure these concepts can lead to dramatically different portfolios and outcomes. When a portfolio’s returns are highly sensitive to its specification – i.e. slight variation in returns or model parameters lead to dramatically different return profiles – we label the strategy as fragile.

In this brief commentary, we will use the Global Equities Momentum (“GEM”) strategy as a case study in fragility.

Global Equities Momentum (“GEM”)

To implement the GEM strategy, an investor merely needs to follow the decision tree below at the end of each month.

From a practitioner stand-point, there are several attractive features about this model. First, it is based upon the long-run evidence of both trend-following and momentum. Second, it is very easy to model and generate signals for. Finally, it is fairly light-weight from an implementation perspective: only twelve potential rebalances a year (and often much less), with the portfolio only holding one ETF at a time.

Despite the evidence that “simple beats complex,” the simplicity of GEM belies its inherent fragility. Below we plot the equity curves for GEM implementations that employ different lookback horizons for measuring trend and momentum, ranging from 6- to 12-months.

Source: CSI Analytics. Calculations by Newfound Research. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees except for underlying ETF expense ratios. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.

We can see a significant dispersion in potential terminal wealth. That dispersion, however, is not necessarily consistent with the notion that one formation period is inherently better than another. While we would argue, ex-ante, that there should be little performance difference between a 9-month and 10-month lookback – they both, after all, capture the notion of “intermediate-term trends” – the former returned just 43.1% over the period while the latter returned 146.1%.

These total return figures further hide the year-to-year disparity that exists. The 9-month model, for example, was not a consistent loser. Below we plot these results, highlighting both the best (blue) and worst (orange) performing specifications. We see that the yearly spread between these strategies can be hundreds-to-thousands of basis points; consider that in 2010, the strategy formed using a 10-month lookback returned 12.2% while the strategy formed using a 9-month lookback returned -9.31%.

Same thesis. Same strategy. Slightly different specification. Dramatically different outcomes. That single year is likely the difference between hired and fired for most advisors and asset managers.

☞ Explore a diversified approach with the Newfound/ReSolve Robust Equity Momentum Index.

For those bemoaning their 2018 return, note that the 10-month specification would have netted a positive result! That specification turned defensive at the end of October.

Now, some may cry “foul” here. The evidence for trend and momentum is, after all, centuries in length and the efficacy of all these horizons is supported. Surely the noise we see over this ten-year period would average out over the long run, right?

The unfortunate reality is that these performance differences are not expected to mean-revert. The gambler’s fallacy would have us believe that bad luck in one year should be offset by good luck in another and vice versa. Unfortunately, this is not the case. While we would expect, at any given point in time, that each strategy has equal likelihood of experiencing good or bad luck going forward, that luck is expected to occur completely independently from what has happened in the past.

The implication is that performance differences due to model specification are not expected to mean-revert and are therefore expected to be random, but very permanent, return artifacts.¹

The larger problem at hand is that none of us have a hundred years to invest. In reality, most investors have a few decades. And we act with the temperament of having just a few years. Therefore, bad luck can have very permanent and very scarring effects not only upon our psyche, but upon our realized wealth.

But consider what happens if we try to neutralize the role of model specification risk and luck by diversifying across the seven different models equally (rebalanced annually). We see that returns closer in line with the median result, a boost to realized Sharpe ratio, and a reduction in the maximum realized drawdown.

These are impressive results given that all we employed was naïve diversification.

Conclusion

The odd thing about strategy diversification is that it guarantees we will be wrong. Each and every year, we will, by definition, allocate at least part of our capital to the worst performing strategy. The potential edge, however, is in being vaguely wrong rather than precisely wrong. The former is annoying. The latter can be catastrophic.

In this commentary we use the popular Dual Momentum GEM strategy as a case study to demonstrate how model specification choices can lead to performance differences that span hundreds, if not thousands, of basis points a year. Unfortunately, we should not expect these performance differences to mean revert. The realizations of good and bad luck are permanent, and potentially very significant, artifacts within our track records.

By simply diversifying across the different models, however, we can dramatically reduce specification risk and thereby reduce strategy fragility.

To be clear, no amount of diversification will protect you from the risk of the style. As we like to say, “risk cannot be destroyed, only transformed.” In that vein, trend following strategies will always incur some sort of whipsaw risk. The question is whether it is whipsaw related to the style as a whole or to the specific implementation.

For example, in the graphs above we can see that Dual Momentum GEM implemented with a 10-month formation period experienced whipsaw in 2011 when few of the other implementations did. This is more specification whipsaw than style whipsaw. On the other hand, we can see that almost all the specifications exhibited whipsaw in late 2015 and early 2016, an indication of style whipsaw, not specification whipsaw.

Specification risk we can attempt to control for; style risk is just something we have to bear.

At Newfound, evidence such as this informs our own trend-following mandates. We seek to diversify ourselves across the axes of what (“what are we investing in?”), how (“how are we making the decisions?”), and when (“when are we making those decisions?”) in an effort to reduce specification risk and provide the greatest style consistency possible.

Is Multi-Manager Diversification Worth It?

By Corey Hoffstein

On January 7, 2019

This post is available as a PDF download here.

Summary

Portfolio risk is traditionally quantified by volatility. The benefits of diversification are measured in how portfolio volatility is changed with the addition or subtraction of different investments.
Another measure of portfolio risk is the dispersion in terminal wealth: a measure that attempts to capture the potential difference in realized returns. For example, two equity managers that each hold 30 stock portfolios may exhibit similar volatility levels but will likely have very different realized results.
In this commentary we explore existing literature covering the potential diversification benefits that can arise from combining multiple managers together.
Empirical evidence suggests that in heterogeneous categories (e.g. many hedge fund styles), combining managers can reduce portfolio volatility. Yet even in homogenous categories (e.g. equity style boxes), combining managers can have a pronounced effect on reducing the dispersion in terminal wealth.
Finally, we address the question as to whether manager diversification leads to dilution, arguing that a combination of managers will reduce idiosyncratic process risks but maintain overall style exposure.

Introduction

In their 2014 paper The Free Lunch Effect: The Value of Decoupling Diversification and Risk, Croce, Guinn, and Robinson draw a distinction between the risk reduction effects that occur due to de-risking and those that occur due to diversification benefits.

To illustrate the distinction, the authors compare the volatility of an all equity portfolio versus a balanced stock/bond mix. In the 1984-2014 sample period, they find that the all equity portfolio has an annualized volatility of 15.25% while the balanced portfolio has an annualized volatility of just 9.56%.

Over 75% of this reduction in volatility, however, is due simply to the fact that bonds were much less volatile than stocks over the period. In fact, of the 568-basis-point reduction, only 124 basis points was due to actual diversification benefits.

Why does this matter?

Because de-risking carries none of the benefits of diversification. If there is a commensurate trade-off between expected return and risk, then all we have done is reduced the expected return of our portfolio.¹

It is only by combining assets of like volatility – and, it is assumed, like expected return – that should allow us to enjoy the free lunch of diversification.

Unfortunately, unless you are willing to apply leverage (e.g. risky parity), the reality of finding such free lunch opportunities across assets is limited. The classic example of inter-asset diversification, though, is taught in Finance 101: as we add more stocks to a portfolio, we drive the contribution of idiosyncratic volatility towards zero.

Yet volatility is only one way to measure risk. If we build a portfolio of 30 stocks and you build a portfolio of 30 stocks, the portfolios may have nearly identical levels of volatility, but we almost assuredly will end up with different realized results. This difference between the expected and the realized is captured by a measure known as terminal wealth dispersion, first introduced by Robert Radcliffe in his book Investment: Concepts, Analysis, Strategy.

This form of risk naturally arises when we select between investment managers. Two managers may both select securities from the same universe using the same investment thesis, but the realized results of their portfolios can be starkly different. In rare cases, the specific choice of one manager over another can even lead to catastrophic results.

The selection of a manager reflects not only an allocation to an asset class, but also reflects an allocation to a process. In this commentary, we ask: how much diversification benefit exists in process diversification?

The Theory Behind Manager Diversification

In Factors from Scratch, the research team at O’Shaughnessy Asset Management (OSAM), in partnership with anonymous blogger Jesse Livermore, digs into the driving elements behind value and momentum equity strategies.

They find that value stocks do tend to exhibit negative EPS growth, but this decay in fundamentals is offset by multiple expansion. In other words, markets do appear to correctly identify companies with contracting fundamentals, but they also exaggerate and over-extrapolate that weakness. The historical edge for the strategy has been that the re-rating – measured via multiple expansion – tends to overcompensate for the contraction in fundamentals.

For momentum, OSAM finds a somewhat opposite effect. The strategy correctly identifies companies with strengthening fundamentals, but during the holding period a valuation contraction occurs as the market recognizes that its outlook might have been too optimistic. Historically, however, the growth outweighed the contraction to create a net positive effect.

These are the true, underlying economic and behavioral effects that managers are trying to capture when they implement value and momentum strategies.

These are not, however, effects we can observe directly in the market; they are effects that we have to forecast. To do so, we have to utilize semi-noisy signals that we believe are correlated. Therefore, every manager’s strategy will be somewhat inefficient at capturing these effects.

For example, there are a number of quantitative measures we may apply in our attempt to identify value opportunities; e.g. price-to-book, price-to-earnings, and EBITDA-to-enterprise-value to name a few. Two different noisy signals might end up with different performance just due to randomness.

This noise between signals is further compounded when we consider all the other decisions that must be made in the portfolio construction process. Two managers may use the same signals and still end up with very different portfolios based upon how the signals are translated into allocations.

Consider this: Morningstar currently² lists 1,217 large-cap value funds in its mutual fund universe and trailing 1-year returns ranged from 1.91% to -22.90%. This is not just a case of extreme outliers, either: the spread between the 10^thand 90^thpercentile returning funds was 871 basis points.

It bears repeating that these are funds that, in theory, are all trying to achieve the same goal: large-cap value exposure.

Yet this result is not wholly surprising to us. In Separating Ingredients and Recipe in Factor Investing we demonstrated that the performance dispersion between different momentum strategy definitions (e.g. momentum measure, look-back length, rebalance frequency, weighting scheme, et cetera) was larger than the performance dispersion between the traditional Fama-French factors themselves in 90% of rolling 1-year periods. As it turns out, intra-factor differences can cause greater dispersion than inter-factor differences.

Without an ex-ante view as to the superiority of one signal, one process, or one fund versus another, it seems prudent for a portfolio to have diversified exposure to a broad range of signals that seem plausibly related to the underlying phenomenon.

Literature Review

While foundational literature on modern portfolio diversification extends back to the 1950s, little has been written in the field of manager diversification. While it is a well-established teaching that a portfolio of 25-40 stocks is typically sufficient to reduce idiosyncratic risk, there is no matching rule for how many managers to combine together.

One of the earliest articles on the topic was written by Edward O’Neal in 1997, titled How Many Mutual Funds Constitute a Diversified Mutual Fund Portfolio?

Published in the Financial Analysts Journal, this article explores risk across two different dimensions: the volatility of returns over time and the dispersion in terminal period wealth. Again, the idea behind the latter measure is that two investors with identical horizons and different investments will achieve different terminal wealth levels, even if those investments have the same volatility.

Exploring equity mutual fund returns from 1986 to 1997, the study adopts a simulation-based approach to constructing portfolios and tracking returns. Multi-manager portfolios of varying sizes are randomly constructed and compared against other multi-manager portfolios of the same size.

O’Neal finds that while combining managers has little-to-no effect on volatility (manager returns were too homogenous), it had a significant effect upon the dispersion of terminal wealth. To quote the article,

Holding more than a single mutual fund in a portfolio appears to have substantial diversification benefits. The traditional measure of volatility, the time-series standard deviation, is not greatly influenced by holding multiple funds. Measures of the dispersion in terminal-wealth levels, however, which are arguably more important to long-term investors than time-series risk measures, can be reduced significantly. The greatest portion of the reduction occurs with the addition of small numbers of funds. This reduction in terminal-period wealth dispersion is evident for all holding periods studied. Two out of three downside risk measures are also substantially reduced by including multiple funds in a portfolio. These findings are especially important for investors who use mutual funds to fund fixed-horizon investment goals, such as retirement and college savings.

Allocating to three managers instead of just one could reduce the dispersion in terminal wealth by nearly 50%, an effect found to be quite consistent across the different time horizons measured.

In 1999, O’Neal teamed up with L. Franklin Fant to publish Do You Need More than One Manager for a Given Equity Style? Adopting a similar simulation-based approach, Fant and O’Neal explored multi-manager equity portfolios in the context of the style-box framework.

And, as before, they find that taking a multi-manager approach has little effect upon portfolio volatility.

It does, however, again prove to have a significant effect on the deviation in terminal wealth.

To quote the paper,

Regardless of the style category considered, the variability in terminal wealth levels is significantly reduced by using more managers. The first few additional managers make the most difference, as terminal wealth standard deviation declines at a decreasing rate with the number of managers. Concentrating on the variability of periodic portfolio returns fails to document the advantage of using multiple managers within style categories.
Second, some categories benefit more from additional managers than others. Plan sponsors would do well to allocate relatively more managers to the styles that display the greatest diversification benefits. Growth styles and small-cap styles appear to offer the greatest potential for diversification.

In 2002, François-Serge Lhabitant and Michelle Learned pursued a similar vein of research in the realm of hedge funds in their article Hedge Fund Diversification: How Much is Enough? They employ the same simulation-based approach but evaluate diversification effects within the different hedge fund styles.

They find that while diversification does little to affect the expected return for a given style, it does appear to help reduce portfolio volatility: sometimes quite significantly so. This somewhat contradictory result to the prior research is likely due to the fact that hedge funds within a given category exhibit far more heterogeneity in process and returns than do equity managers in the same style box.

(Note that while the graphs below only show the period 1990-1993, the paper explores three time periods: 1990-1993, 1994-1997, and 1998-2001 and finds a similar conclusion in all three).

Perhaps most importantly, however, they find a rather significant reduction in risk characteristics like a portfolio’s realized maximum drawdown.

To quote the article,

We find that naively adding more funds to a portfolio tends to leave returns stable, decrease the standard deviation, and reduce downside risk. Thus, diversification should be increased as long as the marginal benefits of adding a new asset to a portfolio exceeds the marginal cost.
…
If a sample of managers is relatively style pure, then a fewer number of managers will minimize the unsystematic risk of that style. On the contrary, if the sample is really heterogeneous, increasing the number of managers may still provide important diversification benefits.

Taken together, this literature paints an important picture:

Diversifying across managers in the same category will likely do little to reduce portfolio volatility, except in the cases where categories are broad enough to capture many heterogeneous managers.
Diversifying across managers appears to significantly reduce the potential dispersion in terminal wealth.

But why is minimizing “the dispersion of terminal wealth” important? The answer is the same reason why we diversify in the first place: risk management.

The potential for high dispersion in terminal wealth means that we can have dramatically different outcomes based upon the choices we are making, placing significant emphasis on our skill in manager selection. Choosing just one manager is more right style thinking rather than our preferred less wrong.

But What About Dilution?

The number one response we hear when we talk about manager diversification is: “when we combine managers, won’t we just dilute our exposure back to the market?”

The answer, as with all things, is: “it depends.” For the sake of brevity, we’re just going to leave it with, “no.”

No?

No.

If we identify three managers as providing exposure to value, then it makes little logical sense that somehow a combination of them would suddenly remove that exposure. Subtraction through addition only works if there is a negative involved; i.e. one of the managers would have to provide anti-value exposure to offset the others.

Remember that an active manager’s portfolio can always be decomposed into two pieces: the benchmark and a dollar-neutral long/short portfolio that isolates the active over/under-weights that manager has made.

To “dilute back to the benchmark,” we’d have to identify managers and then weight them such that all of their over/under-weights net out to equal zero.

Candidly, we’d be impressed if you managed to do that. Especially if you combine managers within the same style who should all be, at least directionally, taking similar bets. The dilution that occurs is only across those bets which they disagree on and therefore reflect the idiosyncrasies of their specific process.

What a multi-manager implementation allows us to diversify is our selection risk, leading to a return profile more “in-line” with a given style or category. In fact, Lhabitant and Learned (2002) demonstrated this exact notion with a graph that plots the correlation of multi-manager portfolios with their broad category. While somewhat tautological, an increase in manager diversification leads to a return profile closer to the given style than to the idiosyncrasies of those managers.

We can also see this with a practical example. Below we take several available ETFs that implement quantitative value strategies and plot their rolling 52-week return relative to the S&P 500. We also construct a multi-manager index (“MM_IDX”) that is a naïve, equal-weight portfolio. The only wrinkle to this portfolio is that ETFs are not introduced immediately, but rather slowly over a 12-month period.³

Source: CSI Analytics. Calculations by Newfound Research. It is not possible to invest in an index. Returns are total returns (i.e. assume the reinvestment of all distributions) and are gross of all fees except for underlying expense ratios of ETFs. Past performance does not guarantee future results.

We can see that while the multi-manager blend is never the best performing strategy, it is also never the worst. Never the hero; never a zero.

It should be noted that while manager diversification may be able to reduce the idiosyncratic returns that result from process differences, it will not prevent losses (or relative underperformance) of the underlying style itself. In other words, we might avoid the full brunt of losses specific to the Sequoia Fund, but no amount of diversification would prevent the relative drag seen by the quantitative value style in general over the last decade.

We can see this in the graph above by the fact that all the lines generally tend to move together. 2015 was bad for value managers. 2016 was much better. But we can also see that every once in a while, a specific implementation will hit a rough patch that is idiosyncratic to that approach; e.g. IWD in 2017 and most of 2018.

Multi-manager diversification is the tool that allows us to avoid the full brunt of this risk.

Conclusion

Taken together, the research behind manager diversification suggests:

In heterogeneous categories (e.g. many hedge fund styles), manager diversification may reduce portfolio volatility.
In more homogenous categories (e.g. equity style boxes), manager diversification may reduce the dispersion in terminal wealth.
Multi-manager implementations appear to reduce realized portfolio risk metrics such as maximum drawdown. This is likely partially due to the reduction in portfolio volatility, but also due to a reduction in exposure to funds that exhibit catastrophic losses.
Multi-manager implementations do not necessarily “dilute” the portfolio back to market exposure, but rather “dilute” the portfolio back to the style exposure, reducing exposure idiosyncratic process risk.

For advisors and investors, this evidence may cause a sigh of relief. Instead of having to spend time trying to identify the best manager or the best process, there may be significant advantages to simply “avoiding the brain damage”⁴ and allocating equally among a few. In other words, if you don’t know which low-volatility ETF to pick, just buy a couple and move on with your life.

But what are the cons?

A multi-manager approach may be tax inefficient, as we will need to rebalance allocations back to parity between the exposures.
A multi-manager approach may lead to fund bloat within a portfolio, doubling or tripling the number of holdings we have. While this is merely optical, except possibly in small portfolios, we recognize there exists an aversion to it.
By definition, performance will be middling: the cost of avoiding the full brunt of losers is that we also give up the full benefit of winners. We’re reluctant to label this as a con, as it is arguably the whole point of diversification, but it is worth pointing out that the same behavioral biases that emerge in portfolio reviews of asset allocation will likely re-emerge in reviews of manager selection, especially over short time horizons.

For investment managers, a natural interpretation of this research is that approaches blending different signals and portfolio construction methods together should lead to more consistent outcomes. It should be no surprise, then, that asset managers adopting machine learning are finding significant advantages with ensemble techniques. After all, they invoke the low-hanging fruit of manager diversification.

Perhaps most interesting is that this research suggests that fund-of-funds really are not such bad ideas so long as costs can be kept under control. As the asset management business continues to be more competitive, perhaps there is an edge – and a better client result – found in cooperation.

Dart-Throwing Monkeys and Process Diversification

By Corey Hoffstein

On December 24, 2018

In Portfolio Construction, Risk Management, Weekly Commentary

This post is available as a PDF download here.

Summary

This week’s commentary is a short addendum to last week’s piece, attempting to serve as a (very) brief and simplified summary of process diversification.
Volatility is only one way of measuring risk; dispersion in terminal wealth is another.
Using simulations of dart-throwing monkeys, we plot the dispersion in terminal wealth for different levels of portfolio and manager diversification.
We find that increased diversification within a portfolio as well as increased diversification across managers can lead to more consistent portfolio outcomes.

Introduction

In last week’s commentary (What do portfolios and teacups have in common?), we explored at great length the potential benefits of diversification in the domains of what, how, and when.

The crux of our argument is that for investors, return dispersions across time (i.e. “volatility”) can be a potentially misleading risk characteristic and that it is important to consider the potential dispersion in terminal wealth as well.

These are by no means original or unique thoughts. Often the advisors and institutions we work with intuitively understand them: they just have not been presented with the math to justify them.

Therefore, in contrast to last week’s rather expansive note, we aim to keep this week’s note short, simple, and punchy in an effort to drive how manager / process diversification can help deliver more consistent outcomes.

Dart-Throwing Monkeys

Consider the following experiment.

We begin with thousands and thousands of dart-throwing monkeys. Every month, the monkeys throw their darts at a board that determines how they will be invested for the next month. In this hypothetical scenario, we will assume that the monkeys are investing in different industry groups.¹

Some monkeys are “concentrated managers,” throwing just a single dart and holding that pick for the next month. Other monkeys are more diversified, throwing up to 30 darts each month and equally allocating their portfolio across their investments. Portfolio sizes can be either 1, 5, 10, 15, 20, 25, or 30 equally-allocated investments.

It is our job, as an allocator, to choose different monkeys to invest with. Do we invest with just 1 concentrated monkey manager? Five different diversified managers? How much difference does it really make at the end of the day?

We learn in Finance 101 that once we diversify our portfolio sufficiently, we have eliminated nonsystematic risk. But does that mean we expect the portfolios to necessarily end up in the same place?

As an example, if we pick 10 dart-throwing monkeys who each pick 10 investments per month, how different would we expect our final wealth level to be from another allocator who picks 10 different dart-throwing monkeys who each pick 10 investments per month?

Process Diversification and Terminal Wealth Dispersion

Below we plot the dispersion in terminal wealth² as a function of (1) the number of securities picked by each monkey manager and (2) the number of monkey managers we allocate to.

As an example of how to read this graph, the orange line tells us about portfolios comprised of monkey managers who pick five investments each. As we move from left to right, we learn about the dispersion in terminal wealth based upon the number of managers we allocate to.

We can think of this two ways. First, we can think of it as potential dispersion in results among our peers who make the same type of decision (e.g. picking 5 managers who pick 5 investments each) but different specific choices (e.g. might pick different managers). Second, we can think of this as the dispersion in possible results if we were able to live across infinite universes simultaneously.

Source: Kenneth French Data Library. Calculations by Newfound Research.

Unfortunately, we cannot live across infinite universes and this graph tells us that choosing a single, highly concentrated manager can lead to wildly different outcomes depending upon the manager we select.

As the managers further diversify and we further diversify among managers, this dispersion in potential outcomes decreases.³

Conclusion

The intuition behind these results is simple:

More diversified managers are more likely to overlap in portfolio holdings with one another, and therefore are likely to have more similar returns.
Similarly, as the number of managers we choose goes up, so does the likelihood of overlap in holdings with a peer who also selects the same number of managers.

It is equally valid to interpret this analysis as saying there is greater opportunity for out-performance in taking concentrated bets in highly concentrated managers. We would argue this is more right thinking: the win condition requires both that we pick the right managers and the managers pick the right stocks. While a little bit of diversification can go a long way here in clipping outlier events, the dispersion can still far exceed a more diversified approach.

At Newfound, we prefer the less wrong approach. Allocations to a few diversified managers each taking a different approach can lead to significantly less dispersion in outcomes and, therefore, allow for better financial planning.

What do portfolios and teacups have in common?

By Corey Hoffstein

On December 17, 2018

In Portfolio Construction, Risk Management, Weekly Commentary

This post is available as a PDF download here.

Summary

Portfolio risk is often measured as the variance of returns over time. Another form of risk is the variance of terminal wealth that can arise from small variations in strategy inputs or asset returns.
Strategies or portfolios that are more sensitive to small changes in inputs are inherently “fragile.”
Fragile strategy design makes it difficult to rely upon backtests or historical results in setting forward expectations.
We explore how diversification across the “what,” “how,” and “when,” axes of portfolio construction can help reduce strategy fragility.

Introduction

At Newfound, we spend a lot less time trying to figure out how to be more right than we spend trying to figure out how to be less wrong. One area of particular interest for us is the idea of unintended bets: the exposures in a portfolio we may not even be aware of. And if we knew we had the exposure, we might not even want it.

For example, consider a portfolio that invests in either broad U.S., broad international, or broad emerging market equities based upon valuations. A significant tilt towards non-U.S. assets may be a valuation-driven decision, but for U.S. investors it creates significant exposure to fluctuations in the U.S. dollar versus foreign currencies.

Of course, exposures are not limited only to assets. Exposures may be broader macro-economic, stylistic, thematic, geographic, or even political factors.

These unintended bets can go far beyond explicit and implicit exposures. In our example, the choice of how to measure value may lead to meaningfully different portfolios, despite the same overarching thesis. For example, a naïve CAPE ratio versus adjusting for differences in relative sector composition dramatically alters the view of whether international equities are significantly cheaper than U.S. equities. These potential differences capture what we like to call “model specification risk.”

Finally, we can be subject to unintended bets based upon when the portfolio is re-evaluated and reconstituted. Evaluating valuations in January, for example, may lead to a different decision versus evaluating them in July.

How can we avoid these unintended bets? At Newfound, we believe that the answer falls back to diversification: not only in the traditional sense of what we invest in, but also across how we make decisions and when we make them.

When left uncontrolled, unintended bets can make a strategy incredibly fragile.

What, precisely, does it mean for a strategy to be fragile? A strategy is fragile when small variations of strategy inputs – be it asset returns or other measures – lead to meaningful dispersion in realized results.

Now we want to distinguish between volatility and fragility. Volatility is the dispersion of strategy returns across time, while fragility is the dispersion in end-of-period wealth across variations of the strategy.

As an example, a portfolio that invests only in the S&P 500 is very volatile but not particularly fragile. Given the last ten years of returns for the S&P 500, slight variations in annual returns would not lead to significant dispersion in end-of-period wealth. On the other hand, a strategy that flips a coin every December and invests for the next year in the S&P 500 when it lands on heads or short-term U.S. Treasuries when it lands on tails would have lower expected volatility than the S&P 500 but would be much more fragile. We need simply consider a few scenarios (e.g. all heads or all tails) to understand the potential dispersion such a strategy is subject to.

In the remainder of this commentary, we will demonstrate how diversification across the what, how, and when axes can reduce strategy fragility.

The Experiment Setup

Since a large degree of our focus at Newfound is on managing trend equity mandates, we will explore fragility through the lens of the style of measuring trends. For those unfamiliar with the approach, trend equity strategies aim to capture a significant portion of equity market growth while avoiding substantial and prolonged drawdowns through the application of trend following. A naïve implementation of such an idea would be to invest in the S&P 500 when its prior 12-month return has been positive and invest in short-term U.S. Treasuries otherwise.

To learn something about the fragility of a strategy, we are going to have to inject some randomness. After all, no amount of history will tell us about the fragility of a teacup that has spent its entire life sitting on a shelf; we will need to see it fall on the floor to actually learn something.

As with our recent commentary When Simplicity Met Fragility, we will inject randomness by adding white noise to asset returns. Specifically, we will add to daily returns a draw from a random normal distribution with mean 0% and standard deviation 0.025%. Using this slightly altered history, we will then run our investment strategy.

By performing this process a large number of times (10,000 in this commentary), we can explore how the outcome of the strategy is impacted by these slight variations in return history. The greater the dispersion in results, the more fragile the strategy is.

To demonstrate how diversification across the three different axes can affect fragility, we will start with a naïve trend equity strategy – investing in broad U.S. equities using a single trend model that is rebalanced on a monthly basis – and vary the three components in isolation.

The What

The “what” axis simply asks, “what are we invested in?”

How can our choice of “what” affect fragility? Consider a slight variation to our coin-flip strategy from before. Instead of flipping a single coin, we will now flip two coins. The first coin determines whether we invest 50% of the portfolio in either the S&P 500 or short-term U.S. Treasuries, while the second coin determines whether we invest the other 50% of the portfolio in either the Russell 1000 or short-term U.S. Treasuries.

In our single coin example, each year we expected to invest in the S&P 500 50% of the time and in short-term U.S. Treasuries 50% of the time. With two coins, we now expect to be fully invested 25% of the time, partially invested 50% of the time, and divested 25% of the time.

Let’s take this notion to further limits. Consider now flipping 100 coins where each determines the allocation decision for 1% of our portfolio, where heads leads to an investment in a large-cap U.S. equity portfolio and tails means invest in short-term U.S. Treasuries. Now being fully invested or divested is an infinitesimally small probability event; in fact, for a given year there is a 95% chance that your allocation to equities falls between 40-60%.¹

Even though we’ve applied the exact same process to each investment, diversifying across more investments has dramatically reduced the fragility of our coin-flipping strategy.

Now let’s translate this from the theoretical to the practical. We will begin with a simple trend following strategy that invests in the underlying asset when prior 12-1 month returns have been positive or invests in the risk-free rate, re-evaluating the trend at the end of each month.

To explore the impact of diversifying our what, we will implement this strategy five different ways:

A single in-or-out decision on broad U.S. equities.
Applied across 5 equally-weighted U.S. equity industry groups.
Applied across 12 equally-weighted U.S. equity industry groups.
Applied across 30 equally-weighted U.S. equity industry groups.
Applied across 48 equally-weighted U.S. equity industry groups.

The graph below plots the distribution of log difference in terminal wealth against the median outcome for each of these five approaches. Lines within each “violin” show the 25^th, 50^th, and 75^thpercentiles.

The graph clearly demonstrates that by increasing our exposure across the “what” axis, the dispersion in terminal wealth is dramatically reduced.

Source: Kenneth French Data Library. Calculations by Newfound Research.

But why is reduced dispersion in terminal wealth necessarily better?

It implies a greater consistency in outcome, which is not only important for setting forward expectations, but is also important for evaluating past performance (whether backtested or live). This evidence tells us that if we are evaluating a trend equity strategy that employs a single model to make in-or-out decisions on broad U.S. equities on a monthly basis, it will be nearly impossible to tell whether the realized results are in line with reasonable expectations or overly optimistic (we can probably guess that they aren’t overly pessimistic, as those sorts of returns typically aren’t marketed).

To justify a concentration in the “what” axis, we would have to demonstrate that the worst-case scenarios would still represent a meaningful improvement in expected terminal wealth versus a more diversified approach.

It should be noted that our experiment design prohibits dispersion from every being fully reduced, as we are injecting randomness into past returns. Even if no strategy is applied, there will be some inherent dispersion in final wealth. For example, below we plot the dispersion that occurs simply from adding randomness to past returns with a buy-and-hold approach.

Increasing the number of assets in the portfolio inherently reduces dispersion for buy-and-hold because diversification helps drive the expected impact of the injected randomness towards its mean: zero. With only one asset, on the other hand, outlier events are free to wreak havoc on results.

Source: Kenneth French Data Library. Calculations by Newfound Research.

Note that adding a strategy on top of buy-and-hold can exacerbate the fragility issue, making diversification that much more important.

The How

The “how” axis asks, “how are we making investment decisions.”

Many investors are already somewhat familiar with diversification along the “how” axis, often diversifying their active exposures across multiple managers who might have similar investment mandates but slightly different processes.

We like to call this “process diversification” and think of it as akin to the parable of the blind men and the elephant. Each blind man touches a different part of the elephant and pronounces his belief in what he is touching based upon his isolated view. The blind man touching the leg, for example, might think he is touching a sturdy tree while the blind man touching the tail might believe he is grabbing a rope.

None is correct in isolation but taken together we may gain a more well-rounded picture.

Similarly, two managers may claim to invest based upon valuations, but the manner in which they do so gives them a very different picture of where value can be found.

The idea of process diversification was explored in the 1999 paper “Do You Need More than One Manager for a Given Equity Style?” by Franklin Fant and Edward O’Neal. Fant and O’Neal found that while a multi-manager approach does very little for return variability across time (i.e. portfolio volatility), it does a lot for end-of-period wealth variability. They find this to be true across almost all equity style box categories. In other words: taking a multi-manager approach can reduce fragility.

Let us return to our prior coin flip example. Instead of making a choice to invest in the S&P 500 based upon a coin-flip, however, we will combine a number of different signals. For example, we might flip a coin, roll a die, measure the weather, and look at the second hand of a clock. Each signal gives us some sort of in-or-out decision, and we average these decisions together to get our allocation. As with before, as we incorporate more signals, we decrease the probability that we end up with extreme allocations, leading to a more consistent terminal wealth distribution.

Again, we should stress here that the objective is not just outright elimination of dispersion in terminal wealth. After all, if that were our sole pursuit, we could simply stuff our money under our mattress. Rather, assuming we will be implementing some active investment strategy that we hope has a positive long-term expected return, our aim should be to reduce the dispersion in terminal wealth for that strategy.

Of course, in investing we would not expect the processes to be entirely independent. With trend following, for example, most popular models are actually mathematically linked to one another, and therefore generate signals that are highly correlated. Nevertheless, even modest diversification can have meaningful benefits with respect to strategy fragility.

To explore the impact of diversification along the how axis, we implement our trend following strategy six different ways. Each invests in broad U.S. equities and rebalances monthly but differs in the number of trend-following models employed.²

The results are plotted below.

Source: Kenneth French Data Library. Calculations by Newfound Research.

Again, we can see that increased diversification across the how axis dramatically reduces dispersion in terminal wealth. Our takeaway is largely the same: without an ex-ante view as to which particular model (or group of models) is best (i.e. a view of how to be more right), diversification can lead to greater consistency in results. We will be less wrong.

A subtler conclusion of this analysis is that it should be very, very difficult to necessarily conclude that one model is better than another. We can see that if we risk selecting just one model to govern our process, seemingly minor variations in historical returns leads can lead to dramatically different terminal wealth results, as evidenced by the bulging distribution. Inverting this line of thinking, we should also be suspect of any backtest that seeks to demonstrate the superiority of a given model using a single backtest. For example, just because a 12-1 month total return model performs better than a 10-month moving average model on historical S&P 500 returns, we should be highly skeptical as to the robustness of the conclusion that the 12-1 model is best.

The When

Then “when” axis asks, “when are we making our investment decision?”

This is an oft overlooked question in public markets, but it is commonly addressed in the world of private equity and venture capital. Due to the illiquid nature of those markets, investors will often attempt to diversify their business cycle risk by establishing positions in multiple funds over time, giving them exposure to different “vintages.” The idea here is simple: the opportunity set available at different points in time can vary and if we allocate all of our earmarked capital to a particular year, we may miss out on later opportunities.

Consider our original coin-flipping example where we flipped a single coin every December to determine whether we would buy the S&P 500 or hold our capital in short-term Treasuries. But why was it necessary that we make the decision in December? Why not July? Or January? Or September?

While we would not expect there to be point-in-time risk for coin flipping, we can still consider the net effect of a vintage-based allocation methodology. Here we will assume that we flip a coin each month and rebalance 1/12^thof our capital based upon the result.

Again, the probability of allocating to the extremes (100% invested or 100% divested) is dramatically reduced (each has approximately a 0.02% chance of occurring) and we reduce strategy fragility to any specific coin flip.

But just how impactful is this notion? Below we plot the rolling 1-year total return difference between two 60% S&P 500 / 40% 5-year U.S. Treasury fixed-mix portfolios, with one being rebalanced in February and one in August. Even for this highly simplified example, we can see that the total return spread between the two portfolios blows out to over 700 basis points in March 2010 due to the fact that the February portfolio rebalanced back into equities at nearly the exact bottom of the crisis.

Source: Global Financial Data. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

To increase diversification across the “when” axis, we want to increase the number of vintages we deploy. For our trend following example, we will assume that the portfolio allocates between broad U.S. equities and the risk-free rate based upon a single model, but with an increasing number of evenly-spaced vintages. Again, we will run 10,000 simulations that each slightly perturb historical U.S. equity market returns and compare the terminal wealth variation for approaches that employ a different number of vintages.

We can see in the graph below that, as with the other axes of diversification, as we increase the number of vintages employed, the variance decreases. While the 25^thand 75^thpercentiles do not decrease as dramatically as for the other axes, we can see that the extreme variations are reined in substantially when we move from 1 monthly tranche to 4 weekly tranches.

Source: Kenneth French Data Library. Calculations by Newfound Research.

Conclusion

We see two critical conclusions from this analysis:

To develop confidence in achieving our objective we have to consider our sensitivity to unintended bets that may be included within the portfolio.

Fragility makes it incredibly difficult to distinguish between luck and skill, particularly as strategy fragility increases. This is true for both backtested and live performance.

To conclude our analysis, below we present a graph that combines diversification across all three axes. We again run 10,000 samples, randomly perturbing returns. For each sample, we then run four variations:

A single, randomly selected model run in broad U.S. equities that is rebalanced monthly.
A random selection of 3 models run on 5 industry groups in 2 bi-weekly tranches.
A random selection of 6 models run on 12 industry groups in 4 weekly tranches.
A random selection of 9 models run on 30 industry groups in 20 daily tranches.

It should come as no surprise that as we increase the amount of diversification across all three axes, the dispersion in terminal wealth is dramatically reduced.³

Source: Kenneth French Data Library. Calculations by Newfound Research.

It is also important to note that while our analysis focused on trend following strategies, this same line of thinking applies across all investment approaches. As an example, consider a quantitative value manager who buys the top five cheapest stocks, as measured by price-to-book, in the S&P 500 each December and then holds them for the next year. Questions worth pondering are:

What does it say about our conviction when the 6^thstock in the list is incredibly close to the 5^thstock?
What happens if some of our measures of book value are incorrect (or even just outdated)?
How different would the portfolio look if we ranked on another value measure (e.g. price-to-earnings)?
How different would the opportunity set be if we ranked every June versus every December?

While low levels of diversification across the what, how, and when axes are not necessarily an indicator that a model is inherently fragile, it should be a red flag that more effort is required to disprove that it is not fragile.

Flirting with Models

The Research Library of Newfound Research

Drawdowns and Portfolio Longevity

Summary

Introduction

Drawdowns and the Risk of Ruin

Conclusion

Fragility Case Study: Dual Momentum GEM

Summary

Introduction

Global Equities Momentum (“GEM”)

Conclusion

Is Multi-Manager Diversification Worth It?

Summary

Introduction

The Theory Behind Manager Diversification

Literature Review

But What About Dilution?

Conclusion

Dart-Throwing Monkeys and Process Diversification

Summary

Introduction

Dart-Throwing Monkeys

Process Diversification and Terminal Wealth Dispersion

Conclusion

What do portfolios and teacups have in common?

Summary

Introduction

The Experiment Setup

The What

The How

The When

Conclusion

Drawdowns and Portfolio Longevity

Summary­

Introduction

Drawdowns and the Risk of Ruin

Conclusion

Fragility Case Study: Dual Momentum GEM

Summary­

Introduction

Global Equities Momentum (“GEM”)

Conclusion

Is Multi-Manager Diversification Worth It?

Summary­

Introduction

The Theory Behind Manager Diversification

Literature Review

But What About Dilution?

Conclusion

Dart-Throwing Monkeys and Process Diversification

Summary­

Introduction

Dart-Throwing Monkeys

Process Diversification and Terminal Wealth Dispersion

Conclusion

What do portfolios and teacups have in common?

Summary­

Introduction

The Experiment Setup

The What

The How

The When

Conclusion

Summary

Summary

Summary

Summary

Summary