This post is available as a PDF download here.
Summary
- While retirement planning is often performed with Monte Carlo simulations, investors only experience a single path.
- Large or prolonged drawdowns early in retirement can have a significant impact upon the probability of success.
- We explore this idea by simulation returns of a 60/40 portfolio and measuring the probability of portfolio failure based upon a quantitative measure of risk called the Ulcer Index.
- We find that a high Ulcer Index reading early in an investor’s retirement can dramatically increase the probability of failure as well as decrease the expected longevity of a portfolio.
Introduction
At Newfound we often say, “while other asset managers focus on alpha, our first focus is on risk.”
Not that there is anything wrong with the pursuit of alpha. We’d argue that the pursuit of alpha is actually a necessary component for well-functioning financial markets.
It’s simply that we have never met a financial advisor who has built a financial plan that assumed any sort of alpha. Alpha is great if we can harvest it, but the empirical evidence suggesting how difficult that can be (both for the manager net-of-fees as well as the investor behaviorally) would make the presumption of achieving alpha rather bold.
Furthermore, alpha is a zero-sum game: we can’t all plan for it.
Risk, however, is a crucial element of every investor’s plan. Bearing too little risk can lead to a portfolio that “fails slowly,” falling short of achieving the escape velocity required to outpace inflation. Bearing too much risk, however, can lead to sudden and catastrophic ruin: a case of “failing fast.”
When investors hit retirement, the usual portfolio math changes. While we’re taught in Finance 101 that the order of returns does not matter, the introduction of portfolio withdrawals makes the order of returns a large determinant of plan success. This phenomenon is known as “sequence risk” and it peaks in the years just before and after retirement.
Typically, we look at returns through the lens of the investment. In retirement, however, what really matters is the returns of the investor.
We’re often told that our primitive brain, trained on the African veldt, is unsuited for investing. Yet our brain seems to understand quite well that we do not get to live our lives as the average of a Monte Carlo simulation.
If we lose our arm to a lion because we did not flee when we heard a rustle in the bushes, we do not end up with half of an arm because of all the other parallel universes where we did flee. On the timeline we live, the situation is binary.
As investors, the same is true. We live but a single path and there are very real, very permanent knock-out conditions we need to be aware of. Prolonged and significant drawdowns during the first years of retirement rank among the most dangerous.
Drawdowns and the Risk of Ruin
A retirement plan typically establishes a safe withdrawal rate. This is the amount of inflation-adjusted money an investor can withdraw from their portfolio every year and still retain a sufficiently high probability that they will not run out of money before they die.
A well-established (albeit controversial) rule is that 4% of an investor’s portfolio level at retirement is usually an appropriate withdrawal amount. For example, if an investor retires with a $1,000,000 portfolio, they can theoretically safely withdraw $40,000 a year. Another way to think of this is that the portfolio reflects 25 years of spending assuming growth matches inflation.
The problem with portfolio drawdowns is that the withdrawal rate now reflects a larger proportion of capital unless it is commensurately adjusted downward. For example, if the portfolio falls to $700,000, a $40,000 withdrawal is now 5.7% of capital and the portfolio reflects just 17.5 years of spending units.
Even shallow, prolonged drawdowns can have a damaging effect. If the portfolio falls to $900,000 and stays stagnant for the next five years, the $40,000 withdrawals grow from representing 4% of the portfolio to nearly 5.5% of the portfolio. If we do not adjust the withdrawal, at five years into retirement we have gone from 25 spending units to 18.5, losing a year and a half of portfolio longevity.
As sudden and steep drawdowns can be just as damaging as shallow and prolonged ones, we prefer to use a quantitative measure known as the Ulcer Index to measure this risk. Specifically, the Ulcer Index is calculated as the root mean square of monthly drawdowns, capturing both severity and duration simultaneously.
In an effort to demonstrate the damaging impact of drawdowns early in retirement, we will run the following experiment:
- Generate 250,000 simulations, each block-bootstrapped from monthly real U.S. equity and real U.S. 5-year Treasury bond returns from 1918 – 2018.
- Assume a 65 year old investor with a $1,000,000 starting portfolio and a fixed real $3,333 withdrawal monthly ($40,000 annual).
- Assume the investor holds a 60/40 portfolio at all times.
- For each simulation:
- Calculate the Ulcer Index of the first five years of portfolio returns (ignoring withdrawals).
- Determine how many years until the portfolio runs out of money.
Based upon this data, below we plot the probability of failure – i.e. the probability we run out of money before we die – given an assumed age of death as well as the Ulcer Index realized by the portfolio in the first five years of retirement.
As an example of how to read this graph, consider the darkest blue line in the middle of the graph, which reflects an assumed age of death of 84. Along the x-axis are different bins of Ulcer Index levels, with lower numbers reflecting fewer and less severe drawdowns, while higher numbers reflect steeper and more frequent ones.
As we trace the line, we can see that the probability of failure – i.e. running out of money before death – increases dramatically as the Ulcer Index increases. While for shallow and infrequent drawdowns the probability of failure is <5%, we can see that the probability approaches 50% for more severe, frequent losses.
Beyond the binary question of failure, it is also important to consider when a portfolio runs out of money relative to when we die. Below we plot how many years prior to death a portfolio runs out of money, on average, based upon the Ulcer Index.
Once again using the darkest blue line as an example, we can see that for most minor-to-moderate Ulcer Index levels, the portfolio would only run out of money a year or two before we die in the case of failure. For more extreme losses, however, the portfolio can run out of money a full decade before we kick the bucket.
It is worth stressing here that these Ulcer Index readings are derived using simulations based upon prior realized U.S. equity and fixed income returns. In other words, while improbable (see the histogram below), extreme readings are not impossible.
It is worth further acknowledging that U.S. assets have experienced some of the highest realized risk premia in the world, and more conservative estimates may put a higher probability mass on more extreme Ulcer Index readings.
Conclusion
For early retirees, large or prolonged drawdowns early in retirement can have a significant impact on the probability of success.
In this commentary, we capture both the depth and duration of drawdowns using a single metric known as the Ulcer Index. We simulate 250,000 possible return paths for a 60/40 portfolio and calculate the Ulcer Index in the first five years of returns. We then plot the probability of failure as well as expected portfolio longevity conditional upon the Ulcer Index level realized.
We clearly see a positive relationship between failure and Ulcer Index, with larger and more prolonged drawdowns earlier in retirement leading to a higher probability of failure. This phenomenon is precisely why investors tend to de-risk their portfolios over time.
While the right risk profile and a well-diversified portfolio make for a strong foundation, we believe that investors should also consider expanding their investment palette to include alternative assets and style premia that may be more defensive oriented in nature. For example, defensive equities (e.g. low-volatility and quality approaches) have historically demonstrated an ability to reduce drawdown risk. Diversified, multi-asset style premia also tend to exhibit low correlation to traditional risk factors and a low intrinsic style premia.
Here at Newfound, we focus on trend equity strategies, which seek to overlay trend-following approaches on top of equity exposures in an effort to reduce left-tail risk and create a higher quality of return profile.
However, an investor chooses to build their portfolio, however, it should be risk that is on the forefront of their mind.
Measuring the Benefit of Diversification
By Nathan Faber
On November 5, 2018
In Portfolio Construction, Risk Management, Weekly Commentary
This post is available as a PDF download here.
Summary
Introduction
Diversification is a standard risk management tool in any portfolio. Reducing the impact of idiosyncratic risks in individual investments by holding a suite of stocks, asset classes, strategies, etc. produces a smoother investment ride most of the time and reduces the risk of negative surprises.
But in a world where we only experience one outcome out of the multitude of possibilities, gauging the benefit of diversification is difficult. It is even hard to do in hindsight, not so much because we can’t but more often that we won’t. The results already happened.
Over a single time period with no rebalancing, a diversified portfolio will underperform the best asset that it holds. This is a mathematical fact when there is any dispersion in the returns of the assets and it is why we have said that diversification will always disappoint. Our natural behavioral tendencies can often get the better of us, despite the fact that diversification might be doing a great job, especially when examined through the appropriate lens and measured in the context of what could have happened.
Last summer, we published a presentation entitled Building an Unconstrained Sleeve. In it, we looked at ways to combine traditional and non-traditional assets and strategies to target specific objectives: equity hedging, absolute return, and equity-like with downside management.
Now that we have 15 months of subsequent data for all the underlying strategies, we want to revisit that piece and explore the benefit of diversification in the context of hindsight.
A Recap of the Process
As a quick refresher, we included seven strategies and asset classes in the construction of our unconstrained sleeves:
While these strategies are surely not exhaustive, they cover a range of factors (value, momentum, low volatility, etc.) and a global set of asset classes (equities, bonds, commodities, and currencies) commonly included in unconstrained sleeves. They were also selected because many of these strategies are conveniently packaged as ETFs or mutual funds, making the resulting sleeves more easily implementable.
Source: St. Louis Federal Reserve, MSCI, Salient, HFRI, CSI Analytics. Calculations by Newfound Research. It is not possible to invest in an index. Past performance does not guarantee future results. Index returns are total returns and are gross of all fees.
Over the 15 months, world equity was by far the best performer and the spread between best-performing and worst-performing positions exceeded 20 percentage points. If you wanted high returns – and going back to our statement about how diversification will always disappoint – you could have just held world equities and been quite content.
But putting ourselves back in June 2017, we did not know a priori that simply holding equities would have generated the highest returns. Looking at this type of chart in November 2008 would have led to a very different emotional conclusion.
The aim of our original study was to develop unconstrained sleeves that would meet their objectives regardless of how the future played out. Therefore, we employed a simulation-based method that aimed to preserve some of the unique correlation structure between the strategies across different market environments and reduce the risk of overfitting to a single realization of history. With this approach, we constructed portfolios that targeted three different objectives that investors might be interested in:
(Note: Greater detail about portfolio construction process, strategy descriptions, and performance attributes of each strategy can be found in our original presentation.)
But were our constructed portfolios successful in achieving their objectives out-of-sample? To analyze this question, as well as explore the benefits/detractors of diversification for each objective, we will calculate the distribution of what could have happened. The hope is that, each strategy would perform well relative to all other possible portfolios that could have been chosen for the sleeve.
Saying exactly what portfolios we could have chosen is where a little art comes into play. For example, in the equity-like strategies, it is difficult to say that a 100% bond portfolio would have ever been a viable option and therefore may not be an apt out-of-sample comparison.
However, since our original process did not have any specific override for these intuitive constraints, and since we do not wish to assert after-the-fact which portfolios would have been rejected, we will allow the entire potential allocation space to be fair game in our comparison.
There are a number of ways to sample the set of allocations over the 7 asset classes that could have formed the portfolios for each sleeve. Perhaps the most obvious choice would be to sample uniformly over the possible allocations. The issue to balance in this case is coverage of the space (a 6-dimensional simplex) with the number of samples. To be 95% confident that we sampled an allocation above 95% for only a single asset class would require nearly 200 million samples. We have used modified Sobol sequences in the past to ensure coverage of more of the space with fewer points. However, in the current case, to mimic the rounding that is often found in portfolio allocations, we will use a lattice of points spaced 2.5% apart covering the entire space. This requires just under 10 million points in the simulations.
Equity Hedge
This sleeve was designed to offset significant equity losses by limiting downside capture. The resulting optimized portfolio was relatively concentrated in two main positions that historically have exhibited low-to-negative correlations to equities and exhibited potential crisis alpha during significant and prolonged drawdowns.Source: St. Louis Federal Reserve, MSCI, Salient, HFRI, CSI Analytics. Calculations by Newfound Research.
The down capture this portfolio during the out-of-sample period was 0.44. This result falls in the 70th percentile (that is, better than 70% of the other sample portfolios and where lower down-capture is better) when compared to the 10 million possible other portfolios we could have originally selected. Not surprisingly, the 100% intermediate-term Treasury portfolio had the best down capture (-0.05) over the out-of-sample. Of the portfolios with better down capture, Intermediate Treasuries and Macro – Income were generally the highest allocations.
This does not come as much of a surprise to anyone who has followed the managed futures space for the last 15 months. The category largely remains in a multi-year drawdown (peaking in early 2014), but it has also done little to offset the rapid sell-offs seen in equities in 2018. Therefore, with the full benefit of hindsight, any allocation to Macro – Trend in the original portfolio would be a detriment realizing our out-of-sample objective.
Yet even with this lackluster performance, an out-of-sample realized 70th percentile result over a short, 15-month horizon is a result to be pleased with.
Source: St. Louis Federal Reserve, MSCI, Salient, HFRI, CSI Analytics. Calculations by Newfound Research. It is not possible to invest in an index. Past performance does not guarantee future results. Index returns are total returns and are gross of all fees.
Absolute Return
This sleeve was designed to seek a stable and consistent return stream in all market environments. We aimed to accomplish this by utilizing a risk parity approach. As expected, this sleeve holds all asset classes and is very well diversified across them.
Source: St. Louis Federal Reserve, MSCI, Salient, HFRI, CSI Analytics. Calculations by Newfound Research.
To measure the success of the risk parity over the live period, we will look at the Gini coefficient for each of the ten million potential portfolios we could have initially selected. The Gini coefficient quantifies the equality of the distribution, with a value of 1 representing 100% concentration and 0 representing perfect equality.
The Gini coefficient of the actual portfolio was 0.25 which was in the 99.8th percentile of possible outcomes (i.e. highly diversified on a relative basis). Here, the percentile estimate is padded by the fact that many of the simulated portfolios (e.g. the 100% ones) would clearly not be close to equal risk contribution.
Source: St. Louis Federal Reserve, MSCI, Salient, HFRI, CSI Analytics. Calculations by Newfound Research. It is not possible to invest in an index. Past performance does not guarantee future results. Index returns are total returns and are gross of all fees.
Did our original portfolio achieve its out-of-sample goal? Here, we can evaluate success as to whether the realized contribution to risk of each exposure was close to equivalent; i.e. did we actually achieve risk parity as desired? We can see below that indeed we did, with the main exception of Macro – Trend, which was the most volatile asset class over the period.
Source: St. Louis Federal Reserve, MSCI, Salient, HFRI, CSI Analytics. Calculations by Newfound Research.
Over the sample space of potential portfolios, the portfolio with the minimum out-of-sample Gini coefficient (0.08) was tilted toward the less volatile and more diversifying asset classes (Intermediate Treasuries and Macro – Income). Even so, due to the limited granularity of the sampled portfolios, the risk contribution of Macro – Income was still half of that for each of the other strategies.
It is also worth noting how similar this solution is – generated with the complete benefit of hindsight – to our originally constructed portfolio.
Source: St. Louis Federal Reserve, MSCI, Salient, HFRI, CSI Analytics. Calculations by Newfound Research.
Equity-like with Downside Management
This sleeve was designed in an effort to capture equity market growth while managing the risk of severe and prolonged drawdowns. It was tilted toward the equity-like exposures with a split among risk management styles (trend, minimum volatility, macro strategies, etc.). The allocation to U.S. Treasuries is very small.
Source: St. Louis Federal Reserve, MSCI, Salient, HFRI, CSI Analytics. Calculations by Newfound Research.
For this portfolio, we have two variables to analyze: the up capture relative to global equities and the Ulcer index, a measure of the severity and duration of drawdowns. In the construction of the sleeve, the target was to keep the Ulcer index less than 25% of the value for global equities. The joint distribution of these quantities over the live period is shown below with the actual values over the live period for the sleeve indicated.
Source: St. Louis Federal Reserve, MSCI, Salient, HFRI, CSI Analytics. Calculations by Newfound Research. It is not possible to invest in an index. Past performance does not guarantee future results. Index returns are total returns and are gross of all fees.
The realized Ulcer level was 68% of that of world equity – a far cry from the 25% that the portfolio was optimized for – and was in the 42nd percentile while the up capture of 0.60 was in the 93rd percentile.
With the explicit goal of achieving a relative Ulcer level, a comparison against the entire potential allocation space of 10 million portfolios is not appropriate. Therefore, we reduce the set of 10 million comparative portfolios to only those that would have given a relative Ulcer index less than 25% compared to world equities, eliminating approximately 40% of possible portfolios.
The distributions of allocations to each of the strategies in the acceptable subset are shown below. We can see that the more diversifying strategies take on a larger range of allocations.
Source: St. Louis Federal Reserve, MSCI, Salient, HFRI, CSI Analytics. Calculations by Newfound Research. It is not possible to invest in an index. Past performance does not guarantee future results. Index returns are total returns and are gross of all fees.
Interestingly, looking only over this subset of the original 10 million portfolios improves the out-of-sample up capture of our originally constructed portfolio to the 99th percentile but does not change the percentile of the Ulcer index over the live period. Why is this?
The correlation of the relative Ulcer index over the live period with that over the historical period is only 0.1, indicating that the out of sample data did not line up with our expectations at first glance. However, this makes sense when we recall that the optimization was carried out using data from much more extreme market environments (think 2001 and 2008). It is a good reminder that, just because you optimize for a certain parameter value does not mean you will get it over the live data.
Higher up-capture typically goes hand-in-hand with a higher Ulcer index, as higher return often requires bearing more risk. Therefore, one way to standardize our measures across the potential set of portfolios is to calculate the ratio of up-capture to the Ulcer index. With this transformation, the risk-adjusted up capture falls in the 87th percentile over the set of sample allocations, indicating a very high realized risk-adjusted return.
Source: St. Louis Federal Reserve, MSCI, Salient, HFRI, CSI Analytics. Calculations by Newfound Research. It is not possible to invest in an index. Past performance does not guarantee future results. Index returns are total returns and are gross of all fees.
Conclusion
We only experience one path of the world and do not know the infinite alternate course history could have taken. But it is exactly this infinitude of alternate states that diversification is meant to address.
Diversification generally has no apparent benefit unless we envision what could have happened. Unfortunately our innate natures make this difficult. We do not often value our realized path in this context. After all, none of these alternate states actually happened, so it is difficult to picture what we did not experience.
A quantitative approach can yield a systematic way to evaluate the benefit (or detriment) of diversification. This way, we are not relying as much on intuition – how did our performance feel? – and are looking through a more objective lens at our initial decisions.
In the examples using the Unconstrained Sleeves, diversification focused on more than just returns. The objectives that initially went in to the portfolio construction were the parameters of interest.
Taking a systematic approach does not fully remove the art of the analysis, as was evident in the construction of the potential sample of portfolios used in the comparisons, but having a process can remove some of the behavioral biases that make sticking with a portfolio difficult in the first place.