This post is available as a PDF download here.
Summary
- The Perfect Withdrawal Rate (PWR) is the rate of regular portfolio withdrawals that leads to a zero balance over a given time frame.
- 4% is the commonly accepted lower bound for safe withdrawal rates, but this is only based on one realization of history and the actual risk investors take on by using this number may be uncertain.
- Using simulation techniques, we aim to explore how different assumptions match the historical experience of retirement portfolios.
- We find that simple assumptions commonly used in financial planning Monte Carlo simulations do not seem to reflect as much variation as we have seen in the historical PWR.
- Including more stress testing and utilizing richer simulation methods may be necessary to successfully gauge that risk in a proposed PWR, especially as it pertains to the risk of failure in the financial plan.
Financial planning for retirement is a combination of art and science. The problem is highly multidimensional, requiring estimates of cash flows, investment returns and risk, taxation, life events, and behavioral effects. Reduction along the dimensions can simplify the analysis, but introduces consequences in the applicability and interpretation of the results. This is especially true for investors who are close to the line between success and failure.
One of the primary simplifying assumptions is the 4% rule. This heuristic was derived using worst-case historical data for portfolio withdrawals under a set of assumptions, such as constant inflation adjusted withdrawals, a fixed mix of stock and bonds, and a set time horizon.
Below we construct a monthly-rebalanced, fixed-mix 60/40 portfolio using the S&P 500 index for U.S. equities and the Dow Jones Corporate Bond index for U.S. bonds. Using historical data from 12/31/1940 through 12/31/2018, we can evaluate the margin for error the 4% rule has historically provided and how much opportunity for higher withdrawal rates was sacrificed in “better” market environments.
Source: Global Financial Data and Shiller Data Library. Calculations by Newfound Research. Returns are backtested and hypothetical. Past performance is not a guarantee of future results. Returns are gross of all fees. Returns assume the reinvestment of all distributions. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
But the history is only a single realization of the world. Risk is hard to gauge.
Perfect Withdrawal Rates
The formula (in plain English) for the perfect withdrawal rate (“PWR”) in a portfolio, assuming an ending value of zero, is relatively simple since it is just a function of portfolio returns:
The portfolio value in the numerator is the final value of the portfolio over the entire period, assuming no withdrawals. The sequence risk in the denominator is a term that accounts for both the order and magnitude of the returns.
Larger negative returns earlier on in the period increase the sequence risk term and therefore reduce the PWR.
From a calculation perspective, the final portfolio value in the equation is typically described (e.g. when using Monte Carlo techniques) as a log-normal random variable, i.e. the log-returns of the portfolio are assumed to be normally distributed. This type of random variable lends itself well to analytic solutions that do not require numerical simulations.
The sequence risk term, however, is not so friendly to closed-form methods. The path-dependent, additive structure of returns within the sequence risk term means that we must rely on numerical simulations.
To get a feel for some features of this equation, we can look at the PWR in the context of the historical portfolio return and volatility.
Source: Global Financial Data and Shiller Data Library. Calculations by Newfound Research. Returns are backtested and hypothetical. Past performance is not a guarantee of future results. Returns are gross of all fees. Returns assume the reinvestment of all distributions. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
The relationship is difficult to pin down.
As we saw in the equation shown before, the –annualized return of the portfolio– does appear to impact the –PWR– (correlation of 0.51), but there are periods (e.g. those starting in the 1940s) that had higher PWRs with lower returns than in the 1960s. Therefore, investors beginning withdrawals in the 1960s must have had higher sequence risk.
Correlation between –annualized volatility– and –PWR– was slightly negative (-0.35).
The Risk in Withdrawal Rates
Since our goal is to assess the risk in the historical PWR with a focus on the sequence risk, we will use the technique of Brownian Bridges to match the return of all simulation paths to the historical return of the 60/40 portfolio over rolling 30-year periods. We will use the historical full-period volatility of the portfolio over the period for the simulation.
This is essentially a conditional PWR risk based on assuming we know the full-period return of the path beforehand.
To more explicitly describe the process, consider a given 30-year period. We begin by computing the full-period annualized return and volatility of the 60/40 portfolio over that period. We will then generate 10,000 simulations over this 30-year period but using the Brownian Bridge technique to ensure that all of the simulations have the exact same full-period annualized return and intrinsic volatility. In essence, this approach allows us to vary the path of portfolio returns without altering the final return. As PWR is a path-dependent metric, we should gain insight into the distribution of PWRs.
The percentile bands for the simulations using this method are shown below with the actual PWR in each period overlaid.
Source: Global Financial Data and Shiller Data Library. Calculations by Newfound Research. Returns are backtested and hypothetical. Past performance is not a guarantee of future results. Returns are gross of all fees. Returns assume the reinvestment of all distributions. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
From this chart, we see two items of note: The percentile bands in the distribution roughly track the historical return over each of the periods, and the actual PWR fluctuates into the left and right tails of the distribution rather frequently. Below we plot where the actual PWR actually falls within the simulated PWR distribution.
Source: Global Financial Data and Shiller Data Library. Calculations by Newfound Research. Returns are backtested and hypothetical. Past performance is not a guarantee of future results. Returns are gross of all fees. Returns assume the reinvestment of all distributions. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
The actual PWR is below the 5th percentile 12% of the time, below the 1st percentile 4% of the time, above the 95th percentile 11% of the time, and above the 99th percentile 7% of the time. Had our model been more well calibrated, we would expect the percentiles to align; e.g. the PWR should be below the 5th percentile 5% of the time and above the 99th percentile 1% of the time.
This seems odd until we realize that our model for the portfolio returns was likely too simplistic. We are assuming Geometric Brownian Motion for the returns. And while we are fixing the return over the entire simulation path to match that of the actual portfolio, the path to get there is assumed to have constant volatility and independent returns from one month to the next.
In reality, returns do not always follow these rules. For example, the skew of the monthly returns over the entire history is -0.36 and the excess kurtosis is 1.30. This tendency toward larger magnitude returns and returns that are skewed to the left can obscure some of the risk that is inherent in the PWRs.
Additionally, returns are not totally independent. While this is good for trend following strategies, it can lead to an understatement of risk as we explored in our previous commentary on Accounting for Autocorrelation in Assessing Drawdown Risk.
Over the full period, monthly returns of lags 1, 4, and 5 exhibit autocorrelation that is significant at the 95% confidence level.
Source: Global Financial Data and Shiller Data Library. Calculations by Newfound Research. Returns are backtested and hypothetical. Past performance is not a guarantee of future results. Returns are gross of all fees. Returns assume the reinvestment of all distributions. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
To incorporate some of these effects in our simulations, we must move beyond the simplistic assumption of normally distributed returns.
First, we will fit a skewed normal distribution to the rolling historical data and use that to draw our random variables for each period. This is essentially what was done in the previous section for the normally distributed returns.
Then, to account for some autocorrelation, we will use the same adjustment to volatility as we used in the previously reference commentary on autocorrelation risk. For positive autocorrelations (which we saw in the previous graphs), this results in a higher volatility for the simulations (typically around 10% – 25% higher).
The two graphs below show the same analysis as before under this modified framework.
Source: Global Financial Data and Shiller Data Library. Calculations by Newfound Research. Returns are backtested and hypothetical. Past performance is not a guarantee of future results. Returns are gross of all fees. Returns assume the reinvestment of all distributions. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
The historical PWR now fall more within the bounds of our simulated results.
Additionally, the 5th percentile band now shows that there were periods where a 4% withdrawal rule may not have made the cut.
Source: Global Financial Data and Shiller Data Library. Calculations by Newfound Research. Returns are backtested and hypothetical. Past performance is not a guarantee of future results. Returns are gross of all fees. Returns assume the reinvestment of all distributions. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
Conclusion
Heuristics can be a great way to distill complex data into actionable insights, and the perfect withdrawal rate in retirement portfolios is no exception.
The 4% rule is a classic example where we may not be aware of the risk in using it. It is the commonly accepted lower bound for safe withdrawal rates, but this is only based on one realization of history.
The actual risk investors take on by using this number may be uncertain.
Using simulation techniques, we explored how different assumptions match the historical experience of retirement portfolios.
The simple assumptions (expected return and volatility) commonly used in financial planning Monte Carlo simulations do not seem to reflect as much variation as we have seen in the historical PWR. Therefore, relying on these assumptions can be risky for investors who are close to the “go-no-go” point; they do not have much room for failure and will be more likely to have to make cash flow adjustments in retirement.
Utilizing richer simulation methods (e.g. accounting for negative skew and autocorrelation like we did here or using a downside shocking method like we explored in A Shock to the Covariance System) may be necessary to successfully gauge that risk in a proposed PWR, especially as it pertains to the risk of failure in the financial plan.
Having a number to base planning calculations on makes life easier in the moment, but knowing the risk in using that number makes life easier going forward.
Style Surfing the Business Cycle
By Corey Hoffstein
On April 29, 2019
In Risk & Style Premia, Weekly Commentary
This post is available as a PDF download here.
Summary
Just as soon as the market began to meaningfully adopt factor investing, someone had to go and ask, “yeah, but can they be timed?” After all, while the potential opportunity to harvest excess returns is great, who wants to live through a decade of relative drawdowns like we’re seeing with the value factor?
And thus the great valuation-spread factor timing debates of 2017 were born and from the ensuing chaos emerged new, dynamic factor rotation products.
There is no shortage of ways to test factor rotation: valuation-spreads, momentum, and mean-reversion to name a few. We have even found mild success using momentum and mean reversion, though we ultimately question whether the post-cost headache is worth the potential benefit above a well-diversified portfolio.
Another potential idea is to time factor exposure based upon the state of the economic or business cycle.
It is easy to construct a narrative for this approach. For example, it sounds logical that you might want to hold higher quality, defensive stocks during a recession to take advantage of the market’s flight-to-safety. On the other hand, it may make sense to overweight value during a recovery to exploit larger mispricings that might have occurred during the contraction period.
An easy counter-example, however, is the performance of value during the last two recessions. During the dot-com fall-out, cheap out-performed expensive by a wide margin. This fit a wonderful narrative of value as a defensive style of investing, as we are buying assets at a discount to intrinsic value and therefore establishing a margin of safety.
Of course, we need only look towards 2008 to see a very different scenario. From peak to trough, AQR’s HML Devil factor had a drawdown of nearly 40% during that crisis.
Two recessions with two very different outcomes for a single factor. But perhaps there is still hope for this approach if we diversify across enough factors and apply it over the long run.
The problem we face with business cycle style timing is really two-fold. First, we have to be able to identify the factors that will do well in a given market environment. Equally important, however, is our ability to predict the future economic environment.
Philosophically, there are limitations in our ability to accurately identify both simultaneously. After all, if we could predict both perfectly, we could construct an arbitrage.
If we believe the markets are at all efficient, then being able to identify the factors that will out-perform in a given state of the business cycle should lead us to conclude that we cannot predict the future state of the business cycle. Similarly, if we believe we can predict the future state of the business cycle, we should not be able to predict which factors will necessarily do well.
Philosophical arguments aside, we wanted to test the efficacy of this approach.
Which Factors and When?
Rather than simply perform a data-mining exercise to determine which factors have done well in each economic environment, we wanted to test prevalent beliefs about factor performance and economic cycles. To do this, we identified marketing and research materials from two investment institutions that tie factor allocation recommendations to the business cycle.
Both models expressed a view using four stages of the economic environment: a slowdown, a contraction, a recovery, and an economic expansion.
Model #1
Model #2
Defining the Business Cycle
Given these models, our next step was to build a model to identify the current economic environment. Rather than build a model, however, we decided to dust off our crystal ball. After all, if business-cycle-based factor rotation does not work with perfect foresight of the economic environment, what hope do we have for when we have to predict the environment?
We elected to use the National Bureau of Economic Research’s (“NBER”) listed history of US business cycle expansions and contractions. With the benefit of hindsight, they label recessions as the peak of the business cycle prior to the subsequent trough.
Unfortunately, NBER only provides a simple indicator as to whether a given month is in a recession or not. We were left to fill in the blanks around what constitutes a slowdown, a contraction, a recovery, and an expansionary period. Here we settled on two definitions:
Definition #1
Definition #2
For definition #2, in the case where two recessions were 12 or fewer months apart (as was the case in the 1980s), the intermediate period was split equivalently into recovery and slowdown.
Implementing Factor Rotation
After establishing the rotation rules and using our crystal ball to identify the different periods of the business cycle, our next step was to build the factor rotation portfolios.
We first sourced monthly long/short equity factor returns for size, value, momentum, and quality from AQR’s data library. To construct a low-volatility factor, we used portfolios sorted on variance from the Kenneth French library and subtracted bottom-quintile returns from top-quintile returns.
As the goal of our study is to identify the benefit of factor timing, we de-meaned the monthly returns by the average of all factor returns in that month to identify relative performance.
We constructed four portfolios using the two factor rotation definitions and the two economic cycle definitions. Generically, at the end of each month, we would use the next month’s economic cycle label to identify which factors to hold in our portfolio. Identified factors were held in equal weight.
Below we plot the four equity curves. Remember that these series are generated using de-meaned return data, so reflect the out-performance against an equal-weight factor benchmark.
Source: NBER, AQR, and Kenneth French Data Library. Calculations by Newfound Research. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
It would appear that even with a crystal ball, conventional wisdom about style rotation and business cycles may not hold. And even where it might, we can see multi-decade periods where it adds little-to-no value.
Data-Mining Our Way to Success
If we are going to use a crystal ball, we might as well just blatantly data-mine our way to success and see what we learn along the way.
To achieve this goal, we can simply look at the annualized de-meaned returns of each factor during each period of the business cycle.
Source: NBER, AQR, and Kenneth French Data Library. Calculations by Newfound Research. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
Despite two different definitions of the business cycle, we can see a strong alignment in which factors work when. Slow-downs / pre-recessionary periods are tilted towards momentum and defensive factors like quality and low-volatility. Momentum may seem like a curious factor, but its high turnover may give it a chameleon-like nature that can tilt it defensively in certain scenarios.
In a recession, momentum is replaced with value while quality and low-volatility remain. In the initial recovery, small-caps, value, and momentum are favored. In this case, while value may actually be benefiting from multiple expansion, small-caps may simply be a way to play higher beta. Finally, momentum is strongly favored during an expansion.
Yet even a data-mined solution is not without its flaws. Below we plot rolling 3-year returns for our data-mined timing strategies. Again, remember that these series are generated using de-meaned return data, so reflect the out-performance against an equal-weight factor benchmark.
Source: NBER, AQR, and Kenneth French Data Library. Calculations by Newfound Research. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
Despite a crystal ball telling us what part of the business cycle we are in and completely data-mined results, there are still a number of 3-year periods with low-to-negative results. And we have not even considered manager costs, transaction costs, or taxes yet.
A few more important things to note.
Several of these factors exhibit strong negative performance during certain parts of the market cycle, indicating a potential for out-performance by taking the opposite side of the factor. For example, value appears to do poorly during pre-recession and expansion periods. One hypothesis is that during expansionary periods, markets tend to over-extrapolate earnings growth potential, favoring growth companies that appear more expensive.
We should also remember that our test is on long/short portfolios and may not necessarily be relevant for long-only investors. While we can think of a long-only portfolio as a market-cap portfolio plus a long/short portfolio, the implicit long/short is not necessarily identical to academic factor definitions.
Finally, it is worth considering that these results are data-mined over a 50+ year period, which may allow outlier events to dramatically skew the results. Momentum, for example, famously exhibited dramatic crashes during the Great Depression and in the 2008-crisis, but may have actually relatively out-performed in other recessions.
Conclusion
In this commentary we sought to answer the question, “can we use the business cycle to time factor exposures?” Assuming access to a crystal ball that could tell us where we stood precisely in the business cycle, we found that conventional wisdom about factor timing did not add meaningful value over time. We do not hold out much hope, based on this conventional wisdom, that someone without a crystal ball would fare much better.
Despite explicitly trying to select models that reflected conventional wisdom, we found a significant degree of similarity in these recommendations with those that came from blindly data-mining optimal results. Nevertheless, even slight recommendation differences lead to lackluster results.
The similarities between data-mined results and conventional wisdom, however, should give us pause. While the argument for conventional wisdom is often a well-articulated economic rationale, we have to wonder whether we have simply fooled ourselves with a narrative that has been inherently constructed with the benefit of hindsight.