Prior research and empirical investment results have shown that portfolio construction choices related to rebalance schedules may have non-trivial impacts on realized performance. We construct long-only indices that provide exposures to popular U.S. equity factors (value, size, momentum, quality, and low volatility) and vary their rebalance schedules to isolate the effects of “rebalance timing luck.” Our constructed indices exhibit high levels of rebalance timing luck, often exceeding 100 basis points annualized, with total impact dependent upon the frequency of rebalancing, portfolio concentration, and the nature of the underlying strategy. As a case study, we replicate popular factor-based index funds and similarly find meaningful performance impacts due to rebalance timing luck. For example, a strategy replicating the S&P Enhanced Value index saw calendar year return differentials above 40% strictly due to the rebalance schedule implemented. Our results suggest substantial problems for analyzing any investment when the strategy, its peer group, or its benchmark is susceptible to performance impacts driven by the choice of rebalance schedule.

Defensive equity strategies are comprised of stocks that lose less than the market during bear markets while keeping up with the market during a bull market.

Coarse sorts on metrics such as volatility, beta, value, and momentum lead to diversified portfolios but have mixed results in terms of their defensive characteristics, especially through different crisis periods that may favor one metric over another.

Using non-linear machine learning techniques is a desirable way to identify certain combinations of factors that lead to better defensive equity strategies over multiple periods.

By applying techniques such as random forests and gradient boosting to two sample defensive equity metrics, we find that machine learning does not add significant value over a low volatility sort, given the features included in the model.

While this by no means rules out the benefits of machine learning techniques, it shows how a blanket application of it is not a panacea for investing during crisis periods.

There is no shortage of hypotheses as to what characteristics define a stock that will outperform in a bear market. Some argue that value stocks should perform well, given their relative valuation buffer (the “less far to fall” argument). Some argue for a focus on balance sheet strength while others argue that cash-flow is the ultimate life blood of a company and should be prioritized. There are even arguments for industry preferences based upon economic cyclicality.

Each recession and crisis is unique, however, and therefore the characteristics of stocks that fare best will likely change. For example, the dot-com run-up caused a large number of real-economy businesses to be sorted into the “cheap” bucket of the value factor. These companies also tended to have higher quality earnings and lower beta / volatility than the dot-com stocks.

Common sense would indicate that unconstrained value may be a natural counter-hedge towards large, speculative bubbles, but we need only look towards 2008 – a credit and liquidity event – to see that value is not a panacea for every type of crisis.

It is for this reason that some investors prefer to take their cues from market-informed metrics such as beta, volatility, momentum, or trading volume.

Regardless of approach, there are some philosophical limitations we should consider when it comes to expectations with defensive equity portfolios. First, if we were able to identify an approach that could avoid market losses, then we would expect that strategy to also have negative alpha.^{1} If this were not the case, we could construct an arbitrage.

Therefore, in designing a defensive equity portfolio, our aim should be to provide ample downside protection against market losses while minimizing the relative upside participation cost of doing so.

Traditional linear sorts – such as buying the lowest volatility stocks – are coarse by design. They aim to robustly capture a general truth and hedge missed subtleties through diversification. For example, while some stocks deserve to be cheap and some stocks are expensive for good reason, naïve value sorts will do little to distinguish them from those that are unjustifiably cheap or rich.

For a defensive equity portfolio, however, this coarseness may not only reduce effectiveness, but it may also increase the implicit cost. Therefore, in this note we implement non-linear techniques in an effort to more precisely identify combinations of characteristics that may create a more effective defensive equity strategy.

The Strategy Objective

To start, we must begin by defining precisely what we mean by a “defensive equity strategy.” What are the characteristics that would make us label one security as defensive and another as not? Or, potentially better, is there a characteristic that allows us to rank securities on a gradient of defensiveness?

This is not a trivial decision, as our entire exercise will attempt to maximize the probability of correctly identifying securities with this characteristic.

As our goal is to find those securities which provide the most protection during equity market routs but bleed the least during equity market rallies, we chose a metric that scored how closely a stock’s return reflected the payoff of a call option on the S&P 500 over the next 63 trading days (approximately 3 months).

In other words, if the S&P 500 is positive over the next 63 trading days, the score of a security is equal to the squared difference between its return and the S&P 500’s return. If the market’s return is negative, the score of a security is simply its squared return.

To determine whether this metric reflects the type of profile we want, we can create a long/short portfolio. Each month we rank securities by their scores and select the quintile with the lowest scores. Securities are then weighted by their market capitalization. Securities are held for three months and the portfolio is implemented with three tranches. The short leg of the portfolio is the market rather than the highest quintile, as we are explicitly trying to identify defense against the market.

To create a scalable solution, we restrict our investable universe to those in the top 1,000 securities by market capitalization.

We plot the performance below.

Source: Sharadar Fundamentals. Calculations by Newfound Research. Returns are hypothetical and backtested. Returns are gross of all fees including, but not limited to, management fees, transaction fees, and taxes. Returns assume the reinvestment of all distributions.

We can see that the strategy is relatively flat during bull markets (1998-2000, 2003-2007, 2011-2015, 2016-2018), but rallies during bear markets and sudden market shocks (2000-2003, 2008, 2011, 2015/2016, Q4 2018, and 2020).

Interestingly, despite having no sector constraints and not explicitly targeting tracking error at the portfolio level, the resulting portfolio ends up well diversified across sectors, though it does appear to make significant short-term jumps in sector weights. We can also see an increasing tilt towards Technology over the last 3 years in the portfolio. In recent months, positions in Financials and Industrials have been almost outright eliminated.

Source: Sharadar Fundamentals. Calculations by Newfound Research.

Of course, this metric is explicitly forward looking. We’re using a crystal ball to peer into the future and identify those stocks that track the best on the way up and protect the best on the way down. Our goal, then, is to use a variety of company and security characteristics to accurately forecast this score.

We will include a variety of characteristics and features, including:

Size: Market Capitalization.

Valuation: Book-to-Price, Earnings-to-Price, Free Cash Flow-to-Price, Revenue-to-EV, and EBITDA-to-EV.

Momentum: 12-1 Month Return and 1-Month Return.

Risk: Beta, Volatility, Idiosyncratic Volatility, and Ulcer Index.

Quality: Accruals, ROA, ROE, CFOA, GPOA, Net Margin, Asset Turnover, Leverage, and Payout Ratio.

These 24 features are all cross-sectionally ranked at each point in time. We also include dummy variables for each security to represent sector inclusion as well as whether the company has positive Net Income and whether the company has positive Operating Cash Flow.

Note that we are not including any market regime characteristics, such information about market returns, volatility, interest rates, credit spreads, sentiment, or monetary or fiscal policy. Had we included such features, our resulting model may end up as a factor switching approach, changing which characteristics it selects based upon the market environment. This may be an interesting model in its own right, but our goal herein is simply to design a static, non-linear factor sort.

Random Forests

Our first approach will be to apply a random forest algorithm, which is an ensemble learning method. The approach uses a training data set to build a number of individual decision trees whose results are then re-combined to create the ultimate decision. By training each tree on a subset of data and considering only a subset of features for each node, we can create trees that may individually have high variance, but as an aggregate forest reduce variance without necessarily increasing bias.

As an example, this means that one tree may be built using a mixture of low volatility and quality features, while another may be built using valuation and momentum features. Each tree is able to model a non-linear relationship, but by restricting tree depth and building trees using random subsets of data and features, we can prevent overfitting.

There are a number of hyperparameters that can be set to govern the model fit. For example, we can set the maximum depth of the individual trees as well as the number of trees we want to fit. Fitting hyperparameters is an art unto itself, and rather than go down the rabbit hole of tuning hyperparameters via cross-validation, we did our best to select reasonable hyper parameters. We elected to train the model on 50% of our data (March 1998 to March 2009), with a total of 100 trees each with a maximum depth of 2.

The results of the exercise are plotted below.

Source: Sharadar Fundamentals. Calculations by Newfound Research.

The performance does appear to provide defensive properties both in- and out-of-sample, with meaningful returns generated in 2000-2002, 2008, Q3 and Q4 of 2011, June 2015 through June 2016, and Q4 2008.

We can see that the allocations also express a number of static sector concentrations (e.g. Consumer Defensive) as well as some cyclical changes (e.g. Finances pre- and post-2007).

We can also gain insight into how the portfolio composition changes by looking at the weighted characteristic scores of the long leg of the portfolio over time.

Source: Sharadar Fundamentals. Calculations by Newfound Research.

It is important to remember that characteristics are cross-sectionally ranked across stocks. For some characteristics, higher is often considered better (e.g. a higher earnings-to-price cheaper is considered cheaper), whereas for other factors lower is better (e.g. lower volatility is considered to have less risk).

We can see that some characteristics are static tilts: higher market capitalization, positive operating cash flow, positive net income, and lower risk characteristics. Other characteristics are more dynamic. By 12/2008, the portfolio has tilted heavily towards high momentum stocks. A year later, the portfolio has tilted heavily towards low momentum stocks.

What is somewhat difficult to disentangle is whether these static and dynamic effects are due to the non-linear model we have developed, or whether it’s simply that applying static tilts results in the dynamic tilts. For example, if we only applied a low volatility tilt, is it possible that the momentum tilts would emerge naturally?

Unfortunately, the answer appears to be the latter. If we plot a long/short portfolio that goes long the bottom quintile of stocks ranked on realized 1-year volatility and short the broad market, we see a very familiar equity curve.

Source: Sharadar Fundamentals. Calculations by Newfound Research. Returns are hypothetical and backtested. Returns are gross of all fees including, but not limited to, management fees, transaction fees, and taxes. Returns assume the reinvestment of all distributions.

It would appear that the random forest model effectively identified the benefits of low volatility securities. And while out-of-sample performance does appear to provide more ample defense during 2011, 2015-2016, and 2018 than the low volatility tilt, it also has significantly greater performance drag.

Gradient Boosting

One potential improvement we might consider is to apply a gradient boosting model. Rather than simply building our decision trees independently in parallel, we can build them sequentially such that each tree is built on a modified version of the original data set (e.g. increasing the weights of those data points that were harder to classify and decreasing the weights on those that were easier).

Rather than just generalize to a low-volatility proxy, gradient boosting may allow our decision tree process to pick up upon greater subtleties and conditional relationships in the data. For comparison purposes, we’ll assume the same maximum tree depth and number of trees as the random forest method.

In initially evaluating the importance of features, it does appear that low volatility remains a critical factor, but other characteristics – such as momentum, free cash flow yield, and payout ratio – are close seconds. This may be a hint that gradient boosting was able to identify more subtle relationships.

Unfortunately, in evaluating the sector characteristics over time, we see a very similar pattern. Though we can notice that sectors like Technology have received a meaningfully higher allocation with this methodology versus the random forest approach.

Source: Sharadar Fundamentals. Calculations by Newfound Research.

If we compare long/short portfolios, we find little meaningful difference to our past results. Our model simply seems to identify a (historically less effective) low volatility model.

Source: Sharadar Fundamentals. Calculations by Newfound Research. Returns are hypothetical and backtested. Returns are gross of all fees including, but not limited to, management fees, transaction fees, and taxes. Returns assume the reinvestment of all distributions.

Re-Defining Defensiveness

When we set out on this problem, we made a key decision to define a stock’s defensiveness by how closely it is able to replicate the payoff of a call option on the S&P 500. What if we had elected another definition, though? For example, we could define defensive stocks as those that minimize the depth and frequency of drawdowns using a measure like the Ulcer Index.

Below we replicate the above tests but use forward 12-month Ulcer Index as our target score (or, more precisely, a security’s forward 12-month cross-sectional Ulcer Index rank).

We again begin by constructing an index that has perfect foresight, buying a market-capitalization weighted portfolio of securities that rank in the lowest quintile of forward 12-month ulcer index. We see a very different payoff profile than before, with strong performance exhibited in both bull and bear markets.

By focusing on forward 12-month scores rather than 3-month scores, we also see a far steadier sector allocation profile over time. Interestingly, we still see meaningful sector tilts over time, with sectors like Technology, Financials, and Consumer Defensives coming in and out of favor over time.

Source: Sharadar Fundamentals. Calculations by Newfound Research. Returns are hypothetical and backtested. Returns are gross of all fees including, but not limited to, management fees, transaction fees, and taxes. Returns assume the reinvestment of all distributions.

We again use a gradient boosted random forest model to try to model our target scores. We find that five of the top six most important features are price return related, either measuring return or risk.

Despite the increased emphasis on momentum, the resulting long/short index still echoes a naïve low-volatility sort. This is likely because negative momentum and high volatility have become reasonably correlated proxies for one another in recent years.

While returns appear improved from prior attempts, the out-of-sample performance (March 2009 and onward) is almost identical to that of the low-volatility long/short.

Source: Sharadar Fundamentals. Calculations by Newfound Research. Returns are hypothetical and backtested. Returns are gross of all fees including, but not limited to, management fees, transaction fees, and taxes. Returns assume the reinvestment of all distributions.

Conclusion

In this research note we sought to apply machine learning techniques to factor portfolio construction. Our goal was to exploit the ability of machine learning models to model non-linear relationships, hoping to come up with a more nuanced definition of a defensive equity portfolio.

In our first test, we defined a security’s defensiveness by how closely it was able to replicate the payoff of a call option on the S&P 500 over rolling 63-day (approximately 3-month) periods. If the market was up, we wanted to pick stocks that closely matched the market’s performance; if the market was down, we wanted to pick stocks that minimized drawdown.

After pre-engineering a set of features to capture both company and stock dynamics, we first turned to a random forest model. We chose this model as the decision tree structure would allow us to model conditional feature dynamics. By focusing on generating a large number of shallow trees we aimed to avoid overfitting while still reducing overall model variance.

Training the model on data from 1999-2009, we found that the results strongly favored companies exhibiting positive operating cash flow, positive earnings, and low realized risk characteristics (e.g. volatility and beta). Unfortunately, the model did not appear to provide any meaningful advantage versus a simple linear sort on volatility.

We then turned to applying gradient boosting to our random forest. This approach builds trees in sequence such that each tree seeks to improve upon the last. We hoped that such an approach would allow the random forest to build more nuance than simply scoring on realized volatility.

Unfortunately, the results remained largely the same.

Finally, we decided to change our definition of defensiveness by focusing on the depth and frequency of drawdowns with the Ulcer Index. Again, after re-applying the gradient boosted random forest model, we found little difference in realized results versus a simple sort on volatility (especially out-of-sample).

One answer for these similar results may be that our objective function is highly correlated to volatility measures. For example, if stocks follow a geometric Brownian motion process, those with higher levels of volatility should have deeper drawdowns. And if the best predictor of future realized volatility is past realized volatility, then it is no huge surprise that the models ultimately fell back towards a naïve volatility sort.

Interestingly, value, quality, and growth characteristics seemed largely ignored. We see two potential reasons for this.

The first possibility is that they were simply subsumed by low volatility with respect to our objective. If this were the case, however, we would see little feature importance placed upon them, but would still expect their weighted average characteristic scores within our portfolios to be higher (or lower). While this is true for select features (e.g. payout ratio), the importance of others appears largely cyclical (e.g. earnings-to-price). In fact, during the fall out of the dot-com bubble, weighted average value scores remained between 40 and 70.

The second reason is that the fundamental drivers behind each market sell-off are different. Factors tied to company metrics (e.g. valuation, quality, or growth), therefore, may be ill-suited to navigate different types of sell offs. For example, value was the natural antithesis to the speculative dot-com bubble. However, during the recent COVID-19 crisis, it has been the already richly priced technology stocks that have fared the best. Factors based upon security characteristics (e.g. volatility, returns, or volume) may be better suited to dynamically adjust to market changes.

While our results were rather lackluster, we should acknowledge that we have really only scratched the surface of machine learning techniques. Furthermore, our results are intrinsically linked to how we’ve defined our problem and the features we engineered. A more thoughtful target score or a different set of features may lead to substantially different results.

In past research notes we have explored the impact of rebalance timing luck on strategic and tactical portfolios, even using our own Systematic Value methodology as a case study.

In this note, we generate empirical timing luck estimates for a variety of specifications for simplified value, momentum, low volatility, and quality style portfolios.

Relative results align nicely with intuition: higher concentration and less frequent rebalancing leads to increasing levels of realized timing luck.

For more reasonable specifications – e.g. 100 stock portfolios rebalanced semi-annually – timing luck ranges between 100 and 400 basis points depending upon the style under investigation, suggesting a significant risk of performance dispersion due only to when a portfolio is rebalanced and nothing else.

The large magnitude of timing luck suggests that any conclusions drawn from performance comparisons between smart beta ETFs or against a standard style index may be spurious.

We’ve written about the concept of rebalance timing luck a lot. It’s a cowbell we’ve been beating for over half a decade, with our first article going back to August 7^{th}, 2013.

As a reminder, rebalance timing luck is the performance dispersion that arises from the choice of a particular rebalance date (e.g. semi-annual rebalances that occur in June and December versus March and September).

We’ve empirically explored the impact of rebalance timing luck as it relates to strategic asset allocation, tactical asset allocation, and even used our own Systematic Value strategy as a case study for smart beta. All of our results suggest that it has a highly non-trivial impact upon performance.

This summer we published a paper in the Journal of Index Investing that proposed a simple solution to the timing luck problem: diversification. If, for example, we believe that our momentum portfolio should be rebalanced every quarter – perhaps as an optimal balance of cost and signal freshness – then we proposed splitting our capital across the three portfolios that spanned different three-month rebalance periods (e.g. JAN-APR-JUL-OCT, FEB-MAY-AUG-NOV, MAR-JUN-SEP-DEC). This solution is referred to either as “tranching” or “overlapping portfolios.”

The paper also derived a formula for estimating timing luck ex-ante, with a simplified representation of:

Where L is the timing luck measure, T is turnover rate of the strategy, F is how many times per year the strategy rebalances, and S is the volatility of a long/short portfolio that captures the difference of what a strategy is currently invested in versus what it could be invested in if the portfolio was reconstructed at that point in time.

Without numbers, this equation still informs some general conclusions:

Higher turnover strategies have higher timing luck.

Strategies that rebalance more frequently have lower timing luck.

Strategies with a less constrained universe will have higher timing luck.

Bullet points 1 and 3 may seem similar but capture subtly different effects. This is likely best illustrated with two examples on different extremes. First consider a very high turnover strategy that trades within a universe of highly correlated securities. Now consider a very low turnover strategy that is either 100% long or 100% short U.S. equities. In the first case, the highly correlated nature of the universe means that differences in specific holdings may not matter as much, whereas in the second case the perfect inverse correlation means that small portfolio differences lead to meaningfully different performance.

L, in and of itself, is a bit tricky to interpret, but effectively attempts to capture the potential dispersion in performance between a particular rebalance implementation choice (e.g. JAN-APR-JUL-OCT) versus a timing-luck-neutral benchmark.

After half a decade, you’d would think we’ve spilled enough ink on this subject.

But given that just about every single major index still does not address this issue, and since our passion for the subject clearly verges on fever pitch, here comes some more cowbell.

Equity Style Portfolio Definitions

In this note, we will explore timing luck as it applies to four simplified smart beta portfolios based upon holdings of the S&P 500 from 2000-2019:

Value: Sort on earnings yield.

Momentum: Sort on prior 12-1 month returns.

Low Volatility: Sort on realized 12-month volatility.

Quality: Sort on average rank-score of ROE, accruals ratio, and leverage ratio.

Quality is a bit more complicated only because the quality factor has far less consistency in accepted definition. Therefore, we adopted the signals utilized by the S&P 500 Quality Index.

For each of these equity styles, we construct portfolios that vary across two dimensions:

Number of Holdings: 50, 100, 150, 200, 250, 300, 350, and 400.

Frequency of Rebalance: Quarterly, Semi-Annually, and Annually.

For the different rebalance frequencies, we also generate portfolios that represent each possible rebalance variation of that mix. For example, Momentum portfolios with 50 stocks that rebalance annually have 12 possible variations: a January rebalance, February rebalance, et cetera. Similarly, there are 12 possible variations of Momentum portfolios with 100 stocks that rebalance annually.

By explicitly calculating the rebalance date variations of each Style x Holding x Frequency combination, we can construct an overlapping portfolios solution. To estimate empirical annualized timing luck, we calculate the standard deviation of monthly return dispersion between the different rebalance date variations of the overlapping portfolio solution and annualize the result.

Empirical Timing Luck Results

Before looking at the results plotted below, we would encourage readers to hypothesize as to what they expect to see. Perhaps not in absolute magnitude, but at least in relative magnitude.

For example, based upon our understanding of the variables affecting timing luck, would we expect an annually rebalanced portfolio to have more or less timing luck than a quarterly rebalanced one?

Should a more concentrated portfolio have more or less timing luck than a less concentrated variation?

Which factor has the greatest risk of exhibiting timing luck?

Source: Sharadar. Calculations by Newfound Research.

To create a sense of scale across the styles, below we isolate the results for semi-annual rebalancing for each style and plot it.

Source: Sharadar. Calculations by Newfound Research.

In relative terms, there is no great surprise in these results:

More frequent rebalancing limits the risk of portfolios changing significantly between rebalance dates, thereby decreasing the impact of timing luck.

More concentrated portfolios exhibit larger timing luck.

Faster-moving signals (e.g. momentum) tend to exhibit more timing luck than more stable, slower-moving signals (e.g. low volatility).

What is perhaps the most surprising is the sheer magnitude of timing luck. Consider that the S&P 500 Enhanced Value, Momentum, Low Volatility, and Quality portfolios all hold 100 securities and are rebalanced semi-annually. Our study suggests that timing luck for such approaches may be as large as 2.5%, 4.4%, 1.1%, and 2.0% respectively.

But what does that really mean? Consider the realized performance dispersion of different rebalance date variations of a Momentum portfolio that holds the top 100 securities in equal weight and is rebalanced on a semi-annual basis.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

The 4.4% estimate of annualized timing luck is a measure of dispersion between each underlying variation and the overlapping portfolio solution. If we isolate two sub-portfolios and calculate rolling 12-month performance dispersion, we can see that the difference can be far larger, as one might exhibit positive timing luck while the other exhibits negative timing luck. Below we do precisely this for the APR-OCT and MAY-NOV rebalance variations.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

In fact, since these variations are identical in every which way except for the date on which they rebalance, a portfolio that is long the APR-OCT variation and short the MAY-NOV variation would explicitly capture the effects of rebalance timing luck. If we assume the rebalance timing luck realized by these two portfolios is independent (which our research suggests it is), then the volatility of this long/short is approximately the rebalance timing luck estimated above scaled by the square-root of two.

Derivation: For variations v_{i} and v_{j} and overlapping-portfolio solution V, then:

Thus, if we are comparing two identically-managed 100-stock momentum portfolios that rebalance semi-annually, our 95% confidence interval for performance dispersion due to timing luck is +/- 12.4% (2 x SQRT(2) x 4.4%).

Even for more diversified, lower turnover portfolios, this remains an issue. Consider a 400-stock low-volatility portfolio that is rebalanced quarterly. Empirical timing luck is still 0.5%, suggesting a 95% confidence interval of 1.4%.

S&P 500 Style Index Examples

One critique of the above analysis is that it is purely hypothetical: the portfolios studied above aren’t really those offered in the market today.

We will take our analysis one step further and replicate (to the best of our ability) the S&P 500 Enhanced Value, Momentum, Low Volatility, and Quality indices. We then created different rebalance schedule variations. Note that the S&P 500 Low Volatility index rebalances quarterly, so there are only three possible rebalance variations to compute.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

We see a meaningful dispersion in terminal wealth levels, even for the S&P 500 Low Volatility index, which appears at first glance in the graph to have little impact from timing luck.

Minimum Terminal Wealth

Maximum Terminal Wealth

Enhanced Value

$4.45

$5.45

Momentum

$3.07

$4.99

Low Volatility

$6.16

$6.41

Quality

$4.19

$5.25

We should further note that there does not appear to be one set of rebalance dates that does significantly better than the others. For Value, FEB-AUG looks best while JUN-DEC looks the worst; for Momentum it’s almost precisely the opposite.

Furthermore, we can see that even seemingly closely related rebalances can have significant dispersion: consider MAY-NOV and JUN-DEC for Momentum. Here is a real doozy of a statistic: at one point, the MAY-NOV implementation for Momentum is down -50.3% while the JUN-DEC variation is down just -13.8%.

These differences are even more evident if we plot the annual returns for each strategy’s rebalance variations. Note, in particular, the extreme differences in Value in 2009, Momentum in 2017, and Quality in 2003.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

Conclusion

In this study, we have explored the impact of rebalance timing luck on the results of smart beta / equity style portfolios.

We empirically tested this impact by designing a variety of portfolio specifications for four different equity styles (Value, Momentum, Low Volatility, and Quality). The specifications varied by concentration as well as rebalance frequency. We then constructed all possible rebalance variations of each specification to calculate the realized impact of rebalance timing luck over the test period (2000-2019).

In line with our mathematical model, we generally find that those strategies with higher turnover have higher timing luck and those that rebalance more frequently have less timing luck.

The sheer magnitude of timing luck, however, may come as a surprise to many. For reasonably concentrated portfolios (100 stocks) with semi-annual rebalance frequencies (common in many index definitions), annual timing luck ranged from 1-to-4%, which translated to a 95% confidence interval in annual performance dispersion of about +/-1.5% to +/-12.5%.

The sheer magnitude of timing luck calls into question our ability to draw meaningful relative performance conclusions between two strategies.

We then explored more concrete examples, replicating the S&P 500 Enhanced Value, Momentum, Low Volatility, and Quality indices. In line with expectations, we find that Momentum (a high turnover strategy) exhibits significantly higher realized timing luck than a lower turnover strategy rebalanced more frequently (i.e. Low Volatility).

For these four indices, the amount of rebalance timing luck leads to a staggering level of dispersion in realized terminal wealth.

“But Corey,” you say, “this only has to do with systematic factor managers, right?”

Consider that most of the major equity style benchmarks are managed with annual or semi-annual rebalance schedules. Good luck to anyone trying to identify manager skill when your benchmark might be realizing hundreds of basis points of positive or negative performance luck a year.

After the Great Financial Crisis, the Momentum factor has exhibited positive returns, but those returns have been largely driven by the short side of the portfolio.

One research note suggests that this is driven by increased risk aversion among investors, using the correlation of high volatility and low momentum baskets as evidence.

In contradiction to this point, the iShares Momentum ETF (MTUM) has generated positive excess annualized returns against its benchmark since inception. The same note suggests that this is due to the use of risk-adjusted momentum measures.

We explore whether risk-adjusting momentum scores introduces a meaningful and structural tilt towards low-volatility equities.

For the examples tested, we find that it does not, and risk-adjusted momentum portfolios behave very similarly to momentum portfolios.

A research note recently crossed my desk that aimed to undress the post-Global Financial Crisis (GFC) performance of the momentum factor in U.S. equities. Not only have we witnessed a significant reduction in the factor’s return, but the majority of the return has been generated by the short side of the strategy, which can be more difficult for long-only investors to access.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. The Long (Alpha) strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on 12-1 month momentum and shorts an equal-weight S&P 500 portfolio. The Short (Alpha) strategy is a monthly rebalanced portfolio that goes long an equal-weight S&P 500 portfolio and shorts, in equal weight, the bottom 50 securities in the S&P 500 ranked on 12-1 month momentum.

The note makes the narratively-appealing argument that the back-to-back recessions of the dot-com bubble and the Great Financial Crisis amplified investor risk aversion to downside losses. The proposed evidence of this fact is the correlation of the cumulative alpha generated from shorting low momentum stocks and the cumulative alpha generated from shorting high volatility stocks.

While correlation does not imply causation, one argument might be that in a heightened period of risk aversion, investors may consistently punish higher risk stocks, causing them to become persistent losers. Or, conversely, losers may be rapidly sold, creating both persistence and high levels of volatility. We can arguably see this in the convergence of holdings in low momentum and high volatility stocks during “risk off” regimes.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. The HI VOL (Alpha) strategy is a monthly rebalanced portfolio that goes long an equal-weight S&P 500 portfolio and shorts, in equal weight, the bottom 50 securities in the S&P 500 ranked on trailing 252-day realized volatility. The LO MOM (Alpha) strategy is a monthly rebalanced portfolio that goes long an equal-weight S&P 500 portfolio and shorts, in equal weight, the bottom 50 securities in the S&P 500 ranked on 12-1 month momentum.

Given these facts, we would expect long-only momentum investors to have harvested little out-performance in recent years. Yet we find that the popular iShares Momentum ETF (MTUM) has out-performed the S&P 500 by 290 basis points per year since its inception in 2013.

The answer to this conundrum, as proposed by the research note, is that MTUM’s use of risk-adjusted momentum is the key.

If we think of risk-adjusted momentum as simply momentum divided by volatility (which is how MTUM defines it), we might interpret it as an integrated signal of both the momentum and low-volatility factors. Therefore, risk-adjusting creates a multi-factor portfolio that tilts away from high volatility stocks.

And hence the out-performance.

Except if we actually create a risk-adjusted momentum portfolio, that does not appear to really be the case at all.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. The alpha of the risk-adjusted momentum strategy is defined as the return of a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on risk-adjusted momentum (12-1 month momentum divided by 252-day realized volatility) and shorts an equal-weight S&P 500 portfolio.

To be fair, MTUM’s construction methodology differs quite a bit from that employed herein. We are simply equally-weighting the top 50 stocks in the S&P 500 when ranked by risk-adjusted momentum, whereas MTUM uses a blend of 6- and 12-month risk-adjusted momentum scores and then tilts market-capitalization weights based upon those scores.

Nevertheless, if we look at actual holdings overlap over time of our Risk-Adjusted Momentum portfolio versus Momentum and Low Volatility portfolios, not only do we see persistently higher overlap with the Momentum portfolio, but we see fairly low average overlap with the Low Volatility portfolio.

For the latter point, it is worth first anchoring ourselves to the standard overlap between Momentum and Low Volatility (green line below). While we can see that the Risk-Adjusted Momentum portfolio does indeed have a higher average overlap with Low Volatility than does the Momentum portfolio, the excess tilt to Low Volatility due to the use of risk-adjusted momentum (i.e. the orange line minus the green line) appears rather small. In fact, on average, it is just 10%.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. The risk-adjusted momentum strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on risk-adjusted momentum (12-1 month momentum divided by 252-day realized volatility). The momentum strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on 12-1 month momentum. The low volatility strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on trailing 252-day realized volatility.

This is further evident by looking at the actual returns of the strategies themselves:

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. The risk-adjusted momentum strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on risk-adjusted momentum (12-1 month momentum divided by 252-day realized volatility). The momentum strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on 12-1 month momentum. The low volatility strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on trailing 252-day realized volatility.

The Risk-Adjusted Momentum portfolio performance tracks that of the Momentum portfolio very closely.

As it turns out, the step of adjusting for risk creates far less of a low volatility factor tilt in our top-decile portfolio than one might initially suspect. (Or, at least, I’ll speak for myself: it created far less of a tilt than I expected.)

To understand this point, we will first re-write our risk-adjusted momentum signal as:

While trivial algebra, re-writing risk-adjusted momentum as the product of momentum and inverse volatility is informative to understanding why risk-adjusted momentum appears to load much more heavily on momentum than low volatility.

At a given point in time, it would appear as if Momentum and Low Volatility should have an equal influence on the rank of a given security. However, we need to dig a level deeper and consider how changes in these variables impact change in risk-adjusted momentum.

Fortunately, the product makes this a trivial exercise: holding INVVOL constant, changes in MOM are scaled by INVVOL and vice versa. This scaling effect can cause large changes in risk-adjusted momentum – and therefore ordinal ranking – particularly as MOM crosses the zero level.

Consider a trivial example where INVVOL is a very large number (e.g. 20) due to a security having a very low volatility profile (e.g. 5%). This would appear, at first glance, to give a security a structural advantage and hence create a low volatility tilt in the portfolio. However, a move from positive prior returns to negative prior returns would shift the security from ranking among the best to ranking among the worst in risk-adjusted momentum.^{1}

A first order estimate of change in risk-adjusted momentum is:

So which term ultimately has more influence on the change in scores over time?

To get a sense of relative scale, we plot the cross-sectional mean absolute difference between the two terms over time. This should, at least partially, capture interaction effects between the two terms.

Source: Sharadar. Calculations by Newfound Research.

We can see that the term including the change in MOM has a much more significant influence on changes in risk-adjusted momentum than changes in INVVOL do. Thus, we might expect a portfolio driven entirely by changes in momentum to share more in common with our risk-adjusted momentum portfolio than one driven entirely by changes in volatility.

This is somewhat evident when we plot the return of MTUM versus our top 50 style portfolios. The correlation of daily returns between MTUM and our Momentum, Low Volatility, and Risk-Adjusted Momentum portfolios is 0.93, 0.72, and 0.93 respectively, further suggesting that MTUM is driven more by momentum than volatility.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. The risk-adjusted momentum strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on risk-adjusted momentum (12-1 month momentum divided by 252-day realized volatility). The momentum strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on 12-1 month momentum. The low volatility strategy is a monthly rebalanced portfolio that goes long, in equal weight, the top 50 securities in the S&P 500 ranked on trailing 252-day realized volatility.

This is only one part of the equation, however, as it is possible that changes to the risk-adjusted momentum score are so small – despite being largely driven by momentum – that relative rankings never actually change. Or, because we have constructed our portfolios by choosing only the top 50 ranked securities, that momentum does drive the majority of change across the entire universe, but the top 50 are always structurally advantaged by the non-linear scaling of low volatility.

To create a more accurate picture, we can rank-weight the entire S&P 500 and evaluate the holdings overlap over time.

Source: Sharadar. Calculations by Newfound Research.

Note that by now including all securities, and not just selecting the top 50, the overlap with both the Momentum and Low Volatility portfolios naturally appears higher on average. Nonetheless, we can see that the overlap with the Momentum portfolio is consistently higher than that of the Low Volatility portfolio, again suggesting that momentum has a larger influence on the overall portfolio composition than volatility does.

Conclusion

Without much deep thought, it would be easy to assume that a risk-adjusted momentum measure – i.e. prior returns divided by realized volatility – would tilt a portfolio towards both prior winners and low-volatility securities, resulting in a momentum / low-volatility barbell.

Upon deeper consideration, however, the picture complicates quickly. For example, momentum can be both positive and negative; dividing by volatility creates a non-linear impact; and momentum tends to change more rapidly than volatility.

We do not attempt to derive a precise, analytical equation that determines which of the two variables ultimately drives portfolio composition, but we do construct long-only example portfolios for empirical study. We find that a high-concentration risk-adjusted momentum portfolio has significantly more overlap in holdings with a traditional momentum portfolio than a low-volatility portfolio, resulting in a more highly correlated return stream.

The most important takeaway from this note is that intuition can be deceiving: it is important to empirically test our assumptions to ensure we truly understand the impact of our strategy construction choices.

In this commentary we explore the application of several quantitative signals to a broad set of fixed income exposures.

Specifically, we explore value, momentum, carry, long-term reversals, and volatility signals.

We find that value, 3-month momentum, carry, and 3-year reversals all create attractive quantile profiles, potentially providing clues for how investors might consider pursuing higher returns or lower risk.

This study is by no means comprehensive and only intended to invite further research and conversation around the application of quantitative styles across fixed income exposures.

In Navigating Municipal Bonds with Factors, we employed momentum, value, carry, and low-volatility signals to generate a sector-based approach to navigating municipal bonds.

In this article, we will introduce an initial data dive into applying quantitative signals to a broader set of fixed income exposures. Specifically, we will incorporate 17 different fixed income sectors, spanning duration, credit, and geographic exposure.

U.S. Treasuries: Near (3-Month), short (1-3 Year), mid (3-5 Year) intermediate (7-10 Year), and long (20+ Year).

Investment-Grade Corporates: Short-term, intermediate-term, and Floating Rate corporate bonds.

High Yield: Short- and intermediate-term high yield.

International Government Bonds: Currency hedged and un-hedged government bonds.

In this study, each exposure is represented by a corresponding ETF. We extend our research prior to ETF launch by employing underlying index data the ETF seeks to track.

The quantitative styles we will explore are:

Momentum: Buy recent winners and sell recent losers.

Value: Buy cheap and sell expensive.

Carry: Buy high carry and sell low carry.

Reversal: Buy long-term losers and sell long-term winners.

Volatility: Buy high volatility and sell low volatility.^{1}

The details of each style are explained in greater depth in each section below.

Note that the analysis herein is by no means meant to be prescriptive in any manner, nor is it a comprehensive review. Rather, it is meant as a launching point for further commentaries we expect to write.

At the risk of spoiling the conclusion, below we plot the annualized returns and volatility profiles of dollar-neutral long-short portfolios.^{2} We can see that short-term Momentum, Value, Carry, and Volatility signals generate positive excess returns over the testing period.

Curiously, longer-term Momentum does not seem to be a profitable strategy, despite evidence of this approach being rather successful for many other asset classes.

Source: Bloomberg; Tiingo. Calculations by Newfound Research. Returns are hypothetical and backtested. Returns are gross of all management fees, transaction fees, and taxes, but net of underlying fund fees. Total return series assumes the reinvestment of all distributions.

However, these results are not achievable by most investors who may be constrained to a long-only implementation. Even when interpreted as over- and under-weight signals, the allocations in the underlying long/short portfolios differ so greatly from benchmark exposures, they would be nearly impossible to implement.

For a long-only investor, then, what is more relevant is how these signals forecast performance of different rank orderings of portfolios. For example, how does a portfolio of the best-ranking 3-month momentum exposures compare to a portfolio of the worst-ranking?

In the remainder of this commentary, we explore the return and risk profiles of quintile portfolios formed on each signal. To construct these portfolios, we rank order our exposures based on the given quantitative signal and equally-weight the exposures falling within each quintile.

Momentum

We generate momentum signals by computing 12-, 6- and 3- month prior total returns to reflect slow, intermediate, and fast momentum signals. Low-ranking exposures are those with the lowest prior total returns, while high ranking exposures have the highest total returns.

The portfolios assume a 1-month holding period for momentum signals. To avoid timing luck, four sub-indexes are used, each rebalancing on a different week of the month.

Annualized return and volatility numbers for the quintiles are plotted below.

A few interesting data-points stand out:

For 12-month prior return, the lowest quintile actually had the highest total return.However, it has a dramatically lower Sharpe ratio than the highest quintile, which only slightly underperforms it.

Total returns among the highest quintile increase by 150 basis points (“bps”) from 12-month to 3-month signals, and 3-month rankings create a more consistent profile of increasing total return and Sharpe ratio. This may imply that short-term signals are more effective for fixed income.

Source: Bloomberg; Tiingo. Calculations by Newfound Research. Returns are hypothetical and backtested. Returns are gross of all management fees, transaction fees, and taxes, but net of underlying fund fees. Total return series assumes the reinvestment of all distributions.

Carry

Carry is the expected excess return of an asset assuming price does not change. For our fixed income universe, we proxy carry using yield-to-worst minus the risk-free rate. For non-Treasury holdings, we adjust this figure for expected defaults and recovery.

For reasonably efficient markets, we would expect higher carry to imply higher return, but not necessarily higher risk-adjusted returns. In other words, we earn higher carry as a reward for bearing more risk.

Therefore, we also calculate an alternate measure of carry: carry-to-risk. Carry-to-risk is calculated by taking our carry measure and dividing it by recent realized volatility levels. One way of interpreting this figure is as forecast of Sharpe ratio. Our expectation is that this signal may be able to identify periods when carry is episodically cheap or rich relative to prevailing market risk.

The portfolios assume a 12-month holding period for carry signals. To avoid timing luck, 52 sub-indexes are used, each rebalancing on a different week of the year.

We see:

Higher carry implies a higher return as well as a higher volatility. As expected, no free lunch here.

Carry-to-risk does not seem to provide a meaningful signal. In fact, low carry-to-risk outperforms high carry-to-risk by 100bps annualized.

Volatility meaningfully declines for carry-to-risk quintiles, potentially indicating that this integrated carry/volatility signal is being too heavily driven by volatility.

Source: Bloomberg; Tiingo. Calculations by Newfound Research. Returns are hypothetical and backtested. Returns are gross of all management fees, transaction fees, and taxes, but net of underlying fund fees. Total return series assumes the reinvestment of all distributions.

Value

In past commentaries, we have used real yield as our value proxy in fixed income. In this commentary, we deviate from that methodology slightly and use a time-series z-score of carry as our value of measure. Historically high carry levels are considered to be cheap while historically low carry levels are considered to be expensive.

The portfolios assume a 12-month holding period for value signals. To avoid timing luck, 52 sub-indexes are used, each rebalancing on a different week of the year.

We see not only a significant increase in total return in buying cheap versus expensive holdings, but also an increase in risk-adjusted returns.

Source: Bloomberg; Tiingo. Calculations by Newfound Research. Returns are hypothetical and backtested. Returns are gross of all management fees, transaction fees, and taxes, but net of underlying fund fees. Total return series assumes the reinvestment of all distributions.

Reversal

Reversal signals are the opposite of momentum: we expect past losers to outperform and past winners to underperform. Empirically, reversals tend to occur over very short time horizons (e.g. 1 month) and longer-term time horizons (e.g. 3- to 5-years). In many ways, long-term reversals can be thought of as a naive proxy for value, though there may be other behavioral and structural reasons for the historical efficacy of reversal signals.

We must be careful implementing reversal signals, however, as exposures in our universe have varying return dynamics (e.g. expected return and volatility levels).

To illustrate this problem, consider the simple two-asset example of equities and cash. A 3-year reversal signal would sell the asset that has had the best performance over the prior 3-years and buy the asset that has performed the worst. The problem is that we expect stocks to outperform cash due to the equity risk premium. Naively ranking on prior returns alone would have us out of equities during most bull markets.

Therefore, we must be careful in ranking assets with meaningfully different return dynamics.

(Why, then, can we do it for momentum? In a sense, momentum is explicitly trying to exploit the relative time-series properties over a short-term horizon. Furthermore, in a universe that contains low-risk, low-return assets, cross-sectional momentum can be thought of as an integrated process between time-series momentum and cross-sectional momentum, as the low-risk asset will bubble to the top when absolute returns are negative.)

To account for this, we use a time-series z-score of prior returns to create a reversal signal. For example, at each point in time we calculate the current 3-year return and z-score it against all prior rolling 3-year periods.

Note that in this construction, high z-scores will reflect higher-than-normal 3-year numbers and low z-scores will reflect lower-than-normal 3-year returns. Therefore, we negate the z-score to generate our signal such that low-ranked exposures reflect those we want to sell and high-ranked exposures reflect those we want to buy.

The portfolios assume a 12-month holding period for value signals. To avoid timing luck, 52 sub-indexes are used, each rebalancing on a different week of the year.

Plotting the results below for 1-, 3-, and 5-year reversal signals, we see that 3- and 5-year signals see a meaningful increase in both total return and Sharpe ratio between the lowest quintile.

Source: Bloomberg; Tiingo. Calculations by Newfound Research. Returns are hypothetical and backtested. Returns are gross of all management fees, transaction fees, and taxes, but net of underlying fund fees. Total return series assumes the reinvestment of all distributions.

Volatility

Volatility signals are trivial to generate: we simply sort assets based on prior realized volatility. Unfortunately, exploiting the low-volatility anomaly is difficult without leverage, as the empirically higher risk-adjusted return exhibited by low-volatility assets typically coincides with lower total returns.

For example, in the tests below the low quintile is mostly comprised of short-term Treasuries and floating rate corporates. The top quintile is allocated across local currency emerging market debt, long-dated Treasuries, high yield bonds, and unhedged international government bonds.

As a side note, for the same reason we z-scored reversal signals, we also hypothesized that z-scoring may work on volatility. Beyond these two sentences, the results were nothing worth writing about.

Nevertheless, we can still attempt to confirm the existence of the low-volatility anomaly in our investable universe by ranking assets on their past volatility.

The portfolios assume a 1-month holding period for momentum signals. To avoid timing luck, four sub-indexes are used, each rebalancing on a different week of the month.

Indeed, in plotting results we see that the lowest volatility quintiles have significantly higher realized Sharpe ratios.

Source: Bloomberg; Tiingo. Calculations by Newfound Research. Returns are hypothetical and backtested. Returns are gross of all management fees, transaction fees, and taxes, but net of underlying fund fees. Total return series assumes the reinvestment of all distributions.

Of the results plotted above, our eyes might be drawn to the results in the short-term volatility measure. It would appear that the top quintile has both a lower total return and much higher volatility than the 3^{rd }and 4^{th }quintiles. This might suggest that we could improve our portfolios risk-adjusted returns without sacrificing total return by avoiding those top-ranked assets.

Unfortunately, this is not so clear cut. Unlike the other signals where the portfolios had meaningful turnover, these quintiles are largely stable. This means that the results are driven more by the composition of the portfolios than the underlying signals. For example, the 3^{rd }and 4^{th }quintiles combine both Treasuries and credit exposure, which allows the portfolio to realize lower volatility due to correlation. The highest volatility quintile, on the other hand, holds both local currency emerging market debt and un-hedged international government bonds, introducing (potentially uncompensated) currency risk into the portfolio.

Thus, the takeaway may be more strategic than tactical: diversification is good and currency exposure is going to increase your volatility.

Oh – and allocating to zero-to-negatively yielding foreign bonds isn’t going to do much for your return unless currency changes bail you out.

Conclusion

In this study, we explored the application of value, momentum, carry, reversal, and volatility signals across fixed income exposures. We found that value, 3-month momentum, carry, and 3-year reversal signals may all provide meaningful information about forward expected returns and risk.

Our confidence in this analysis, however, is potentially crippled by several points:

The time horizon covered is, at best, two decades, and several economic variables are constant throughout it.

The inflation regime over the time period was largely uniform.

A significant proportion of the period covered had near-zero short-term Treasury yields and negative yields in foreign government debt.

Reversal signals require a significant amount of formation data. For example, the 3-year reversal signal requires 6 years (i.e. 3-years of rolling 3-year returns) of data before a signal can be generated. This represents nearly 1/3^{rd }of the data set.

The dispersion in return dynamics (e.g. volatility and correlation) of the underlying assets can lead to the emergence of unintended artifacts in the data that may speak more to portfolio composition than the value-add from the quantitative signal.

We did not test whether certain exposures or certain time periods had an outsized impact upon results.

We did not thoroughly test stability regions for different signals.

We did not test the impact of our holding period assumptions.

Holdings within quantile portfolios were assumed to be equally weighted.

Some of these points can be addressed simply. Stability concerns, for example, can be addressed by testing the impact of varying signal parameterization.

Others are a bit trickier and require more creative thinking or more computational horsepower.

Testing for the outsized impact of a given exposure or a given time period, for example, can be done through sub-sampling and cross-validation techniques. We can think of this as the application of randomness to efficiently cover our search space.

For example, below we re-create our 3-month momentum quintiles, but do so by randomly selecting only 10 of the exposures and 75% of the return period to test. We repeat this resampling 10,000 times for each quintile and plot the distribution of annualized returns below.

Even without performing an official difference-in-means test, the separation between the low and high quintile annualized return distributions provides a clue that the performance difference between these two is more likely to be a pervasive effect rather than due to an outlier holding or outlier time period.

We can make this test more explicit by using this subset resampling technique to bootstrap a distribution of annualized returns for a top-minus-bottom quintile long/short portfolio. Specifically, we randomly select a subset of assets and generate our 3-month momentum signals. We construct a dollar-neutral long/short portfolio by going long assets falling in the top quintile and short assets falling in the bottom quintile. We then select a random sub-period and calculate the annualized return.

Only 207 of the 10,000 samples fall below 0%, indicating a high statistical likelihood that the outperformance of recent winners over recent losers is not an effect dominated by a specific subset of assets or time-periods.

While this commentary provides a first step towards analyzing quantitative style signals across fixed income exposures, more tests need to be run to develop greater confidence in their efficacy.

Source: Bloomberg; Tiingo. Calculations by Newfound Research. Returns are hypothetical and backtested. Returns are gross of all management fees, transaction fees, and taxes, but net of underlying fund fees. Total return series assumes the reinvestment of all distributions.

## Defensive Equity with Machine Learning

By Corey Hoffstein

On May 25, 2020

In Defensive, Risk & Style Premia, Weekly Commentary

This post is available as a PDF download here.## Summary

There is no shortage of hypotheses as to what characteristics define a stock that will outperform in a bear market. Some argue that value stocks should perform well, given their relative valuation buffer (the “less far to fall” argument). Some argue for a focus on balance sheet strength while others argue that cash-flow is the ultimate life blood of a company and should be prioritized. There are even arguments for industry preferences based upon economic cyclicality.

Each recession and crisis is unique, however, and therefore the characteristics of stocks that fare best will likely change. For example, the dot-com run-up caused a large number of real-economy businesses to be sorted into the “cheap” bucket of the value factor. These companies also tended to have higher quality earnings and lower beta / volatility than the dot-com stocks.

Common sense would indicate that unconstrained value may be a natural counter-hedge towards large, speculative bubbles, but we need only look towards 2008 – a credit and liquidity event – to see that value is not a panacea for

everytype of crisis.It is for this reason that some investors prefer to take their cues from market-informed metrics such as beta, volatility, momentum, or trading volume.

Regardless of approach, there are some philosophical limitations we should consider when it comes to expectations with defensive equity portfolios. First, if we were able to identify an approach that could avoid market losses, then we would expect that strategy to also have negative alpha.

^{1}If this were not the case, we could construct an arbitrage.Therefore, in designing a defensive equity portfolio, our aim should be to provide ample downside protection against market losses while minimizing the relative upside participation cost of doing so.

Traditional linear sorts – such as buying the lowest volatility stocks – are coarse by design. They aim to robustly capture a general truth and hedge missed subtleties through diversification. For example, while some stocks deserve to be cheap and some stocks are expensive for good reason, naïve value sorts will do little to distinguish them from those that are unjustifiably cheap or rich.

For a defensive equity portfolio, however, this coarseness may not only reduce effectiveness, but it may also increase the implicit cost. Therefore, in this note we implement non-linear techniques in an effort to more precisely identify combinations of characteristics that may create a more effective defensive equity strategy.

## The Strategy Objective

To start, we must begin by defining precisely what we mean by a “defensive equity strategy.” What are the characteristics that would make us label one security as defensive and another as not? Or, potentially better, is there a characteristic that allows us to rank securities on a gradient of defensiveness?

This is not a trivial decision, as our entire exercise will attempt to maximize the probability of correctly identifying securities with this characteristic.

As our goal is to find those securities which provide the most protection during equity market routs but bleed the least during equity market rallies, we chose a metric that scored how closely a stock’s return reflected the payoff of a call option on the S&P 500 over the next 63 trading days (approximately 3 months).

In other words, if the S&P 500 is positive over the next 63 trading days, the score of a security is equal to the squared difference between its return and the S&P 500’s return. If the market’s return is negative, the score of a security is simply its squared return.

To determine whether this metric reflects the type of profile we want, we can create a long/short portfolio. Each month we rank securities by their scores and select the quintile with the lowest scores. Securities are then weighted by their market capitalization. Securities are held for three months and the portfolio is implemented with three tranches. The short leg of the portfolio is the market rather than the highest quintile, as we are explicitly trying to identify defense

againstthe market.To create a scalable solution, we restrict our investable universe to those in the top 1,000 securities by market capitalization.

We plot the performance below.

Source: Sharadar Fundamentals. Calculations by Newfound Research. Returns are hypothetical and backtested. Returns are gross of all fees including, but not limited to, management fees, transaction fees, and taxes. Returns assume the reinvestment of all distributions.We can see that the strategy is relatively flat during bull markets (1998-2000, 2003-2007, 2011-2015, 2016-2018), but rallies during bear markets and sudden market shocks (2000-2003, 2008, 2011, 2015/2016, Q4 2018, and 2020).

Interestingly, despite having no sector constraints and not explicitly targeting tracking error at the portfolio level, the resulting portfolio ends up well diversified across sectors, though it does appear to make significant short-term jumps in sector weights. We can also see an increasing tilt towards Technology over the last 3 years in the portfolio. In recent months, positions in Financials and Industrials have been almost outright eliminated.

Source: Sharadar Fundamentals. Calculations by Newfound Research.Of course, this metric is explicitly forward looking. We’re using a crystal ball to peer into the future and identify those stocks that track the best on the way up and protect the best on the way down. Our goal, then, is to use a variety of company and security characteristics to accurately forecast this score.

We will include a variety of characteristics and features, including:

Size:Market Capitalization.Valuation: Book-to-Price, Earnings-to-Price, Free Cash Flow-to-Price, Revenue-to-EV, and EBITDA-to-EV.Momentum: 12-1 Month Return and 1-Month Return.Risk:Beta, Volatility, Idiosyncratic Volatility, and Ulcer Index.Quality:Accruals, ROA, ROE, CFOA, GPOA, Net Margin, Asset Turnover, Leverage, and Payout Ratio.Growth:Internal Growth Rate, EPS Growth, Revenue Growth.These 24 features are all cross-sectionally ranked at each point in time. We also include dummy variables for each security to represent sector inclusion as well as whether the company has positive Net Income and whether the company has positive Operating Cash Flow.

Note that we are not including any market regime characteristics, such information about market returns, volatility, interest rates, credit spreads, sentiment, or monetary or fiscal policy. Had we included such features, our resulting model may end up as a factor switching approach, changing which characteristics it selects based upon the market environment. This may be an interesting model in its own right, but our goal herein is simply to design a static, non-linear factor sort.

## Random Forests

Our first approach will be to apply a random forest algorithm, which is an ensemble learning method. The approach uses a training data set to build a number of individual decision trees whose results are then re-combined to create the ultimate decision. By training each tree on a subset of data and considering only a subset of features for each node, we can create trees that may individually have high variance, but as an aggregate forest reduce variance without necessarily increasing bias.

As an example, this means that one tree may be built using a mixture of low volatility and quality features, while another may be built using valuation and momentum features. Each tree is able to model a non-linear relationship, but by restricting tree depth and building trees using random subsets of data and features, we can prevent overfitting.

There are a number of hyperparameters that can be set to govern the model fit. For example, we can set the maximum depth of the individual trees as well as the number of trees we want to fit. Fitting hyperparameters is an art unto itself, and rather than go down the rabbit hole of tuning hyperparameters via cross-validation, we did our best to select reasonable hyper parameters. We elected to train the model on 50% of our data (March 1998 to March 2009), with a total of 100 trees each with a maximum depth of 2.

The results of the exercise are plotted below.

Source: Sharadar Fundamentals. Calculations by Newfound Research.The performance does appear to provide defensive properties both in- and out-of-sample, with meaningful returns generated in 2000-2002, 2008, Q3 and Q4 of 2011, June 2015 through June 2016, and Q4 2008.

We can see that the allocations also express a number of static sector concentrations (e.g. Consumer Defensive) as well as some cyclical changes (e.g. Finances pre- and post-2007).

We can also gain insight into how the portfolio composition changes by looking at the weighted characteristic scores of the long leg of the portfolio over time.

Source: Sharadar Fundamentals. Calculations by Newfound Research.It is important to remember that characteristics are cross-sectionally ranked across stocks. For some characteristics, higher is often considered better (e.g. a higher earnings-to-price cheaper is considered cheaper), whereas for other factors lower is better (e.g. lower volatility is considered to have less risk).

We can see that some characteristics are static tilts: higher market capitalization, positive operating cash flow, positive net income, and lower risk characteristics. Other characteristics are more dynamic. By 12/2008, the portfolio has tilted heavily towards high momentum stocks. A year later, the portfolio has tilted heavily towards low momentum stocks.

What is somewhat difficult to disentangle is whether these static and dynamic effects are due to the non-linear model we have developed, or whether it’s simply that applying static tilts

resultsin the dynamic tilts. For example, if we only applied a low volatility tilt, is it possible that the momentum tilts would emerge naturally?Unfortunately, the answer appears to be the latter. If we plot a long/short portfolio that goes long the bottom quintile of stocks ranked on realized 1-year volatility and short the broad market, we see a very familiar equity curve.

Source: Sharadar Fundamentals. Calculations by Newfound Research. Returns are hypothetical and backtested. Returns are gross of all fees including, but not limited to, management fees, transaction fees, and taxes. Returns assume the reinvestment of all distributions.It would appear that the random forest model effectively identified the benefits of low volatility securities. And while out-of-sample performance does appear to provide more ample defense during 2011, 2015-2016, and 2018 than the low volatility tilt, it also has significantly greater performance drag.

## Gradient Boosting

One potential improvement we might consider is to apply a gradient boosting model. Rather than simply building our decision trees independently in parallel, we can build them sequentially such that each tree is built on a modified version of the original data set (e.g. increasing the weights of those data points that were harder to classify and decreasing the weights on those that were easier).

Rather than just generalize to a low-volatility proxy, gradient boosting may allow our decision tree process to pick up upon greater subtleties and conditional relationships in the data. For comparison purposes, we’ll assume the same maximum tree depth and number of trees as the random forest method.

In initially evaluating the importance of features, it does appear that low volatility remains a critical factor, but other characteristics – such as momentum, free cash flow yield, and payout ratio – are close seconds. This may be a hint that gradient boosting was able to identify more subtle relationships.

Unfortunately, in evaluating the sector characteristics over time, we see a very similar pattern. Though we can notice that sectors like Technology have received a meaningfully higher allocation with this methodology versus the random forest approach.

Source: Sharadar Fundamentals. Calculations by Newfound Research.If we compare long/short portfolios, we find little meaningful difference to our past results. Our model simply seems to identify a (historically less effective) low volatility model.

Source: Sharadar Fundamentals. Calculations by Newfound Research. Returns are hypothetical and backtested. Returns are gross of all fees including, but not limited to, management fees, transaction fees, and taxes. Returns assume the reinvestment of all distributions.## Re-Defining Defensiveness

When we set out on this problem, we made a key decision to define a stock’s defensiveness by how closely it is able to replicate the payoff of a call option on the S&P 500. What if we had elected another definition, though? For example, we could define defensive stocks as those that minimize the depth and frequency of drawdowns using a measure like the Ulcer Index.

Below we replicate the above tests but use forward 12-month Ulcer Index as our target score (or, more precisely, a security’s forward 12-month cross-sectional Ulcer Index rank).

We again begin by constructing an index that has perfect foresight, buying a market-capitalization weighted portfolio of securities that rank in the lowest quintile of forward 12-month ulcer index. We see a very different payoff profile than before, with strong performance exhibited in both bull and bear markets.

By focusing on forward 12-month scores rather than 3-month scores, we also see a far steadier sector allocation profile over time. Interestingly, we still see meaningful sector tilts over time, with sectors like Technology, Financials, and Consumer Defensives coming in and out of favor over time.

We again use a gradient boosted random forest model to try to model our target scores. We find that five of the top six most important features are price return related, either measuring return or risk.

Despite the increased emphasis on momentum, the resulting long/short index still echoes a naïve low-volatility sort. This is likely because negative momentum and high volatility have become reasonably correlated proxies for one another in recent years.

While returns appear improved from prior attempts, the out-of-sample performance (March 2009 and onward) is almost identical to that of the low-volatility long/short.

## Conclusion

In this research note we sought to apply machine learning techniques to factor portfolio construction. Our goal was to exploit the ability of machine learning models to model non-linear relationships, hoping to come up with a more nuanced definition of a defensive equity portfolio.

In our first test, we defined a security’s defensiveness by how closely it was able to replicate the payoff of a call option on the S&P 500 over rolling 63-day (approximately 3-month) periods. If the market was up, we wanted to pick stocks that closely matched the market’s performance; if the market was down, we wanted to pick stocks that minimized drawdown.

After pre-engineering a set of features to capture both company and stock dynamics, we first turned to a random forest model. We chose this model as the decision tree structure would allow us to model conditional feature dynamics. By focusing on generating a large number of shallow trees we aimed to avoid overfitting while still reducing overall model variance.

Training the model on data from 1999-2009, we found that the results strongly favored companies exhibiting positive operating cash flow, positive earnings, and low realized risk characteristics (e.g. volatility and beta). Unfortunately, the model did not appear to provide any meaningful advantage versus a simple linear sort on volatility.

We then turned to applying gradient boosting to our random forest. This approach builds trees in sequence such that each tree seeks to improve upon the last. We hoped that such an approach would allow the random forest to build more nuance than simply scoring on realized volatility.

Unfortunately, the results remained largely the same.

Finally, we decided to change our definition of defensiveness by focusing on the depth and frequency of drawdowns with the Ulcer Index. Again, after re-applying the gradient boosted random forest model, we found little difference in realized results versus a simple sort on volatility (especially out-of-sample).

One answer for these similar results may be that our objective function is highly correlated to volatility measures. For example, if stocks follow a geometric Brownian motion process, those with higher levels of volatility

shouldhave deeper drawdowns. And if the best predictor of future realized volatility is past realized volatility, then it is no huge surprise that the models ultimately fell back towards a naïve volatility sort.Interestingly, value, quality, and growth characteristics seemed largely ignored. We see two potential reasons for this.

The first possibility is that they were simply subsumed by low volatility with respect to our objective. If this were the case, however, we would see little feature importance placed upon them, but would still expect their weighted average characteristic scores within our portfolios to be higher (or lower). While this is true for select features (e.g. payout ratio), the importance of others appears largely cyclical (e.g. earnings-to-price). In fact, during the fall out of the dot-com bubble, weighted average value scores remained between 40 and 70.

The second reason is that the fundamental drivers behind each market sell-off are different. Factors tied to company metrics (e.g. valuation, quality, or growth), therefore, may be ill-suited to navigate different types of sell offs. For example, value was the natural antithesis to the speculative dot-com bubble. However, during the recent COVID-19 crisis, it has been the already richly priced technology stocks that have fared the best. Factors based upon security characteristics (e.g. volatility, returns, or volume) may be better suited to dynamically adjust to market changes.

While our results were rather lackluster, we should acknowledge that we have really only scratched the surface of machine learning techniques. Furthermore, our results are intrinsically linked to how we’ve defined our problem and the features we engineered. A more thoughtful target score or a different set of features may lead to substantially different results.