Author: Corey Hoffstein Page 4 of 18

Corey is co-founder and Chief Investment Officer of Newfound Research.

Corey holds a Master of Science in Computational Finance from Carnegie Mellon University and a Bachelor of Science in Computer Science, cum laude, from Cornell University.

You can connect with Corey on LinkedIn or Twitter.

Should I Stay or Should I Growth Now?

By Corey Hoffstein

On January 21, 2020

In Value, Weekly Commentary

This post is available as a PDF download here.

Summary

Naïve value factor portfolios have been in a drawdown since 2007.
More thoughtful implementations performed well after 2008, with many continuing to generate excess returns versus the market through 2016.
Since 2017, however, most value portfolios have experienced a steep drawdown in their relative performance, significantly underperforming glamour stocks and the market as a whole.
Many investors are beginning to point to the relative fundamental attractiveness of value versus growth, arguing that value is well poised to out-perform going forward.
In this research note, we aim to provide further data for the debate, constructing two different value indices (a style-box driven approach and a factor-driven approach) and measuring the relative attractiveness of fundamental measures versus both the market and growth stocks.

“Should I stay or should I go now?
If I go, there will be trouble
And if I stay it will be double”

— The Clash

It is no secret that quantitative value strategies have struggled as of late. Naïve sorts – like the Fama-French HML factor – peaked around 2007, but most quants would stick their noses up and say, “See? Craftsmanship matters.” Composite metrics, industry-specific scoring, sector-neutral constraints, factor-neutral constraints, and quality screens all helped quantitative value investors stay in the game.

Even a basket of long-only value ETFs didn’t peak against the S&P 500 until mid-2014.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. The Value ETF basket is an equal-weight portfolio of FVAL, IWD, JVAL, OVLU, QVAL, RPV, VLU, and VLUE, with each ETF being included when it is first available. Performance of the long/short portfolio is calculated as the monthly return of the Value ETF Basket minus the monthly return of the S&P 500 (“SPY”).

Many strategies were able to keep the mojo going until 2016 or so. But at that point, the wheels came off for just about everyone.

A decade of under-performance for the most naïve approaches and three-plus years of under-performance for some of the most thoughtful has many people asking, “is quantitative value an outdated idea? Should we throw in the towel and just buy growth?”

Of course, it should come as no surprise that many quantitative value managers are now clamoring that this is potentially the best time to invest in value since the dot-com bubble. “No pain, no premium,” as we like to say.

Nevertheless, the question of value’s attractiveness itself is muddied for a variety of reasons:

How are we defining value?
Are we talking about long/short factors or long-only implementations?
Are we talking about the style-box definition or the factor definition of value?

By no means will this commentary be a comprehensive evaluation as to the attractiveness of Value, but we do hope to provide some more data for the debate.

Replicating Style-Box Growth and Value

If you want the details of how we are defining Growth and Value, read on. Otherwise, you can skip ahead to the next section.

Morningstar invented the style box back in the early 1990s. Originally, value was simply defined based upon price-to-book and price-to-earnings. But somewhere along the line, things changed. Not only was the definition of value expanded to include more metrics, but growth was given an explicit set of metrics to quantify it, as well.

The subtle difference here is rather than measuring cheap versus expensive, the new model more explicitly attempted to capture value versus growth. The problem – at least in my opinion – is that the model makes it such that the growth-iest fund is now the one that simultaneously ranks the highest on growth metrics and the lowest on value metrics. Similarly, the value-iest fund is the one that ranks the highest on value metrics and the lowest on growth metrics. So growth is growing but expensive and value is cheap but contracting.

The index providers took the same path Morningstar did. For example, while MSCI originally defined value and growth based only upon price-to-book, they later amended it to include not only other value metrics, but growth metrics as well. S&P Dow Jones and FTSE Russell follow this same general scheme. Which is all a bit asinine if you ask me.¹

Nevertheless, it is relevant to the discussion as to whether value is attractive or not, as value defined by a style-box methodology can differ from value as defined by a factor methodology. Therefore, to dive under the hood, we created our own “Frankenstein’s style-box” by piecing together different components of S&P Dow Jones’, FTSE Russell’s, and MSCI’s methodologies.

The parent universe is the S&P 500.
Growth metrics are 3-year earnings-per-share growth, 3-year revenue-per-share growth, internal growth rate², and 12-month price momentum.³
Value metrics are book-to-price⁴, earnings-to-price⁵, free-cash-flow-to-price, and sales-to-enterprise-value⁶.
Metrics are all winsorized at the 90^th percentile.
Z-scores for each Growth and Value metric are calculated using market-capitalization weighted means and standard deviations.
An aggregate Growth and Value score is calculated for each security as the sum of the underlying style z-scores.

From this point, we basically follow MSCI’s methodology. Each security is plotted onto a “style space” (see image below) and assigned value and growth inclusion factors based upon the region it falls into. These inclusion factors represent the proportion of a security’s market cap that can be allocated to the Value or Growth index.

Securities are then sorted by their distance from the origin point. Starting with the securities that are furthest from the origin (i.e. those with more extreme style scores), market capitalizations are proportionally allocated to Value and Growth based upon their inclusion factors. Once one style hits 50%, the remaining securities are allocated to the other style regardless of inclusion factors.

Source: MSCI.

The result of this process is that each style represents approximately 50% of the total market capitalization of the S&P 500. The market capitalization for each security will be fully represented in the combination of growth and value and may even be represented in both Value and Growth as a partial weight (though never double counted).

Portfolios are rebalanced semi-annually using six overlapping portfolios.

How Attractive is Value?

To evaluate the relative attractiveness of Growth versus Value, we will evaluate two approaches.

In the first approach, we will make the assumption that fundamentals will not change but prices will revert. In this approach, we will plot the ratio of price-to-fundamental measures (e.g. price-to-earnings of Growth over price-to-earnings of Value) minus 1. This can be thought of as how far price would have to revert between the two indices before valuations are equal.

As an example, consider the following two cases. First, Value has an earnings yield of 2% and Growth has an earnings yield of 1%. In this case, both are expensive (Value has a P/E of 50 and Growth has a P/E of 100), but the price of Value would have to double (or the price of Growth would have to get cut in half) for their valuations to meet. As a second case, Value has an earnings yield of 100% and Growth has an earnings yield of 50%. Both are very cheap, but we would still have to see the same price moves for their fundamentals to meet.

For our second approach, we will assume prices and fundamentals remain constant and ask the question, “how much carry do I earn for this trade?” Specifically, we will measure shareholder yield (dividend yield plus buyback yield) for each index and evaluate the spread.

In both cases, we will decompose our analysis into Growth versus the Market and the Market versus Value to gain a better perspective as to how each leg of the trade is influencing results.

Below we plot the relative ratio for price-to-book, price-to-earnings, price-to-free-cash-flow, and price-to-sales.

Source: Sharadar. Calculations by Newfound Research.

A few things stand out:

The ratio of Growth’s price-to-book versus the S&P 500’s price-to-book appears to be at 2000-level highs. Even the ratio of the S&P 500’s price-to-book versus Value’s price-to-book appears extreme. However, the interpretation of this data is heavily reliant upon whether we believe price-to-book is still a relevant valuation metric. If not, this result may simply be a byproduct of naïve value construction loading up on financials and ignoring technology companies, leading to an artificially high spread. The fact that Growth versus the S&P 500 has far out-stripped the S&P 500 versus Value in this metric might suggest that this result might just be caused Growth loading up on industries where the market feels book value is no longer relevant.
The ratio of price-to-earnings has certainly increased in the past year for both Growth versus the S&P 500 and the S&P 500 versus Value, suggesting an even larger spread for Growth versus Value. We can see, however, that we are still a far way off from 2000 highs.
Ratios for free cash flows actually look to be near 20-year lows.
Finally, we can see that ratios in price-to-sales have meaningfully increased in the last few years. Interestingly, Growth versus the S&P 500 has climbed much faster than the S&P 500 versus Value, suggesting that moving from Growth to the S&P 500 may be sufficient for de-risking against reversion. Again, while these numbers sit at decade highs, they are still well below 2000-era levels.

Below we plot our estimate of carry (i.e. our return expectation given no change in prices): shareholder yield. Again, we see recent-era highs, but levels still well below 2000 and 2008 extremes.

Source: Sharadar. Calculations by Newfound Research.

Taken all together, value certainly appears cheaper – and a trade we likely would be paid more to sit on than we had previously – but a 2000s-era opportunity seems a stretch.

Growth is not Glamour

One potential flaw in the above analysis is that we are evaluating “Value 1.0” indices. More modern factor indices drop the “not Growth” aspect of defining value, preferring to focus only on valuation metrics. Therefore, to acknowledge that investors today may be evaluating the choice of a Growth 1.0 index versus a modern Value factor index, we repeat the above analysis using a Value strategy more consistent with current smart-beta products.

Specifically, we winsorize earnings yield, free-cash-flow yield, and sales yield and then compute market-cap-weighted z-scores. A security’s Value score is then equal to its average z-score across all three metrics with no mention of growth scores. The strategy selects the securities in the top quintile of Value scores and weights them in proportion to their value-score-scaled market capitalization. The strategy is rebalanced semi-annually using six overlapping portfolios.

Source: Sharadar. Calculations by Newfound Research.

We can see:

In the Value 1.0 approach, moving from Growth appeared much more expensive versus the S&P 500 than the S&P 500 did versus Value. With a more concentrated approach, the S&P 500 now appears far more expensive versus Value than Growth does versus the S&P 500.
Relative price-to-book (despite price-to-book no longer being a focus metric) still appears historically high. While it peaked in Q3 2019, meaningful reversion could still occur. All the same caveats as before apply, however.
Relative price-to-earnings did appear to hit multi-decade highs (excluding the dot-com era) in early 2019. If the prior 6/2016-to-2/2018 reversion is the playbook, then we appear to be halfway home.
Relative price-to-free-cash-flow and price-to-sales are both near recent highs, but both below 2008 and dot-com era levels.

Plotting our carry for this trade, we do see a more meaningful divergence between Value and Growth. Furthermore, the carry for bearing Value risk does appear to be at decade highs; however it is certainly not at extreme levels and it has actually reverted from Q3 2019 highs.

Source: Sharadar. Calculations by Newfound Research.

Conclusion

In this research note, we sought to explore the current value-of-value. Unfortunately, it proves to be an elusive question, as the very definition of value is difficult to pin down.

For our first approach, we build a style-box driven definition of Value. We then plot the relative ratio of four fundamental measures – price-to-book, price-to-earnings, price-to-sales, and price-to-free-cash-flow – of Growth versus the S&P 500 and the S&P 500 versus Value. We find that both Growth and the S&P 500 look historically expensive on price-to-book and price-to-earnings metrics (implying that Value is very, very cheap), whereas just Growth looks particularly expensive for price-to-sales (implying that Value may not be cheap relative to the Market). However, none of the metrics look particularly cheap compared to the dot-com era.

We also evaluate Shareholder Yield as a measure of carry, finding that Value minus Growth reached a 20-year high in 2019 if the dot-com and 2008 periods are excluded.

Recognizing that many investors may prefer a more factor-based definition of value, we run the same analysis for a more concentrated value portfolio. Whereas the first analysis generally pointed to Growth versus the S&P 500 being more expensive than the S&P 500 versus Value trade, the factor-based approach finds the opposite conclusion. Similar to the prior results, Value appears historically cheap for price-to-book, price-to-earnings, and price-to-sales metrics, though it appears to have peaked in Q3 2019.

Finally, the Shareholder Yield spread for the factor approach also appears to be at multi-decade highs ignoring the dot-com and 2008 extremes.

Directionally, this analysis suggests that Value may indeed be cheaper-than-usual. Whether that cheapness is rational or not, however, is only something we’ll know with the benefit of hindsight.

For further reading on style timing, we highly recommend Style Timing: Value vs Growth (AQR). For more modern interpretations: Value vs. Growth: The New Bubble (QMA), It’s Time for a Venial Value-Timing (AQR), and Reports of Value’s Death May Be Greatly Exaggerated (Research Affiliates).

Pursuing Factor Purity

By Corey Hoffstein

On January 6, 2020

In Risk & Style Premia, Weekly Commentary

This post is available as a PDF download here.

Summary

Factors play an important role for quantitative portfolio construction.
How a factor is defined and how a factor portfolio is constructed play important roles in the results achieved.
Naively constructed portfolios – such as most “academic” factors – can lead to latent style exposures and potentially large unintended bets.
Through numerical techniques, we can seek to develop pure factors that provide targeted exposure to one style while neutralizing exposure to the rest.
In this research note, we implement a regression-based and optimized-based approach to achieving pure factor portfolios and report the results achieved.

Several years ago, we penned a note titled Separating Ingredients and Recipe in Factor Investing (May 21, 2018). In the note we discussed why we believe it is important for investors and allocators to consider not just what ingredients are going into their portfolios – i.e. securities, styles, asset classes, et cetera – but the recipe by which those ingredients are combined. Far too often the ingredients are given all the attention, but mistake salt for sugar and I can guarantee that you’re not going to enjoy your cake, regardless of the quality of the salt.

As an example, the note focused on constructing momentum portfolios. By varying the momentum measure, lookback period, rebalance frequency, portfolio construction, weighting scheme, and sector constraints we constructed over 1,000 momentum strategies. The resulting dispersion between the momentum strategies was more-often-than-not larger than the dispersion between generic value (top 30% price-to-book) and momentum (top 30% by 12-1 prior returns).

Yet having some constant definition for factor portfolios is desirable for a number of reasons, including both alpha signal generation and return attribution.

One potential problem for naïve factor construction – e.g. a simple characteristic rank-sort – is that it can lead to time-varying correlations between factors.

For example, below we plot the correlation between momentum and value, size, growth, and low volatility factors. We can see significant time-varying behavior; for example, in 2018 momentum and low volatility exhibited moderate negative correlation, while in 2019 they exhibited significant positive correlation.

The risk of time-varying correlations is that they can potentially leading to the introduction of unintended bets within single- or multi-factor portfolios or make it more difficult to determine with accuracy a portfolio’s sensitivity to different factors.

More broadly, low and stable correlations are preferable – assuming they can be achieved without meaningfully sacrificing expected returns – because they should allow investors to develop portfolios with lower volatility and higher information ratios.

Naively constructed equity styles can also exhibit time-varying correlations to traditional economic factors (e.g. interest rate risk), risk premia (e.g. market beta) or risk factors (e.g. sector or country exposure).

But equity styles can even exhibit time-varying sensitivities to themselves. For example, below we multiply the weights of naively constructed long/short style portfolios against the characteristic z-scores for the underlying holdings. As the characteristics of the underlying securities change, so does the actual weighted characteristic score of the portfolio. While some signals stay quite steady (e.g. size), others can vary substantially; sometimes value is just more value-y.

Source: Sharadar. Calculations by Newfound Research. Factor portfolios self-financing long/short portfolios that are long the top quintile and short the bottom quintile of securities, equally weighted and rebalanced monthly, ranked based upon their specific characteristics (see below).

In the remainder of this note, we will explore two approaches to constructing “pure” factor portfolios that can be used to generate a factor portfolio that neutralizes exposure to risk factors and other style premia.

Using the S&P 500 as our parent universe, we will construct five different factors defined by the security characteristics below:

Value (VAL): Earnings yield, free cash flow yield, and revenue yield.
Size (SIZE): Negative log market capitalization.
Momentum (MOM): 12-1 month total return.
Quality (QUAL): Return on equity¹, negative accruals ratio, negative leverage ratio².
Low Volatility (VOL): Negative 12-month realized volatility.

All characteristics are first cross-sectionally winsorized at the 5^th and 95^th percentiles, then cross-sectionally z-scored, and finally averaged (if a style is represented by multiple scores) to create a single score for each security.

Naively constructed style benchmarks are 100% long the top-ranked quintile of securities and 100% short the bottom-ranked quintile, with securities receiving equal weights.

Factor Mimicry with Fama-MacBeth

Our first approach to designing “pure” factor portfolios is inspired by Fama-MacBeth (1973)³. Fama-MacBeth regression is a two-step approach:

Regress each security against proposed risk factors to determine the security’s beta for that risk factor;
Regress all security returns for a fixed time period against the betas to determine the risk premium for each factor.

Similarly, we will assume a factor model where the return for a given security can be defined as:

Where R_m is the return of the market and RF_j is the return for some risk factor. In this equation, the betas define a security’s sensitivity to a given risk factor. However, instead of using the Fama-MacBeth two-step approach to solve for the factor betas, we can replace the betas with factor characteristic z-scores.

Using these known scores, we can both estimate the factor returns using standard regression⁴ and extract the weights of the factor mimicking portfolios. The upside to this approach is that each factor mimicking portfolios will, by design, have constant unit exposure to its specific factor characteristic and zero exposure to the others.

Here we should note that unless an intercept is added to the regression equation, the factor mimicking portfolios will be beta-neutral but not dollar-neutral. This can have a substantial impact on factors like low volatility (VOL), where we expect our characteristics to be informative about risk-adjusted returns but not absolute returns. We can see the impact of this choice in the factor return graphs plotted below.⁵

Furthermore, by utilizing factor z-scores, this approach will neutralize characteristic exposure, but not necessarily return exposure. In other words, correlations between factor returns may not be zero. A further underlying assumption of this construction is that an equal-weight portfolio of all securities is style neutral. Given that equal-weight portfolios are generally considered to embed positive size and value tilts, this is an assumption we should be cognizant of.

Attempting to compare these mimic portfolios versus our original naïve construction is difficult as they target a constant unit of factor exposure, varying their total notional exposure to do so. Therefore, to create an apples-to-apples comparison, we adjust both sets of factors to target a constant volatility of 5%.

We can see that neutralizing market beta and other style factors leads to an increase in annualized return for value, size, momentum, and quality factors, leading to a corresponding increase in information ratio. Unfortunately, none of these results are statistically significant at a 5% threshold.

Nevertheless, it may still be informative to take a peek under the hood to see how the weights shook out. Below we plot the average weight by security characteristic percentile (at each rebalance, securities are sorted into percentile score bins and their weights are summed together; weights in each bin are then averaged over time).

Before reviewing the weights, however, it is important to recall that each portfolio is designed to capture a constant unit exposure to a style and therefore total notional exposure will vary over time. To create a fairer comparison across factors, then, we scale the weights such that each leg has constant 100% notional exposure.

As we would generally expect, all the factors are over-weight high scoring securities and underweight low scoring securities. What is interesting to note, however, is that the shapes by which they achieve their exposure are different. Value, for example leans strongly into top decile securities whereas quality leans heavily away (i.e. shorts) the bottom decile. Unlike the other factors which are largely positively sloped in their weights, low volatility exhibits fairly constant positive exposure above the 50^th percentile.

What may come as a surprise to many is how diversified the portfolios appear to be across securities. This is because the regression result is equivalent to minimizing the sum of squared weights subject to target exposure constraints.

Source: Sharadar. Calculations by Newfound Research.

While we focused specifically on neutralizing style exposure, this approach can be extended to also neutralize industry / sector exposure (e.g. with dummy variables), region exposure, and even economic factor exposure. Special care must be taken, however, to address potential issues of multi-collinearity.

Pure Quintile Portfolios with Optimization

Liu (2016)⁶ proposes an alternative means for constructing pure factor portfolios using an optimization-based approach. Specifically, long-only quintile portfolios are constructed such that:

They minimize the squared sum of weights;
Their weighted characteristic exposure for the target style is equal to the weighted characteristic exposure of a naïve, equally-weighted, matching quintile portfolio; and
Weighted characteristic exposure for non-targeted styles equals zero.

While the regression-based approach was fast due to its closed-form solution, an optimization-based approach can potentially allow for greater flexibility in objectives and constraints.

Below we replicate the approach proposed in Liu (2016) and then create dollar-neutral long/short factor portfolios by going long the top quintile portfolio and short the bottom quintile portfolio. Portfolios are re-optimized and rebalanced monthly. Unlike the regression-based approach, however, these portfolios do not seek to be beta-neutral.

We can see that the general shapes of the factor equity curves remain largely similar to the naïve implementations. Unlike the results reported in Liu (2016), however, we measure a decline in return among several factors (e.g. value and size). We also find that annualized volatility is meaningfully reduced for all the optimized portfolios; taken together, information ratio differences are statistically indistinguishable from zero.

As with the regression-based approach, we can also look at the average portfolio exposures over time to characteristic ranks. Below we plot these results for both the naïve and optimized Value quintiles. We can see that the top and bottom quintiles lean heavily into top- and bottom-decile securities, while 2^nd, 3^rd, and 4^th quintiles had more diversified security exposure on average. Similar weighting profiles are displayed by the other factors.

Source: Sharadar. Calculations by Newfound Research.

Conclusion

Factors are easy to define in general but difficult to define explicitly. Commonly accepted academic definitions are easy to construct and track, but often at the cost of inconsistent style exposure and the risk of latent, unintended bets. Such impure construction may lead to time-varying correlations between factors, making it more difficult for managers to manage risk as well as disentangle the true source of returns.

In this research note we explored two approaches that attempt to correct for these issues: a regression-based approach and an optimization-based approach. With each approach, we sought to eliminate non-target style exposure, resulting in a pure factor implementation.

Despite a seemingly well-defined objective, we still find that how “purity” is defined can lead to different results. For example, in our regression-based approach we targeted unit style exposure and beta-neutrality, allowing total notional exposure to vary. In our optimization-based approach, we constructed long-only quintiles independently, targeting the same weighted-average characteristic exposure as a naïve, equal-weight factor portfolio. We then built a long/short implementation from the top and bottom quintiles. The results between the regression-based and optimization-based approaches were markedly different.

And, statistically, not any better than the naïve approaches.

This is to say nothing of other potential choices we could make about defining “purity.” For example, what assumptions should we make about industry, sector, or regional exposures?

More broadly, is “purity” even desirable?

In Do Factors Market Time? (June 5, 2017) we demonstrated that beta timing was an unintentional byproduct of naïve value, size, and momentum portfolios and had actually been a meaningful tailwind for value from 1927-1957. Some factors might actually be priced across industries rather than just within them (Vyas and van Baren (2019)⁷). Is the chameleon-like nature of momentum to rapidly tilt towards whatever style, sector, or theme has been recently outperforming a feature or a bug?

And this is all to say nothing of the actual factor definitions we selected.

While impurity may be a latent risk for factor portfolios, we believe this research suggests that purity is in the eye of the beholder.

Timing Trend Model Specification with Momentum

By Corey Hoffstein

On December 23, 2019

In Craftsmanship, Risk & Style Premia, Trend, Weekly Commentary

A PDF version of this post is available here.

Summary

Over the last several years, we have written several research notes demonstrating the potential benefits of diversifying “specification risk.”
Specification risk occurs when an investment strategy is overly sensitive to the outcome of a single investment process or parameter choice.
Adopting an ensemble approach is akin to creating a virtual fund-of-funds of stylistically similar managers, exhibiting many of the same advantages of traditional multi-manager diversification.
In this piece, we briefly explore whether model specification choices can be timed using momentum within the context of a naïve trend strategy.
We find little evidence that momentum-based parameter specification leads to meaningful or consistent improvements beyond a naively diversified approach.

Over the last several years, we’ve advocated on numerous occasions for a more holistic view of diversification: one that goes beyond just what we invest in, but also considers how those decisions are made and when they are made.

We believe that this style of thinking can be applied “all the way down” our process. For example, how-based diversification would advocate for the inclusion of both value and momentum processes, as well as for different approaches to capturing value and momentum.

Unlike correlation-based what diversification, how-based diversification often does little for traditional portfolio risk metrics. For example, in Is Multi-Manager Diversification Worth It? we demonstrated that within most equity categories, allocating across multiple managers does almost nothing to reduce portfolio volatility. It does, however, have a profound impact on the dispersion of terminal wealth that is achieved, often by avoiding manager-specific tail-risks. In other words, our certainty of achieving a given outcome may be dramatically improved by taking a multi-manager approach.

Ensemble techniques to portfolio construction can be thought of as adopting this same multi-manager approach by creating a set of virtual managers to allocate across.

In late 2018, we wrote two notes that touched upon this: When Simplicity Met Fragility and What Do Portfolios and Teacups Have in Common? In both studies we injected a bit of randomness into asset returns to measure the stability of trend-following strategies. We found that highly simplistic models tended to exhibit significant deviations in results with just slightly modified inputs, suggesting that they are highly fragile. Increasing diversification across what, how, and when axes led to a significant improvement in outcome stability.

As empirical evidence, we studied the real-time results of the popular Dual Momentum GEM strategy in our piece Fragility Case Study: Dual Momentum GEM, finding that slight deviations in model specification lead to significantly different allocation conclusions and therefore meaningfully different performance results. This was particularly pronounced over short horizons.

Tying trend-following to option theory, we then demonstrated how an ensemble of trend following models and specifications could be used to increase outcome certainty in Tightening the Uncertain Payout of Trend-Following.

Yet while more diversification appears to make portfolios more consistent in the outcomes they achieve, empirical evidence also suggests that certain specifications can lead to superior results for prolonged periods of time. For example, slower trend following signals appear to have performed much, much better than fast trend following signals over the last two decades.

One of the benefits of being a quant is that it is easy to create thousands of virtual managers, all of whom may follow the same style (e.g. “trend”) but implement with a different model (e.g. prior total return, price-minus-moving-average, etc) and specification (e.g. 10 month, 200 day, 13 week / 34 week cross, etc). An ancillary benefit is that it is also easy to re-allocate capital among these virtual managers.

Given this ease, and knowing that certain specifications can go through prolonged periods of out-performance, we might ask: can we time specification choices with momentum?

Timing Trend Specification

In this research note, we will explore whether momentum signals can help us time out specification choices as it relates to a simple long/flat U.S. trend equity strategy.

Using data from the Kenneth French library, our strategy will hold broad U.S. equities when the trend signal is positive and shift to the risk-free asset when trends are negative. We will develop 1023 different strategies by employing three different models – prior total return, price-minus-moving-average, and dual-moving-average-cross-over – with lookback choices spanning from 20-to-360 days in length.

After constructing the 1023 different strategies, we will then apply a momentum model that ranks the models based upon prior returns and equally-weights our portfolio across the top 10%. These choices are made daily and implemented with 21 overlapping portfolios to reduce the impact of rebalance timing luck.

It should be noted that because the underlying strategies are only allocating between U.S. equities and a risk-free asset, they can go through prolonged periods where they have identical returns or where more than 10% of models share the highest prior return. In these cases, we select all models that have returns equal-to-or-greater-than the model identified at the 10^th percentile.

Before comparing performance results, we think it is worthwhile to take a quick look under the hood to see whether the momentum-based approach is actually creating meaningful tilts in specification selection. Below we plot both aggregate model and lookback weights for the 126-day momentum strategy.

Source: Kenneth French Data Library. Calculations by Newfound Research.

We can see that while the model selection remains largely balanced, with the exception of a few periods, the lookback horizon selection is far more volatile. On average, the strategy preferred intermediate-to-long-term signals (i.e. 181-to-360 day), but we can see intermittent periods where short-term models carried favor.

Did this extra effort generate value, though? Below we plot the ratio of the momentum strategies’ equity curves versus the naïve diversified approach.

We see little consistency in relative performance and four of the five strategies end up flat-to-worse. Only the 252-day momentum strategy out-performs by the end of the testing period and this is only due to a stretch of performance from 1950-1964. In fact, since 1965 the relative performance of the 252-day momentum model has been negative versus the naively diversified approach.

Source: Kenneth French Data Library. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

This analysis suggests that naïve, momentum-based specification selection does not appear to have much merit against a diversified approach for our simple trend equity strategy.

The Potential Benefits of Virtual Rebalancing

One potential benefit of an ensemble approach is that rebalancing across virtual managers can generate growth under certain market conditions. Similar to a strategically rebalanced portfolio, we find that when returns across virtual managers are expected to be similar, consistent rebalancing can harvest excess returns above a buy-and-hold approach.

The trade-off, of course, is that when there is autocorrelation in specification performance, rebalancing creates a drag. However, given that the evidence above suggests that relative performance between specifications is not persistent, we might expect that continuously rebalancing across our ensemble of virtual managers may actually allow us to harvest returns above and beyond what might be possible with just selecting an individual manager.

Conclusion

In this study, we explored whether we could time model specification choices in a simple trend equity strategy using momentum signals.

Testing different lookback horizons of 21-through-378 days, we found little evidence of meaningful persistence in the returns of different model specifications. In fact, four of the five momentum models we studied actually under-performed a naïve, diversified. The one model that did out-perform only seemed to do so due to strong performance realized over the 1950-1964 period, actually relatively under-performing ever since.

While this evidence suggests that timing specification with momentum may not be a fruitful approach, it does suggest that the lack of return persistence may benefit diversification for a second reason: rebalancing. Indeed, barring any belief that one specification would necessarily do better than another, consistently re-pooling and distributing resources through rebalancing may actually lead to the growth-optimal solution.¹ This potentially implies an even higher hurdle rate for specification-timers to overcome.

Diversification: More Than “What”

By Corey Hoffstein

On November 25, 2019

In Risk Management, Weekly Commentary

Not seeing the video? Click here.

The Dumb (Timing) Luck of Smart Beta

By Corey Hoffstein

On November 18, 2019

In Craftsmanship, Defensive, Momentum, Popular, Portfolio Construction, Risk & Style Premia, Value, Weekly Commentary

This post is available as a PDF download here.

Summary

In past research notes we have explored the impact of rebalance timing luck on strategic and tactical portfolios, even using our own Systematic Value methodology as a case study.
In this note, we generate empirical timing luck estimates for a variety of specifications for simplified value, momentum, low volatility, and quality style portfolios.
Relative results align nicely with intuition: higher concentration and less frequent rebalancing leads to increasing levels of realized timing luck.
For more reasonable specifications – e.g. 100 stock portfolios rebalanced semi-annually – timing luck ranges between 100 and 400 basis points depending upon the style under investigation, suggesting a significant risk of performance dispersion due only to when a portfolio is rebalanced and nothing else.
The large magnitude of timing luck suggests that any conclusions drawn from performance comparisons between smart beta ETFs or against a standard style index may be spurious.

We’ve written about the concept of rebalance timing luck a lot. It’s a cowbell we’ve been beating for over half a decade, with our first article going back to August 7^th, 2013.

As a reminder, rebalance timing luck is the performance dispersion that arises from the choice of a particular rebalance date (e.g. semi-annual rebalances that occur in June and December versus March and September).

We’ve empirically explored the impact of rebalance timing luck as it relates to strategic asset allocation, tactical asset allocation, and even used our own Systematic Value strategy as a case study for smart beta. All of our results suggest that it has a highly non-trivial impact upon performance.

This summer we published a paper in the Journal of Index Investing that proposed a simple solution to the timing luck problem: diversification. If, for example, we believe that our momentum portfolio should be rebalanced every quarter – perhaps as an optimal balance of cost and signal freshness – then we proposed splitting our capital across the three portfolios that spanned different three-month rebalance periods (e.g. JAN-APR-JUL-OCT, FEB-MAY-AUG-NOV, MAR-JUN-SEP-DEC). This solution is referred to either as “tranching” or “overlapping portfolios.”

The paper also derived a formula for estimating timing luck ex-ante, with a simplified representation of:

Where L is the timing luck measure, T is turnover rate of the strategy, F is how many times per year the strategy rebalances, and S is the volatility of a long/short portfolio that captures the difference of what a strategy is currently invested in versus what it could be invested in if the portfolio was reconstructed at that point in time.

Without numbers, this equation still informs some general conclusions:

Higher turnover strategies have higher timing luck.
Strategies that rebalance more frequently have lower timing luck.
Strategies with a less constrained universe will have higher timing luck.

Bullet points 1 and 3 may seem similar but capture subtly different effects. This is likely best illustrated with two examples on different extremes. First consider a very high turnover strategy that trades within a universe of highly correlated securities. Now consider a very low turnover strategy that is either 100% long or 100% short U.S. equities. In the first case, the highly correlated nature of the universe means that differences in specific holdings may not matter as much, whereas in the second case the perfect inverse correlation means that small portfolio differences lead to meaningfully different performance.

L, in and of itself, is a bit tricky to interpret, but effectively attempts to capture the potential dispersion in performance between a particular rebalance implementation choice (e.g. JAN-APR-JUL-OCT) versus a timing-luck-neutral benchmark.

After half a decade, you’d would think we’ve spilled enough ink on this subject.

But given that just about every single major index still does not address this issue, and since our passion for the subject clearly verges on fever pitch, here comes some more cowbell.

Equity Style Portfolio Definitions

In this note, we will explore timing luck as it applies to four simplified smart beta portfolios based upon holdings of the S&P 500 from 2000-2019:

Value: Sort on earnings yield.
Momentum: Sort on prior 12-1 month returns.
Low Volatility: Sort on realized 12-month volatility.
Quality: Sort on average rank-score of ROE, accruals ratio, and leverage ratio.

Quality is a bit more complicated only because the quality factor has far less consistency in accepted definition. Therefore, we adopted the signals utilized by the S&P 500 Quality Index.

For each of these equity styles, we construct portfolios that vary across two dimensions:

Number of Holdings: 50, 100, 150, 200, 250, 300, 350, and 400.
Frequency of Rebalance: Quarterly, Semi-Annually, and Annually.

For the different rebalance frequencies, we also generate portfolios that represent each possible rebalance variation of that mix. For example, Momentum portfolios with 50 stocks that rebalance annually have 12 possible variations: a January rebalance, February rebalance, et cetera. Similarly, there are 12 possible variations of Momentum portfolios with 100 stocks that rebalance annually.

By explicitly calculating the rebalance date variations of each Style x Holding x Frequency combination, we can construct an overlapping portfolios solution. To estimate empirical annualized timing luck, we calculate the standard deviation of monthly return dispersion between the different rebalance date variations of the overlapping portfolio solution and annualize the result.

Empirical Timing Luck Results

Before looking at the results plotted below, we would encourage readers to hypothesize as to what they expect to see. Perhaps not in absolute magnitude, but at least in relative magnitude.

For example, based upon our understanding of the variables affecting timing luck, would we expect an annually rebalanced portfolio to have more or less timing luck than a quarterly rebalanced one?

Should a more concentrated portfolio have more or less timing luck than a less concentrated variation?

Which factor has the greatest risk of exhibiting timing luck?

Source: Sharadar. Calculations by Newfound Research.

To create a sense of scale across the styles, below we isolate the results for semi-annual rebalancing for each style and plot it.

Source: Sharadar. Calculations by Newfound Research.

In relative terms, there is no great surprise in these results:

More frequent rebalancing limits the risk of portfolios changing significantly between rebalance dates, thereby decreasing the impact of timing luck.
More concentrated portfolios exhibit larger timing luck.
Faster-moving signals (e.g. momentum) tend to exhibit more timing luck than more stable, slower-moving signals (e.g. low volatility).

What is perhaps the most surprising is the sheer magnitude of timing luck. Consider that the S&P 500 Enhanced Value, Momentum, Low Volatility, and Quality portfolios all hold 100 securities and are rebalanced semi-annually. Our study suggests that timing luck for such approaches may be as large as 2.5%, 4.4%, 1.1%, and 2.0% respectively.

But what does that really mean? Consider the realized performance dispersion of different rebalance date variations of a Momentum portfolio that holds the top 100 securities in equal weight and is rebalanced on a semi-annual basis.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

The 4.4% estimate of annualized timing luck is a measure of dispersion between each underlying variation and the overlapping portfolio solution. If we isolate two sub-portfolios and calculate rolling 12-month performance dispersion, we can see that the difference can be far larger, as one might exhibit positive timing luck while the other exhibits negative timing luck. Below we do precisely this for the APR-OCT and MAY-NOV rebalance variations.

In fact, since these variations are identical in every which way except for the date on which they rebalance, a portfolio that is long the APR-OCT variation and short the MAY-NOV variation would explicitly capture the effects of rebalance timing luck. If we assume the rebalance timing luck realized by these two portfolios is independent (which our research suggests it is), then the volatility of this long/short is approximately the rebalance timing luck estimated above scaled by the square-root of two.

Derivation: For variations v_i and v_j and overlapping-portfolio solution V, then:

Thus, if we are comparing two identically-managed 100-stock momentum portfolios that rebalance semi-annually, our 95% confidence interval for performance dispersion due to timing luck is +/- 12.4% (2 x SQRT(2) x 4.4%).

Even for more diversified, lower turnover portfolios, this remains an issue. Consider a 400-stock low-volatility portfolio that is rebalanced quarterly. Empirical timing luck is still 0.5%, suggesting a 95% confidence interval of 1.4%.

S&P 500 Style Index Examples

One critique of the above analysis is that it is purely hypothetical: the portfolios studied above aren’t really those offered in the market today.

We will take our analysis one step further and replicate (to the best of our ability) the S&P 500 Enhanced Value, Momentum, Low Volatility, and Quality indices. We then created different rebalance schedule variations. Note that the S&P 500 Low Volatility index rebalances quarterly, so there are only three possible rebalance variations to compute.

We see a meaningful dispersion in terminal wealth levels, even for the S&P 500 Low Volatility index, which appears at first glance in the graph to have little impact from timing luck.

	Minimum Terminal Wealth	Maximum Terminal Wealth
Enhanced Value	$4.45	$5.45
Momentum	$3.07	$4.99
Low Volatility	$6.16	$6.41
Quality	$4.19	$5.25

We should further note that there does not appear to be one set of rebalance dates that does significantly better than the others. For Value, FEB-AUG looks best while JUN-DEC looks the worst; for Momentum it’s almost precisely the opposite.

Furthermore, we can see that even seemingly closely related rebalances can have significant dispersion: consider MAY-NOV and JUN-DEC for Momentum. Here is a real doozy of a statistic: at one point, the MAY-NOV implementation for Momentum is down -50.3% while the JUN-DEC variation is down just -13.8%.

These differences are even more evident if we plot the annual returns for each strategy’s rebalance variations. Note, in particular, the extreme differences in Value in 2009, Momentum in 2017, and Quality in 2003.

Conclusion

In this study, we have explored the impact of rebalance timing luck on the results of smart beta / equity style portfolios.

We empirically tested this impact by designing a variety of portfolio specifications for four different equity styles (Value, Momentum, Low Volatility, and Quality). The specifications varied by concentration as well as rebalance frequency. We then constructed all possible rebalance variations of each specification to calculate the realized impact of rebalance timing luck over the test period (2000-2019).

In line with our mathematical model, we generally find that those strategies with higher turnover have higher timing luck and those that rebalance more frequently have less timing luck.

The sheer magnitude of timing luck, however, may come as a surprise to many. For reasonably concentrated portfolios (100 stocks) with semi-annual rebalance frequencies (common in many index definitions), annual timing luck ranged from 1-to-4%, which translated to a 95% confidence interval in annual performance dispersion of about +/-1.5% to +/-12.5%.

The sheer magnitude of timing luck calls into question our ability to draw meaningful relative performance conclusions between two strategies.

We then explored more concrete examples, replicating the S&P 500 Enhanced Value, Momentum, Low Volatility, and Quality indices. In line with expectations, we find that Momentum (a high turnover strategy) exhibits significantly higher realized timing luck than a lower turnover strategy rebalanced more frequently (i.e. Low Volatility).

For these four indices, the amount of rebalance timing luck leads to a staggering level of dispersion in realized terminal wealth.

“But Corey,” you say, “this only has to do with systematic factor managers, right?”

Consider that most of the major equity style benchmarks are managed with annual or semi-annual rebalance schedules. Good luck to anyone trying to identify manager skill when your benchmark might be realizing hundreds of basis points of positive or negative performance luck a year.