Re-specifying the Fama French 3-Factor Model

On December 16, 2019

In Craftsmanship, Portfolio Construction, Risk & Style Premia, Risk Management, Value, Weekly Commentary

This post is available as a PDF download here.

Summary

The Fama French three-factor model provides a powerful tool for assessing exposures to equity risk premia in investment strategies.
In this note, we explore alternative specifications of the value (HML) and size (SMB) factors using price-to-earnings, price-to-cash flow, and dividend yield.
Running factor regressions using these alternate specifications on a suite of value ETFs and Newfound’s Systematic Value strategy, lead to a wide array of results, both numerically and directionally.
While many investors consider the uncertainty of the parameter estimates from the regression using the three-factor model, most do not consider the uncertainty that comes from the assumption of how you construct the equity factors in the first place.
Understanding the additional uncertainty is crucial for manager and investors who must consider what risks they are trying to measure and control by using tools like factor regression and make sure their assumptions align with their goals.

In their 1992 paper, The Cross-Section of Expected Stock Returns, Eugene Fama and Kenneth French outlined their three-factor model to explain stock returns.

While the Capital Asset Pricing Model (CAPM) only describes asset returns in relation to their exposure to the market’s excess return through the stock’s beta and identifies any return beyond that as alpha, Fama and French’s three-factor model reattributed some of that supposed alpha to exposures to a value factor (High-minus-low or HML) based on returns stratified by price-to-book ratios and a size factor (small-minus-big or SMB) based on returns stratified by market capitalization.

This gave investors a tool to judge investment strategies based on the loadings to these risk factors. A manager with a seemingly high alpha may have simply been investing in value and small-cap stocks historically.

The notion of compensated risk premia has also opened the floodgate of many additional factors from other researchers (such as momentum, quality, low beta, etc.) and even two more factors from Fama and French (investment and profitability).

A richer factor universe opens up a wide realm of possibilities for analysis and attribution. However, setting further developments aside and going back to the original three-factor model, we would be remiss if we didn’t dive a bit further into its specification.

At the highest level, we agree with treating “value” and “size” as risk factors, but there is more than one way to skin a factor.

What is “value”?

Fama and French define it using the price-to-book ratio of a stock. This seems legitimate for a broad swath of stocks, especially those that are very capital intensive – such as energy, manufacturing, and financial firms – but what about industries that have structurally lower book values and may have other potential price drivers? For example, a technology company might have significant intangible intellectual property and some utility companies might employ leverage, which decreases their book value substantially.

To determine value in these sectors, we might utilize ratios that account for sales, dividends, or earnings. But then if we analyzed these strategies using the Fama French three-factor model as it is specified, we might misjudge the loading on the value factor.

“Size” seems more straightforward. Companies with low market capitalizations are small. However, when we consider how the size factor is defined based on the value factor, there might even be some differences in SMB using different value metrics.

In this commentary, we will explore what happens when we alter the definition of value for the value factor (and hence the size factor) and see how this affects factor regressions of a sample of value ETFs along with our Systematic Value strategy.

HML Factor Definitions

In the standard version of the Fama French 3-factor model, HML is constructed as a self-financing long/short portfolio using a 2×3 sort on size and value. The investment universe is split in half based on market capitalization and in three parts (30%/40%/30%) based on valuation, in this base case, price-to-book ratio.

Using additional data from the Kenneth French Data Library and the same methodology, we will construct HML factors using sorts based on size and:

Price-to-earnings ratios
Price-to-cash flow ratios
Dividend yields

The common inception date for all the factors is June 1951.

The chart below shows the growth of each of the four value factor portfolios.

Source: Kenneth French Data Library. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

Over the entire time period – and for many shorter time horizons – the standard HML factor using price-to-book does not even have the most attractive returns. Price-to-earnings and price-to-cash flow often beat it out.

On the other hand, the HML factor formed using dividend yields doesn’t look so hot.

One of the reasons behind this is that the small, low dividend yield companies performed much better than the small companies that were ranked poorly by the other value factors. We can see this effect borne out in the SMB chart for each factor, as the SMB factor for dividend yield performed the best.

(Recall that we mentioned previously how the Fama French way of defining the size factor is dependent on which value metric we use.)

Source: Kenneth French Data Library. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

Looking at the statistical significance of each factor through its t-statistic, we can see that Price-to-Earnings and Price-to-Cash Flow yielded higher significance for the HML factor than Price-to-Book. And those two along with Dividend Yield all eclipsed the Price-to-Book construction of the SMB factor.

T-Statistics for HML and SMB Using Various Value Metrics

	Price-to-Book	Dividend Yield	Price-to-Earnings	Price-to-Cash Flow
HML	2.9	0.0	3.7	3.4
SMB	1.0	2.4	1.6	1.9

Assuming that we do consider all metrics to be appropriate ways to assess the value of companies, even if possibly under different circumstances, how do different variants of the Fama French three-factor model change for each scenario with regression analysis?

The Impact on Factor Regressions

Using a sample of U.S. value ETFs and our Systematic Value strategy, we plot the loadings for the different versions of HML. The regressions are carried out using the trailing three years of monthly data ending on October 2019.

Source: Tiingo, Kenneth French Data Library. Calculations by Newfound Research. Past performance is not an indicator of future results. Returns represent live strategy results. Returns for the Newfound Systematic Value strategy are gross of all management fees and taxes, but net of execution fees. Returns for ETFs included in study are gross of any management fees, but net of underlying ETF expense ratios. Returns assume the reinvestment of all distributions.

For each different specification of HML, the differences in the loading between investments is generally directionally consistent. For instance, DVP has higher loadings than FTA for all forms of HML.

However, sometimes this is not the case.

VLUE looks more attractive than VTV based on price-to-cash flow but not dividend yield. FTA is roughly equivalent to QVAL in terms of loading when price-to-book is used for HML, but it varies wildly when other metrics are used.

The tightest range for the four models for any of the investments is 0.09 (PWV) and the widest is 0.52 (QVAL). When we factor in that these estimates each have their own uncertainty, distinguishing which investment has the better value characteristic is tough. Decisions are commonly made on much smaller differences.

We see similar dispersion in the SMB loadings for the various constructions.

Source: Tiingo, Kenneth French Data Library. Calculations by Newfound Research. Past performance is not an indicator of future results. Returns represent live strategy results. Returns for the Newfound Systematic Value strategy are gross of all management fees and taxes, but net of execution fees. Returns for ETFs included in study are gross of any management fees, but net of underlying ETF expense ratios. Returns assume the reinvestment of all distributions.

Many of these values are not statistically significant from zero, so someone who has a thorough understanding of uncertainty in regression would likely not draw a strict comparison between most of these investments.

However, one implication of this is that if a metric is chosen that does ascribe significant size exposure to one of these investments, an investor may make a decision based on not wanting to bear that risk in what they desire to be a large-cap investment.

Can We Blend Our Way Out?

One way we often mitigate model specification risk is by blending a number of models together into one.

By averaging all of our HML and SMB factors, respectively, we arrive at blended factors for the three-factor model.

Source: Tiingo, Kenneth French Data Library. Calculations by Newfound Research. Past performance is not an indicator of future results. Returns represent live strategy results. Returns for the Newfound Systematic Value strategy are gross of all management fees and taxes, but net of execution fees. Returns for ETFs included in study are gross of any management fees, but net of underlying ETF expense ratios. Returns assume the reinvestment of all distributions.

All of the investments now have HML loadings in the top of their range of the individual model loadings, and many (FTA, PWV, RPV, SPVU, VTV, and the Systematic Value strategy) have loadings to the blended HML factor that exceed the loadings for all of the individual models.

The opposite is the case for the blended SMB factor: the loadings are in the low-end of the range of the individual model loadings.

Source: Tiingo, Kenneth French Data Library. Calculations by Newfound Research. Past performance is not an indicator of future results. Returns represent live strategy results. Returns for the Newfound Systematic Value strategy are gross of all management fees and taxes, but net of execution fees. Returns for ETFs included in study are gross of any management fees, but net of underlying ETF expense ratios. Returns assume the reinvestment of all distributions.

So which is the correct method?

That’s a good question.

For some investments, it is situation-specific. If a strategy only uses price-to-earnings as its value metric, then putting it up against a three-factor model using the P/E ratio to construct the factors is appropriate for judging the efficacy of harvesting that factor.

However, if we are concerned more generally about the abstract concept of “value”, then the blended model may be the best way to go.

Conclusion

In this study, we have explored the impact of model specification for the value and size factor in the Fama French three-factor model.

We empirically tested this impact by designing a variety of HML and SMB factors based on three additional value metrics (price-to-earnings, price-to-cash flow, and dividend yield). These factors were constructed using the same rules as for the standard method using price-to-book ratios.

Each factor, with the possible exceptions of the dividend yield-based HML, has performance that could make it a legitimate specification for the three-factor model over the time that common data is available.

Running factor regressions using these alternate specifications on a suite of value ETFs and Newfound’s Systematic Value strategy, led to a wide array of results, both numerically and directionally.

While many investors consider the uncertainty of the parameter estimates from the regression using the three-factor model, most do not consider the uncertainty that comes from the assumption of how you construct the equity factors in the first place.

Understanding the additional uncertainty is crucial for decision-making. Managers and investors alike must consider what risks they are trying to measure and control by using tools like factor regression and make sure their assumptions align with their goals.

“Value” is in the eye of the beholder, and blind applications of two different value factors may lead to seeing double conclusions.

The Dumb (Timing) Luck of Smart Beta

By Corey Hoffstein

On November 18, 2019

In Craftsmanship, Defensive, Momentum, Popular, Portfolio Construction, Risk & Style Premia, Value, Weekly Commentary

This post is available as a PDF download here.

Summary

In past research notes we have explored the impact of rebalance timing luck on strategic and tactical portfolios, even using our own Systematic Value methodology as a case study.
In this note, we generate empirical timing luck estimates for a variety of specifications for simplified value, momentum, low volatility, and quality style portfolios.
Relative results align nicely with intuition: higher concentration and less frequent rebalancing leads to increasing levels of realized timing luck.
For more reasonable specifications – e.g. 100 stock portfolios rebalanced semi-annually – timing luck ranges between 100 and 400 basis points depending upon the style under investigation, suggesting a significant risk of performance dispersion due only to when a portfolio is rebalanced and nothing else.
The large magnitude of timing luck suggests that any conclusions drawn from performance comparisons between smart beta ETFs or against a standard style index may be spurious.

We’ve written about the concept of rebalance timing luck a lot. It’s a cowbell we’ve been beating for over half a decade, with our first article going back to August 7^th, 2013.

As a reminder, rebalance timing luck is the performance dispersion that arises from the choice of a particular rebalance date (e.g. semi-annual rebalances that occur in June and December versus March and September).

We’ve empirically explored the impact of rebalance timing luck as it relates to strategic asset allocation, tactical asset allocation, and even used our own Systematic Value strategy as a case study for smart beta. All of our results suggest that it has a highly non-trivial impact upon performance.

This summer we published a paper in the Journal of Index Investing that proposed a simple solution to the timing luck problem: diversification. If, for example, we believe that our momentum portfolio should be rebalanced every quarter – perhaps as an optimal balance of cost and signal freshness – then we proposed splitting our capital across the three portfolios that spanned different three-month rebalance periods (e.g. JAN-APR-JUL-OCT, FEB-MAY-AUG-NOV, MAR-JUN-SEP-DEC). This solution is referred to either as “tranching” or “overlapping portfolios.”

The paper also derived a formula for estimating timing luck ex-ante, with a simplified representation of:

Where L is the timing luck measure, T is turnover rate of the strategy, F is how many times per year the strategy rebalances, and S is the volatility of a long/short portfolio that captures the difference of what a strategy is currently invested in versus what it could be invested in if the portfolio was reconstructed at that point in time.

Without numbers, this equation still informs some general conclusions:

Higher turnover strategies have higher timing luck.
Strategies that rebalance more frequently have lower timing luck.
Strategies with a less constrained universe will have higher timing luck.

Bullet points 1 and 3 may seem similar but capture subtly different effects. This is likely best illustrated with two examples on different extremes. First consider a very high turnover strategy that trades within a universe of highly correlated securities. Now consider a very low turnover strategy that is either 100% long or 100% short U.S. equities. In the first case, the highly correlated nature of the universe means that differences in specific holdings may not matter as much, whereas in the second case the perfect inverse correlation means that small portfolio differences lead to meaningfully different performance.

L, in and of itself, is a bit tricky to interpret, but effectively attempts to capture the potential dispersion in performance between a particular rebalance implementation choice (e.g. JAN-APR-JUL-OCT) versus a timing-luck-neutral benchmark.

After half a decade, you’d would think we’ve spilled enough ink on this subject.

But given that just about every single major index still does not address this issue, and since our passion for the subject clearly verges on fever pitch, here comes some more cowbell.

Equity Style Portfolio Definitions

In this note, we will explore timing luck as it applies to four simplified smart beta portfolios based upon holdings of the S&P 500 from 2000-2019:

Value: Sort on earnings yield.
Momentum: Sort on prior 12-1 month returns.
Low Volatility: Sort on realized 12-month volatility.
Quality: Sort on average rank-score of ROE, accruals ratio, and leverage ratio.

Quality is a bit more complicated only because the quality factor has far less consistency in accepted definition. Therefore, we adopted the signals utilized by the S&P 500 Quality Index.

For each of these equity styles, we construct portfolios that vary across two dimensions:

Number of Holdings: 50, 100, 150, 200, 250, 300, 350, and 400.
Frequency of Rebalance: Quarterly, Semi-Annually, and Annually.

For the different rebalance frequencies, we also generate portfolios that represent each possible rebalance variation of that mix. For example, Momentum portfolios with 50 stocks that rebalance annually have 12 possible variations: a January rebalance, February rebalance, et cetera. Similarly, there are 12 possible variations of Momentum portfolios with 100 stocks that rebalance annually.

By explicitly calculating the rebalance date variations of each Style x Holding x Frequency combination, we can construct an overlapping portfolios solution. To estimate empirical annualized timing luck, we calculate the standard deviation of monthly return dispersion between the different rebalance date variations of the overlapping portfolio solution and annualize the result.

Empirical Timing Luck Results

Before looking at the results plotted below, we would encourage readers to hypothesize as to what they expect to see. Perhaps not in absolute magnitude, but at least in relative magnitude.

For example, based upon our understanding of the variables affecting timing luck, would we expect an annually rebalanced portfolio to have more or less timing luck than a quarterly rebalanced one?

Should a more concentrated portfolio have more or less timing luck than a less concentrated variation?

Which factor has the greatest risk of exhibiting timing luck?

Source: Sharadar. Calculations by Newfound Research.

To create a sense of scale across the styles, below we isolate the results for semi-annual rebalancing for each style and plot it.

Source: Sharadar. Calculations by Newfound Research.

In relative terms, there is no great surprise in these results:

More frequent rebalancing limits the risk of portfolios changing significantly between rebalance dates, thereby decreasing the impact of timing luck.
More concentrated portfolios exhibit larger timing luck.
Faster-moving signals (e.g. momentum) tend to exhibit more timing luck than more stable, slower-moving signals (e.g. low volatility).

What is perhaps the most surprising is the sheer magnitude of timing luck. Consider that the S&P 500 Enhanced Value, Momentum, Low Volatility, and Quality portfolios all hold 100 securities and are rebalanced semi-annually. Our study suggests that timing luck for such approaches may be as large as 2.5%, 4.4%, 1.1%, and 2.0% respectively.

But what does that really mean? Consider the realized performance dispersion of different rebalance date variations of a Momentum portfolio that holds the top 100 securities in equal weight and is rebalanced on a semi-annual basis.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

The 4.4% estimate of annualized timing luck is a measure of dispersion between each underlying variation and the overlapping portfolio solution. If we isolate two sub-portfolios and calculate rolling 12-month performance dispersion, we can see that the difference can be far larger, as one might exhibit positive timing luck while the other exhibits negative timing luck. Below we do precisely this for the APR-OCT and MAY-NOV rebalance variations.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

In fact, since these variations are identical in every which way except for the date on which they rebalance, a portfolio that is long the APR-OCT variation and short the MAY-NOV variation would explicitly capture the effects of rebalance timing luck. If we assume the rebalance timing luck realized by these two portfolios is independent (which our research suggests it is), then the volatility of this long/short is approximately the rebalance timing luck estimated above scaled by the square-root of two.

Derivation: For variations v_i and v_j and overlapping-portfolio solution V, then:

Thus, if we are comparing two identically-managed 100-stock momentum portfolios that rebalance semi-annually, our 95% confidence interval for performance dispersion due to timing luck is +/- 12.4% (2 x SQRT(2) x 4.4%).

Even for more diversified, lower turnover portfolios, this remains an issue. Consider a 400-stock low-volatility portfolio that is rebalanced quarterly. Empirical timing luck is still 0.5%, suggesting a 95% confidence interval of 1.4%.

S&P 500 Style Index Examples

One critique of the above analysis is that it is purely hypothetical: the portfolios studied above aren’t really those offered in the market today.

We will take our analysis one step further and replicate (to the best of our ability) the S&P 500 Enhanced Value, Momentum, Low Volatility, and Quality indices. We then created different rebalance schedule variations. Note that the S&P 500 Low Volatility index rebalances quarterly, so there are only three possible rebalance variations to compute.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

We see a meaningful dispersion in terminal wealth levels, even for the S&P 500 Low Volatility index, which appears at first glance in the graph to have little impact from timing luck.

	Minimum Terminal Wealth	Maximum Terminal Wealth
Enhanced Value	$4.45	$5.45
Momentum	$3.07	$4.99
Low Volatility	$6.16	$6.41
Quality	$4.19	$5.25

We should further note that there does not appear to be one set of rebalance dates that does significantly better than the others. For Value, FEB-AUG looks best while JUN-DEC looks the worst; for Momentum it’s almost precisely the opposite.

Furthermore, we can see that even seemingly closely related rebalances can have significant dispersion: consider MAY-NOV and JUN-DEC for Momentum. Here is a real doozy of a statistic: at one point, the MAY-NOV implementation for Momentum is down -50.3% while the JUN-DEC variation is down just -13.8%.

These differences are even more evident if we plot the annual returns for each strategy’s rebalance variations. Note, in particular, the extreme differences in Value in 2009, Momentum in 2017, and Quality in 2003.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

Conclusion

In this study, we have explored the impact of rebalance timing luck on the results of smart beta / equity style portfolios.

We empirically tested this impact by designing a variety of portfolio specifications for four different equity styles (Value, Momentum, Low Volatility, and Quality). The specifications varied by concentration as well as rebalance frequency. We then constructed all possible rebalance variations of each specification to calculate the realized impact of rebalance timing luck over the test period (2000-2019).

In line with our mathematical model, we generally find that those strategies with higher turnover have higher timing luck and those that rebalance more frequently have less timing luck.

The sheer magnitude of timing luck, however, may come as a surprise to many. For reasonably concentrated portfolios (100 stocks) with semi-annual rebalance frequencies (common in many index definitions), annual timing luck ranged from 1-to-4%, which translated to a 95% confidence interval in annual performance dispersion of about +/-1.5% to +/-12.5%.

The sheer magnitude of timing luck calls into question our ability to draw meaningful relative performance conclusions between two strategies.

We then explored more concrete examples, replicating the S&P 500 Enhanced Value, Momentum, Low Volatility, and Quality indices. In line with expectations, we find that Momentum (a high turnover strategy) exhibits significantly higher realized timing luck than a lower turnover strategy rebalanced more frequently (i.e. Low Volatility).

For these four indices, the amount of rebalance timing luck leads to a staggering level of dispersion in realized terminal wealth.

“But Corey,” you say, “this only has to do with systematic factor managers, right?”

Consider that most of the major equity style benchmarks are managed with annual or semi-annual rebalance schedules. Good luck to anyone trying to identify manager skill when your benchmark might be realizing hundreds of basis points of positive or negative performance luck a year.

The Limit of Factor Timing

By Nathan Faber

On November 11, 2019

In Craftsmanship, Momentum, Risk & Style Premia, Weekly Commentary

This post is available as a PDF download here.

Summary

We have shown previously that it is possible to time factors using value and momentum but that the benefit is not large.
By constructing a simple model for factor timing, we examine what accuracy would be required to do better than a momentum-based timing strategy.
While the accuracy required is not high, finding the system that achieves that accuracy may be difficult.
For investors focused on managing the risks of underperformance – both in magnitude and frequency – a diversified factor portfolio may be the best choice.
Investors seeking outperformance will have to bear more concentration risk and may be open to more model risk as they forego the diversification among factors.

A few years ago, we began researching factor timing – moving among value, momentum, low volatility, quality, size etc. – with the hope of earning returns in excess not only of the equity market, but also of buy-and-hold factor strategies.

To time the factors, our natural first course of action was to exploit the behavioral biases that may create the factors themselves. We examined value and momentum across the factors and used these metrics to allocate to factors that we expected to outperform in the future.

The results were positive. However, taking into account transaction costs led to the conclusion that investors were likely better off simply holding a diversified factor portfolio.

We then looked at ways to time the factors using the business cycle.

The results in this case were even less convincing and were a bit too similar to a data-mined optimal solution to instill much faith going forward.

But this evidence does not necessarily remove the temptation to take a stab at timing the factors, especially since explicit transactions costs have been slashed for many investors accessing long-only factors through ETFs.Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

After all, there is a lot to gain by choosing the right factors. For example, in the first 9 months of 2019, the spread between the best (Quality) and worst (Value) performing factors was nearly 1,000 basis points (“bps”). One month prior, that spread had been double!

In this research note, we will move away from devising a systematic approach to timing the factors (as AQR asserts, this is deceptively difficult) and instead focus on what a given method would have to overcome to achieve consistent outperformance.

Benchmarking Factor Timing

With all equity factor strategies, the goal is usually to outperform the market-cap weighted equity benchmark.

Since all factor portfolios can be thought of as a market cap weighted benchmark plus a long/short component that captures the isolated factor performance, we can focus our study solely on the long/short portfolio.

Using the common definitions of the factors (from Kenneth French and AQR), we can look at periods over which these self-financing factor portfolios generate positive returns to see if overlaying them on a market-cap benchmark would have added value over different lengths of time.¹

We will also include the performance of an equally weighted basket of the four factors (“Blend”).

Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.

The persistence of factor outperformance over one-month periods is transient. If the goal is to outperform the most often, then the blended portfolio satisfies this requirement, and any timing strategy would have to be accurate enough to overcome this already existing spread.

Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.

The results for the blended portfolio are so much better than the stand-alone factors because the factors have correlations much lower than many other asset classes, allowing even naïve diversification to add tremendous value.

The blended portfolio also cuts downside risk in terms of returns. If the timing strategy is wrong, and chooses, for example, momentum in an underperforming month, then it could take longer for the strategy to climb back to even. But investors are used to short periods of underperformance and often (we hope) realize that some short-term pain is necessary for long-term gains.

Looking at the same analysis over rolling 1-year periods, we do see some longer periods of factor outperformance. Some examples are quality in the 1980s, value in the mid-2000s, momentum in the 1960s and 1990s, and size in the late-1970s.

However, there are also decent stretches where the factors underperform. For example, the recent decade for value, quality in the early 2010s, momentum sporadically in the 2000s, and size in the 1980s and 1990s. If the timing strategy gets stuck in these periods, then there can be a risk of abandoning it.

Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.

Again, a blended portfolio would have addressed many of these underperforming periods, giving up some of the upside with the benefit of reducing the risk of choosing the wrong factor in periods of underperformance.

Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.

And finally, if we extend our holding period to three years, which may be used for a slower moving signal based on either value or the business cycle, we see that the diversified portfolio still exhibits outperformance over the most rolling periods and has a strong ratio of upside to downside.

Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.

The diversified portfolio stands up to scrutiny against the individual factors but could a generalized model that can time the factors with a certain degree of accuracy lead to better outcomes?

Generic Factor Timing

To construct a generic factor timing model, we will consider a strategy that decides to hold each factor or not with a certain degree of accuracy.

For example, if the accuracy is 50%, then the strategy would essentially flip a coin for each factor. Heads and that factor is included in the portfolio; tails and it is left out. If the accuracy is 55%, then the strategy will hold the factor with a 55% probability when the factor return is positive and not hold the factor with the same probability when the factor return is negative. Just to be clear, this strategy is constructed with look-ahead bias as a tool for evaluation.

All factors included in the portfolio are equally weighted, and if no factors are included, then the returns is zero for that period.

This toy model will allow us to construct distributions to see where the blended portfolio of all the factors falls in terms of frequency of outperformance (hit rate), average outperformance, and average underperformance. The following charts show the percentiles of the diversified portfolio for the different metrics and model accuracies using 1,000 simulations.²

Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.

In terms of hit rate, the diversified portfolio behaves in the top tier of the models over all time periods for accuracies up to about 57%. Even with a model that is 60% accurate, the diversified portfolio was still above the median.

Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.

For average underperformance, the diversified portfolio also did very well in the context of these factor timing models. The low correlation between the factors leads to opportunities for the blended portfolio to limit the downside of individual factors.

Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.

For average outperformance, the diversified portfolio did much worse than the timing model over all time horizons. We can attribute this also to the low correlation between the factors, as choosing only a subset of factors and equally weighting them often leads to more extreme returns.

Overall, the diversified portfolio manages the risks of underperformance, both in magnitude and in frequency, at the expense of sacrificing outperformance potential. We saw this in the first section when we compared the diversified portfolio to the individual factors.

But if we want to have increased return potential, we will have to introduce some model risk to time the factors.

Checking in on Momentum

Momentum is one model-based way to time the factors. Under our definition of accuracy in the toy model, a 12-1 momentum strategy on the factors has an accuracy of about 56%. While the diversified portfolio exhibited some metrics in line with strategies that were even more accurate than this, it never bore concentration risk: it always held all four factors.

Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.

For the hit rate percentiles of the momentum strategy, we see a more subdued response. Momentum does not win as much as the diversified portfolio over the different time periods.

But not winning as much can be fine if you win bigger when you do win.

The charts below show that momentum does indeed have a higher outperformance percentile but with a worse underperformance percentile, especially for 1-month periods, likely due to mean reversionary whipsaw.

Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions. Data from July 1957 – September 2019.

While momentum is definitely not the only way to time the factors, it is a good baseline to see what is required for higher average outperformance.

Now, turning back to our generic factor timing model, what accuracy would you need to beat momentum?

Sharpening our Signal

The answer is: not a whole lot. Most of the time, we only need to be about 53% accurate to beat the momentum-based factor timing.

Source: Kenneth French Data Library, AQR. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

The caveat is that this is the median performance of the simulations. The accuracy figure climbs closer to 60% if we use the 25^th percentile as our target.

While these may not seem like extremely high requirements for running a successful factor timing strategy, it is important to observe that not many investors are doing this. True accuracy may be hard to discover, and sticking with the system may be even harder when the true accuracy can never be known.

Conclusion

If you made it this far looking for some rosy news on factor timing or the Holy Grail of how to do it skillfully, you may be disappointed.

However, for most investors looking to generate some modest benefits relative to market-cap equity, there is good news. Any signal for timing factors does not have to be highly accurate to perform well, and in the absence of a signal for timing, a diversified portfolio of the factors can lead to successful results by the metrics of average underperformance and frequency of underperformance.

For those investors looking for higher outperformance, concentration risk will be necessary.

Any timing strategy on low correlation investments will generally forego significant diversification in the pursuit of higher returns.

While this may be the goal when constructing the strategy, we should always pause and determine whether the potential benefits outweigh the costs. Transaction costs may be lower now. However, there are still operational burdens and the potential stress caused by underperformance when a system is not automated or when results are tracked too frequently.

Factor timing may be possible, but timing and tactical rotation may be better suited to scenarios where some of the model risk can be mitigated.

Global Growth-Trend Timing

By Steven Braun

On November 4, 2019

In Portfolio Construction, Trend, Weekly Commentary

This post is available as a PDF download here.

Summary

While trend following may help investors avoid prolonged drawdowns, it is susceptible to whipsaw where false signals cause investors to either buy high and sell low (realizing losses) or sell low and buy high (a missed opportunity).
Empirical evidence suggests that using economic data in the United States as a filter of when to employ trend-following – a “growth-trend timing” model – has historically been fruitful.
When evaluated in other countries, growth-trend timing has been historically successful in mitigating whipsaw losses without sacrificing the ability to avoid large drawdowns. However, we see mixed results on whether this actually improves upon naïve trend-following.
We find that countries that can be influenced by factors originating outside of their borders might not benefit from an introspective economic signal.

We apologize in advance, as this commentary will be fairly graph- and table-heavy.

We have written fairly extensively on the topic of factor-timing in the past, and much of the success has been proven to be both hard to implement and recreate out of sample.

One of the inherent pains of trend following is the existence of whipsaws, or more precisely, the misidentification of perceived market trends, which turn out to be more noise than signal. An article from Philosophical Economics proposed using several economic indicators to tune down the noise that might affect price-driven signals such as trend following. Generally, this strategy imposed an overlay that turned trend following “on” when the change in the economic indicators were negative year-over-year signaling a higher likelihood of recession, and conversely, adopted a buy-and-hold stance when the economic indicators were not flashing warning lights.

This strategy presents a certain appeal as leading economic indicators may, as their name implies, lead the market for some time until capital preservation is warranted. Switching to a trend-following approach may allow a strategy to continue to participate in market appreciation while it lasts. On the other hand, using economic confirmation as a filter may help a strategy avoid the whipsaw costs generated from noisy market dips while positive economic conditions persist.

In an effort to test such a strategy out-of-sample, we took the approach global, hoping to capture a broader cross-section of economic and market environments.

First, we will consider trend following with no timing using the economic indicators.¹

Below we plot the equity curves for Australia, Germany, Italy, Japan, Singapore, the United Kingdom, and the United States, alongside a strategy that is long the market when the market is above the trailing twelve-month average (“12 Month average”) and steps to cash when the price is below it. The ratio between the two is also included to show the relative cumulative performance between the trend strategy and the respective market. An increasing ratio means that the trend following strategy is adding value over buy-and-hold.

Source: MSCI, Global Financial Data. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

Through the graphs above, it becomes clear that much of the trend premium is realized by avoiding the large, prolonged bear markets that tend to occur during economic distress. In between these periods, however, the trend strategy lags the market. It makes sense, then, that a potential improvement to this strategy would be to implement an augmentation that could better distinguish between real price break-outs and those that lead to a whipsaw in the portfolio.

Growth-Trend Timing

For each country, we look at a number of economic indicators, including: corporate earnings growth, employment, housing starts, industrial production, and retail sales growth.² The strategy then followed the same rules as described above: if the economic indicator in question displays a negative percentage change over the previous twelve-month period, a position is taken in a trend following strategy utilizing a twelve-month moving average signal. Otherwise, a buy-and-hold position is established.

To ensure that we are not benefitting from look-ahead bias, a lag of three months was imposed on each of the economic indicators, as it would be unrealistic to assume that the economic levels would be known at the end of each month.

Unfortunately, some of the economic data points could not be found for the entire period in which prices are available, though the analysis can still prove beneficial by indicating what economic regimes trend following is benefitted by growth-trend timing, or the potential identification where one indicator may work when another does not.³

In the charts below, we plot the growth-trend timing (referred to as GTT for the remainder of this commentary) for each country utilizing the available signals. The charts represent the relative cumulative performance over the respective country’s market return. For example, when the lines remain flat, the GTT approach has adopted buy-and-hold exposure and therefore matches the respective market’s returns. Any changes in the ratios are due to the GTT strategy investing in the trend following strategy.

Source: MSCI, Global Financial Data, St. Louis Fed, Bloomberg. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

What we see from the above figures is a mixed bag of results.

The overlay of economic indicators was by far successful in the mitigation of whipsaw losses, as each country reaped the benefits of being primarily long the market during bull markets. As the 12-month moving average strategy tended to slowly give up a portion of the gains realized from severe market environments, the majority of the GTT strategies remained relatively stagnant until the next major correction.

There are some instances, however, where the indicator was late to the economic party. It is worth remembering that the market is, in theory, a forward-looking measure, and therefore sudden economic shocks may not be captured in economic data as quickly as it is in market returns. This created cases where the strategy either missed the chance to be out of the market during a correction or was sitting on the sidelines during the subsequent recoveries. Notably, the employment signal in Australia, Italy, Singapore, and the United Kingdom tended to be a poor leading indicator as the strategy tended to be invested longer in the bear markets than the trend strategy.

A Candidate for Ensembling

The implicit assumption in the analysis above is that the included indicators behave in similar ways. For example, by using a twelve-month lookback period for the indicators, we are assuming that each indicator will begin to trend in roughly the same way.

That may not be a particularly fair assumption. Whereas housing starts and retail sales are generally considered leading indicators, employment (unemployment) rates are normally categorized as lagging indicators. For this reason, it may be more beneficial to use a shorter lookback period so as to pick up on potential problems in the economy as they begin to present themselves. Further, some signals tend to be more erratic than others, suggesting that a meaningful lookback period for one indicator may not be meaningful for another. With no perfect reason to prefer one lookback over another, we might consider different lookback periods so as to diversify any specification risk that may exist within the strategy.

With the benefit of hindsight, we know that not all recessions occur for the same reasons, so being reliant on one signal that has worked in the past may not be as beneficial in the future. With this in mind, we should consider that all indicators hold some information as to the state of the economy since one indicator may be signaling the all-clear while another may be flashing warning lights.

For the same reason medical professionals take multiple readings to gain insight into the state of the body, we should also consider any available signals to ascertain the health of the economy.

To ensemble this strategy, we will vary the lookbacks from six to eighteen months, while holding the lag at three months, as well as combine the available economic signals for each country. For the sake of brevity, we will hold the trend-following strategy the same with a twelve-month moving average.

Remember, if the economic signal is negative, it does not mean that we are immediately out of the market: a negative economic signal simply moves the strategy into a trend-following approach. With 5 economic indicators and 13 lookback periods, we have 65 possible strategies for each country. As an example, if 40 of these 65 models were positive and 25 were negative, we would hold 62% in the market and 38% in the trend following strategy.

The resulting performance statistics can be seen in the table below.

Source: MSCI, Global Financial Data, St. Louis Fed, Bloomberg. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

From the table above, we see that there are, again, mixed results. One country that particularly stands out is Italy in that the sign on its return flipped to negative and the drawdown was actually deeper with GTT than with a simple buy-and-hold strategy.

Source: MSCI, Global Financial Data, St. Louis Fed, Bloomberg. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

Digging deeper, it appears that the GTT strategy for Italy was actually whipsawed by more than just trend-following. Housing start data for Italy was not readily available until December 2008, so Italy may have been at a relative disadvantage when compared against the other countries. Since the reliable data we could find begins at the end of 2008 and the majority of the whipsaw losses occur post-Great Financial Crisis, we can run the analysis again, but with housing start data being added in upon its availability.

Source: MSCI, Global Financial Data, St. Louis Fed, Bloomberg. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

Adding housing starts in as an indicator did not meaningfully alter the results over the period. One hypothesis is that the indicators included could not fully encapsulate the complex state of Italy’s economy over the period. Italy has weathered three technical recessions over the past decade, so this could be a regime where the market is looking to sources outside the country for indications of distress or where the economic indicator is not reflective of the pressures driving the market.

Source: MSCI, St. Louis Fed. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

Above, we can see several divergences between the market movement and changes in real GDP. Specifically, in the past decade, we see that the market reacted to information that didn’t materialize in the country’s real GDP. More likely, the market was reacting to regional financial distress driven by debt concerns.

The MSCI Italy index is currently composed of 24 constituents with multinational business operations. Additionally, the index maintains large concentrations in financials, utilities, and energy: 33%, 25%, and 14%, respectively.⁴ Because of this sector concentration, utilizing the economic indicators may overly focus on the economic health of Italy while ignoring external factors such as energy prices or broader financial distress that could be swaying the market needle.

A parallel explanation could be that the Eurozone is entangled enough that signals could be interfering with each other between countries. Further research could seek to disaggregate signals between the Eurozone and the member-countries, attempting to differentiate between zone, regional, and country signals to ascertain further meaning.

Additionally, economic indicators are influenced by both the private and public sector so this could represent a disconnect between public company health and private company health.

Conclusion

In this commentary, we sought to answer the question, “can we improve trend-following by drawing information from a country’s economy”. It intuitively makes sense that an investor would generally opt for remaining in the market unless there are systemic issues that may lead to market distress. A strategy that successfully differentiates between market choppiness and periods of potential recession would drastically mitigate any losses incurred from whipsaw, thereby capturing a majority of the equity premium as well as the trend premium.

We find that growth-trend timing has been relatively successful in countries such as the United States, Germany, and Japan. However, the country that is being analyzed should be considered in light of their specific circumstances.

Peeking under the hood of Italy, it becomes clear that market movements may be influenced by more than a country’s implicit economic health. In such a case, we should pause and ask ourselves whether a macroeconomic indicator is truly reflective of that country’s economy or if there are other market forces pulling the strings.

Factor Orphans

By Corey Hoffstein

On October 28, 2019

In Risk & Style Premia, Weekly Commentary

This post is available as a PDF download here.

Summary

To generate returns that are different than the market, we must adopt a positioning that is different than the market.
With the increasing adoption of systematic factor portfolios, we explore whether an anti-factor stance can generate contrarian-based profits.
Specifically, we explore the idea of factor orphans: stocks that are not included in any factor portfolio at a given time.
To identify these stocks, we replicate four popular factor indices: the S&P 500 Enhanced Value index, the S&P 500 Momentum index, the S&P 500 Low Volatility index, and the S&P 500 Quality index.
On average, there are over 200 stocks in the S&P 500 that are orphaned at any given time.
Generating an equal-weight portfolio of these stocks does not exhibit meaningfully different performance than a naïve equal-weight S&P 500 portfolio.

Contrarian investing is nothing new. Holding a variant perception to the market is often cited as a critical component to generating differentiated performance. The question in the details is, however, “contrarian to what?”

In the last decade, we’ve witnessed a dramatic rise in the popularity of systematically managed active strategies. These so-called “smart beta” portfolios seek to harvest documented risk premia and market anomalies and implement them with ruthless discipline.

But when massively adopted, do these strategies become the commonly-held view and therefore more efficiently priced into the market? Would this mean that the variant perception would actually be buying those securities totally ignored by these strategies?

This is by no means a new idea. Morningstar has long maintained its Unloved strategy that purchases the three equity categories that have witnessed the largest outflows at the end of the year. A few years ago, Vincent Deluard constructed a “DUMB” beta portfolio that included all the stocks shunned by popular factor ETFs. In the short out-of-sample period the performance of the strategy was tested, it largely kept pace with an equal-factor portfolio. More recently, a Bank of America research note claimed that a basket of most-hated securities – as defined by companies neglected by mutual funds and shorted by hedge funds hedge funds – had tripled the S&P 500’s return over the past year.

The approach certainly has an appealing narrative: as the crowd zigs to adopt smart beta, we zag. But has it worked?

To test this concept, we wanted to identify what we call “factor orphans”: those securities not held by any factor portfolio. Once identified, we can build a portfolio holding these stocks and track its performance over time.

As a quant, this idea strikes us as a little crazy. A stock not held in a value, momentum, low volatility, or quality index is likely one that is expensive, highly volatile, with poor fundamentals and declining performance. Precisely the type of stock factor investing would tell us not to own.

But perhaps the fact that these securities are orphaned means that there are no more sellers: the major cross-section of market strategies have already abandoned the stock. Thus, stepping in to buy them may allow us to offload them later when they are picked back up by these systematic approaches.

Perhaps this idea is crazy enough it just might work…

To test this idea, we first sought to replicate four common factor benchmarks: the S&P 500 Enhanced Value index, the S&P 500 Momentum index, the S&P 500 Low Volatility index and the S&P 500 Quality index. Once replicated, we can use the underlying baskets as being representative of the holdings for factor portfolios is general.

Results of our replication efforts are plotted below. We can see that our models fit the shape of most of the indices closely, with very close fits for the Momentum and Low Volatility portfolios.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

The Quality replication represents the largest deviation from the underlying index, but still approximates the shape of the total return profile rather closely. This gives us confidence that the portfolio we constructed is a quality portfolio (which should come as no surprise, as securities were selected based upon common quality metrics), but the failure to more closely replicate this index may represent a thorn in our ability to identify truly orphaned stocks.

At the end of each month, we identify the set of all securities held by any of the four portfolios. The securities in the S&P 500 (at that point in time) but not in the factor basket are the orphaned stocks. Somewhat surprisingly, we find that approximately 200 names are orphaned at any given time, with the number reaching as high as 300 during periods when underlying factors converge.

Also interesting is that the actual overlap in holdings in the factor portfolios is quite low, rarely exceeding 30%. This is likely due to the rather concentrated nature of the indices selected, which hold only 100 stocks at a given time.

Source: Sharadar. Calculations by Newfound Research.

Once our orphaned stocks are identified, we construct a portfolio that holds them in equal weight. We rebalance our portfolio monthly to sell those stocks that have been acquired by a factor portfolio and roll into those securities that have been abandoned.

We plot the results of our exercise below as well as an equally weighted S&P 500 benchmark.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

While the total return is modestly less (but certainly not statistically significantly so), what is most striking is how little deviation there is in the orphaned stock portfolio versus the equal-weight benchmark.

However, as we have demonstrated in the past, the construction choices in a portfolio can have a significant impact upon the realized results. As we look at the factor portfolios themselves, we must acknowledge that they represent relative tilts to the benchmark, and that the absence of one security might actually represent a significantly smaller relative underweight to the benchmark than the absence of another. Or the absence of one security may actually represent a smaller relative underweight than another that is actually included.

Therefore, as an alternative test we construct an equal-weight factor portfolio and subtract the S&P 500 market-capitalization weights. The result is the implied over- and under-weights of the combined factor portfolios. We then rank securities to select the 100 most under-weight securities each month and hold them in equal weight.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

Of course, we didn’t actually have to perform this exercise had we stepped back to think for a moment. We generally know that these (backtested) factors have out-performed the benchmark. Therefore, selecting stocks that they are underweight means we’re taking the opposite side of the factor trade, which we know has not worked.

Which does draw an important distinction between most underweight and orphaned. It would appear that factor orphans do not necessarily create the strong anti-factor tilt the way that the most underweight portfolio does.

For the sake of completion, we can also evaluate the portfolios containing securities held in just one of the factor portfolios, two of the factor portfolios, three of the factor portfolios, or all of the factor portfolios at a given time.

Below we plot the count of securities in such portfolios over time. We can see that it is very uncommon to identify securities that are simultaneously held by all the factors, or even three of the factors, at once.

Source: Sharadar. Calculations by Newfound Research.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

We can see that the portfolio built from stocks held in just one factor (“In One”) closely mimics the portfolio built from stocks held in no factor (“In Zero”), which in turn mimics the S&P 500 Equal Weight portfolio. This is likely because the portfolios include so many securities that they effectively bring you back to the index.

On the other end of the spectrum, we see the considerable risks of concentration manifest in the portfolios built from stocks held in three or four of the factors. The portfolio comprised of stocks held in all four factors simultaneously (“In Four”) not only goes long stretches of holding nothing at all, but is also subject to large bouts of volatility due to the extreme concentration.

We also see this for the portfolio that holds stocks held by three of the factors simultaneously (“In Three”). While this portfolio has modestly more diversification – and even appears to out-perform the equal-weight benchmark – the concentration risk finally materializes in 2018-2019, causing a dramatic drawdown.

The portfolio holding stocks held in just two of the factors (“In Two”), though, appears to offer some out-performance opportunity. Perhaps by forcing just two factors to agree, we strike a balance between confirmation among signals and portfolio diversification.

Unfortunately, our enthusiasm quickly wanes when we realize that this portfolio closely matches the results achieved just by naively equally-weighting exposure among the four factor portfolios themselves, which is far more easily implemented.

Source: Sharadar. Calculations by Newfound Research. Past performance is not an indicator of future results. Performance is backtested and hypothetical. Performance figures are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Performance assumes the reinvestment of all distributions.

Conclusion

To achieve differentiated results, we must take a differentiated stance from the market. As systematic factor portfolios are more broadly adopted, we should consider asking ourselves if taking an anti-factor stance might lead to contrarian-based profits.

In this study, we explore the idea of factor orphans: stocks not held by any factor portfolio at a given time. Our hypothesis is that these orphaned securities may be systematically over-sold, leading to an opportunity for future out-performance if they are re-acquired by the factor portfolios at a later date.

We begin by replicating four factor indices: the S&P 500 Enhanced Value index, the S&P 500 Momentum index, the S&P 500 Low Volatility index, and the S&P 500 Quality index. Replicating these processes allows us to identify historical portfolio holdings, which in turn allows us to identify stocks not held by the factors.

We are able to closely replicate the S&P 500 Momentum and Low Volatility portfolios, create meaningful overlap with the S&P 500 Enhanced Value method, and generally capture the S&P 500 Quality index. The failure to more closely replicate the S&P 500 Quality index may have a meaningful impact on the results herein, though we believe our methodology still captures the generic return of a quality strategy.

We find that, on average, there are over 200 factor orphans at a given time. Constructing an equal-weight portfolio of these orphans, however, only seems to lead us back to an S&P 500 Equal Weight benchmark. While there does not appear to be an edge in this strategy, it is interesting that there does not appear to be a negative edge either.

Recognizing that long-only factor portfolios represent active bets expressed as over- and underweights relative to the S&P 500, we also construct a portfolio of the most underweight stocks. Not surprisingly, as this portfolio actively captures a negative factor tilt, the strategy meaningfully underperforms the S&P 500 Equal Weight benchmark. Though the relative underperformance meaningfully dissipates in recent years.

Finally, we develop portfolios to capture stocks held in just one, two, three, or all four of the factors simultaneously. We find the portfolios comprised stocks held in either three or four of the factors at once exhibit significant concentration risk. As with the orphan portfolio, the portfolio of stocks held by just one of the factors closely tracks the S&P 500 Equal Weight benchmark, suggesting that it might be over-diversified.

The portfolio holding stocks held by just two factors at a time appears to be the Goldilocks portfolio, with enough concentration to be differentiated from the benchmark but not so much as to create significant concentration risk.

Unfortunately, this portfolio also almost perfectly replicates a naïve equal-weight portfolio among the four factors, suggesting that the approach is likely a wasted effort.

In conclusion, we find no evidence that factor orphans have historically offered a meaningful excess return opportunity. Nor, however, do they appear to have been a drag on portfolio returns either. We should acknowledge, however, that the adoption of factor portfolios accelerated rapidly after the Great Financial Crisis, and that backtests may not capture current market dynamics. More recent event studies of orphaned stocks being added to factor portfolios may provide more insight into the current environment.

The Research Library of Newfound Research

Category: Risk & Style Premia Page 4 of 16

Re-specifying the Fama French 3-Factor Model

Summary

HML Factor Definitions

The Impact on Factor Regressions

Can We Blend Our Way Out?

Conclusion

The Dumb (Timing) Luck of Smart Beta

Summary

Equity Style Portfolio Definitions

Empirical Timing Luck Results

S&P 500 Style Index Examples

Conclusion

The Limit of Factor Timing

Summary

Benchmarking Factor Timing

Generic Factor Timing

Checking in on Momentum

Sharpening our Signal

Conclusion

Global Growth-Trend Timing

Summary

Growth-Trend Timing

A Candidate for Ensembling

Conclusion

Factor Orphans

Summary

Conclusion

Category: Risk & Style Premia Page 4 of 16

Re-specifying the Fama French 3-Factor Model

Summary­

HML Factor Definitions

The Impact on Factor Regressions

Can We Blend Our Way Out?

Conclusion

The Dumb (Timing) Luck of Smart Beta

Summary

Equity Style Portfolio Definitions

Empirical Timing Luck Results

S&P 500 Style Index Examples

Conclusion

The Limit of Factor Timing

Summary­

Benchmarking Factor Timing

Generic Factor Timing

Checking in on Momentum

Sharpening our Signal

Conclusion

Global Growth-Trend Timing

Summary­

Growth-Trend Timing

A Candidate for Ensembling

Conclusion

Factor Orphans

Summary­

Conclusion

Summary

Summary

Summary

Summary