Category: Risk & Style Premia Page 11 of 16

This post is available as PDF download here.

Summary

Can the monetary policy environment be used to predict global equity market returns? Should we overweight/buy countries with expansionary monetary policy regimes and underweight/sell countries with contractionary monetary policy regimes?
In twelve of the fourteen countries studied, both nominal and real equity returns are higher (lower) when the central banks most recent action was to cut (hike) rates. For example, nominal U.S. equity returns are 1.8% higher during expansionary environments. Real U.S. equity returns are 3.6% higher during expansionary environments. The gap is even larger outside the United States.
However, the monetary policy regime explains very little of the overall variation in equity returns from a statistical standpoint.
While many of the return differentials during expansionary vs. contractionary regimes seem large at first glance, few are statistically significant once we realistically account for the salient features of equity returns and monetary policy. In other words, we can’t be sure the return differentials didn’t arise simply due to luck.
As a result, evidence suggests that making buy/sell decisions on the equity markets of a given country using monetary policy regime as the lone signal is overly ambitious.

Can the monetary policy environment be used to predict global equity market returns? Should we overweight/buy countries with expansionary monetary policy and underweight/sell countries with contractionary monetary policy?

Such are the softball questions that our readers tend to send in.

Intuitively, it’s clear that monetary policy has some type of impact on equity returns. After all, if the Fed raised rates to 10% tomorrow, that would clearly impact stocks.

The more pertinent question though is if these impacts always tend to be in one direction. It’s relatively straightforward to build a narrative around why this could be the case. After all, the Fed’s primary tool to manage its unemployment and inflation mandates is the discount rate. Typically, we think about the Fed hiking interest rates when the economy gets “too hot” and cutting them when it gets “too cold.” If hiking (cutting) rates has the goal of slowing (stimulating) the economy, it’s plausible to think that equity returns would be pushed lower (higher).

There are a number of good academic papers on the subject. Ioannadis and Kontonikas (2006) is a good place to start. The paper investigates the impact of monetary policy shifts on equity returns in thirteen OECD countries¹ from 1972 to 2002.

Their analysis can be split into two parts. First, they explore whether there is a contemporaneous relationship between equity returns and short-term interest rates (i.e. how do equity returns respond to interest rate changes?)². If there is a relationship, are returns likely to be higher or lower in months where rates increase?

Source: “Monetary Policy and the Stock Market: Some International Evidence” by Ioannadis and Kontonikas (2006).

In twelve of the thirteen countries, there is a negative relationship between interest rate changes and equity returns. Equity returns tend to be lower in months where short-term rates increase. The relationship is statistically significant at the 5% level in eight of the countries, including the United States.

While these results are interesting, they aren’t of much direct use for investors because, as mentioned earlier, they are contemporaneous. Knowing that equity returns are lower in months where short-term interest rates rise is actionable only if we can accurately predict the interest rate movements ahead of time.

As an aside, if there is one predictive interest rate model we subscribe to, it’s that height matters.

Fortunately, this is where the authors’ second avenue of analysis comes into play. In this section, they first classify each month as being part of either a contractionary or an expansionary monetary policy regime. A month is part of a contractionary regime if the last change in the discount rate was positive (i.e. the last action by that country’s central bank was a hike). Similarly, a month is part of an expansionary regime if the last central bank action was a rate cut.

We illustrate this classification for the United States below. Orange shading indicates contractionary regimes and gray shading indicates expansionary regimes.

The authors then regress monthly equity returns on a dummy variable representing which regime a month belongs to. Importantly, this is not a contemporaneous analysis: we know whether the last rate change was positive or negative heading into the month. Quoting the paper:

“The estimated beta coefficients associated with the local monetary environment variable are negative and statistically significant in six countries (Finland, France, Italy, Switzerland, UK, US). Hence, for those countries our measure of the stance of monetary policy contains significant information, which can be used to forecast expected stock returns. Particularly, we find that restrictive (expansive) monetary policy stance decreases (increases) expected stock returns.”

Do we agree?

Partially. When we analyze the data using a similar methodology and with data updated through 2018³, we indeed find a negative relationship between monetary policy environment and forward 1-month equity returns. For example, annualized nominal returns in the United States were 10.6% and 8.8% in expansionary and contractionary regimes, respectively. The gap is larger for real returns – 7.5% in expansionary environments and 3.9% in contractionary environments.

Source: Bloomberg, MSCI, Newfound Research. Past performance does not guarantee future results. Return data relies on hypothetical indices and is exclusive of all fees and expenses. Returns assume the reinvestment of dividends.

A similar, albeit more pronounced, pattern emerges when we go outside the United States and consider thirteen other countries.

The results are especially striking in ten of the fourteen countries examined. The effect in the U.S. was smaller compared to many of these.

That being said, we think the statistical significance (and therefore investing merit) is less obvious. Now, it is certainly the case that many of these differences are statistically significant when measured traditionally. In this sense, our results agree with Ioannadis and Kontonikas (2006).

However, there are two issues to consider. First, the R² values for the regressions are very low. For example, the highest R²in the paper is 0.037 for Finland. In other words, the monetary regime models do not do a particularly great job explaining stock returns.

Second, it’s important to take a step back and think about how monetary regimes evolve. Central banks, especially today, typically don’t raise rates one month, cut the next, raise the next, etc. Instead, these regimes tend to last multiple months or years. The traditional significance testing assumes the former type of behavior, when the latter better reflects reality.

Now, this wouldn’t be a major issue if stock returns were what statisticians call “IID” (independent and identically distributed). The results of a coin flip are IID. The probability of heads and tails are unchanged across trials and the result of one flip doesn’t impact the odds for the next.

Daily temperatures are not IID. The distribution of temperatures is very different for a day in December than they are for a day in July, at least for most of us. They are not identical. Nor are they independent. Today’s high temperature gives us some information that tomorrow’s temperature has a good chance of hitting that value as well.

Needless to say, stock returns behave more like temperatures than they do coin flips. This combination of facts – stock returns being non-IID (exhibiting both heteroskedasticity⁴ and autocorrelation) and monetary policy regimes having the tendency to persist over the medium term – leads to false positives. What at first glance look like statistically significant relationships are no longer up to snuff because the model was poorly constructed in the first place.

To flush out these issues, we used two different simulation-based approaches to test for the significance of return differences across regimes.⁵

The first approach works as follows for each country:

Compute the probability of expansionary and contractionary regimes using that country’ actual history.
Randomly classify each month into one of the two regimes using the probabilities from #1.
Compute the difference between annualized returns in expansionary vs. contractionary regimes using that country’s actual equity returns.
Return to #2, repeating 10,000 times total.

This approach assumes that today’s monetary policy regime says nothing about what tomorrow’s may be. We have transformed monetary policy into an IID variable. Below, we plot the regime produced by a single iteration of the simulation. Clearly, this is not realistic.

Source: Newfound Research

The second approach is similar to the first in all ways except how the monetary policy regimes are simulated. The algorithm is:

Compute the transition matrix for each country using that country’s actual history of monetary policy shifts. A transition matrix specifies the likelihood of moving to each regime state given that we were in a given regime the prior month. For example, if last month was contractionary, we may have a 95% probability of staying contractionary and a 5% probability of moving to an expansionary state.
Randomly classify each month into one of the two regimes using the transition matrix from #1. We have to determine how to seed the simulation (i.e. which state do we start off in). We do this randomly using the overall historical probability of contractionary/expansionary regimes for that country.
Compute the difference between annualized returns in expansionary vs. contractionary regimes using that country’s actual equity returns.
Return to #2, repeating 10,000 times total.

The regimes produced by this simulation look much more realistic.

Source: Newfound Research

When we compare the distribution of return differentials produced by each of the simulation approaches, we see that the second produces a wider range of outcomes.

Source: Newfound Research

In the table below, we present the confidence intervals for return differentials using each algorithm. We see that the differentials are statistically significant in six of the fourteen countries when we use the first methodology that produces unrealistic monetary regimes. Only four countries show statistically significant results with the improved second method.

*Country*	Spread Between Annualized Real Returns	95% CI First Method	P-Value First Method	95% CI Second Method	P-Value Second Method
Australia	+9.8%	-1.1% to +20.7%	7.8%	-1.5% to +21.1%	8.9%
Belgium	+14.6%	+4.1% to +25.1%	0.6%	+0.7% to +28.5%	3.9%
Canada	-0.7%	-12.2% to +10.8%	90.5%	-14.2% to +12.8%	91.9%
Finland	+29.0%	+6.5% to +51.5%	1.2%	-2.4% to +60.4%	7.1%
France	+17.3%	-0.5% to +35.1%	5.7%	-10.8% to +45.4%	22.7%
Germany	+10.8%	-1.1% to +22.7%	7.5%	-2.8% to +24.4%	12.0%
Italy	+17.3%	+3.6% to +31.0%	1.3%	-0.2% to +34.8%	5.3%
Japan	+26.5%	+12.1% to +40.9%	0.0%	+3.4% to +49.6%	2.5%
Netherlands	+16.8%	-1.8% to +35.4%	7.6%	-11.6% to +45.2%	24.7%
Spain	+23.8%	+11.3% to +36.3%	0.0%	+9.9% to +37.7%	0.1%
Sweden	+30.4%	+12.7% to +48.1%	0.1%	+4.7% to +56.1%	2.1%
Switzerland	+2.3%	-11.5% to +16.1%	74.4%	-26.3% to +30.9%	87.5%
United Kingdom	-0.6%	-11.5% to +10.3%	91.4%	-12.0% to +10.8%	91.8%
United States	+3.6%	-5.0% to +12.2%	41.1%	-6.0% to +13.2%	46.2%

Source: Bloomberg, MSCI, Newfound Research

Conclusion

We find that global equity returns have been more than 10% higher during expansionary regimes. At first glance, such a large differential suggests there may be an opportunity to profitably trade stocks based on what regime a given country is in.

Unfortunately, the return differentials, while large, are generally not statistically significant when we account for the realistic features of equity returns and monetary policy regimes. In plain English, we can’t be sure that the return differentials didn’t arise simply due to randomness.

This result isn’t too surprising when we consider the complexity of the relationship between equity returns and interest rates (despite what financial commentators may have you believe). Interest rate changes can impact both the numerator (dividends/dividend growth) and denominator (discount rate) of the dividend discount model in complex ways. In addition, there are numerous other factors that impact equity returns and are unrelated / only loosely related to interest rates.

When such complexity reigns, it is probably a bit ambitious to rely on a standalone measure of monetary policy regime as a predictor of equity returns.

Momentum’s Magic Number

By Corey Hoffstein

On July 15, 2018

In Momentum, Risk & Style Premia, Weekly Commentary

This post is available as a PDF download here.

Summary

In HIMCO’s May 2018 Quantitative Insight, they publish a figure that suggests the optimal holding length of a momentum strategy is a function of the formation period.
Specifically, the result suggests that the optimal holding period is one selected such that the formation period plus the holding period is equal to 14-to-18 months: a somewhat “magic” result that makes little intuitive, statistical, or economic sense.
To investigate this result, we construct momentum strategies for country indices as well as industry groups.
We find similar results, with performance peaking when the formation period plus the holding period is equal to 12-to-14 months.
While lacking a specific reason why this effect exists, it suggests that investors looking to leverage shorter-term momentum signals may benefit from longer investment horizons, particularly when costs are considered.

A few weeks ago, we came across a study published by HIMCO on momentum investing¹. Contained within this research note was a particularly intriguing exhibit.

Source: HIMCO Quantitative Insights, May 2018

What this figure demonstrates is that the excess cumulative return for U.S. equity momentum strategies peaks as a function of both formation period and holding period. Specifically, the returns appear to peak when the sum of the formation and holding period is between 14-18 months.

For example, if you were to form a portfolio based upon trailing 6-1 momentum – i.e. ranking on the prior 6-month total returns and skipping the most recent month (labeled in the figure above as “2_6”) – this evidence suggests that you would want to hold such a portfolio for 8-to-12 months (labeled in the figure above as 14-to-18 months since the beginning of the uptrend).

Which is a rather odd conclusion. Firstly, we would intuitively expect that we should employ holding periods that are shorter than our formation periods. The notion here is that we want to use enough data to harvest information that will be stationary over the next, smaller time-step. So, for example, we might use 36 months of returns to create a covariance matrix that we might hold constant for the next month (i.e. a 36-month formation period with a 1-month hold). Given that correlations are non-stable, we would likely find the idea of using 1-month of data to form a correlation matrix we hold for the next 36-months rather ludicrous.

And, yet, here we are in a similar situation, finding that if we use a formation period of 5 months, we should hold our portfolio steady for the next 8-to-10 months. And this is particularly weird in the world of momentum, which we typically expect to be a high turnover strategy. How in the world can having a holding period longer than our formation period make sense when we expect information to quickly decay in value?

Perhaps the oddest thing of all is the fact that all these results center around 14-18 months. It would be one thing if the conclusion was simply, “holding for six months after formation is optimal”; here the conclusion is that the optimal holding period is a function of formation period. Nor is the conclusion something intuitive, like “the holding period should be half the formation period.”

Rather, the result – that the holding period should be 14-to-18 months minus the length of the formation period – makes little intuitive, statistical, or economic sense.

Out-of-Sample Testing with Countries and Sectors

In effort to explore this result further, we wanted to determine whether similar results were found when cross-sectional momentum was applied to country indices and industry groups.

Specifically, we ran three tests.

In the first, we constructed momentum portfolios using developed country index returns (U.S. dollar denominated; net of withholding taxes) from MSCI. The countries included in the test are: Australia, Austria, Belgium, Canada, Denmark, Finland, France, Germany, Hong Kong, Ireland, Israel, Italy, Japan, Netherlands, New Zealand, Norway, Portugal, Singapore, Spain, Sweden, Switzerland, the United Kingdom, and the United States of America. The data extends back to 12/1969.

In the second, we constructed momentum portfolios using the 12 industry group data set from the Kenneth French Data Library. The data extends back to 7/1926.

In the third, we constructed momentum portfolios using the 49 industry group data set from the Kenneth French Data Library. The data extends back to 7/1926.

For each data set, we ran the same test:

Vary formation periods from 5-1 to 12-1 months.
Vary holding periods from 1-to-26 months.
Using this data, construct dollar-neutral long/short portfolios that go long, in equal-weight, the top third ranking holdings and go short, in equal-weight, the bottom third.

Note that for holding periods exceeding 1 month, we employed an overlapping portfolio construction process.

Below we plot the results.

Source: MSCI and Kenneth French Data Library. Calculations by Newfound Research. Past performance is not a predictor of future results. All information is backtested and hypothetical and does not reflect the actual strategy managed by Newfound Research. Performance is net of all fees except for underlying ETF expense ratios. Returns assume the reinvestment of all dividends, capital gains, and other earnings.

While the results are not as clear as those published by HIMCO, we still see an intriguing effect: returns peak as a function of both formation and holding period. For the country strategy, formation and holding appear to peak between 12-14 months, indicating that an investor using 5-1 month signals would want to hold for 7 months while an investor using 12-1 signals would only want to hold for 1 month.

For the industry data, the results are less clear. Where the HIMCO and country results exhibited a clear “peak,” the industry results simply seem to “decay slower.” In particular, we can see in the results for the 12-industry group test that almost all strategies peak with a 1-month holding period. However, they all appear to fall off rapidly, and uniformly, after the time where formation plus holding period exceeds 16 months.

While less pronounced, it is worth pointing out that this result is achieved without the consideration of trading costs or taxes. So, while the 5-1 strategy 12-industry group strategy return may peak with a 1-month hold, we can see that it later forms a second peak at a 9-month hold (“14 months since beginning uptrend”). Given that we would expect a nine month hold to exhibit considerably less trading, analysis that includes trading cost estimates may exhibit even greater peakedness in the results.

Does the Effect Persist for Long-Only Portfolios?

In analyzing factors, it is often important to try to determine whether a given result is arising from an effect found in the long leg or the short leg. After all, most investors implement strategies in a long-only capacity. While long-only strategies are, technically, equal to a benchmark plus a dollar-neutral long/short portfolio², the long/short portfolio rarely reflects the true factor definition.

Therefore, we want to evaluate long-only construction to determine whether the same result holds, or whether it is a feature of the short-leg.

We find incredibly similar results. Again, country indices appear to peak between 12-to-14 months after the beginning of the uptrend. Industry group results, while not as strong as country results, still appear to offer fairly flat results until 12-to-14 months after the beginning of the uptrend. Taken together, it appears that this result is sustained for long-only portfolio implementations as well.

Conclusion

Traditionally, momentum is considered a high turnover factor. Relative ranking of recent returns can vary substantially over time and our intuition would lead us to expect that the shorter the horizon we use to measure returns, the shorter the time we expect the relative ranking to persist.

Yet recent research published by HIMCO finds this intuition may not be true. Rather, they find that momentum portfolio performance tends to peak 14-to-18 months after the beginning of the uptrend in measured. In other words, a portfolio formed on prior 5-month returns should hold between 9-to-13 months, while a portfolio formed on the prior 12-months of returns should only hold 2-to-6 months.

This result is rather counter-intuitive, as we would expect that shorter formation periods would require shorter holding periods.

We test this result out-of-sample, constructing momentum portfolios using country indices, 12-industry group indices, and 49-industry group indices. We find a similar result in this data. We then further test whether the result is an artifact found in only long/short implementations whether this information is useful for long-only investors. Indeed, we find very similar results for long-only implementations.

Precisely why this result exists is still up in the air. One argument may be that the trade-off is ultimately centered around win rate versus the size of winners. If relative momentum tends to persist for only for 12-to-18 months total, then using 12-month formation may give us a higher win rate but reduce the size of the winners we pick. Conversely, using a shorter formation period may reduce the number of winners we pick correctly (i.e. lower win rate), but those we pick have further to run. Selecting a formation period and a holding period such that their sum equals approximately 14 months may simply be a hack to find the balance of win rate and win size that maximizes return.

Factor Fimbulwinter

By Corey Hoffstein

On June 11, 2018

In Carry, Defensive, Momentum, Popular, Risk & Style Premia, Trend, Value, Weekly Commentary

This post is available as a PDF download here.

Summary

Value investing continues to experience a trough of sorrow. In particular, the traditional price-to-book factor has failed to establish new highs since December 2006 and sits in a 25% drawdown.
While price-to-book has been the academic measure of choice for 25+ years, many practitioners have begun to question its value (pun intended).
We have also witnessed the turning of the tides against the size premium, with many practitioners no longer considering it to be a valid stand-alone anomaly. This comes 35+ years after being first published.
With this in mind, we explore the evidence that would be required for us to dismiss other, already established anomalies. Using past returns to establish prior beliefs, we simulate out forward environments and use Bayesian inference to adjust our beliefs over time, recording how long it would take for us to finally dismiss a factor.
We find that for most factors, we would have to live through several careers to finally witness enough evidence to dismiss them outright.
Thus, while factors may be established upon a foundation of evidence, their forward use requires a bit of faith.

In Norse mythology, Fimbulvetr (commonly referred to in English as “Fimbulwinter”) is a great and seemingly never-ending winter. It continues for three seasons – long, horribly cold years that stretch on longer than normal – with no intervening summers. It is a time of bitterly cold, sunless days where hope is abandoned and discord reigns.

This winter-to-end-all-winters is eventually punctuated by Ragnarok, a series of events leading up to a great battle that results in the ultimate death of the major gods, destruction of the cosmos, and subsequent rebirth of the world.

Investment mythology is littered with Ragnarok-styled blow-ups and we often assume the failure of a strategy will manifest as sudden catastrophe. In most cases, however, failure may more likely resemble Fimbulwinter: a seemingly never-ending winter in performance with returns blown to-and-fro by the harsh winds of randomness.

Value investors can attest to this. In particular, the disciples of price-to-book have suffered greatly as of late, with “expensive” stocks having outperformed “cheap” stocks for over a decade. The academic interpretation of the factor sits nearly 25% belowits prior high-water mark seen in December 2006.

Expectedly, a large number of articles have been written about the death of the value factor. Some question the factor itself, while others simply argue that price-to-book is a broken implementation.

But are these simply retrospective narratives, driven by a desire to have an explanation for a result that has defied our expectations? Consider: if price-to-book had exhibited positive returns over the last decade, would we be hearing from nearly as large a number of investors explaining why it is no longer a relevant metric?

To be clear, we believe that many of the arguments proposed for why price-to-book is no longer a relevant metric are quite sound. The team at O’Shaughnessy Asset Management, for example, wrote a particularly compelling piece that explores how changes to accounting rules have led book value to become a less relevant metric in recent decades.¹

Nevertheless, we think it is worth taking a step back, considering an alternate course of history, and asking ourselves how it would impact our current thinking. Often, we look back on history as if it were the obvious course. “If only we had better prior information,” we say to ourselves, “we would have predicted the path!”² Rather, we find it more useful to look at the past as just one realized path of many that’s that could have happened, none of which were preordained. Randomness happens.

With this line of thinking, the poor performance of price-to-book can just as easily be explained by a poor roll of the dice as it can be by a fundamental break in applicability. In fact, we see several potential truths based upon performance over the last decade:

This is all normal course performance variance for the factor.
The value factor works, but the price-to-book measure itself is broken.
The price-to-book measure is over-crowded in use, and thus the “troughs of sorrow” will need to be deeper than ever to get weak hands to fold and pass the alpha to those with the fortitude to hold.
The value factor never existed in the first place; it was an unfortunate false positive that saturated the investing literature and broad narrative.

The problem at hand is two-fold: (1) the statistical evidence supporting most factors is considerable and (2) the decade-to-decade variance in factor performance is substantial. Taken together, you run into a situation where a mere decade of underperformance likely cannot undue the previously established significance. Just as frustrating is the opposite scenario. Consider that these two statements are not mutually exclusive: (1) price-to-book is broken, and (2) price-to-book generates positive excess return over the next decade.

In investing, factor return variance is large enough that the proof is not in the eating of the short-term return pudding.

The small-cap premium is an excellent example of the difficulty in discerning, in real time, the integrity of an established factor. The anomaly has failed to establish a meaningful new high since it was originally published in 1981. Only in the last decade – nearly 30 years later – have the tides of the industry finally seemed to turn against it as an established anomaly and potential source of excess return.

Thirty years.

The remaining broadly accepted factors – e.g. value, momentum, carry, defensive, and trend – have all been demonstrated to generate excess risk-adjusted returns across a variety of economic regimes, geographies, and asset classes, creating a great depth of evidence supporting their existence. What evidence, then, would make us abandon faith from the Church of Factors?

To explore this question, we ran a simple experiment for each factor. Our goal was to estimate how long it would take to determine that a factor was no longer statistically significant.

Our assumption is that the salient features of each factor’s return pattern will remain the same (i.e. autocorrelation, conditional heteroskedasticity, skewness, kurtosis, et cetera), but the forward average annualized return will be zero since the factor no longer “works.”

Towards this end, we ran the following experiment:

Take the full history for the factor and calculate prior estimates for mean annualized return and standard error of the mean.
De-mean the time-series.
Randomly select a 12-month chunk of returns from the time series and use the data to perform a Bayesian update to our mean annualized return.
Repeat step 3 until the annualized return is no longer statistically non-zero at a 99% confidence threshold.

For each factor, we ran this test 10,000 times, creating a distribution that tells us how many years into the future we would have to wait until we were certain, from a statistical perspective, that the factor is no longer significant.

Sixty-seven years.

Based upon this experience, sixty-seven years is median number of years we will have to wait until we officially declare price-to-book (“HML,” as it is known in the literature) to be dead.³ At the risk of being morbid, we’re far more likely to die before the industry finally sticks a fork in price-to-book.

We perform this experiment for a number of other factors – including size (“SMB” – “small-minus-big”), quality (“QMJ” – “quality-minus-junk”), low-volatility (“BAB” – “betting-against-beta”), and momentum (“UMD” – “up-minus-down”) – and see much the same result. It will take decades before sufficient evidence mounts to dethrone these factors.

	HML	SMB⁴	QMJ	BAB	UMD
Median Years-until-Failure	67	43	132	284	339

Now, it is worth pointing out that these figures for a factor like momentum (“UMD”) might be a bit skewed due to the design of the test. If we examine the long-run returns, we see a fairly docile return profile punctuated by sudden and significant drawdowns (often called “momentum crashes”).

Since a large proportion of the cumulative losses are contained in these short but pronounced drawdown periods, demeaning the time-series ultimately means that the majority of 12-month periods actually exhibit positive returns. In other words, by selecting random 12-month samples, we actually expect a high frequency of those samples to have a positive return.

For example, using this process, 49.1%, 47.6%, 46.7%, 48.8% of rolling 12-month periods are positive for HML, SMB, QMJ, and BAB factors respectively. For UMD, that number is 54.7%. Furthermore, if you drop the worst 5% of rolling 12-month periods for UMD, the average positive period is 1.4x larger than the average negative period. Taken together, not only are you more likely to select a positive 12-month period, but those positive periods are, on average, 1.4x larger than the negative periods you will pick, except for the rare (<5%) cases.

The process of the test was selected to incorporate the salient features of each factor. However, in the case of momentum, it may lead to somewhat outlandish results.

Conclusion

While an evidence-based investor should be swayed by the weight of the data, the simple fact is that most factors are so well established that the majority of current practitioners will likely go our entire careers without experiencing evidence substantial enough to dismiss any of the anomalies.

Therefore, in many ways, there is a certain faith required to use them going forward. Yes, these are ideas and concepts derived from the data. Yes, we have done our best to test their robustness out-of-sample across time, geographies, and asset classes. Yet we must also admit that there is a non-zero probability, however small it is, that these are false positives: a fact we may not have sufficient evidence to address until several decades hence.

And so a bit of humility is warranted. Factors will not suddenly stand up and declare themselves broken. And those that are broken will still appear to work from time-to-time.

Indeed, the death of a factor will be more Fimulwinter than Ragnarok: not so violent to be the end of days, but enough to cause pain and frustration among investors.

Addendum

We have received a large number of inbound notes about this commentary, which fall upon two primary lines of questions. We want to address these points.

How were the tests impacted by the Bayesian inference process?

The results of the tests within this commentary are rather astounding. We did seek to address some of the potential flaws of the methodology we employed, but by-in-large we feel the overarching conclusion remains on a solid foundation.

While we only presented the results of the Bayesian inference approach in this commentary, as a check we actually tested two other approaches:

A Bayesian inference approach assuming that forward returns would be a random walk with constant variance (based upon historical variance) and zero mean.
Forward returns were simulated using the same bootstrap approach, but the factor was being discovered for the first time and the entire history was being evaluated for its significance.

The two tests were in effort to isolate the effects of the different components of our test.

What we found was that while the reported figures changed, the overall magnitude did not. In other words, the median death-date of HML may not have been 67 years, but the order of magnitude remained much the same: decades.

Stepping back, these results were somewhat a foregone conclusion. We would not expect an effect that has been determined to be statistically significant over a hundred year period to unravel in a few years. Furthermore, we would expect a number of scenarios that continue to bolster the statistical strength just due to randomness alone.

Why are we defending price-to-book?

The point of this commentary was not to defend price-to-book as a measure. Rather, it was to bring up a larger point.

As a community, quantitative investors often leverage statistical significance as a defense for the way we invest.

We think that is a good thing. We should look at the weight of the evidence. We should be data driven. We should try to find ideas that have proven to be robust over decades of time and when applied in different markets or with different asset classes. We should want to find strategies that are robust to small changes in parameterization.

Many quants would argue (including us among them), however, that there also needs to be a why. Why does this factor work? Without the why, we run the risk of glorified data mining. With the why, we can choose for ourselves whether we believe the effect will continue going forward.

Of course, there is nothing that prevents the why from being pure narrative fallacy. Perhaps we have simply weaved a story into a pattern of facts.

With price-to-book, one might argue we have done the exact opposite. The effect, technically, remains statistically significant and yet plenty of ink has been spilled as to why it shouldn’t work in the future.

The question we must answer, then, is, “when does statistically significant apply and when does it not?” How can we use it as a justification in one place and completely ignore it in others?

Furthermore, if we are going to rely on hundreds of years of data to establish significance, how can we determine when something is “broken” if the statistical evidence does not support it?

Price-to-book may very well be broken. But that is not the point of this commentary. The point is simply that the same tools we use to establish and defend factors may prevent us from tearing them down.

How to Benchmark Trend-Following

By Nathan Faber

On May 14, 2018

In Risk & Style Premia, Risk Management, Trend, Weekly Commentary

This post is available as a PDF download here.

Summary

Benchmarking a trend-following strategy can be a difficult exercise in managing behavioral biases.
While the natural tendency is often to benchmark equity trend-following to all-equities (e.g. the S&P 500), this does not accurately give the strategy credit for choosing to be invested when the market is going up.
A 50/50 portfolio of equities and cash is generally an appropriate benchmark for long/flat trend-following strategies, both for setting expectations and for gauging current relative performance.
If we acknowledge that for a strategy to outperform over the long-run, it must undergo shorter periods of underperformance, using this symmetric benchmark can isolate market environments that underperformance should be expected.
Diversifying risk-management approaches (e.g. pairing strategic allocation with tactical trend-following) can manage events that are unfavorable to one strategy, and benchmarking is a tool to set expectations around the level of risk management necessary in different market environments.

Any strategy that deviates from the most basic is compared to a benchmark. But how do you choose an appropriate benchmark?

The complicated nature of benchmarking can be easily seen by considering something as simple as a value stock strategy.

You may pit your concentrated value manager you currently use up against the more diversified value manager you used previously. At that time, you may have compared that value manager to a systematic smart-beta ETF like the iShares S&P 500 Value ETF (ticker: IVE). And if you were invested in that ETF, you might compare its performance to the S&P 500.

What prevents you from benchmarking them all to the S&P 500? Or from benchmarking the concentrated value strategy to all of the other three?

Benchmark choices are not unique and are highly dependent on what aspect of performance you wish to measure.

Benchmarking is one of the most frequently abused facets of investing. It can be extremely useful when applied in the correct manner, but most of the time, it is simply a hurdle to sticking with an investment plan.

In an ideal world, the only benchmark for an investor would be whether or not they are on track for hitting their financial goals. However, in an industry obsessed with relative performance, choosing a benchmark is a necessary exercise.

This commentary will explore some of the important considerations when choosing a benchmark for trend-following strategies.

The Purpose of a Trend-Following Benchmark

As an investment manager, our goal with benchmarking is to check that a strategy’s performance is in line with our expectations. Performance versus a benchmark can answer questions such as:

Is the out- or underperformance appropriate for the given market environment?
Is the magnitude of out- or underperformance typical?
How is the strategy behaving in the context of other ways of managing risk?

With long/flat trend-following strategies, the appropriate benchmark should gauge when the manager is making correct or incorrect calls in either direction.

Unfortunately, we frequently see long/flat equity trend-following strategies benchmarked to an all-equity index like the S&P 500. This is similar to the coinflip game we outlined in our previous commentary about protecting and participating with trend-following.[1]

The behavioral implications of this kind of benchmarking are summarized in the table below.

The two cases with wrong calls – to move to cash when the market goes up or remain invested when the market goes down – are appropriately labeled, as is the correct call to move to cash when the market is going down. However, when the market is going up and the strategy is invested, it is merely keeping up with its benchmark even though it is behaving just as one would want it to.

To reward the strategy in either correct call case, the benchmark should consist of allocations to both equity and cash.

A benchmark like this can provide objective answers to the questions outlined above.

Deriving a Trend-Following Benchmark

Sticking with the trend-following strategy example we outlined in our previous commentary [2], we can look at some of the consequences of choosing different benchmarks in terms of how much the trend-following strategy deviates from them over time.

The chart below shows the annualized tracking error of the strategy to the range of strategic proportions of equity and cash.

Source: Kenneth French Data Library. Data from July 1926 – February 2018. Calculations by Newfound Research. Returns are gross of all fees, including transaction fees, taxes, and any management fees. Returns assume the reinvestment of all distributions. This document does not reflect the actual performance results of any Newfound investment strategy or index. All returns are backtested and hypothetical. Past performance is not a guarantee of future results.

The benchmark that minimizes the tracking error is a 47% allocation to equities and 53% to cash. This 0.47 is also the beta of the trend-following strategy, so we can think of this benchmark as accounting for the risk profile of the strategy over the entire 92-year period.

But what if we took a narrower view by constraining this analysis to recent performance?

The chart below shows the equity allocation of the benchmark that minimizes the tracking error to the trend-following strategy over rolling 1-year periods.

A couple of features stand out here.

First, if we constrain our lookback period to one year, a time-period over which many investors exhibit anchoring bias, then the “benchmark” that we may think we will closely track – the one we are mentally tied to – might be the one that we deviate the most from over the next year.

And secondly, the approximately 50/50 benchmark calculated using the entire history of the strategy is rarely the one that minimizes tracking error over the short term.

The median equity allocation in these benchmarks is 80%, the average is 67%, and the data is highly clustered at the extremes of 100% equity and 100% cash.

The Intuitive Trend-Following Benchmark

Is there a problem in determining a benchmark using the tracking error over the entire period?

One issue is that it is being calculated with the benefit of hindsight. If you had started a trend-following strategy back in the 1930s, you would have arrived at a different equity allocation for the benchmark based on this analysis given the available data (e.g. using data up until the end of 1935 yields an equity allocation of 37%).

To remove this reliance on having a sufficiently long backtest, our preference is to rely more on the strategy’s rules and how we would use it in a portfolio to determine our trend-following benchmarks.

For a trend following strategy that pivots between stocks and cash, a 50/50 benchmark is a natural choice.

It is broad enough to include the assets in the trend-following strategy’s investment universe while being neutral to the calls to be long or flat.

Seeing the 50/50 portfolio be the answer to the tracking error minimization problem over the entire data simply provides empirical evidence for its use.

One argument against using a 50/50 blend could focus on the fact that the market is generally up more frequently than it is down, at least historically. While this is true, the magnitude of down moves has often been larger than the magnitude of up moves. Since this strategy is explicitly meant as a risk management tool, accounting for both the magnitude and the frequency is prudent.

Another argument against its use could be the belief that we are entering a different market environment where history will not be an accurate guide going forward. However, given the random nature of market moves coupled with the behavioral tendencies of investors to overreact, herd, and anchor, a benchmark close to a 50/50 is likely still a fitting choice.

Setting Expectations with a Trend-Following Benchmark

Now that we have a benchmark to use, how do we use it to set our expectations?

Neglecting the historical data for the moment, from the ex-ante perspective, it is helpful to decompose a typical market cycle into four different segments and assess how we expect trend-following to behave:

Initial decline – Equity markets begin to sell off, and the fully invested trend-following strategy underperforms the 50/50 benchmark.
Prolonged drawdown – The trend-following strategy adapts to the decline and moves to cash. The trend-following strategy outperforms.
Initial recovery – The trend-following strategy is still in cash and lags the benchmark as prices rebound off the bottom.
Sustained recovery – The trend-following strategy reinvests and captures more of the upside than the benchmark.

Of course, this is a somewhat ideal scenario that rarely plays out perfectly. Whipsaw events occur as prices recover (decline) before declining (recovering) again.

But it is important to note how the level of risk relative to this 50/50 benchmark varies over time.

Contrast this with something like an all equity strategy benchmarked to the S&P 500 where the risk is likely to be similar during most market environments.

Now, if we look at the historical data, we can see this borne out in the graph of the drawdowns for trend-following and the 50/50 benchmark.

In most prolonged and major (>20%) drawdowns, trend-following first underperforms the benchmark, then outperforms, then lags as equities improve, and then outperform again.

Using the most recent example of the Financial Crisis, we can see the capture ratios verses the benchmark in each regime.

Source: Kenneth French Data Library. Data from October 2007 – February 2018. Calculations by Newfound Research. Returns are gross of all fees, including transaction fees, taxes, and any management fees. Returns assume the reinvestment of all distributions. This document does not reflect the actual performance results of any Newfound investment strategy or index. All returns are backtested and hypothetical. Past performance is not a guarantee of future results.

The underperformance of the trend-following strategy verses the benchmark is in line with expectations based on how the strategy is desired to work.

Another way to use the benchmark to set expectations is to look at rolling returns historically. This gives context for the current out- or underperformance relative to the benchmark.

From this we can see which percentile the current return falls into or check to see how many standard deviations it is away from the average level of relative performance.

In all this, there are a few important points to keep in mind:

Price moves that occur faster than the scope of the trend-following measurement can be one source of the largest underperformance events.
Along a similar vein, whipsaw is a key risk of trend-following. Highly oscillatory markets will not be favorable to trend-following. In these scenarios, trend following can underperform even fully invested equities.
With percentile analysis, there is always a first time for anything. Having a rich data history covering a variety of market scenarios mitigates this, but setting new percentiles, either on the low end or high end, is always possible.
Sometimes a strategy is expected to lag its benchmark in a given market environment. A primary goal with benchmarking is it accurately set these expectations for the potential magnitude of relative performance and design the portfolio accordingly.

Conclusion

Benchmarking a trend-following strategy can be a difficult exercise in managing behavioral biases. With the tendency to benchmark all equity-based strategies to an all-equity index, investors often set themselves up for a let-down in a bull market with trend-following.

With benchmarking, the focus is often on lagging the benchmark by “too much.” This is what an all-equity benchmark can do to trend-following. However, the issue is symmetric: beating the benchmark by “too much” can also signal either an issue with the strategy or with the benchmark choice. This is why we would not benchmark a long/flat trend-following strategy to cash.

A 50/50 portfolio of equities and cash is generally an appropriate benchmark for long/flat trend-following strategies. This benchmark allows us to measure the strategy’s ability to correctly allocate when equities are both increasing or decreasing.

Too often, investors use benchmarking solely to see which strategy is beating the benchmark by the most. While this can be a use for very similar strategies (e.g. a set of different value managers), we must always be careful not to compare apples to oranges.

A benchmark should not conjure up an image of a dog race where the set of investment strategies are the dogs and the benchmark is the bunny out ahead, always leading the way.

We must always acknowledge that for a strategy to outperform over the long-run, it must undergo shorter periods of underperformance. Diversifying approaches can manage events that are unfavorable to one strategy, and benchmarking is a tool to set expectations around the level of risk management necessary in different market environments.

[1] https://blog.thinknewfound.com/2018/05/leverage-and-trend-following/

[2] https://blog.thinknewfound.com/2018/03/protect-participate-managing-drawdowns-with-trend-following/

Leverage and Trend Following

By Corey Hoffstein

On May 7, 2018

In Risk & Style Premia, Risk Management, Trend, Weekly Commentary

This post is available as a PDF download here.

Summary

We typically discuss trend following in the context of risk management for investors looking to diversify their diversifiers.
While we believe that trend following is most appropriate for investors concerned about sequence risk, levered trend following may have use for investors pursuing growth.
In a simple back-test, a naïve levered trend following considerably increases annualized returns and reduces negative skew and kurtosis (“fat tails”).
The introduced leverage, however, significantly increases annualized volatility, meaning that the strategy still exhibits significant and large drawdown profiles.
Nevertheless, trend following may be a way to allow for the incorporation of leverage with reduced risk of permanent portfolio impairment that would otherwise occur from large drawdowns.

In an industry obsessed with alpha, our view here at Newfound has long been to take a risk-first approach to investing. In light of this, when we discuss trend following techniques, it is often with an eye towards explicitly managing drawdowns. Our aim is to help investors diversify their diversifiers and better manage the potentially devastation that sequence risk can wreak upon their portfolios.

Thus, we often discuss the application of trend following for soon-to-be and recent retirees who are in peak sequence risk years.

Empirical evidence suggests that trend following can be a highly effective means of limiting exposure to significant and prolonged drawdowns.
Trend following is complementary to other diversifiers like fixed income, which can theoretically increase the Sharpe ratio of the diversification bucket as a whole.
Instead of acting as a static hedge, the dynamic approach of trend following can also help investors take advantage of market tailwinds. This may be particularly important if real interest rates remain low.
The potential tax inefficiency of trend following is significantly lower when the alternative risk management technique is fixed income.

Despite our focus on using trend following to manage sequence risk, we often receive questions from investors still within their accumulation phase asking whether trend following can be appropriate for them as well. Most frequently, the question is, “If trend following can manage downside risk, can I use a levered approach to trend following in hopes of boosting returns?”

This commentary explores that idea, specifically in the context of available levered ETFs.

Does Naïve Levered Trend Following Work?

In an effort to avoid overfitting our results to any one particular model or parameterization of trend following, we have constructed our signals employing a model-of-models approach [1] Specifically, we use four different definitions of trend for a given N-period lookback:

Price-Minus-Moving-Average: When price is above its N-period simple moving average, invest.Otherwise, divest.
EWMA Cross-Over: When the (N/4)-length exponentially-weighted moving average is above the (N/2)-length exponentially-weighted moving average, invest.Otherwise, divest.
EWMA Slope: When the (N/2)-length exponentially-weighted moving average is positively sloped, invest. Otherwise, divest.
Percentile Channel: When price crossed above the trailing 75^thpercentile over the prior N-periods, invest. Stay invested until it crosses below its trailing 25^thpercentile over the prior N-periods. Stay divested until it crosses back above the 75^th

For each of these four models, we also run a number of parameterizations covering 6-to-18-month lookbacks. In grand total, there are 4 models with 5 parameterizations each, giving us 30 variations of trend signals.

Using these signals, we construct three models. In the first model, we simply invest in U.S. equities in proportion to the number of signals that are positive. For example, if 75% of the trend following signals are positive, the portfolio is 75% invested in U.S. equities and 25% in the risk-free asset.

For our leveraged model, we simply multiply the percentage of signals by 2x and invest that proportion of our portfolio in U.S. equities and the remainder in the risk-free asset. In those cases where the amount invested in U.S. equities exceeds 100% of the portfolio, we assume a negative allocation to the risk-free asset (e.g. if we invest 150% of our assets in U.S. equities, we assume a -50% allocation to the risk-free asset).

With the benefit of hindsight, we should not be surprised at the results. If we know that trend following is effective at limiting severe and prolonged drawdowns (the kryptonite to levered investors), then it should come as no surprise that a levered trend following strategy does quite well.

It is well worth pointing out, however, that a highly levered strategy can be quickly wiped out by a sudden and immediate drawdown that trend following is unable to sidestep. Assuming a 2x levered position, our portfolio would be quickly wiped out by a sharp 50% correction. While such an event did not happen during the 1900s for U.S. equities, that does not mean it cannot happen in the future. Caveat emptor.

Logarithmically-plotted equity curves can be deceiving, so it is important that we also compare the annual return characteristics.

Source: Kenneth French Data Library. Calculations by Newfound Research. Returns are gross of all fees, including transaction fees, taxes, and any management fees. Returns assume the reinvestment of all distributions. Past performance is not a guarantee of future results.

While we can see that a simple trend following approach effectively “clips” the tails of the underlying distribution – giving up both the best and the worst annual returns – the levered strategy still has significant mass in both directions. Evaluating the first several moments of the distributions, however, we see that both simple and levered trend following significantly reduce the skew and kurtosis of the return distribution.

	Mean	Standard Deviation	Skew	Kurtosis
U.S. Equities	9.4%	19%	-1.01	1.36
Trend Following	9.5%	13%	0.09	-0.92
Levered Trend Following	14.4%	26%	0.11	-0.78

Nevertheless, the standard deviation of the levered trend following strategy exceeds even that of the underlying asset, a potential indication that expectations for the approach may be less about, “Can I avoid large drawdowns?” and more about, “Can I use leverage for growth and still avoid catastrophe?” We can see this by plotting the joint annual log-return distributions.

We can see that for U.S. equity returns between 0% and -20%, the Levered Trend Following strategy can exhibit returns between -20% and -40%. About 11% of the observations fall into this category, making it an occurrence that a levered trend follower should expect to experience multiple times in their investment lifecycle. We can even see one year where U.S. equities are slightly positive and the levered model exhibits a near -30% return. It is in the most extreme U.S. equity years – those exceeding -20% – that the trend following aspect appears to come into play.

We must also ask the question, “can this idea survive associated fees?” If investors are looking to apply this approach using levered ETFs, they must consider the expense ratios of the ETFs themselves, transaction costs, and bid/ask spreads. Here we will use the ProShares Ultra S&P 500 ETF (“SSO”) as a data proxy. The expense ratio is 0.90% and the average bid/ask spread is 0.03%. Since transactions costs vary, we will assume an added annual 0.20% fee for asset-based pricing.

In comparison, for the naïve model, we will use the SPDR S&P 500 ETF (“SPY”) as the data proxy and assume an expense ratio of 0.09% and an average bid/ask spread 0.004%. Since most platforms have a vanilla S&P 500 ETF on their no-transaction fee list, we will not add any explicit transaction costs.

We plot the strategy equity curves below net of these assumed fees.

The annualized return for the Levered Trend Following strategy declines from 15.9% to 14.5%, while the unlevered version only falls from 10.1% to 10.0%. While the overall return of the levered version declines by 140 basis points per year, it still far exceeds the total return performance of the unlevered version.

Conclusion

Based upon this initial analysis, it would appear that a simple, levered trend following approach may be worth further consideration for investors in the accumulation phase of their investment lifecycle.

Do-it-yourself investors may have no problem implementing this idea on their own using levered ETFs, but other investors may prefer a simple, packaged approach. Unfortunately, as far as we are aware, no such packaged product exists in the marketplace today.

However, one workaround may be to utilize levered ETFs to “make room” for an unlevered trend following strategy. For example, if a growth-oriented investor currently holdings an 80/20 stock/bond mix and wanted to introduce a 20% allocation to trend following, they could re-orient their portfolio to be 60% stocks, 10% 2x levered stocks, 10% 2x levered bonds, and 20% trend following. This would have the effect of being an 80/20 stock/bond portfolio with 20% leverage applied to introduce the trend following strategy. While there are the nuances of daily reset to consider in the levered ETF solutions, this approach may allow for the modest introduction of levered trend following into the portfolio.

It is worth noting that while we employed up to 2x leverage in this commentary, there is no reason investors could not apply a lower amount, either by mixing levered and unlevered ETFs, or by using a solution like the new Portfolio+ line-up from Direxion, which applies 1.25x leverage to underlying indices.

As we like to say here at Newfound, “risk cannot be destroyed, only transformed.” While this commentary explored levered trend following in comparison to unlevered exposure, a more apt comparison might simply be to levered market exposure. We suspect that the trend following overlay creates the same transformation: a reduction of the best and worst years at the cost of whipsaw. However, the introduction of leverage further heightens the risk of sudden and immediate drawdowns: the exact loss profile trend following is ill-suited to avoid.

[1] Nothing in this commentary reflects an actual investment strategy or model managed by Newfound and any investment strategies or investment approaches reflected herein are constructed solely for purposes of analyzing and evaluating the topics herein.

The Research Library of Newfound Research

Category: Risk & Style Premia Page 11 of 16

Timing Equity Returns Using Monetary Policy

Summary

Conclusion

Momentum’s Magic Number

Summary

Out-of-Sample Testing with Countries and Sectors

Does the Effect Persist for Long-Only Portfolios?

Conclusion

Factor Fimbulwinter

Summary

Conclusion

Addendum