*This post is available as a PDF download here.*

# Summary

- Last week we introduced a signal that appeared to generate statistically significant performance results for performing country rotation.
- This week, we walk through the steps taken to explore the robustness of the signal.
- We first explore out-of-sample data with sector and emerging market country indices. Unfortunately, definitional differences and limited data impact our ability to pass judgement.
- We then use simulation methods to test the robustness of the original strategy.
- We find that just four countries are responsible for nearly 100% of the original strategy’s performance: Australia, Austria, Denmark, and Spain. Removing these countries from the eligible universe makes the signal become insignificant.
- In performing a similar analysis for a standard momentum signal, we do not find the same impact from a limited subset of countries, indicating that our signal is likely an artifact of data-mining.

In last week’s commentary (*Country Rotation with Growth/Value Sentiment*) we introduced a signal we had stumbled upon for country rotation that had a pretty attractive backtest.

*Source: MSCI. Calculations by Newfound Research. Returns are hypothetical and backtested. Returns are gross of all fees, transaction costs, and taxes. Returns assume the reinvestment of all distributions. You cannot invest in an index.*

The signal was generated by calculating the prior return spread between a country’s growth and value indices (which we’ll call “GMV” for the remainder of this commentary) and using the cross-sectional rank of the return spread to allocate to countries. Countries for which growth had dramatically out-performed value were likely to receive an overweight, while countries for which value had dramatically out-performed growth would receive an underweight.

What made this signal even more interesting was that it seemed to exhibit little-to-no meaningful correlation with traditional factors like momentum and value, perhaps indicating that it captured some additional source of information.

*Source: MSCI. Calculations by Newfound Research. Returns are hypothetical and backtested. Returns are gross of all fees, transaction costs, and taxes. Returns assume the reinvestment of all distributions. You cannot invest in an index.*

The danger of such a signal was that we had discovered it by accident. Well, worse. We discovered it by inverting a hypothesis that had failed spectacularly. This left us grasping for narratives that might confirm our data bias. We postulated – in-line with the initial hypothesis – that the growth-minus-value signal was some sort of sentiment indicator.

In publishing our commentary, we hoped to invite responses from the community as to whether there was prior literature about similar signals or how this signal might be capturing other effects.

**“It’s Just Momentum”**

By and large the most consistent feedback we heard was, “it has to be related to momentum.”

The connection seems apparent, given that we are calculating the prior return spread of growth minus value, much like a momentum signal which relies upon the prior return of the country itself.

The most likely way we hypothesize that this could occur was if growth indices had higher country beta coefficients than their value counterparts. If this were the case, a dollar-neutral growth-minus-value calculation would create a positive beta residual. Prior total return of this spread, then, would be highly correlated to the prior total return of the beta residual, and therefore momentum.

Unfortunately, this was not the case. The relative betas of growth-minus-value indices were not only meaningfully time-varying, but also exhibited both positive and negative signals, indicating that growth-minus-value may, in some cases, be a negative beta signal.

Given the low realized correlations and the lack of evidence of consistent beta effects, we did not feel that this signal was just cloaked momentum.

**Out-of-Sample Tests**

Our first test for robustness was to perform out-of-sample tests. We evaluated two different datasets: U.S. sectors and emerging market countries.

To cut right to the chase: the signal did not appear to work on the sector data. Unfortunately, it was difficult to tell whether it did not work because the signal did not work, or because the construction methodology of the growth and value indices from Russell was meaningfully different from the country data from MSCI we were employing.

While we would hope our signal would be robust to minor definitional tweaks of growth and value, what we observed in sector data was significant disparities in how sectors were decomposed versus countries. For example, below we plot total return, growth, and value data for the Russell U.S. Utility Sector and the MSCI USA indices.

*Source: Morningstar and Russell. Returns are hypothetical. Returns are gross of all fees, transaction costs, and taxes. Returns assume the reinvestment of all distributions. You cannot invest in an index.*

*Source: MSCI. Returns are hypothetical. Returns are gross of all fees, transaction costs, and taxes. Returns assume the reinvestment of all distributions. You cannot invest in an index.*

We can see that while the Utility sector tends to behave more consistently like its value definition, the USA total return index more evenly split its growth and value counter-parts. This was a consistent artifact in the data: sectors tended to skew towards either growth or value, while the country data bisected the two.

We would hypothesize that this is because the Russell methodology likely defines the value and growth indices of a sector as those securities that are in a market-wide value or growth index that fall in that sector, rather than bisecting the sector itself. Country data, on the other hand, represents a more even market-capitalization split, such that growth plus value should approximately equal the total index.

Whether the failure of the GMV signal on sectors was a failure of the signal or an artifact of the data is difficult to determine.

Fortunately, we could look towards another data set: emerging market country data from MSCI. This data would avoid the potential definitional issues of growth and value. Dropping the data in, we calculated the same long/short index returns.

*Source: MSCI. Calculations by Newfound Research. Returns are hypothetical and backtested. Returns are gross of all fees, transaction costs, and taxes. Returns assume the reinvestment of all distributions. You cannot invest in an index.*

Despite the positive total return over the period for, the realized volatility for the Emerging Market strategy was so high, we cannot reject the null hypothesis that the positive return was due to luck alone.

Unfortunately, over this same period, the same can be said for the original Foreign Developed index data we calculated. Which leaves us in a bit of a tough spot. The signal failed to work in two out-of-sample scenarios, but with two highly plausible excuses: (1) a data mis-match and (2) too short a time-horizon for evaluation.

**Deriving the Big Muscle Movements**

Left somewhat high and dry by our out-of-sample tests, we wanted to turn our attention to evaluating the robustness of the original signal we developed.

One of the key questions to ask when evaluating signals such as these is, “where is the return coming from?”

Not only should we ask this from a qualitative perspective, but we should also evaluate the return impact of underlying holdings. We can already see from the time series that the results are not due to a single trade, but it is possible that they are due to a structural overweight to just a few holdings.

While it sounds like a trivial exercise to back out return attribution, it turns out to be rather complicated when you have to take into account the effects of compounding (look up “Frongello linking” if you want to go down that rabbit hole). Fortunately, this is a case where we need directional guidance and not exact precision.

Thus, we elected to just blindly throw some computing horsepower at the problem.

Specifically, we performed the following exercise:

- Of the 16 possible countries, select 10 randomly
- Run the strategy on the subset of 10 countries
- Calculate and store annualized return of the strategy as well as associated country weights over time
- Repeat 10,000 times

We then sorted trials by their annualized returns. Trials falling in the worst 5% of realizations had their corresponding weights averaged together, as did those trials falling in the top 5%.

The purpose of this exercise is to look at the weight differences of trials falling in the best- and worst-case scenarios. The *differences *in weights will tell us how these scenarios differ.

*Source: MSCI. Calculations by Newfound Research. *

We’ll be the first to admit that this graph is not the easiest to interpret. Generally speaking, it says, “weights above the zero line imply larger allocations in top 5% trials, while weights below the zero line imply larger allocations in the bottom 5% of trials.”

Two things are apparent.

First, the relative weight in the top 5% versus the bottom 5% is highly time varying, with countries spending time in both. This implies that success of the strategy was not likely due to a structural over- or under-weight towards certain countries.

Certain countries do appear, however, to maintain large absolute weights, implying that they might have an out-sized impact on performance. We can get a better idea of this by looking at the normalized absolute value of weights over time.

*Source: MSCI. Calculations by Newfound Research. *

We can see that Australia, Austria, Denmark, and Spain appear to maintain consistently high weights relative to other countries. What happens, then, if we simply remove them from the original 16 and re-run our strategy?

The performance entirely disappears. This indicates that the entirety of the result in the original strategy can be attributed to allocation decisions on just four countries: Australia, Austria, Denmark, and Spain.

The fact that just four countries drove the entire performance should certainly give us reason for significant pause. Not least of which is the fact that even if we did believe this signal was robust, implementing it in a long-only manner now becomes almost entirely impossible, as “under-weighting” these countries relative to an ACWI benchmark would have little-to-no impact.

But is it all that unusual? Consider the same analysis for a standard momentum signal.

*Source: MSCI. Calculations by Newfound Research. *

We can see that Australia, Austria, Finland, and the United Kingdom (at least in early years) are stand-out weights.^{1} If we remove them from the eligible universe, what happens to the momentum factor?

We can see that the momentum factor largely maintains its mojo. This indicates that our GMV signal is likely just an artifact of data-mining, while MOM has a higher likelihood of being an anomaly (though, 20 years of sideways performance does give cause for concern; especially when we consider that not only does this analysis not consider cost, but prior to 20 years ago, implementing this sort of strategy would have been highly difficult for most investors.)

**Conclusion**

If you stare at the same data long enough, you’re bound to find spurious patterns. In this case, the pattern we found gave us extra pause as we discovered it by doing the exact opposite of our initial hypothesis. That alone is not enough to dismiss a signal, but we should be aware of trying to fit a narrative to a signal after we discover it, particular when the signal flies in the face of the original narrative we were testing.

In this case, our country growth-minus-value signal proved to be an artifact of data-mining. While out-of-sample testing left us with no solid conclusion, using simulation techniques to back out the driving allocations of performance allowed us to identify that only 25% of the investible universe was responsible for 100% of the strategy returns.

While we would expect *some *degradation in performance when we remove exposures identified as being high impact, the elimination of return tells us that the remaining 75% of the universe had zero meaningful contribution. By comparison, when we perform a similar analysis on a standard momentum signal, we find that removing the three most impactful countries only reduces annualized returns from 7.3% to 6.3%.

Even if we did believe in the efficacy of the signal, deriving the meaningful drivers of performance from an allocation perspective is important because it informs us about implementation. In this case, the large drivers all have relatively small market-capitalizations, which would make it difficult to create meaningful “underweights” in a long-only portfolio. Thus, if we were to pursue this idea further, it would likely have to be in a long/short capacity.

Of course, there is no need to pursue it further. Forcing something to work over a backtest is not likely to end up well on a live basis. Like most ideas we explore, the lack of robustness lands it in our research graveyard.

## 6 Pingbacks