It can be difficult to disentangle the difference between luck and skill by examining performance on its own.
We simulate the returns of investors with different prediction accuracy levels and find that an investor with the skill of a fair coin (i.e. 50%) would likely under-perform a simple buy-and-hold investor, even before costs are considered.
It is not until an investor exhibits accuracy in excess of 60% that a buy-and-hold investor is meaningfully “beaten” over rolling 5-year evaluation periods.
In the short-term, however, a strategy with a known accuracy rate can still masquerade as one far more accurate or far less accurate due to luck.
Further confounding the analysis is the role of skewness of the return distribution. Positively skewed strategies, like trend following, can actually exhibit accuracy rates lower than 50% and still be successful over the long run.
Relying on perceptions of accuracy alone may lead to highly misguided conclusions.
The only thing sure about luck is that it will change. — Bret Harte1
The distinction between luck and skill in investing can be extremely difficult to measure. Seemingly good or bad strategies can be attributable to either luck or skill, and the truth has important implications for the future prospects of the strategy.Source: Grinold and Kahn, Active Portfolio Management. (New York: McGraw-Hill, 1999).
Time is one of the surest ways to weed out lucky strategies, but the amount of time needed to make this decision with a high degree of confidence can be longer than we are willing to wait. Or, sometimes, even longer than the data we have.
For example, in order to be 95% confident that a strategy with a 7% historical return and a volatility of 15% has a true expected return that is greater than a 2% risk-free rate, we would need 27 years of data. While this is possible for equity and bond strategies, we would have a long time to wait in order to be confident in a Bitcoin strategy with these specifications.
Even after passing that test, however, that same strategy could easily return less than the risk-free rate over the next 5 years (the probability is 25%).
Regardless of the skill, would you continue to hold a strategy that underperformed for that long?
In this commentary, we will use a sample U.S. sector strategy that isolates luck and skill to explore the impacts of varying accuracy and how even increased accuracy may only be an idealized goal.
The (In)Accurate Investor
To investigate the historical impact of luck and skill in the arena of U.S. equity investing, we will consider a strategy that invests in the 30 industries from the Kenneth French Data Library.
Each month, the strategy independently evaluates each sector and either holds it or invests the capital at the risk-free rate. The term “evaluates” is used loosely here; the evaluation can be as simple as flipping a (potentially biased) coin.
The allocation allotted to each sector is 1/30th of the portfolio (3.33%). We are purposely not reallocating capital among the sectors chosen so that the sector calls based on the accuracy straightforwardly determine the performance.
To get an idea for the bounds of how well – or poorly – this strategy would have performed over time, we can consider three investors:
The Plain Investor – This investor simply holds all 30 sectors, equally weighted, all the time.
The Perfect Investor – This investor allocates with 100% accuracy. Using a crystal ball to look into the future, if a sector will go up in the subsequent month, this investor will allocate to it. If the sector will go down, this investor will invest the capital in cash.
The Anti-Perfect Investor – This investor not merely imperfect, they are the complete opposite of the Perfect Investor. They make the wrong calls to invest or not without fail. Their accuracy is 0%. They are so reliably bad that if you could short their strategy, you would be the Perfect Investor.
The Perfect and Anti-Perfect investors set the bounds for what performance is possible within this framework, and the Plain Investor denotes the performance of not making any decisions.
The growth of each boundary strategy over the entire time period is a little outrageous.
Annualized Return
Annualized Volatility
Maximum Drawdown
Plain Investor
10.5%
19.3%
83.9%
Perfect Investor
42.6%
11.0%
0.0%
Anti-Perfect Investor
-20.0%
12.1%
100.0%
Source: Kenneth French Data Library. Calculations by Newfound Research. Past performance is not a guarantee of future results. All returns are hypothetical and backtested. Returns are gross of all fees. This does not reflect any investment strategy offered or managed by Newfound Research and was constructed exclusively for the purposes of this commentary. It is not possible to invest in an index.
A more informative illustration is the rolling annualized 5-year return for each strategy.
Source: Kenneth French Data Library. Calculations by Newfound Research. Past performance is not a guarantee of future results. All returns are hypothetical and backtested. Returns are gross of all fees. This does not reflect any investment strategy offered or managed by Newfound Research and was constructed exclusively for the purposes of this commentary. It is not possible to invest in an index.
While the spread between the Perfect and Anti-Perfect investors ebbs and flows, its median value Is 59,000 basis points (“bps”). Between the Perfect and Plain investors, there is still 29,000 bps of annualized outperformance to be had. A natural wish is to make calls that harvest some of this spread.
Accounting for Accuracy
Now we will look at a set of investors who are able to evaluate each sector with some known degree of accuracy.
For each accuracy level between 0% and 100% (i.e. our Anti-Perfect and Perfect investors, respectively), we simulate 1,000 trials and look at how the historical results have played out.
A natural starting point is the investor who merely flips a fair coin for each sector. Their accuracy is 50%.
The chart below shows the rolling 5-year performance range of the simulated trials for the 50% Accurate Investor.
Source: Kenneth French Data Library. Calculations by Newfound Research. Past performance is not a guarantee of future results. All returns are hypothetical and backtested. Returns are gross of all fees. This does not reflect any investment strategy offered or managed by Newfound Research and was constructed exclusively for the purposes of this commentary. It is not possible to invest in an index.
In 59% of the rolling periods, the buy-and-hold Plain Investor beat even the best 50% Accurate Investor. The Plain Investor was only worse than the worst performing coin flip strategy in 6% of rolling periods.
Beating buy-and-hold is hard to do reliably if you rely only on luck.
In this case, having a neutral hit rate with the negative skew of the sector equity returns leads to negative information coefficients. Taking more bets over time and across sectors did not help offset this distributional disadvantage.
So, let’s improve the accuracy slightly to see if the rolling results improve. Even with negative skew (-0.42 median value for the 30 sectors), an improvement in the accuracy to 60% is enough to bring the theoretical information coefficient back into the positive realm.
Source: Kenneth French Data Library. Calculations by Newfound Research. Past performance is not a guarantee of future results. All returns are hypothetical and backtested. Returns are gross of all fees. This does not reflect any investment strategy offered or managed by Newfound Research and was constructed exclusively for the purposes of this commentary. It is not possible to invest in an index.
The worst of these more skilled investors is now beating the Plain Investor in 41% of the rolling periods, and the best is losing to the buy-and-hold investor in 13% of the periods.
Going the other way, to a 40% accurate investor, we find that the best one was beaten by the Plain investor 93% of the time, and the worst one never beats the buy-and-hold investor.
Source: Kenneth French Data Library. Calculations by Newfound Research. Past performance is not a guarantee of future results. All returns are hypothetical and backtested. Returns are gross of all fees. This does not reflect any investment strategy offered or managed by Newfound Research and was constructed exclusively for the purposes of this commentary. It is not possible to invest in an index.
If we only require a modest increase in our accuracy to beat buy-and-hold strategies over shorter time horizons, why isn’t diligently focusing on increasing our accuracy an easy approach to success?
In order to increase our accuracy, we must first find a reliable way to do so: a task easier said than done due to the inherent nature of probability. Something having a 60% probability of being right does not preclude it from being wrong for a long time. The Law of Large Numbers can require larger numbers than our portfolios can stand.
Thus, even if we have found a way that will reliably lead to a 60% accuracy, we may not be able to establish confidence in that accuracy rate. This uncertainty in the accuracy can be unnerving. And it can cut both ways.
A strategy with a hit rate of less than 50% can masquerade as a more accurate strategy simply for lack of sufficient data to sniff out the true probability.
Source: Kenneth French Data Library. Calculations by Newfound Research. Past performance is not a guarantee of future results. All returns are hypothetical and backtested. Returns are gross of all fees. This does not reflect any investment strategy offered or managed by Newfound Research and was constructed exclusively for the purposes of this commentary. It is not possible to invest in an index.
You may think you have an edge when you do not. And if you do not have an edge, repeatedly applying it will lead to worse and worse outcomes.2
Accuracy Schmaccuracy
Our preference is to rely on systematic bets, which generally fall under the umbrella of factor investing. Even slight improvements to the accuracy can lead to better results when applied over a sufficient breadth of investments. Some of these factors also alter the distribution of returns (i.e. the skew) so that accuracy improvements have a larger impact.
Consider two popular measures of trend, used as the signals to determine the allocations in our 30 sector US equity strategy from the previous sections:
12-1 Momentum: We calculate the return over an 11-month period, starting one month ago to account for mean reversionary effects. If this number is positive, we hold the sector; if it is negative, we invest that capital at the risk-free rate.
10-month Simple Moving Average (SMA): We average the prices over the prior 10 months and compare that value to the current price. If the current price is greater than or equal to the average, we hold the sector; if it is less than, we invest that capital at the risk-free rate.
These strategies have volatilities in line with the Perfect and Anti-Perfect Investors and returns similar to the Plain Investor.
Using our measure of accuracy as correctly calling the direction of the sector returns over the subsequent month, it might come as a surprise that the accuracies for the 12-1 Momentum and 10-month SMA signals are only 42% and 41%, respectively.
Even with this low accuracy, the following chart shows that over the entire time period, the returns of these strategies more closely resemble those of the 55% Accurate Investor and have even looked like those of the 70% Accurate Investor over some time periods. What gives?
Source: Kenneth French Data Library. Calculations by Newfound Research. Past performance is not a guarantee of future results. All returns are hypothetical and backtested. Returns are gross of all fees. This does not reflect any investment strategy offered or managed by Newfound Research and was constructed exclusively for the purposes of this commentary. It is not possible to invest in an index.
This is an example of how addressing the negative skew in the underlying asset returns can offset a sacrifice in accuracy. These trend following strategies may have overall accuracy of less than 50%, but they have been historically right when it counts.
Consistently removing large negative returns – at the expense of giving up some large positive returns – is enough to generate a return profile that looks much like a strategy that picks sectors with above average accuracy.
Whether investors can stick with a strategy that exhibits below 50% accuracy, however, is another question entirely.
Conclusion
While most investors expect the proof to be in the eating of the pudding, in this commentary we demonstrate how luck can have a meaningful impact in the determination of whether skill exists. While skill should eventually differentiate itself from luck, the horizon over which it will do so may be far, far longer than most investors suspect.
To explore this idea, we construct portfolios comprised of all thirty industry groups. We then simulate the results of investors with known accuracy rates, comparing their outcomes to 100% Accuracy, 100% Inaccurate, and Buy-and-Hold benchmarks.
Perhaps somewhat counter-intuitively, we find that an investor exhibiting 50% accuracy would have fairly reliably underperformed a Buy-and-Hold Investor. This seems somewhat counter-intuitive until we acknowledge that equity returns have historically exhibit negative skew, with the left tail of their return distribution (“losses”) being longer and fatter than the right (“gains”). Combining a neutral hit rate with negative skew creates negative information coefficients.
To offset this negative skew, we require increased accuracy. Unfortunately, even in the case where an investor exhibits 60% accuracy, there are a significant number of 5-year periods where it might masquerade as a strategy with a much higher or lower hit-rate, inviting false conclusions.
This is all made somewhat more confusing when we consider that a strategy can have an accuracy rate below 50% and still be successful. Trend following strategies are a perfect example of this phenomenon. The positive skew that has been historically exhibited by these strategies means that frequently inaccurate trades of small magnitude are offset by infrequent, by very large accurate trades.
Yet if we measure success by short-term accuracy rates, we will almost certainly dismiss this type of strategy as one with no skill.
When taken together, this evidence suggests that not only might it be difficult for investors to meaningfully determine the difference between skill and luck over seemingly meaningful time horizons (e.g. 5 years), but also that short-term perceptions of accuracy can be woefully misleading for long-term success. Highly accurate strategies can still lead to catastrophe if there is significant negative skew lurking in the shadows (e.g. an ETF like XIV), while inaccurate strategies can be successful with enough positive skew (e.g. trend following).
Long/flat trend-following strategies have historically delivered payout profiles similar to those of call options, with positive payouts for larger positive underlying asset returns and slightly negative payouts for near-zero or negative underlying returns.
However, this functional relationship contains a fair amount of uncertainty for any given trend-following model and lookback period.
In portfolio construction, we tend to favor assets that have a combination of high expected returns or diversifying return profiles.
Since broad investor behavior provides a basis for systematic trend-following models to have positive expected returns, taking a multi-model approach to trend-following can be used to reduce the variance around the expected payout profile.
Introduction
Over the past few months, we have written much about model diversification as a tactic for managing specification risk, even with specific case studies. When we consider the three axes of diversification, model diversification pertains to the “how” axis, which focuses on strategies that have the same overarching objective but go about achieving it in different ways.
Long/flat trend-following, especially with equity investments, aims to protect capital on the downside while maintaining participation in positive markets. This leads to a payout profile that looks similar to that of a call option.1
However, while a call option offers a defined payout based on the price of an underlying asset and a specific maturity date, a trend-following strategy does not provide such a guarantee. There is a degree of uncertainty.
The good news is that uncertainty can potentially be diversified given the right combinations of assets or strategies.
In this commentary, we will dive into a number of trend-following strategies to see what has historically led to this benefit and the extent that diversification would reduce the uncertainty around the expected payoff.
Diversification in Trend-Following
The justification for a multi-model approach boils down to a simple diversification argument.
Say you would like to include trend-following in a portfolio as a way to manage risk (e.g. sequence risk for a retiree). There is academic and empirical evidence that trend-following works over a variety of time horizons, generally ranging from 3 to 12 months. And there are many ways to measure trends, such as moving average crossovers, trailing returns, deviations from moving averages, risk adjusted returns, etc.
The basis for deciding ex-ante which variant will be the best over our own investment horizon is tenuous at best. Backtests can show one iteration outperforming over a given time horizon, but most of the differences between strategies are either noise from a statistical point of view or realized over a longer time period than any investor has the lifespan (or mettle) to endure.
However, we expect each one to generate positive returns over a sufficiently long time horizon. Whether this is one year, three years, five years, 10 years, 50 years… we don’t know. What we do know is that out of the multitude the variations of trend-following, we are very likely to pick one that is not the best or even in the top segment of the pack in the short-term.
From a volatility standpoint, when the strategies are fully invested, they will have volatility equal to the underlying asset. Determining exactly when the diversification benefits will come in to play – that is, when some strategies are invested and others are not – is a fool’s errand.
Modern portfolio theory has done a disservice in making correlation seem like an inherent trait of an investment. It is not.
Looking at multiple trend-following strategies that can coincide precisely for stretches of time before behaving completely differently from each other, makes many portfolio construction techniques useless. We do not expect correlation benefits to always be present. These are nonlinear strategies, and fitting them into a linear world does not make sense.
Source: ReSolve Asset Management. Reprinted with permission
From this simple framework, we can break the different performance regimes down as follows:
The Math Behind the Diversification
The expected value of a trend-following strategy can be thought of as a function of the underlying security return:
Where the subscript i is used to indicate that the function is dependent on the specific trend-following strategy.
If we combine multiple trend-following strategies into a portfolio, then the expectation is the average of these functions (assuming an equal weight portfolio per the ReSolve chart above):
What’s left to determine is the functional form of f.
Continuing in the vein of the call option payoff profile, we can use the Black-Scholes equation as the functional form (with the risk-free rate set to 0). This leaves three parameters with which to fit the formula to the data: the volatility (with the time to expiration term lumped in, i.e. sigma * sqrt(T-t)), the strike, and the initial cost of the option.
where d1 and d2 are defined in the standard fashion and N is the cumulative normal distribution function.
rK is the strike price in the option formula expressed as a percent relative to the current value of the underlying security.
In the following example, we will attempt to provide some meaning to the fitted parameters. However, keep in mind that any mapping is not necessarily one-to-one with the option parameters. The functional form may apply, but the parameters are not ones that were set in stone ex-ante.2
An Example: Trend-Following on the S&P 500
As an example, we will consider a trend-following model on the S&P 500 using monthly time-series momentum with lookback windows ranging from 4 to 16 months. The risk-free rate was used when the trends were negative.
The graph below shows an example of the option price fit to the data using a least-squares regression for the 15-month time series momentum strategy using rolling 3-year returns from 1927 to 2018.
Source: Global Financial Data and Kenneth French Data Library. Calculation by Newfound. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
The volatility parameter was 9.5%, the strike was 2.3%, and the cost was 1.7%.
What do these parameters mean?
As we said before this can be a bit tricky. Painting in broad strokes:
The volatility parameter describes how “elbowed” payoff profile is. Small values are akin to an option close to expiry where the payoff profile changes abruptly around the strike price. Larger values yield a more gentle change in slope.
The strike represents the point at which the payoff profile changes from participation to protection using trend-following lingo. In the example where the strike is 2.3%, this means that the strategy would be expected to start protecting capital when the S&P 500 return is less than 2.3%. There is some cost associated with this value being high.
The cost is the vertical shift of the payoff profile, but it is not good to think of it as the insurance premium of the trend-following strategy. It is only one piece. To see why this is the case, consider that the fitted volatility may be large and that the option price curve may be significantly above the final payout curve (i.e. if the time-scaled volatility went to zero).
So what is the actual “cost” of the strategy?
With trend-following, since whipsaw is generally the largest potential detractor, we will look at the expected return on the strategy when the S&P 500 is flat, that is, an absence of an average trend. It is possible for the cost to be negative, indicating a positive expected trend-following return when the market was flat.
Looking at the actual fit of the data from a statistical perspective, the largest deviations from the expected value (the residuals from the regression) are seen during large positive returns for the S&P 500, mainly coming out of the Great Depression. This characteristic of individual trend-following models is generally attributable to the delay in getting back into the market after a prolonged, severe drawdown due to the time it takes for a new positive trend to be established.
Source: Global Financial Data and Kenneth French Data Library. Calculation by Newfound. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
Part of the seemingly large number of outliers is simply due to the fact that these returns exhibit autocorrelation since the periods are rolling, which means that the data points have some overlap. If we filtered the data down into non-overlapping periods, some of these outliers would be removed.
The outliers that remain are a fact of trend-following strategies. While this fact of trend-following cannot be totally removed, some of the outliers may be managed using multiple lookback periods.
The following chart illustrates the expected values for the trend-following strategies over all the lookback periods.
Source: Global Financial Data and Kenneth French Data Library. Calculation by Newfound. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
The shorter-term lookback windows have the expected value curves that are less horizontal on the left side of the chart (higher volatility parameter).
As we said before the cost of the trend-following strategy can be represented by the strategy’s expected return when the S&P 500 is flat. This can be thought of as the premium for the insurance policy of the trend-following strategies.
Source: Global Financial Data and Kenneth French Data Library. Calculation by Newfound. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
The blend does not have the lowest cost, but this cost is only one part of the picture. The parameters for the expected value functions do nothing to capture the distribution of the data around – either above or below – these curves.
The diversification benefits are best seen in the distribution of the rolling returns around the expected value functions.
Source: Global Financial Data and Kenneth French Data Library. Calculation by Newfound. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
Now with a more comprehensive picture of the potential outcomes, a cost difference of even 3% is less than one standard deviation, making the blended strategy much more robust to whipsaw for the potential range of S&P 500 returns.
As a side note, the cost of the short window (4 and 5 month) strategies is relatively high. However, since there are many rolling periods when these models are the best performing of the group, there can still be a benefit to including them. With them in the blend, we still see a reduction in the dispersion around the expected value function.
Expanding the Multitude of Models
To take the example even further down the multi-model path, we can look at the same analysis for varying lookback windows for a price-minus-moving-average model and an exponentially weighted moving average model.
Source: Global Financial Data and Kenneth French Data Library. Calculation by Newfound. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
Source: Global Financial Data and Kenneth French Data Library. Calculation by Newfound. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
And finally, we can combine all three trend-following measurement style blends into a final composite blend.
Source: Global Financial Data and Kenneth French Data Library. Calculation by Newfound. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
As with nearly every study on diversification, the overall blend is not the best by all metrics. In this case, its cost is higher than the EWMA blended model and its dispersion is higher than the TS blended model. But it exhibits the type of middle-of-the-road characteristics that lead to results that are robust to an uncertain future.
Conclusion
Long/flat trend-following strategies have payoff profiles similar to call options, with larger upsides and limited downsides. Unlike call options (and all derivative securities) that pay a deterministic amount based on the underlying securities prices, the payoff of a trend-following strategy is uncertain,
Using historical data, we can calculate the expected payoff profile and the dispersion around it. We find that by blending a variety of trend-following models, both in how they measure trend and the length of the lookback window, we can often reduce the implied cost of the call option and the dispersion of outcomes.
A backtest of an individual trend-following model can look the best over a given time period, but there are many factors that play into whether that performance will be valid going forward. The assets have to behave similarly, potentially both on an absolute and relative basis, and an investor has to hold the investment for a long enough time to weather short-term underperformance.
A multi-model approach can address both of these.
It will reduce the model specification risk that is present ex-ante. It will not pick the best model, but then again, it will not pick the worst.
From an investor perspective, this diversification reduces the spread of outcomes which can lead to an easier product to hold as a long-term investment. Diversification among the models may not always be present (i.e. when style risk dominates and all trend-following strategies do poorly), but when it is, it reduces the chance of taking on uncompensated risks.
Taking on compensated risks is a necessary part of investing, and in the case of trend-following, the style risk is something we desire. Removing as many uncompensated risks as possible leads to more pure forms of this style risk and strategies that are robust to unfavorable specifications.
Recent market volatility has caused many tactical models to make sudden and significant changes in their allocation profiles.
Periods such as Q4 2018 highlight model specification risk: the sensitivity of a strategy’s performance to specific implementation decisions.
We explore this idea with a case study, using the popular Dual Momentum GEM strategy and a variety of lookback horizons for portfolio formation.
We demonstrate that the year-to-year performance difference can span hundreds, if not thousands, of basis points between the implementations.
By simply diversifying across multiple implementations, we can dramatically reduce model specification risk and even potentially see improvements in realized metrics such as Sharpe ratio and maximum drawdown.
Introduction
Among do-it-yourself tactical investors, Gary Antonacci’s Dual Momentum is the strategy we tend to see implemented the most. The Dual Momentum approach is simple: by combining both relative momentum and absolute momentum (i.e. trend following), Dual Momentum seeks to rotate into areas of relative strength while preserving the flexibility to shift entirely to safety assets (e.g. short-term U.S. Treasury bills) during periods of pervasive, negative trends.
In our experience, the precise implementation of Dual Momentum tends to vary (with various bells-and-whistles applied) from practitioner to practitioner. The most popular benchmark model, however, is the Global Equities Momentum (“GEM”), with some variation of Dual Momentum Sector Rotation (“DMSR”) a close second.
Recently, we’ve spoken to several members in our extended community who have bemoaned the fact that Dual Momentum kept them mostly aggressively positioned in Q4 2018 and signaled a defensive shift at the beginning of January 2019, at which point the S&P 500 was already in a -14% drawdown (having peaked at over -19% on December 24th). Several DIYers even decided to override their signal in some capacity, either ignoring it entirely, waiting a few days for “confirmation,” or implementing some sort of “half-and-half” rule where they are taking a partially defensive stance.
Ignoring the fact that a decision to override a systematic model somewhat defeats the whole point of being systematic in the first place, this sort of behavior highlights another very important truth: there is a significant gap of risk that exists between the long-term supporting evidence of an investment style (e.g. momentum and trend) and the precise strategy we attempt to implement with (e.g. Dual Momentum GEM).
At Newfound, we call that gap model specification risk. There is significant evidence supporting both momentum and trend as quantitative styles, but the precise means by which we measure these concepts can lead to dramatically different portfolios and outcomes. When a portfolio’s returns are highly sensitive to its specification – i.e. slight variation in returns or model parameters lead to dramatically different return profiles – we label the strategy as fragile.
In this brief commentary, we will use the Global Equities Momentum (“GEM”) strategy as a case study in fragility.
Global Equities Momentum (“GEM”)
To implement the GEM strategy, an investor merely needs to follow the decision tree below at the end of each month.
From a practitioner stand-point, there are several attractive features about this model. First, it is based upon the long-run evidence of both trend-following and momentum. Second, it is very easy to model and generate signals for. Finally, it is fairly light-weight from an implementation perspective: only twelve potential rebalances a year (and often much less), with the portfolio only holding one ETF at a time.
Despite the evidence that “simple beats complex,” the simplicity of GEM belies its inherent fragility. Below we plot the equity curves for GEM implementations that employ different lookback horizons for measuring trend and momentum, ranging from 6- to 12-months.
Source: CSI Analytics. Calculations by Newfound Research. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees except for underlying ETF expense ratios. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
We can see a significant dispersion in potential terminal wealth. That dispersion, however, is not necessarily consistent with the notion that one formation period is inherently better than another. While we would argue, ex-ante, that there should be little performance difference between a 9-month and 10-month lookback – they both, after all, capture the notion of “intermediate-term trends” – the former returned just 43.1% over the period while the latter returned 146.1%.
These total return figures further hide the year-to-year disparity that exists. The 9-month model, for example, was not a consistent loser. Below we plot these results, highlighting both the best (blue) and worst (orange) performing specifications. We see that the yearly spread between these strategies can be hundreds-to-thousands of basis points; consider that in 2010, the strategy formed using a 10-month lookback returned 12.2% while the strategy formed using a 9-month lookback returned -9.31%.
Same thesis. Same strategy. Slightly different specification. Dramatically different outcomes. That single year is likely the difference between hired and fired for most advisors and asset managers.
Source: CSI Analytics. Calculations by Newfound Research. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees except for underlying ETF expense ratios. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
For those bemoaning their 2018 return, note that the 10-month specification would have netted a positive result! That specification turned defensive at the end of October.
Now, some may cry “foul” here. The evidence for trend and momentum is, after all, centuries in length and the efficacy of all these horizons is supported. Surely the noise we see over this ten-year period would average out over the long run, right?
The unfortunate reality is that these performance differences are not expected to mean-revert. The gambler’s fallacy would have us believe that bad luck in one year should be offset by good luck in another and vice versa. Unfortunately, this is not the case. While we would expect, at any given point in time, that each strategy has equal likelihood of experiencing good or bad luck going forward, that luck is expected to occur completely independently from what has happened in the past.
The implication is that performance differences due to model specification are not expected to mean-revert and are therefore expected to be random, but very permanent, return artifacts.1
The larger problem at hand is that none of us have a hundred years to invest. In reality, most investors have a few decades. And we act with the temperament of having just a few years. Therefore, bad luck can have very permanent and very scarring effects not only upon our psyche, but upon our realized wealth.
But consider what happens if we try to neutralize the role of model specification risk and luck by diversifying across the seven different models equally (rebalanced annually). We see that returns closer in line with the median result, a boost to realized Sharpe ratio, and a reduction in the maximum realized drawdown.
Source: CSI Analytics. Calculations by Newfound Research. Returns are backtested and hypothetical. Returns assume the reinvestment of all distributions. Returns are gross of all fees except for underlying ETF expense ratios. None of the strategies shown reflect any portfolio managed by Newfound Research and were constructed solely for demonstration purposes within this commentary. You cannot invest in an index.
These are impressive results given that all we employed was naïve diversification.
Conclusion
The odd thing about strategy diversification is that it guarantees we will be wrong. Each and every year, we will, by definition, allocate at least part of our capital to the worst performing strategy. The potential edge, however, is in being vaguely wrong rather than precisely wrong. The former is annoying. The latter can be catastrophic.
In this commentary we use the popular Dual Momentum GEM strategy as a case study to demonstrate how model specification choices can lead to performance differences that span hundreds, if not thousands, of basis points a year. Unfortunately, we should not expect these performance differences to mean revert. The realizations of good and bad luck are permanent, and potentially very significant, artifacts within our track records.
By simply diversifying across the different models, however, we can dramatically reduce specification risk and thereby reduce strategy fragility.
To be clear, no amount of diversification will protect you from the risk of the style. As we like to say, “risk cannot be destroyed, only transformed.” In that vein, trend following strategies will always incur some sort of whipsaw risk. The question is whether it is whipsaw related to the style as a whole or to the specific implementation.
For example, in the graphs above we can see that Dual Momentum GEM implemented with a 10-month formation period experienced whipsaw in 2011 when few of the other implementations did. This is more specification whipsaw than style whipsaw. On the other hand, we can see that almost all the specifications exhibited whipsaw in late 2015 and early 2016, an indication of style whipsaw, not specification whipsaw.
Specification risk we can attempt to control for; style risk is just something we have to bear.
At Newfound, evidence such as this informs our own trend-following mandates. We seek to diversify ourselves across the axes of what (“what are we investing in?”), how (“how are we making the decisions?”), and when (“when are we making those decisions?”) in an effort to reduce specification risk and provide the greatest style consistency possible.
Research suggests that simple heuristics are often far more robust than more complicated, theoretically optimal solutions.
Taken too far, we believe simplicity can actually introduce significant fragility into an investment process.
Using trend equity as an example, we demonstrate how using only a single signal to drive portfolio allocations can make a portfolio highly sensitive to the impact of randomness, clouding our ability to determine the difference between skill and luck.
We demonstrate that a slightly more complicated process that combines signals significantly reduces the portfolio’s sensitivity to randomness.
We believe that the optimal level of simplicity is found at the balance of diversification benefit and introduced estimation risk. When a more complicated process can introduce meaningful diversification gain into a strategy or portfolio with little estimation risk, it should be considered.
Introduction
In the world of finance, simple can be surprisingly robust. DeMiguel, Garlappi, and Uppal (2005)1, for example, demonstrate that a naïve, equal-weight portfolio frequently delivers higher Sharpe ratios, higher certainty-equivalent returns, and lower turnover out-of-sample than competitive “optimal” allocation policies. In one of our favorite papers, Haldane (2012)2demonstrates that simplified heuristics often outperform more complicated algorithms in a variety of fields.
Yet taken to an extreme, we believe that simplicity can have the opposite effect, introducing extreme fragility into an investment strategy.
As an absurd example, consider a highly simplified portfolio that is 100% allocated to U.S. equities. Introducing bonds into the portfolio may not seem like a large mental leap but consider that this small change introduces an axis of decision making that brings with it a number of considerations. The proportion we allocate between stocks and bonds requires, at the very least, estimates of an investor’s objectives, risk tolerances, market outlook, and confidence levels in these considerations.3
Despite this added complexity, few investors would consider an all-equity portfolio to be more “robust” by almost any reasonable definition of robustness.
Yet this is precisely the type of behavior we see all too often in tactical portfolios – and particularly in trend equity strategies – where investors follow a single signal to make dramatic allocation decisions.
So Close and Yet So Far
To demonstrate the potential fragility of simplicity, we will examine several trend-following signals applied to broad U.S. equities:
Price minus the 10-month moving average
12-1 month total return
13-minus-34-week exponential moving average cross-over
Below we plot over time the distance each of these signals is from turning off. Whenever the line crosses over the 0% threshold, it means the signal has flipped direction, with negative values indicating a sell and positive values indicating a buy.
In orange we highlight those periods where the signal is within 1% of changing direction. We can see that for each signal there are numerous occasions where the signal was within this threshold but avoided flipping direction. Similarly, we can see a number of scenarios where the signal just breaks the 0% threshold only to revert back shortly thereafter. In the former case, the signal has often just managed to avoid whipsaw, while in the latter it has usually become unfortunately subject to it.
Source: Kenneth French Data Library. Calculations by Newfound Research.
Is the avoidance of whipsaw representative of the “skill” of the signals while the realization of whipsaw is just bad luck? Or might it be that the avoidance of whipsaw is often just as much luck as the realization of whipsaw is poor skill? How can we determine what is skill and what is luck when there are so many “close calls” and “just hits”?
What is potentially confusing for investors new to this space is that academic literature and practitioner evidence finds that these highly simplified approaches are surprisingly robust across a variety of investment vehicles, geographies, and time periods. What we must stress, however, is that evidence of general robustness is not evidence of specific robustness; i.e. there is little evidence suggesting that a single approach applied to a single instrument over a specific time horizon will be particularly robust.
What Randomness Tells Us About Fragility
To emphasize the potential fragility on utilizing a single in-or-out signal to drive our allocation decisions, we run a simple test:
Begin with daily market returns
Add a small amount of white noise (mean 0%; standard deviation 0.025%) to daily market returns
Calculate a long/flat trend equity strategy using 12-1 month momentum signals4
Calculate the rolling 12-month return of the strategy minus the alternate market history return.
Repeat 1,000 times to generate 1,000 slightly alternate histories.
The design of this test aims to deduce how fragile a strategy is via the introduction of randomness. By measuring 12-month rolling relative returns versus the modified benchmarks, we can compare the 1,000 slightly alternate histories to one another in an effort to determine the overall stability of the strategy itself.
Now bear with us, because while the next graph is a bit difficult to read, it succinctly captures the thrust of our entire thesis. At each point in time, we first calculate the average 12-month relative return of all 1,000 strategies. This average provides a baseline of expected relative strategy performance.
Next, we calculate the maximum and minimum relative 12-month relative performance and subtract the average. This spread – which is plotted in the graph below – aims to capture the potential return differential around the expected strategy performance due to randomness. Or, put another way, the spread captures the potential impact of luck in strategy results due only to slight changes in market returns.
Source: Kenneth French Data Library. Calculations by Newfound Research.
We can see that the spread frequently exceeds 5% and sometimes even exceeds 10. Thus, a tiny bit of injected randomness has a massive effect upon our realized results. Using a single signal to drive our allocation appears particularly fragile and success or failure over the short run can largely be dictated by the direction the random winds blow.
A backtest based upon a single signal may look particularly good, but this evidence suggests we should dampen our confidence as the strategy may actually have just been the accidental beneficiary of good fortune. In this situation, it is nearly impossible to identify skill from luck when in a slightly alternate universe we may have had substantially different results. After all, good luck in the past can easily turn into misfortune in the future.
Now let us perform the same exercise again using the same random sequences we generated. But rather than using a single signal to drive our allocation we will blend the three trend-following approaches above to determine the proportional amount of equities the portfolio should hold.5 We plot the results below using the same scale in the y-axis as the prior plot.
Source: Kenneth French Data Library. Calculations by Newfound Research.
We can see that our more complicated approach actually exhibits a significant reduction in the effects of randomness, with outlier events significantly decreased and far more symmetry in both positive and negative impacts.
Below we plot the actual spreads themselves. We can see that the spread from the combined signal approach is lower than the single signal approach on a fairly consistent basis. In the cases where the spread is larger, it is usually because the sensitivity is arising from either the 10-month SMA or 13-minus-34-week EWMA signals. Were spreads for single signal strategies based upon those approaches plotted, they would likely be larger during those time periods.
Source: Kenneth French Data Library. Calculations by Newfound Research.
Conclusion
So, where is the balance? How can we tell when simplicity creates robustness and simplicity introduces fragility? As we discussed in our article A Case Against Overweighting International Equity, we believe the answer is diversificationversus estimation risk.
In our case above, each trend signal is just a model: an estimate of what the underlying trend is. As with all models, it is imprecise and our confidence level in any individual signal at any point in time being correct may actually be fairly low. We can wrap this all together by simply saying that each signal is actually shrouded in a distribution of estimation risk. But by combining multiple trend signals, we exploit the benefits of diversification in an effort to reduce our overall estimation risk.
Thus, while we may consider a multi-model approach less transparent and more complicated, that added layer of complication serves to increase internal diversification and reduce estimation risk.
It should not go overlooked that the manner in which the signals were blended represents a model with its own estimation risk. Our choice to simply equally-weight the signals indicates a zero-confidence position in views about relative model accuracy and relative marginal diversification benefits among the models. Had we chosen a more complicated method of combining signals, it is entirely possible that the realized estimation risk could overwhelm the diversification gain we aimed to benefit from in the first place. Or, conversely, that very same added estimation risk could be entirely justified if we could continue to meaningfully improve diversification benefits.
If we return back to our original example of a 100% equity portfolio versus a blended stock-bond mix, the diversification versus estimation risk trade-off becomes obvious. Introducing bonds into our portfolio creates such a significant diversification gain that the estimation risk is often an insignificant consideration. The same might not be true, however, in a tactical equity portfolio.
Research and empirical evidence suggest that simplicity is surprisingly robust. But we should be skeptical of simplicity for the sake of simplicity when it foregoes low-hanging diversification opportunities, lest we make our portfolios and strategies unintentionally fragile.
Trade optimization is more technical topic than we usually cover in our published research. Therefore, this note will relies heavily on mathematical notation and assumes readers have a basic understanding of optimization. Accompanying the commentary is code written in Python, meant to provide concrete examples of how these ideas can be implemented. The Python code leverages the PuLP optimization library.
Readers not proficient in these areas may still benefit from reading the Introduction and evaluating the example outlined in Section 5.
Summary
In practice, portfolio managers must account for the real-world implementation costs – both explicit (e.g. commission) and implicit (e.g. bid/ask spread and impact) associated with trading portfolios.
Managers often implement trade paring constraints that may limit the number of allowed securities, the number of executed trades, the size of a trade, or the size of holdings. These constraints can turn a well-formed convex optimization into a discrete problem.
In this note, we explore how to formulate trade optimization as a Mixed-Integer Linear Programming (“MILP”) problem and implement an example in Python.
0. Initialize Python Libraries
import pandas
import numpy
from pulp import *
import scipy.optimize
1. Introduction
In the context of portfolio construction, trade optimization is the process of managing the transactions necessary to move from one set of portfolio weights to another. These optimizations can play an important role both in the cases of rebalancing as well as in the case of a cash infusion or withdrawal. The reason for controlling these trades is to try to minimize the explicit (e.g. commission) and implicit (e.g. bid/ask spread and impact) costs associated with trading.
Two approaches are often taken to trade optimization:
Trading costs and constraints are explicitly considered within portfolio construction. For example, a portfolio optimization that seeks to maximize exposure to some alpha source may incorporate explicit measures of transaction costs or constrain the number of trades that are allowed to occur at any given rebalance.
Portfolio construction and trade optimization occur in a two step process. For example, a portfolio optimization may take place that creates the “ideal” portfolio, ignoring consideration of trading constraints and costs. Trade optimization would then occur as a second step, seeking to identify the trades that would move the current portfolio “as close as possible” to the target portfolio while minimizing costs or respecting trade constraints.
These two approaches will not necessarily arrive at the same result. At Newfound, we prefer the latter approach, as we believe it creates more transparency in portfolio construction. Combining trade optimization within portfolio optimization can also lead to complicated constraints, leading to infeasible optimizations. Furthermore, the separation of portfolio optimization and trade optimization allows us to target the same model portfolio across all strategy implementations, but vary when and how different portfolios trade depending upon account size and costs.
For example, a highly tactical strategy implemented as a pooled vehicle with a large asset base and penny-per-share commissions can likely afford to execute a much higher number of trades than an investor trying to implement the same strategy with $250,000 and $7.99 ticket charges. While implicit and explicit trading costs will create a fixed drag upon strategy returns, failing to implement each trade as dictated by a hypothetical model will create tracking error.
Ultimately, the goal is to minimize the fixed costs while staying within an acceptable distance (e.g. turnover distance or tracking error) of our target portfolio. Often, this goal is expressed by a portfolio manager with a number of semi-ad-hoc constraints or optimization targets. For example:
Asset Paring. A constraint that specifies the minimum or maximum number of securities that can be held by the portfolio.
Trade Paring. A constraint that specifies the minimum or maximum number of trades that may be executed.
Level Paring. A constraint that establishes a minimum level threshold for securities (e.g. securities must be at least 1% of the portfolio) or trades (e.g. all trades must be larger than 0.5%).
Unfortunately, these constraints often turn the portfolio optimization problem from continuous to discrete, which makes the process of optimization more difficult.
2. The Discreteness Problem
Consider the following simplified scenario. Given our current, drifted portfolio weights w_{old} and a new set of target model weights w_{target}, we want to minimize the number of trades we need to execute to bring our portfolio within some acceptable turnover threshold level, \theta. We can define this as the optimization problem:
Unfortunately, as we will see below, simply trying to throw this problem into an off-the-shelf convex optimizer, as is, will lead to some potentially odd results. And we have not even introduced any complex paring constraints!
2.1 Example Data
# setup some sample data
tickers = "amj bkln bwx cwb emlc hyg idv lqd \
pbp pcy pff rem shy tlt vnq vnqi vym".split()
w_target = pandas.Series([float(x) for x in "0.04095391 0.206519656 0 \
0.061190655 0.049414401 0.105442705 0.038080766 \
0.07004622 0.045115708 0.08508047 0.115974239 \
0.076953702 0 0.005797291 0.008955226 0.050530852 \
0.0399442".split()], index = tickers)
w_old = pandas.Series([float(x) for x in \
"0.058788745 0.25 0 0.098132817 \
0 0.134293993 0.06144967 0.102295438 \
0.074200473 0 0 0.118318536 0 0 \
0.04774768 0 0.054772649".split()], \
index = tickers)
n = len(tickers)
w_diff = w_target - w_old
2.2 Applying a Naive Convex Optimizer
The example below demonstrates the numerical issues associated with attempting to solve discrete problems with traditional convex optimizers. Using the portfolio and target weights established above, we run a naive optimization that seeks to minimize the number of trades necessary to bring our holdings within a 5% turnover threshold from the target weights.
# Try a naive optimization with SLSQP
theta = 0.05
theta_hat = theta + w_diff.abs().sum() / 2.
def _fmin(t):
return numpy.sum(numpy.abs(t) > 1e-8)
def _distance_constraint(t):
return theta_hat - numpy.sum(numpy.abs(t)) / 2.
def _sums_to_zero(t):
return numpy.sum(numpy.square(t))
t0 = w_diff.copy()
bounds = [(-w_old[i], 1) for i in range(0, n)]
result = scipy.optimize.fmin_slsqp(_fmin, t0, bounds = bounds, \
eqcons = [_sums_to_zero], \
ieqcons = [_distance_constraint], \
disp = -1)
result = pandas.Series(result, index = tickers)
Note that the trades we received are simply w_{target} - w_{old}, which was our initial guess for the optimization. The optimizer didn’t optimize.
What’s going on? Well, many off-the-shelf optimizers – such as the Sequential Least Squares Programming (SLSQP) approach applied here – will attempt to solve this problem by first estimating the gradient of the problem to decide which direction to move in search of the optimal solution. To achieve this numerically, small perturbations are made to the input vector and their influence on the resulting output is calculated.
In this case, small changes are unlikely to create an influence in the problem we are trying to minimize. Whether the trade is 5% or 5.0001% will have no influence on the *number* of trades executed. So the first derivative will appear to be zero and the optimizer will exit.
Fortunately, with a bit of elbow grease, we can turn this problem into a mixed integer linear programming problem (“MILP”), which have their own set of efficient optimization tools (in this article, we will use the PuLP library for the Python programming language). A MILP is a category of optimization problems that take the standard form:
Here b is a vector and A and G are matrices. Don’t worry too much about the form.
The important takeaway is that we need: (1) to express our minimization problem as a linear function and (2) express our constraints as a set of linear inequalities.
But first, for us to take advantage of linear programming tools, we need to eliminate our absolute values and indicator functions and somehow transform them into linear constraints.
The combination of constraints makes it such that \psi_i \ge |x_i|. When x_i is positive, \psi_i is constrained by the first constraint and when x_i is negative, it is constrained by the latter. Since the optimization seeks to minimize the sum of each \psi_i, and we know \psi_i will be positive, the optimizer will reduce \psi_i to equal |x_i|, which is it’s minimum possible value.
Below is an example of this trick in action. Our goal is to minimize the absolute value of some variables x_i. We apply bounds on each x_i to allow the problem to converge on a solution.
lp_problem = LpProblem("Absolute Values", LpMinimize)
x_vars = []
psi_vars = []
bounds = [[1, 7], [-10, 0], [-9, -1], [-1, 5], [6, 9]]
print "Bounds for x: "
print pandas.DataFrame(bounds, columns = ["Left", "Right"])
for i in range(5):
x_i = LpVariable("x_" + str(i), None, None)
x_vars.append(x_i)
psi_i = LpVariable("psi_i" + str(i), None, None)
psi_vars.append(psi_i)
lp_problem += lpSum(psi_vars), "Objective"
for i in range(5):
lp_problem += psi_vars[i] >= -x_vars[i]
lp_problem += psi_vars[i] >= x_vars[i]
lp_problem += x_vars[i] >= bounds[i][0]
lp_problem += x_vars[i] <= bounds[i][1]
lp_problem.solve()
print "\nx variables"
print pandas.Series([x_i.value() for x_i in x_vars])
print "\npsi Variables (|x|):"
print pandas.Series([psi_i.value() for psi_i in psi_vars])
Note that the last three constraints, when taken together, tell us that y_i \in \{0, 1\}. The new variable A should be a large constant, bigger than any value of x_i. Let’s assume A = max(x) + 1.
Let’s first consider what happens when x_i \le 0. In such a case, y_i can be set to zero without violating any constraints. When x_i is positive, however, for x_i \le A*y_i to be true, it must be the case that y_i = 1.
What prevents y_i from equalling 1 in the case where x_i \le 0 is the goal of minimizing the sum of y_i, which will force y_i to be 0 whenever possible.
Below is a sample problem demonstrating this trick, similar to the example described in the prior section.
lp_problem = LpProblem("Indicator Function", LpMinimize)
x_vars = []
y_vars = []
bounds = [[-4, 1], [-3, 5], [-6, 1], [1, 7], [-5, 9]]
A = 11
print "Bounds for x: "
print pandas.DataFrame(bounds, columns = ["Left", "Right"])
for i in range(5):
x_i = LpVariable("x_" + str(i), None, None)
x_vars.append(x_i)
y_i = LpVariable("ind_" + str(i), 0, 1, LpInteger)
y_vars.append(y_i)
lp_problem += lpSum(y_vars), "Objective"
for i in range(5):
lp_problem += x_vars[i] >= bounds[i][0]
lp_problem += x_vars[i] <= bounds[i][1]
lp_problem += x_vars[i] <= A * y_vars[i]
lp_problem.solve()
print "\nx variables"
print pandas.Series([x_i.value() for x_i in x_vars])
print "\ny Variables (Indicator):"
print pandas.Series([y_i.value() for y_i in y_vars])
Bounds for x:
Left Right
0 -4 1
1 -3 5
2 -6 1
3 1 7
4 -5 9
x variables
0 -4.0
1 -3.0
2 -6.0
3 1.0
4 -5.0
dtype: float64
y Variables (Indicator):
0 0.0
1 0.0
2 0.0
3 1.0
4 0.0
dtype: float64
3.3 Tying the Tricks Together
A problem arises when we try to tie these two tricks together, as both tricks rely upon the minimization function itself. The \psi_i are dragged to the absolute value of x_i because we minimize over them. Similarly, y_i is dragged to zero when the indicator should be off because we are minimizing over it.
What happens, however, if we want to solve a problem of the form:
While there are a large number of constraints present, in reality there are just a few key steps going on. First, our key variable in question is t_i. We then use our absolute value trick to create \psi_i = |t_i|. Next, we use the indicator function trick to create y_i, which tells us whether each position is traded or not. Ultimately, this is the variable we are trying to minimize.
Next, we have to deal with our turnover constraint. Again, we invoke the absolute value trick to create \phi_i, and replace our turnover constraint as a sum of \phi‘s.
Et voila?
As it turns out, not quite.
Consider a simple two-asset portfolio. The current weights are [0.25, 0.75] and we want to get these weights within 0.05 of [0.5, 0.5] (using the L^1 norm – i.e. the sum of absolute values – as our definition of “distance”).
Let’s consider the solution [0.475, 0.525]. At this point, \phi = [0.025, 0.025] and \psi = [0.225, 0.225]. Is this solution “better” than [0.5, 0.5]? At [0.5, 0.5], \phi = [0.0, 0.0] and \psi = [0.25, 0.25]. From the optimizer’s viewpoint, these are equivalent solutions. Within this region, there are an infinite number of possible solutions.
Yet if we are willing to let some of our tricks “fail,” we can find a solution. If we want to get as close as possible, we effectively want to minimize the sum of \psi‘s. The infinite solutions problem arises when we simultaneously try to minimize the sum of \psi‘s and \phi‘s, which offset each other.
Do we actually need the values of \psi to be correct? As it turns out: no. All we really need is that \psi_i is positive when t_i is non-zero, which will then force y_i to be 1. By minimizing on y_i, \psi_i will still be forced to 0 when t_i = 0.
So if we simply remove \psi_i from the minimization, we will end up reducing the number of trades as far as possible and then reducing the distance to the target model as much as possible given that trade level.
As a side note, because the sum of \phi‘s will at most equal 2 and the sum of y‘s can equal the number of assets in the portfolio, the optimizer will get more minimization bang for its buck by focusing on reducing the number of trades first before reducing the distance to the target model. This priority can be adjusted by multiplying \phi_i by a sufficiently large scaler in our objective.
theta = 0.05
trading_model = LpProblem("Trade Minimization Problem", LpMinimize)
t_vars = []
psi_vars = []
phi_vars = []
y_vars = []
A = 2
for i in range(n):
t = LpVariable("t_" + str(i), -w_old[i], 1 - w_old[i])
t_vars.append(t)
psi = LpVariable("psi_" + str(i), None, None)
psi_vars.append(psi)
phi = LpVariable("phi_" + str(i), None, None)
phi_vars.append(phi)
y = LpVariable("y_" + str(i), 0, 1, LpInteger) #set y in {0, 1}
y_vars.append(y)
# add our objective to minimize y, which is the number of trades
trading_model += lpSum(phi_vars) + lpSum(y_vars), "Objective"
for i in range(n):
trading_model += psi_vars[i] >= -t_vars[i]
trading_model += psi_vars[i] >= t_vars[i]
trading_model += psi_vars[i] <= A * y_vars[i]
for i in range(n):
trading_model += phi_vars[i] >= -(w_diff[i] - t_vars[i])
trading_model += phi_vars[i] >= (w_diff[i] - t_vars[i])
# Make sure our trades sum to zero
trading_model += (lpSum(t_vars) == 0)
# Set our trade bounds
trading_model += (lpSum(phi_vars) / 2. <= theta)
trading_model.solve()
results = pandas.Series([t_i.value() for t_i in t_vars], index = tickers)
print "Number of trades: " + str(sum([y_i.value() for y_i in y_vars]))
print "Turnover distance: " + str((w_target - (w_old + results)).abs().sum() / 2.)
Number of trades: 12.0
Turnover distance: 0.032663284500000014
5. A Sector Rotation Example
As an example of applying trade paring, we construct a sample sector rotation strategy. The investment universe consists of nine sector ETFs (XLB, XLE, XLF, XLI, XLK, XLU, XLV and XLY). The sectors are ranked by their 12-1 month total returns and the portfolio holds the four top-ranking ETFs in equal weight. To reduce timing luck, we apply a four-week tranching process.
We construct three versions of the strategy.
Naive: A version which rebalances back to hypothetical model weights on a weekly basis.
Filtered: A version that rebalances back to hypothetical model weights when drifted portfolio weights exceed a 5% turnover distance from target weights.
Trade Pared: A version that applies trade paring to rebalance back to within a 1% turnover distance from target weights when drifted weights exceed a 5% turnover distance from target weights.
The equity curves and per-year trade counts are plotted for each version below. Note that the equity curves do not account for any implicit or explicit trading costs.
Data Source: CSI. Calculations by Newfound Research. Past performance does not guarantee future results. All returns are hypothetical index returns. You cannot invest directly in an index and unmanaged index returns do not reflect any fees, expenses, sales charges, or trading expenses. Index returns include the reinvestment of dividends. No index is meant to measure any strategy that is or ever has been managed by Newfound Research. The indices were constructed by Newfound in August 2018 for purposes of this analysis and are therefore entirely backtested and not investment strategies that are currently managed and offered by Newfound.
For the reporting period covering full years (2001 – 2017), the trade filtering process alone reduced the average number of annual trades by 40.6% (from 255.7 to 151.7). The added trade paring process reduced the number of trades another 50.9% (from 151.7 to 74.5), for a total reduction of 70.9%.
6. Possible Extensions & Limitations
There are a number of extensions that can be made to this model, including:
Accounting for trading costs. Instead of minimizing the number of trades, we could minimize the total cost of trading by multiplying each trade against an estimate of cost (including bid/ask spread, commission, and impact).
Forcing accuracy. There may be positions for which more greater drift can be permitted and others where drift is less desired. This can be achieved by adding specific constraints to our \phi_i variables.
Unfortunately, there are also a number of limitations. The first set is due to the fact we are formulating our optimization as a linear program. This means that quadratic constraints or objectives, such as tracking error constraints, are forbidden. The second set is due to the complexity of the optimization problem. While the problem may be technically solvable, problems containing a large number of securities and constraints may be time infeasible.
6.1 Non-Linear Constraints
In the former case, we can choose to move to a mixed integer quadratic programming framework. Or, we can also employ multi-step heuristic methods to find feasible, though potentially non-optimal, solutions.
For example, consider the case where we wish our optimized portfolio to fall within a certain tracking error constraint of our target portfolio. Prior to optimization, the marginal contribution to tracking error can be calculated for each asset and the total current tracking error can be calculated. A constraint can then be added such that the current tracking error minus the sum of weighted marginal contributions must be less than the tracking error target. After the optimization is complete, we can determine whether our solution meets the tracking error constraint.
If it does not, we can use our solution as our new w_{old}, re-calculate our tracking error and marginal contribution figures, and re-optimize. This iterative approach approximates a gradient descent approach.
In the example below, we introduce a covariance matrix and seek to target a solution whose tracking error is less than 0.25%.
theta = 0.05
target_te = 0.0025
w_old_prime = w_old.copy()
# calculate the difference from the target portfolio
# and use this difference to estimate tracking error
# and marginal contribution to tracking error ("mcte")
z = (w_old_prime - w_target)
te = numpy.sqrt(z.dot(covariance_matrix).dot(z))
mcte = (z.dot(covariance_matrix)) / te
while True:
w_diff_prime = w_target - w_old_prime
trading_model = LpProblem("Trade Minimization Problem", LpMinimize)
t_vars = []
psi_vars = []
phi_vars = []
y_vars = []
A = 2
for i in range(n):
t = LpVariable("t_" + str(i), -w_old_prime[i], 1 - w_old_prime[i])
t_vars.append(t)
psi = LpVariable("psi_" + str(i), None, None)
psi_vars.append(psi)
phi = LpVariable("phi_" + str(i), None, None)
phi_vars.append(phi)
y = LpVariable("y_" + str(i), 0, 1, LpInteger) #set y in {0, 1}
y_vars.append(y)
# add our objective to minimize y, which is the number of trades
trading_model += lpSum(phi_vars) + lpSum(y_vars), "Objective"
for i in range(n):
trading_model += psi_vars[i] >= -t_vars[i]
trading_model += psi_vars[i] >= t_vars[i]
trading_model += psi_vars[i] <= A * y_vars[i]
for i in range(n):
trading_model += phi_vars[i] >= -(w_diff_prime[i] - t_vars[i])
trading_model += phi_vars[i] >= (w_diff_prime[i] - t_vars[i])
# Make sure our trades sum to zero
trading_model += (lpSum(t_vars) == 0)
# Set tracking error limit
# delta(te) = mcte * delta(z)
# = mcte * ((w_old_prime + t - w_target) -
# (w_old_prime - w_target))
# = mcte * t
# te + delta(te) <= target_te
# ==> delta(te) <= target_te - te
trading_model += (lpSum([mcte.iloc[i] * t_vars[i] for i in range(n)]) \
<= (target_te - te))
# Set our trade bounds
trading_model += (lpSum(phi_vars) / 2. <= theta)
trading_model.solve()
# update our w_old' with the current trades
results = pandas.Series([t_i.value() for t_i in t_vars], index = tickers)
w_old_prime = (w_old_prime + results)
z = (w_old_prime - w_target)
te = numpy.sqrt(z.dot(covariance_matrix).dot(z))
mcte = (z.dot(covariance_matrix)) / te
if te < target_te:
break
print "Tracking error: " + str(te)
# since w_old' is an iterative update,
# the current trades only reflect the updates from
# the prior w_old'. Thus, we need to calculate
# the trades by hand
results = (w_old_prime - w_old)
n_trades = (results.abs() > 1e-8).astype(int).sum()
print "Number of trades: " + str(n_trades)
print "Turnover distance: " + str((w_target - (w_old + results)).abs().sum() / 2.)
Tracking error: 0.0016583319880074485
Number of trades: 13
Turnover distance: 0.01624453350000001
6.2 Time Constraints
For time feasibility, heuristic approaches can be employed in effort to rapidly converge upon a “close enough” solution. For example, Rong and Liu (2011) discuss “build-up” and “pare-down” heuristics.
The basic algorithm of “pare-down” is:
Start with a trade list that includes every security
Solve the optimization problem in its unconstrained format, allowing trades to occur only for securities in the trade list.
If the solution meets the necessary constraints (e.g. maximum number of trades, trade size thresholds, tracking error constraints, etc), terminate the optimization.
Eliminate from the trade list a subset of securities based upon some measure of trade utility (e.g. violation of constraints, contribution to tracking error, etc).
Go to step 2.
The basic algorithm of “build-up” is:
Start with an empty trade list
Add a subset of securities to the trade list based upon some measure of trade utility.
Solve the optimization problem in its unconstrained format, allowing trades to occur only for securities in the trade list.
If the solution meets the necessary constraints (e.g. maximum number of trades, trade size thresholds, tracking error constraints, etc), terminate the optimization.
Go to step 2.
These two heuristics can even be combined in an integrated fashion. For example, a binary search approach can be employed, where the initial trade list list is filled with 50% of the tradable securities. Depending upon success or failure of the resulting optimization, a pare-down or build-up approach can be taken to either prune or expand the trade list.
7. Conclusion
In this research note we have explored the practice of trade optimization, which seeks to implement portfolio changes in as few trade as possible. While a rarely discussed detail of portfolio management, trade optimization has the potential to eliminate unnecessary trading costs – both explicit and implicit – that can be a drag on realized investor performance.
Constraints within the practice of trade optimization typically fall into one of three categories: asset paring, trade paring, and level paring. Asset paring restricts the number of securities the portfolio can hold, trade paring restricts the number of trades that can be made, and level paring restricts the size of positions and trades. Introducing these constraints often turns an optimization into a discrete problem, making it much more difficult to solve for traditional convex optimizations.
With this in mind, we introduced mixed-integer linear programming (“MILP”) and explore a few techniques that can be utilized to transform non-linear functions into a set of linear constraints. We then combined these transformations to develop a simple trade optimization framework that can be solved using MILP optimizers.
To offer numerical support in the discussion, we created a simple momentum-based sector rotation strategy. We found that naive turnover-filtering helped reduce the number of trades executed by 50%, while explicit trade optimization reduced the number of trades by 70%.
Finally, we explored how our simplified framework could be further extended to account for both non-linear functional constraints (e.g. tracking error) and operational constraints (e.g. managing execution time).
The paring constraints introduced by trade optimization often lead to problems that are difficult to solve. However, when we consider that the cost of trading is a very real drag on the results realized by investors, we believe that the solutions are worth pursuing.
How Much Accuracy Is Enough?
By Nathan Faber
On March 4, 2019
In Craftsmanship, Portfolio Construction, Trend, Weekly Commentary
Available as a PDF download here.
Summary
The distinction between luck and skill in investing can be extremely difficult to measure. Seemingly good or bad strategies can be attributable to either luck or skill, and the truth has important implications for the future prospects of the strategy.Source: Grinold and Kahn, Active Portfolio Management. (New York: McGraw-Hill, 1999).
Time is one of the surest ways to weed out lucky strategies, but the amount of time needed to make this decision with a high degree of confidence can be longer than we are willing to wait. Or, sometimes, even longer than the data we have.
For example, in order to be 95% confident that a strategy with a 7% historical return and a volatility of 15% has a true expected return that is greater than a 2% risk-free rate, we would need 27 years of data. While this is possible for equity and bond strategies, we would have a long time to wait in order to be confident in a Bitcoin strategy with these specifications.
Even after passing that test, however, that same strategy could easily return less than the risk-free rate over the next 5 years (the probability is 25%).
Regardless of the skill, would you continue to hold a strategy that underperformed for that long?
In this commentary, we will use a sample U.S. sector strategy that isolates luck and skill to explore the impacts of varying accuracy and how even increased accuracy may only be an idealized goal.
The (In)Accurate Investor
To investigate the historical impact of luck and skill in the arena of U.S. equity investing, we will consider a strategy that invests in the 30 industries from the Kenneth French Data Library.
Each month, the strategy independently evaluates each sector and either holds it or invests the capital at the risk-free rate. The term “evaluates” is used loosely here; the evaluation can be as simple as flipping a (potentially biased) coin.
The allocation allotted to each sector is 1/30th of the portfolio (3.33%). We are purposely not reallocating capital among the sectors chosen so that the sector calls based on the accuracy straightforwardly determine the performance.
To get an idea for the bounds of how well – or poorly – this strategy would have performed over time, we can consider three investors:
The Perfect and Anti-Perfect investors set the bounds for what performance is possible within this framework, and the Plain Investor denotes the performance of not making any decisions.
The growth of each boundary strategy over the entire time period is a little outrageous.
Source: Kenneth French Data Library. Calculations by Newfound Research. Past performance is not a guarantee of future results. All returns are hypothetical and backtested. Returns are gross of all fees. This does not reflect any investment strategy offered or managed by Newfound Research and was constructed exclusively for the purposes of this commentary. It is not possible to invest in an index.
A more informative illustration is the rolling annualized 5-year return for each strategy.
Source: Kenneth French Data Library. Calculations by Newfound Research. Past performance is not a guarantee of future results. All returns are hypothetical and backtested. Returns are gross of all fees. This does not reflect any investment strategy offered or managed by Newfound Research and was constructed exclusively for the purposes of this commentary. It is not possible to invest in an index.
While the spread between the Perfect and Anti-Perfect investors ebbs and flows, its median value Is 59,000 basis points (“bps”). Between the Perfect and Plain investors, there is still 29,000 bps of annualized outperformance to be had. A natural wish is to make calls that harvest some of this spread.
Accounting for Accuracy
Now we will look at a set of investors who are able to evaluate each sector with some known degree of accuracy.
For each accuracy level between 0% and 100% (i.e. our Anti-Perfect and Perfect investors, respectively), we simulate 1,000 trials and look at how the historical results have played out.
A natural starting point is the investor who merely flips a fair coin for each sector. Their accuracy is 50%.
The chart below shows the rolling 5-year performance range of the simulated trials for the 50% Accurate Investor.
Source: Kenneth French Data Library. Calculations by Newfound Research. Past performance is not a guarantee of future results. All returns are hypothetical and backtested. Returns are gross of all fees. This does not reflect any investment strategy offered or managed by Newfound Research and was constructed exclusively for the purposes of this commentary. It is not possible to invest in an index.
In 59% of the rolling periods, the buy-and-hold Plain Investor beat even the best 50% Accurate Investor. The Plain Investor was only worse than the worst performing coin flip strategy in 6% of rolling periods.
Beating buy-and-hold is hard to do reliably if you rely only on luck.
In this case, having a neutral hit rate with the negative skew of the sector equity returns leads to negative information coefficients. Taking more bets over time and across sectors did not help offset this distributional disadvantage.
So, let’s improve the accuracy slightly to see if the rolling results improve. Even with negative skew (-0.42 median value for the 30 sectors), an improvement in the accuracy to 60% is enough to bring the theoretical information coefficient back into the positive realm.
Source: Kenneth French Data Library. Calculations by Newfound Research. Past performance is not a guarantee of future results. All returns are hypothetical and backtested. Returns are gross of all fees. This does not reflect any investment strategy offered or managed by Newfound Research and was constructed exclusively for the purposes of this commentary. It is not possible to invest in an index.
The worst of these more skilled investors is now beating the Plain Investor in 41% of the rolling periods, and the best is losing to the buy-and-hold investor in 13% of the periods.
Going the other way, to a 40% accurate investor, we find that the best one was beaten by the Plain investor 93% of the time, and the worst one never beats the buy-and-hold investor.
Source: Kenneth French Data Library. Calculations by Newfound Research. Past performance is not a guarantee of future results. All returns are hypothetical and backtested. Returns are gross of all fees. This does not reflect any investment strategy offered or managed by Newfound Research and was constructed exclusively for the purposes of this commentary. It is not possible to invest in an index.
If we only require a modest increase in our accuracy to beat buy-and-hold strategies over shorter time horizons, why isn’t diligently focusing on increasing our accuracy an easy approach to success?
In order to increase our accuracy, we must first find a reliable way to do so: a task easier said than done due to the inherent nature of probability. Something having a 60% probability of being right does not preclude it from being wrong for a long time. The Law of Large Numbers can require larger numbers than our portfolios can stand.
Thus, even if we have found a way that will reliably lead to a 60% accuracy, we may not be able to establish confidence in that accuracy rate. This uncertainty in the accuracy can be unnerving. And it can cut both ways.
A strategy with a hit rate of less than 50% can masquerade as a more accurate strategy simply for lack of sufficient data to sniff out the true probability.
Source: Kenneth French Data Library. Calculations by Newfound Research. Past performance is not a guarantee of future results. All returns are hypothetical and backtested. Returns are gross of all fees. This does not reflect any investment strategy offered or managed by Newfound Research and was constructed exclusively for the purposes of this commentary. It is not possible to invest in an index.
You may think you have an edge when you do not. And if you do not have an edge, repeatedly applying it will lead to worse and worse outcomes.2
Accuracy Schmaccuracy
Our preference is to rely on systematic bets, which generally fall under the umbrella of factor investing. Even slight improvements to the accuracy can lead to better results when applied over a sufficient breadth of investments. Some of these factors also alter the distribution of returns (i.e. the skew) so that accuracy improvements have a larger impact.
Consider two popular measures of trend, used as the signals to determine the allocations in our 30 sector US equity strategy from the previous sections:
These strategies have volatilities in line with the Perfect and Anti-Perfect Investors and returns similar to the Plain Investor.
Using our measure of accuracy as correctly calling the direction of the sector returns over the subsequent month, it might come as a surprise that the accuracies for the 12-1 Momentum and 10-month SMA signals are only 42% and 41%, respectively.
Even with this low accuracy, the following chart shows that over the entire time period, the returns of these strategies more closely resemble those of the 55% Accurate Investor and have even looked like those of the 70% Accurate Investor over some time periods. What gives?
Source: Kenneth French Data Library. Calculations by Newfound Research. Past performance is not a guarantee of future results. All returns are hypothetical and backtested. Returns are gross of all fees. This does not reflect any investment strategy offered or managed by Newfound Research and was constructed exclusively for the purposes of this commentary. It is not possible to invest in an index.
This is an example of how addressing the negative skew in the underlying asset returns can offset a sacrifice in accuracy. These trend following strategies may have overall accuracy of less than 50%, but they have been historically right when it counts.
Consistently removing large negative returns – at the expense of giving up some large positive returns – is enough to generate a return profile that looks much like a strategy that picks sectors with above average accuracy.
Whether investors can stick with a strategy that exhibits below 50% accuracy, however, is another question entirely.
Conclusion
While most investors expect the proof to be in the eating of the pudding, in this commentary we demonstrate how luck can have a meaningful impact in the determination of whether skill exists. While skill should eventually differentiate itself from luck, the horizon over which it will do so may be far, far longer than most investors suspect.
To explore this idea, we construct portfolios comprised of all thirty industry groups. We then simulate the results of investors with known accuracy rates, comparing their outcomes to 100% Accuracy, 100% Inaccurate, and Buy-and-Hold benchmarks.
Perhaps somewhat counter-intuitively, we find that an investor exhibiting 50% accuracy would have fairly reliably underperformed a Buy-and-Hold Investor. This seems somewhat counter-intuitive until we acknowledge that equity returns have historically exhibit negative skew, with the left tail of their return distribution (“losses”) being longer and fatter than the right (“gains”). Combining a neutral hit rate with negative skew creates negative information coefficients.
To offset this negative skew, we require increased accuracy. Unfortunately, even in the case where an investor exhibits 60% accuracy, there are a significant number of 5-year periods where it might masquerade as a strategy with a much higher or lower hit-rate, inviting false conclusions.
This is all made somewhat more confusing when we consider that a strategy can have an accuracy rate below 50% and still be successful. Trend following strategies are a perfect example of this phenomenon. The positive skew that has been historically exhibited by these strategies means that frequently inaccurate trades of small magnitude are offset by infrequent, by very large accurate trades.
Yet if we measure success by short-term accuracy rates, we will almost certainly dismiss this type of strategy as one with no skill.
When taken together, this evidence suggests that not only might it be difficult for investors to meaningfully determine the difference between skill and luck over seemingly meaningful time horizons (e.g. 5 years), but also that short-term perceptions of accuracy can be woefully misleading for long-term success. Highly accurate strategies can still lead to catastrophe if there is significant negative skew lurking in the shadows (e.g. an ETF like XIV), while inaccurate strategies can be successful with enough positive skew (e.g. trend following).