This post is available as a PDF download here.
- Portfolio construction is a lot like cooking. There are two equally important elements: the ingredients and the recipe. The ingredients are the signals that are used to select investments. The recipe is the set of rules used to transform those signals into portfolio allocations.
- In factor investing, the signals (e.g., value, momentum, carry) often get all the attention and the importance of the recipe – how these signals are actually transformed into portfolio construction – often gets lost. Designing a recipe requires making decisions like how often to rebalance, how to weight holdings, and how to blend signals (when multiple signals are used).
- Even portfolios that use the same core factor can experience significant performance dispersion due to recipe differences.
- This dispersion has two main implications for factor investors. First, dispersion creates the opportunity for data mining. To combat this, diligence efforts must focus as much on the construction process as they do on the factors themselves. Second, dispersion makes short-term underperformance inevitable. Potential dispersion is so large due to recipe differences that it is entirely plausible that one momentum portfolio, as an example, could outperform value while another underperforms. Users of factor strategies should resist the urge to chase performance, especially over 3- to 5-year investment horizons.
We often compare portfolio construction to cooking. There are two equally important elements: the ingredients and the recipe. The ingredients are the signals that are used to select investments. The recipe is the set of rules used to transform those signals into portfolio allocations.
When it comes to factor investing, it seems that the signals oftentimes get all the attention. Investors struggle with whether incorporating factor-based strategies is worth it, argue over which are the premier factors, debate whether factors can be timed or not, and fret about premiums being driven towards zero as acceptance of factor-based strategies increases.
In all of this (admittedly an interesting debate), the importance of the recipe – i.e. how we move from a factor score to an actual strategy – is often lost.
Let’s use the momentum factor as an example. What decisions do we need to make to build a momentum portfolio?
First, we need to decide how to measure momentum. The simplest and most traditional approach would just be to use trailing total return. However, there are many other ways to measure momentum. We could use risk-adjusted momentum (i.e. looking at trailing Sharpe Ratios instead of trailing returns). We could use idiosyncratic momentum (i.e. measuring trailing returns after stripping out the stock’s exposure to the broad market). We could use a regression on past prices to estimate the stock’s trend. We could smooth prices using various moving averages. We could diversify by blending multiple signals together.
We also need to decide over how long of a period we want to measure momentum. Do we want to use 12 months? 9 months? 6 months? Do we want to skip to the most recent month to account for mean reversion as is common in the academic literature or include it?
Do want to rebalance weekly, monthly, quarterly, or at some other interval?
How concentrated do we want the portfolio to be? Do we choose the 30 stocks with the highest momentum? 50 stocks? 100 stocks?
How will the stocks be weighted? We could market-cap weight the holdings. We could equal-weight the holdings. We could allocate in proportion to each stock’s momentum by giving the highest allocation to the stocks with the most momentum.
If we wanted to get really fancy we could run some type of optimization like equal-risk contribution, mean-variance optimization, or minimum volatility.
If we go this route, then we really open Pandora’s box. We now need to decide how to estimate our parameters (means, volatility, and correlations). Do we use historical data? If so, how much? What measures do we use? How do we account for estimation risk?
Do we care about controlling for other risk factors, like sector risk? If not, we may let the portfolio be sector unconstrained and therefore let the sector exposures just fall out of the stock selection. If yes, we need to decide how to control it. Do we force the portfolio to be sector neutral (i.e. have the same sector weights as the market)? Are we comfortable with a middle ground, where sector weights are allowed to drift from market-cap sector weights, but only by some pre-specified amount? If we want to go this route, how loose or tight do we set those thresholds?
We could go on and on and on. The degree of potential customization of the recipe is largely limitless. In the ramblings above, we mentioned at least six dimensions of customization for a momentum strategy:
- Momentum measure
- Lookback period
- Rebalance frequency
- Weighting Scheme
- Sector Constraints
If these were the only six dimensions and there were only ten choices per dimension, there would be a whopping 1,000,000 potential variations of the momentum strategy!
Illustrating the Impact of the Portfolio Construction Recipe
To illustrate the impact that these choices can have on returns, we conducted a simple experiment. Our investment universe consists of the 200 largest holdings in the SPDR S&P 500 ETF (ticker: SPY). Using this universe, we create 1,080 different momentum portfolios by varying the construction rules (5 momentum measures x 4 lookback periods x 3 rebalance frequencies x 3 levels of portfolio concentration x 3 weighting schemes x 2 methods of dealing with sector risk).
As a quick note, since we are only interested in exploring dispersion across strategies, all performance statistics presented are normalized by subtracting the median for that metric across all 1,080 strategies. For example, an annualized return of +5.0% means that a strategy outperformed the median by 5.0% per year. Similarly, a Sharpe Ratio of -0.10 simply says that the strategy’s Sharpe Ratio was 0.10 lower than the median Sharpe Ratio across all 1,080 strategies.
The dispersion of full period performance statistics is quite large. On a raw return basis, the worst performing strategy underperformed the median by 12.0% annualized while the best performing strategy outperformed the median by 18.0% annualized. A difference of more than 30% per year between the best and worst strategy is certainly nothing to sneeze at. Even if we ignore the tails of the return distribution, the dispersion remains very evident. The 95% confidence interval is -5.5% to +11.5%. The standard deviation of annualized returns across the strategies tested is 5.3%.
The results are very similar if we change focus to risk-adjusted returns using the Sharpe Ratio. The worst strategy on a risk-adjusted basis has a Sharpe Ratio 0.57 below the median, while the best strategy’s Sharpe Ratio is 0.27 above the median. The 95% confidence interval is -0.18 to +0.14 and the standard deviation of Sharpe Ratio relative to the median is 0.10.
While the dispersion of full period results is interesting, what we really want to focus on is the dispersion in returns over shorter time horizons because these are the differences that fuel many behavioral biases. To do so, we calculate rolling one-year returns for each of the strategies. We then rank these returns across all strategies at each point in time and then plot the 95% confidence interval. Again, all returns are normalized by subtracting the median return at each point in time.
To put this dispersion into context, we use Fama-French data to calculate rolling one-year returns for four long-only equity portfolios: a market-cap weighted portfolio, a value portfolio (top 30% of universe by book-to-price), a momentum portfolio (top 30% by 12-month trailing return skipping the most recent month), and a small-cap portfolios (bottom 30% of the universe by market-cap). We then calculate the difference between the best and worst performing of these strategies. We compare this quantity to the width of the 95% confidence interval over our 1,080 momentum strategies.
We find that more than 90% of the time, the variation across the different momentum strategies is greater than the variation across the factors. In other words, whether momentum beats value or momentum beats size can be very, very dependent on how each of the individual factor portfolios is constructed.
Let that sink in for a minute. In the short-term, it is entirely possible that one approach to capturing momentum outperforms other factors like value and size while another implementation of momentum underperforms.
In past speaking engagements, we’ve argued that successful factor investing can be boiled down to the following three Warren Buffett quotes:
- “Risk comes from not knowing what you are doing.” → Takeaway: Know what you own. Understand not only the factors being employed, but also how the portfolio is constructed.
- "You don’t need to be a rocket scientist. Investing is not a game where the guy with the 160 IQ beats the guy with 130 IQ.” → Takeaway: Know why you own it. Data should never trump insight, but theories must be supported by the data.
- “No matter how great the talents or efforts, some things just take time. You can’t produce a baby in one month by getting nine women pregnant.” Takeaway → Commit to owning it. Factor premiums vary over time. All factors go through periods of prolonged underperformance. The biggest benefits will accrue to those investors with the discipline to stay committed through these difficult periods. Put differently, weak hands that “fold” will pass to premium to strong hands that “hold.”
We think the types of results illustrated in this commentary have important implications for all three of these aspects of factor investing.
Know What You Own
Once you’ve decided what type of exposure you are interested in adding to a portfolio, the first step of due diligence is understanding the investment process. In our minds, one of the biggest advantages of index-based ETFs is that there is no guessing when it comes to understanding the investment process. We don’t have to rely on manager interviews and presentations. We don’t have to worry that manager will change his or her investment process when the going gets tough. Understanding the investment process is as simple as digging into to the index methodology.
Take the following four momentum ETFs as examples:
- iShares Edge MSCI USA Momentum ETF (ticker: MTUM, tracks the MSCI USA Momentum Index)
- SPDR Russell 1000 Momentum Focus ETF (ticker: ONEO, tracks the Russell 1000 Momentum Focused Factor Index)
- Fidelity Momentum Factor ETF (ticker: FDMO, tracks the Fidelity U.S. Momentum Factor Index)
- JPMorgan U.S. Momentum Factor ETF (ticker: JMOM, tracks the JPMorgan US Momentum Factor Index)
Summary of Index Methodology
|Universe||MSCI USA Index (85% free float of U.S. market, currently has 631 constituents).||Russell 1000||Largest 1000 stocks based on market-capitalization||Russell 1000|
|Measure||Blend of trailing 12-month and 6-month risk-adjusted price return in excess of risk-free rate and skipping the most recent month. Currently has 125 securities (approximately 20% of the stocks in the universe).||12-Month Total Return (skipping most recent month). The process also considers value (Composite of cash flow yield, earnings yield, and sales-to-price.), quality (Composite of profitability and leverage. Profitability is a blend of return on assets, change in asset turnover, and accruals. Leverage is the ratio of operating cash flow to total debt), and size (natural logarithm of market-capitalization).||Blended score with 35% weight on 12-month total return (skipping most recent month), 35% on volatility-adjusted 12-month total return (skipping most recent month, monthly returns used to calculate volatility), 15% on 12-month earnings surprise (EPS estimate from 12-months ago to actual EPS), and 15% on 12-month average short interest. Composite scores are size-adjusted to minimize any size bias.||12-Month Total Return Divided by Volatility (volatility measured using 1 year of daily returns)|
|Rebalance Frequency||Semi-Annual with ability for more frequent rebalances depending on volatility of the index.||Semi-Annual||Quarterly||Quarterly|
|Concentration||Dependent on the number of securities in the parent index and the market-cap distribution of securities.||No explicit concentration, although there is a minimum position size. Currently holds positions in approximately 88% of the companies in the universe.||Top decile (10%) within sectors with more than 100 securities. Top quintile (20%) within sectors with 25 to 100 securities. Top tercile (33%) for sectors with less than 25 securities. Currently holds positions in approximately 13% of the stocks in the universe.||No defined concentration. Currently holds positions in approximately 27% of the stocks in the universe.|
|Weighting Methodology||Multiply momentum score by market-capitalization.||“Tilt-tilt” methodology multiplies the factor scores for quality, value, size, and momentum.||Market-cap within each sector plus an equal overweight adjustment that is applied equally to all constituents within that sector.||Weights to companies in industries that are under the required allocation are increased in an iterative fashion starting with the stock with the highest momentum. Similarly, weights to companies in industries that are over the required allocation are decreased in an iterative fashion starting with the stock with the worst momentum. There is then a set of rules that (i) ensures that the total amount invested equals 100% of the portfolio and (ii) allows for rebalancing from the stocks with the worst momentum to the stocks with the best momentum.|
|Sector Constraints||N/A||Lower and upper bounds applied by industries, but are not relative to the market-cap of each industry.||Sector neutral relative to investment universe||Industry neutral relative to Russell 1000|
|Other Constraints||5% position limit.||Maximum position limit tied to capacity/liquidity.||N/A||Individual position limits are a function of market-cap and liquidity with a hard limit of 2%.|
|Other Notes||Apply buffer rules to manage turnover.||N/A||N/A||The index actively considers liquidity in deciding how large of trades are required at each rebalance.|
Source: FTSE, MSCI, Fidelity, Data as of May 18, 2018
We immediately see the diversity in investment processes across the different strategies. None of the strategies measure momentum the exact same way, let alone have much overlap in the many dimensions of the construction rules like sector limits, weighting methodology, etc.
These differences are abundantly clear when we examine the top holdings, sector holdings, and holdings overlap.
Top Ten Holdings by ETF
|Microsoft (5.1%)||Micron (1.0%)||Apple (4.4%)||Apple (2.1%)|
|Amazon (5.0%)||Lear (0.9%)||Microsoft (3.5%)||Amazon (2.0%)|
|JPMorgan (4.9%)||Best Buy (0.8%)||Amazon (3.0%)||Microsoft (2.0%)|
|Intel (4.6%)||Aptiv (0.8%)||Facebook (2.4%)||Facebook (2.0%)|
|Boeing (4.5%)||Baxter (0.8%)||JPMorgan (2.1%)||Visa (1.8%)|
|Bank of America (4.5%)||XPO Logistics (0.7%)||Berkshire Hathaway (2.0%)||Google (1.8%)|
|Cisco (4.1%)||Corning (0.6%)||Johnson & Johnson (2.0%)||UnitedHealth (1.7%)|
|Mastercard (3.6%)||Michael Kors (0.6%)||UnitedHealth (1.7%)||Home Depot (1.6%)|
|Abbvie (3.3%)||Valero (0.5%)||Visa (1.7%)||Johnson & Johnson (1.6%)|
|Visa (3.2%)||Spirit AeroSystems (0.5%)||Mastercard (1.5%)||Boeing (1.5%)|
Source: JPMorgan, Fidelity, iShares, and SPDR as of May 18, 2018.
Know Why You Own It
Knowing why you own is the second step in the diligence process. Once we understand how the strategy works by digging into the index methodology, our work must turn to understanding the why behind the design. The biggest due diligence implication of our study of different iterations of momentum strategies is that due diligence does not begin and end with finding the right factors. Performing due diligence on the portfolio construction process that translates the factor signals into actual allocations is just as crucial.
Just because a strategy or index touts itself as harvesting one of the handful of factors that are supported by both sound economic theory and broad out-of-sample performance (e.g. value, momentum, trend, carry) does not mean that it is immune from data mining.
In fact, it’s even entirely possible that a factor that has exhibited no long-term efficacy can be made to appear statistically significant just by over-optimizing the portfolio construction rules using the same signals.
Some techniques/rules of thumb that we adhere to when performing this type of diligence are:
- Holding all else equal, simpler is better. Simpler processes have fewer degrees of freedom and therefore are less susceptible to being data mined.
- Ask “Why?” In the first item above, the “holding all else equal” is just as important as the “simpler is better.” Perhaps the simplest momentum strategy would just constantly trade into the single stock with the highest momentum. Such a strategy, while simple, happens to be foolish. Complexity has its place, if used with purpose.Key elements of the recipe should be held to the same standard that we hold the factors themselves to. Namely, we should look for construction rules that are backed by both empirical evidence; preferably across asset classes, geographies, and time frames; and sound theory.Take rebalance frequency as an example. In a world with no transaction costs, it may make sense to rebalance continuously so as to maximize the desired factor exposure at each point in time. Of course, this type of construction, especially for a higher turnover factor like momentum, may incur such high transaction costs that the amount of excess return may be materially or even completely eliminated.
- Ask to see sensitivity analysis across parameters. With index-based strategies, it should be relatively simple for the index or ETF provider to provide insight into how varying a particular parameter would have impacted risk/return in the historical data. As an example, consider a strategy that constrains turnover to X% at each rebalance date. We think it’s entirely reasonable to ask how overall turnover and performance would be impacted by varying that parameters above and below X%. It would be worth noting if X% happened to be an isolated optimal value based on historical performance. Such a result for one parameter would not necessarily be overly concerning, but if we saw the same result across a number of parameters then our data mining alarm bells should be ringing.
- Embrace process diversification. One elegant way to address the idiosyncratic risk across strategies when we don’t have a strong, evidence-based view as to whether one construction is better or worse than another is simply to diversify across parameters. In our simulated momentum strategies, we find that an equal weighted blend of all 1,080 strategies has a better raw and risk-adjusted return than the median of the individual strategies. We employ this type of approach in our tactical strategies where we prefer a monthly rebalance, but are concerned with timing luck (i.e. tracking error that can arise between monthly rebalanced strategies that rebalance at different times during the month). To address this risk, we manage a number of sub-portfolios that are each rebalanced at a different time during the month.
- Beware the magic number. “Magic numbers” are parameters that are unlikely to be chosen by design and rather suggest that the process has been optimized over the historical data set. For example, say there was a value ETF that used three different valuation metrics: price-to-earnings, price-to-book, and price-to-sales. The underlying index creates a composite value score for each stock according to the following formula: Value Score = 0.243 * Normalized P/E + 0.619 * Normalized P/B + 0.138 * Normalized P/S.This is an immediate red flag. It’s unlikely that such a specific formula was derived by anything other than analysis of the historical data and that over-optimization, whether intentional or accidental, has infiltrated the strategy design.
On a related note, there is evidence in our minds that data mining is occurring. For example. “Live from Newport Beach. It’s Smart Beta!”, an August 2017 piece from Feifei Li and John West at Research Affiliates, is a must read for those looking to get into the factor investing game. In the article, the authors examine 125 U.S. equity smart beta or indices that are tracked by ETFs. They conclude: “We find that prior to launch the indices tend to have superior performance relative to a market-capitalization-weighted benchmark, with outperformance peaking about six months ahead of the launch date. The outperformance seems to be extremely strong over the three-year period ahead of the launch. After the indices officially launch, however, their performance relative to the S&P 500 Index appears to hover around the base line, exhibiting virtually none of the outperformance demonstrated before they were live.”
The punchline is summed up nicely in the following chart:
There are a few ways you could interpret this data.
You could conclude that factor investing is fundamentally broken. It may have worked historically in a backtest but has not actually played out so well for ETF investors in the real world, whether because the factors themselves were data mined, increased usage as driven premiums to zero, or some other reason. We don’t believe this is the case. We’d be more worried about this if the factor products being launched largely reflected new factors that had yet to be proven through significant out-of-sample study. Yet, most launches have tended to stick to the factors that check all the right boxes like momentum, value, and low volatility.
Rather, we hypothesize that two things are going on here. First, index providers and ETF manufacturers are not immune to chasing performance. At the end of the day, asset management as a business is about sales. Selling an asset class or strategy that has outperformed recently is infinitely easier than selling one that has underperformed. When the strategies inevitably mean revert, as long-term performance tends to, you get results like those seen in the Research Affiliates piece. Looking at live performance between 60 months and 120 months may be a clearer indicator of this fact.
Second, there may be some data mining going on at the recipe level. Let’s return to our 1,080 momentum strategies for a second. Assume that these represent different potential indices that may be tracked by new ETFs. Each week, we’ll assume that one new momentum ETF is launched. This new ETF will track an index that was randomly chosen from the top 5% of the 1,080-index universe based on trailing five-year Sharpe Ratio. In other words, performance chasing has infected the product development process. After performing this simulation, we plot the average 1-year Sharpe Ratio in the 5-years before and after product launch.
Unsurprisingly, we see a similar situation in our simulation as the actual data from the Research Affiliates. Performance is strong relative to the entire universe of momentum strategies pre-launch (by definition since the “launched” strategies are picked from the group of the best historical performers), but this outperformance erodes in the years after launch.
Commit to Owning It (No Pain, No Premium)
Factor investing is not a painless process. Underperformance for any individual factor strategy, even over a five-year period, should not be alarming in isolation. In fact, the ability to weather prolonged periods of underperformance has historically been a necessary condition for capturing the long-term benefits of factor investing. Going back to the 1930s, all four factors presented below (size, value, momentum, and low beta) show significant positive annualized excess returns, but experience drawdowns that are comparable in duration and magnitude to those seen in the equity market as a whole.
No Pain, No Premium → The Experience of a Factor Investor
|Annualized Excess Return||6.8%||1.8%||3.5%||7.3%||7.9%|
|Longest Drawdown||12.4 Years||34.7 Years||11.0 Years||9.3 Years||6.8 Years|
Data Source: AQR. Calculations by Newfound Research. Data covers the period from 1935 to 2018. Returns are hypothetical and do not reflect any strategy managed by Newfound Research. Returns include the reinvestment of dividends. Past performance does not guarantee future results.
The immense tracking error that can be experienced even across strategies utilizing the same general factor creates even more opportunities for doubt to creep in and commitment to erode.
On multiple occasions, we’ve participated on panels related to factor-investing and been something along the lines of “Is factor investing worth it?” Our answer is a resounding yes, with one caveat. There needs to be long-term commitment. If a factor or strategy will be sold a year from now if it has underperformed, that investor’s experience is much less likely to be positive.
In our momentum strategy simulations, we see that performance chasing does not work. There is almost no relationship between trailing 1-year Sharpe Ratios and forward 1-year Sharpe Ratios. When we expand the performance measurement interval to three and five years, we see clear evidence of mean reversion. High past Sharpe Ratios tend to predict low future Sharpe Ratios and vice versa. Buyer beware that strong recent track records alone should provide little comfort as to future expectations.
|Trailing 1-Year Sharpe vs. Forward 1-Year Sharpe||0.07||0.07||0.01|
|Trailing 3-Year Sharpe vs. Forward 3-Year Sharpe||-0.48||-0.49||0.23|
|Trailing 5-Year Sharpe vs. Forward 5-Year Sharpe||-0.61||-0.78||0.37|
Data Source: CSI and SPDR. Calculations by Newfound Research. Data analysis starts in May 1995 as that is when all portfolios could calibrate given available price data. Returns are hypothetical and backtested and do not reflect any strategy managed by Newfound Research. Returns include the reinvestment of dividends. The strategies all were constructed with explicit hindsight bias as the universe consists of the 200 largest stocks in the SPDR S&P 500 ETF (ticker: SPY) as of May 2018. Past performance does not guarantee future results.
Portfolio construction is a lot like cooking. There are two equally important elements: the ingredients and the recipe. The ingredients are the signals that are used to select investments. The recipe is the set of rules used to transform those signals into portfolio allocations.
In factor investing, the signals (e.g., value, momentum, carry) often get all the attention and the importance of the recipe – how these signals are actually transformed into portfolio construction – often gets lost. Designing a recipe requires making decisions like how often to rebalance, how to weight holdings, and how to blend signals (when multiple signals are used).
Even portfolios that use the same core factor can experience significant performance dispersion due to recipe differences.
This dispersion has two main implications for factor investors. First, dispersion creates the opportunity for data mining. To combat this, diligence efforts must focus as much on the construction process as they do on the factors themselves.
Second, dispersion makes short-term underperformance inevitable. Potential dispersion is so large due to recipe differences that it is entirely plausible that one momentum portfolio, as an example, could outperform value while another underperforms. Users of factor strategies should resist the urge to chase performance, especially over 3- to 5-year investment horizons.
 Because we use the current top 200 holdings, we are knowingly introducing hindsight bias into the portfolio process. Overall, we would expect a portfolio consisting of SPY’s current top holdings to have strong past performance, otherwise many of the companies may not be in the top 200 by market-cap today. We are fine with this because our goal is to illustrate the how much return dispersion can be created through the selection of the portfolio construction rules. However, due to the hindsight bias, the results should not be used to draw conclusions as to how any of these strategies in particular, or the momentum factor in general, would have performed historically or how they may perform going forward, especially in comparison to the broad market.
 Momentum measures are trailing total return, trailing total return skipping the most recent month, risk-adjusted trailing total return, OLS estimate of the trend, and a Kalman filter. The lookbacks are 3 months, 6 months, 9, months, and 12 months. The rebalance frequencies are weekly, monthly, and quarterly. In the most concentrated portfolio we pick the top 10% of stocks by momentum. In the least concentrated portfolio we pick the top 50% of stocks by momentum. The middle level of concentration picks the top 25% of stocks by momentum. The weighting schemes are equal-weighted, rank-weighted, and score-weighted. We consider both sector constrained (sector weights equal to market-cap sector weights) and sector unconstrained strategies.
 We randomly select from the top 5% instead of simply choosing the best strategies to try to ensure that there is some diversity among the strategies. Choosing the top strategy would probably lead to a lot of overlap. We think this is a more realistic representation of reality since it does seem that product manufacturers and index providers seek to differentiate their strategies from others in the market.