This post is available as PDF download here.
Summary
- New research from Axioma suggests that tilting less – through lower target tracking error – can actually create more academically pure factor implementation in long-only portfolios.
- This research highlights an important question: how should long-only investors think about factor exposure in their portfolios?Is measuring against an academically-constructed long/short portfolio really appropriate?
- We return to the question of style versus specification, plotting year-to-date excess returns for long-only factor ETFs.While the general style serves as an anchor, we find significant specification-driven performance dispersion.
- We believe that the “right answer” to this dispersion problem largely depends upon the investor.
When quants speak about factor and style returns, we often do so with some sweeping generalizations. Typically, we’re talking about some long/short specification, but precisely how that portfolio is formed can vary.
For example, one firm might look at deciles while another looks at quartiles. One shop might equal-weight the holdings while another value-weights them. Some might include mid- and small-caps, while others may work on a more realistic liquidity-screened universe.
More often than not, the precision does not matter a great deal (with the exception of liquidity-screening) because the general conclusion is the same.
But for investors who are actually realizing these returns, the precision matters quite a bit. This is particularly true for long-only investors, who have adopted smart-beta ETFs to tap into the factor research.
As we have discussed in the past, any active portfolio can be decomposed into its benchmark plus a dollar-neutral long/short portfolio that encapsulates the active bets. The active bets, then, can actually approach the true long/short implementation.
To a point, at least. The “shorts” will ultimately be constrained by the amount the portfolio can under-weight a given security.
For long-only portfolios, increasing active share often means having to lean more heavily into the highest quintile or decile holdings. This is not a problem in an idealized world where factor scores have a monotonically increasing relationship with excess returns. In this perfect world, increasing our allocation to high-ranking stocks creates just as much excess return as shorting low-ranking stocks does.
Unfortunately, we do not live in a perfect world and for some factors the premium found in long/short portfolios is mostly found on the short side.1 For example, consider the Profitability Factor. The annualized spread between the top- and bottom-quintile portfolios is 410 basis points. The difference between the top quintile portfolio and the market, though, is just 154 basis points. Nothing to scoff at, but when appropriately discounted for data-mining risk, transaction costs, and management costs, there is not necessarily a whole lot left over.
Which leads to some interesting results for portfolio construction, at least according to a recent study by Axioma.2 For factors where the majority of the premium arises from the short side, tilting less might mean achieving more.
For example, Axioma found that a portfolio optimized maximize exposure to the profitability factor while targeting a tracking error to the market of just 10 basis points had a meaningfully higher correlation than the excess returns of a long-only portfolio that simply bought the top quintile. In fact, the excess returns of the top quintile portfolio had zero correlation to the long/short factor returns. Let’s repeat that: the active returns of the top quintile portfolio had zero correlation to the returns of the profitability factor. Makes us sort of wonder what we’re actually buying…
Source: Kenneth French Data Library; Calculations by Newfound Research.
Cumulative Active Returns of Long-Only Portfolios
So, what does it actually mean for long-only investors when we plot long/short equity factor returns? When we see that the Betting-Against-Beta (“BAB”) factor is up 3% on the year, what does that imply for our low-volatility factor ETF? Momentum (“UMD”) was down nearly 10% earlier this year; were long-only momentum ETFs really under-performing by that much?
And what does this all mean for the results in those fancy factor decomposition reports the nice consultants from the big asset management firms have been running for me over the last couple of years?
Source: AQR. Calculations by Newfound Research.
We find ourselves back to a theme we’ve circled many times over the last few years: style versus specification. Choices such as how characteristics are measured, portfolio concentration, the existence or absence of position- and industry/sector-level constraints, weighting methodology, and rebalance frequency (and even date!) can have a profound impact on realized results. The little details compound to matter quite a bit.
To highlight this disparity, below we have plotted the excess return of an equally-weighted portfolio of long-only style ETFs versus the S&P 500 as well as a standard deviation cone for individual style ETF performance.
While most of the ETFs are ultimately anchored to their style, we can see that short-term performance can meaningfully deviate.
Source: CSI Analytics. Calculations by Newfound Research. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes, with the exception of underlying ETF expense ratios. Past performance is not an indicator of future results. Year-to-Date returns are computed by assuming an equal-weight allocation to representative long-only ETFs for each style. Returns are net of underlying ETF expense ratios. Returns are calculated in excess of the SPDR&P 500 ETF (“SPY”). The ETFs used for each style are (in alphabetical order): Value: FVAL, IWD, JVAL, OVLU, QVAL, RPV, VLU, VLUE; Size: IJR, IWM, OSIZ; Momentum: FDMO, JMOM, MMTM, MTUM, OMOM, QMOM, SPMO; Low Volatility: FDLO, JMIN, LGLV, OVOL, SPLV, SPMV, USLB, USMV; Quality; FQAL, JQUA, OQAL, QUAL, SPHQ; Yield: DVY, FDVV, JDIV, OYLD, SYLD, VYM; Growth: CACG, IWF, QGRO, RPG, SCHG, SPGP, SPYG; Trend: BEMO, FVC, LFEQ, PTLC. Newfound may hold positions in any of the above securities.
Conclusion
In our opinion, the research and data outlined in this commentary suggests a few potential courses of action for investors.
- For certain styles, we might consider embracing smaller tilts for purer factor exposure.
- To avoid specification risk, we might embrace the potential benefits of multi-manager diversification.
- Or, if there is a particular approach we prefer, simply acknowledge that it may not behave anything like the academic long/short definition – or even other long-only implementations – in the short-term.
Academically, we might be able to argue for one approach over another. Practically, the appropriate solution is whatever is most suitable for the investor and the approach that they will be able to stick with.
If a client measures their active returns with respect to academic factors, then understanding how portfolio construction choices deviate from the factor definitions will be critical.
An advisor trying to access a style but not wanting to risk choosing the wrong ETF might consider asking themselves, “why choose?” Buying a basket of a few ETFs will do wonders to reduce specification risk.
On the other hand, if an investor is simply trying to maximize their compound annualized return and nothing else, then a concentrated approach may very well be warranted.
Whatever the approach taken, it is important to remember that results between two strategies that claim to implement the same style can and will deviate significantly, especially in the short run.
Using PMI to Trade Cyclicals vs Defensives
By Corey Hoffstein
On August 19, 2019
In Risk & Style Premia, Weekly Commentary
This blog post is available as a PDF download here.
Summary
I love coming across old research because it allows for truly out-of-sample testing.
Earlier this week, I stumbled across a research note from 2009 and a follow-up note from 2012, both exploring the use of macro-based signals for constructing dollar-neutral long/short sector trades. Specifically, the pieces focused on using manufacturing Purchasing Manager Indices (PMIs) as a predictor for Cyclical versus Defensive sectors.1
The strategy outlined is simple: when the prior month change in manufacturing PMI is positive, the strategy is long Cyclicals and short Defensives; when the change is negative, the strategy is long Defensives and short Cyclicals. The intuition behind this signal is that PMIs provide a guide to hard economic activity.
The sample period for the initial test is from 1998 to 2009, a period over which the strategy performed quite well on a global basis and even better when using the more forward-looking ratio of new orders to inventory.
Red flags start to go up, however, when we read the second note from 2012. “It appears that the new orders-to-inventory ratio has lost its ability to forecast the output index.” “In addition, the optimal lookback period … has shifted from one to two months.”
At this point, we can believe one of a few things:
I won’t even bother addressing the whole “one-month versus two-month” comment. Long-time readers know where we come down on ensembles versus parameter specification…
Fortunately, we do not have to pass qualitative judgement: we can let the numbers speak for themselves.
While the initial notes focused on global implementation, we can rebuild the strategy using U.S. equity sectors and US manufacturing PMI as the driving signal. This will serve both as an out-of-sample test for assets, as well as provide approximately 7 more years of out-of-sample time to evaluate returns.
Below we plot the results of this strategy for both 1-month and 2-month lookback periods, highlighting the in-sample and out-of-sample periods for each specification based upon the date the original research notes were published. We use the State Street SPDR Sector Select ETFs as our implementation vehicles, with the exception of the iShares Dow Jones US Telecom ETF.
Source: CSI Data; Quandl. Calculations by Newfound Research. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.
The first thing we notice is that the original 1-month implementation – which appeared to work on a global scale – does not seem particularly robust when implemented with U.S. sectors. Post publish date results do not fare much better.
The 2-month specification, however, does appear to work reasonably well both in- and out-of-sample.
But is there something inherently magical about that two-month specification? We are hard-pressed to find a narrative explanation.
If we plot lookback specifications from 3- to 12-months, we see that the 2-month specification proves to be a significant outlier. Given the high correlation between all the other specifications, it is more likely that the 2-month lookback was the beneficiary of luck rather than capturing a special particular edge.
Source: CSI Data; Quandl. Calculations by Newfound Research. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.
Perhaps we’re not giving this idea enough breathing room. After all, were we to evaluate most value strategies in the most recent decades, we’d likely declare them insignificant as well.
With manufacturing PMI data extending back to the 1948, we can use sector index data from the Kenneth French website to reconstruct this strategy.
Unfortunately, the Kenneth French definitions do not match GICs perfectly, so we have to change the definition of Cyclicals and Defensives slightly. Using the Kenneth French data, we will define Cyclicals to be Manufacturing, Non-Durables, Technology, and Shops. Defensives are defined to be Durables, Telecom, Health Care, and Utilities.
We use the same strategy as before, going long Cyclicals and short Defensives when changes in PMI are positive, and short Cyclicals and long Defensives when changes to PMI are negative. We again vary the lookback period from 1- to 12-months.
Source: Kenneth French Data Library; Quandl. Calculations by Newfound Research. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.
The results are less than convincing. Not only do we see significant dispersion across implementations, but there is also no consistency in those implementations that do well versus those that do not.
Perhaps worse, the best performing variation only returned a paltry 1.40% annualized gross of any implementation costs. Once we start accounting for transaction costs, slippage, and management fees, this figure deflates towards zero rather quickly.
Source: Kenneth French Data Library; Quandl. Calculations by Newfound Research. Results are hypothetical. Results assume the reinvestment of all distributions. Results are gross of all fees, including, but not limited to, manager fees, transaction costs, and taxes. Past performance is not an indicator of future results.
Conclusion
There is no shortage of quantitative research in the market and the research can be particularly compelling when it seems to fit a pre-existing narrative.
Cyclicals versus Defensives are a perfect example. Their very names imply the regimes during which they are supposed to add value, but actually translating this notion into a robust strategy proves to be less than easy.
I would make the philosophical argument that it quite simply cannot be easy. Consider the two pieces of information we need to believe for this strategy to work:
If we have very high confidence in both statements, it effectively implies an arbitrage.
Therefore, if we have very high confidence in the truth of the first statement, then for markets to be reasonably efficient, we must have little confidence in the second statement.
Similarly, if we have high confidence in the trust of the second statement, then for markets to be reasonably efficient, we must have little confidence in the first statement.
Thus, a more reasonable expectation might be that Cyclicals tend to outperform Defensives during an expansion, and Defensives tend to outperform Cyclicals in a contraction, but there may be meaningful exceptions depending upon the particular cycle.
Furthermore, we may believe we have an edge in forecasting expansions and contractions (perhaps not with just PMI, though), but there will be many false positives and false negatives along the way.
Taken together, we might believe we can construct such a strategy, but errors in both assumptions will lead to periods of frustration. However, we should recognize that for such an “open secret” strategy to work in the long run, there have to be troughs of sorrow deep enough to avoid permanent crowding.
In this case, we believe there is little evidence to suggest that level changes in PMI provide particular insight into Cyclicals versus Defensives, but that does not mean there are no macro signals that might.