This blog post is available as a PDF here.
- Most advisors have a fund checklist or screen: a list of selection criteria they employ to help determine whether a fund is worthy of further evaluation.
- The vast majority of checklists we see employ a performance screen based on a 3- or 5-year period.
- We believe that employing such a performance screen not only misleads selection efforts, but also can be harmful to portfolio performance.
Most financial advisors we interact with have some sort of checklist they apply when screening for mutual funds. Our experience has been that 99% of checklists share a common core along the lines of: “We screen for funds based on 3- (or 5-) year track records, looking for significant outperformance over that time period.”
At face value, this doesn’t seem totally unreasonable. After all, why would you want to buy a fund with a track record of failure? Theoretically, track records are objective measures of a manager’s skill, devoid of any marketing spin or excuses. The proof of the pudding is in the eating, as they say.
The problem is that a screen based upon prior returns leaves us susceptible to hindsight bias and narrative fallacies. Many performance evaluators operate under the unrealistic assumption that somehow the returns that actually happened were supposed to happen and that managers who did well were able to better identify this reality beforehand.
Performance screens treat asset management like an archery contest, where we are trying to identify those managers with the greatest accuracy and consistency. Given how bad humans are at accurately discounting the future, however, performance screens are more akin to holding an archery contest and drawing the bull’s-eyes after all the shots have been fired.
Sure, someone won. But did that person actually know what they were shooting at? Or did the bull’s-eye just happen to end up where they shot?
At Newfound, we believe that common performance screens are not only fundamentally flawed, but they are downright dangerous for investment results. Here are six reasons why.
1. “Past performance is not indicative of future results.”
The single most popular phrase across the entire mutual fund industry … totally ignored by just about every mutual fund screen we’ve ever come across. Worse, as an industry, we don’t just ignore it: we assume the opposite. We screen funds assuming that past performance is indicative of future returns.
The danger with this approach is that it assumes that track records are entirely driven by skill and that randomness and luck play no role whatsoever. In this worldview, positive performance comes from the portfolio manager’s ability to successfully re-align their portfolio as markets change.
We do not believe that the above paragraph is at all indicative of the actual investment landscape. Today, process consistency is prized. We expect managers to stick to their process regardless of market outlook. As a result, performance, especially over shorter time horizons, is less influenced by whether the manager has successfully aligned the portfolio with the market, but rather whether the market has aligned itself with the manager’s process. To use our analogy from before: the archers consistently fire in the same spot and one of them will get lucky enough to have the bull’s-eye drawn where they shot
Recent track record, then, is less an indication of manager skill than it is an indication of how well the market was aligned with the manager’s process. Unfortunately, there is no evidence that alignment in the past will imply alignment in the future. In fact, evidence suggests just the opposite – but we’ll get to that in a bit.
The Takeaway: Performance screens interpret luck (both good and bad) as skill (both positive and negative), giving us a false sense of confidence in a manager’s abilities.
2. “Alpha” is likely just unidentified risk
Whether explicitly stated or not, most of the performance screening criteria we see are really just saying, “we’re looking for funds with alpha.” The problem with alpha is that it is highly dependent on the benchmark utilized to measure it. A value fund measured against the S&P 500 may very well have alpha; a value fund evaluated with a multi-factor model that includes the value premium likely will not.
Furthermore, as researchers continue to discover new and unique risk factors that are rewarded by excess return, it is becoming abundantly clear that what was once thought of as alpha is often either an unidentified risk factor or some combination or variation of known risk factors. In the past two decades, the models used to decompose returns and measure alpha have moved from single-factor, to 3-factor, to 5-plus factor – with measurable alpha shrinking all the while.
In other words, if you’re seeking alpha, evidence suggests that you’re really just tapping into risk that is not accounted for in your market models.
The Takeaway: Screening for alpha only draws us towards funds for which we have no idea how excess returns are generated and therefore introduces unknown sources of risk into our portfolios.
3. 3-year track records are statistically meaningless
One of the reasons investors wait for 3- or 5-year track records is that they need enough data to make a statistically significant evaluation of a track record. The problem is that there is little statistical significance in track records this short.
In fact, evidence suggests that many risk factors commonly employed to tap into excess return potential have such low information ratios that luck can dominate skill even in the long run. For example, the historical information ratio for the size premium is 0.18 – a level so low we would have to wait 120 years to be 95% confident of generating positive excess return by employing the tilt.
The problem compounds when we consider that managers attempting to tap into the same risk premium can end up with dramatically different results. Consider, for example, value investing. Common ways of measuring and filtering for value stocks include price-to-book, price-to-sales, and price-to-earnings ratios. All three of these approaches have historically succeeded in harvesting the value premium. Over shorter time periods, however, there can be striking variation in the performance of different iterations of the same strategy. For example, the 1-year relative performance between the best and worst performing of the three value approaches above has exceeded 2000 basis points.
This type of analysis is made even more irrelevant when we realize that multi-year underperformance is not only statistically probable, but potentially outright guaranteed. So if we are to acknowledge that underperformance is a nearly guaranteed statistical anomaly, why do we treat outperformance as skill?
The Takeaway: Trailing performance screens are based on statistically fragile foundations. Over 3- and 5-year time periods, randomness can easily overshadow skill for even the most historically robust investment strategies.
4. Performance screening systematizes performance chasing on the wrong time horizon
Filtering on 3-year performance would not be an issue if the process simply resulted in random behavior. Over a long enough horizon, this random behavior would diversify itself away and we’d end up with a case of no harm, no foul.
The problem is that screening on 3-year performance taps into a significant risk factor – and we’re on the wrong side of it.
Despite the commonly accepted wisdom to not chase performance, there is significant evidence that performance chasing works over short-term horizons of one-year or less. However, performance over longer time horizons – like 3-year cycles – is more often driven by mean reversionary forces (like valuations). By filtering on prior 3-year performance, we are effectively running a momentum strategy at a mean-reversionary time frame. This means buying strategies at tops and selling at bottoms.
This mistiming can cause a significant erosion of portfolio performance. For example, a portfolio that constantly rotates into the top quartile of U.S. Open End Moderate Allocation category funds ranked on 3-year performance creates a 0.5% annualized drag on performance. Buying the lowest quartile, however, actually improves return by 0.5% per year.
Of course, performance chasing can lead to underperformance orders of magnitude greater than 0.5%. Consider that the CGM Focus Fund (CGMFX) returned more than 18% on an annualized basis from 2000 to 2010 – yet the average investor in the fund actually lost 11% on an annualized basis due to the amount of money that piled into the fund after a stellar 2007 only to suffer a catastrophic 2008.
The Takeaway: Screening on performance can embed a systematic drag on portfolio performance by employing a momentum-based investment methodology on a mean-reversionary time-horizon.
5. Summary statistics ignore the importance of behavior
There is mounting evidence that behavior may be one of the most important factors in investing, causing the massive divergence between investor returns and investment returns. Screening for portfolios based upon prior performance largely ignores the behavioral aspect of investing. As we’ve said in the past, however, investors do not experience summary statistics. Investors experience the day-to-day, month-to-month, and quarter-to-quarter volatility. Misbehavior and bad decisions can turn this short-term volatility into permanent portfolio impairment.
Long-term summary statistics are just long-term averages, and “average” is almost certainly not “annual.” By focusing on these metrics, we ignore whether we’ll have the fortitude to hold a strategy long enough to potentially reap the long-term rewards.
The Takeaway: Long-run performance statistics are only relevant if an investor has the fortitude to hold for the long run. Long-run averages can be the siren’s song that misleads investors into crashing on the shoals of short-term volatility.
6. Performance screening marginalizes the value-add ability of wholesalers
One of our wholesalers tells a great story about his time at a prior firm representing a deep value fund.
I introduced the fund to one particular advisor who seemed to take strongly to the fund’s overarching philosophy. Over the next several quarters, I continued to re-introduce the fund to the advisor, repeating the core philosophies and process employed by the management team. Meanwhile, the fund went on a hot streak of massive outperformance. Finally, in a meeting nearly 12 months after the initial introduction, the advisor told me that while he really liked the people, process, and philosophy, he was concerned he had missed the boat and that if he bought today, he would be buying in at the top.
To which I replied, ‘well, if that’s the case, I’ve got a briefcase full of 1-star funds I can tell you about.’ Needless to say, the advisor quickly backpedaled.
Instead of forcing asset managers to be thoughtful about their fund launches, performance screening encourages a scattershot approach to product development and a rifled approach to sales, with a focus only on those funds that were launched and did well (often with the other funds being quietly shut down). So instead of an industry-wide sales process based upon thoughtful forward outlooks and solutions, we end up encouraging portfolio managers to swing for the fences, fund companies to launch as many products as possible, and wholesalers to focus on performance first.
The Takeaway: An industry-wide performance-based screening process encourages a performance-driven sales cycle that focuses on yesterday’s winners – the very same funds most likely to underperform tomorrow.
When it comes to fund evaluation, we believe in the five P’s: people, philosophy, process, performance, and “phees.” We believe that philosophy and process – qualities that are not easy to screen on – are likely the most impactful for long-term performance transparency and sustainability.
Given that we spent the majority of this article denouncing performance-based screening, the inclusion of performance in this list may come as a surprise. We do believe performance should be evaluated: just not as an absolute selection mechanism. Rather, we should evaluate historical performance to determine if the manager performed as we would have expected them to given the environment. If the prior environment was bad for their process, did they perform poorly? If the prior environment was good, did they do well? Answering no to either of these questions may be cause for concern.
To return to our archer analogy, we want to test for consistency of the process. If the bull’s-eye is painted where the archer is not aiming (a “bad” environment for the strategy), but the arrow lands in the target anyway, it is a red flag. Similarly, if the bull’s-eye is painted exactly where the archer has been shooting, but suddenly the archer misses, it is a red flag. Both of these situations raise questions about the transparency and consistency of the investment process.
At this point, most investors have admitted that there is no holy grail for stock picking that will work all the time. By the same logic, there is no magic formula for picking investment managers. If there was a process that consistently picked the best managers, everybody would use it, causing massive inflows into those funds, driving up the valuation of the assets the managers invest in, and driving down forward expected returns – ultimately hurting investor performance.