This post is available as a PDF download here.
- As a firm, Newfound Research focuses on tactical allocation strategies.
- However, we also spend time researching other mandates – such as systematic value – in an effort to introduce lateral thinking to our process.
- Three years ago, we built a systematic value portfolio that seeks to create a “style pure” investment result, attempting to diversify process specification and timing luck risk.
- This commentary introduces our process and the philosophy behind it.
- We conclude by comparing three years of returns against popular systematic value ETF implementations and find that our performance results closely track the median monthly performance, indicating that the ensemble approach may accrue the potential diversification benefits of a multi-manager implementation.
Earlier this week, Tom Psarofagis, ETF Analyst at Bloomberg, highlighted the holdings difference between the new iShares Focused Value ETF (FOVL) versus other concentrated value funds such as the Alpha Architect Quantitative Value ETF (QVAL) and the Deep Value ETF (DVP). Embedded in his analysis was the following graph, which highlights the dramatic sector weight differences between the three products. While QVAL and DVP have over 40% of their portfolio weight in Consumer Discretionary stocks, FOVL has over 40% in Financials.
This analysis raises a very reasonable question: if these ETFs all claim to pursue the same objective, how can they have such different portfolios? We might even take a full step back and ask ourselves, “what is value?”1
Following on Tom’s analysis, Justin Carbonneau at Validea further highlighted this conundrum with his recent post Differences in Value. In the post, Justin evaluates nine different value portfolios that seek to track the methods of famous investors such as Benjamin Graham, Ken Fisher, Jim O’Shaughnessy, and Joel Greenblatt.
Justin walks through the portfolios using a six question framework for evaluating a value strategy proposed earlierby his co-founder Jack Forehand. The six questions are as follows:
- What is the universe?
- Which metrics are used?
- How many stocks are held?
- How often is the portfolio rebalanced?
- How does the portfolio handle industry concentration?
- How are position sizes determined?
We believe that these questions capture a sufficient cross-section of the portfolio construction choices that lead to different portfolio results. Ultimately, Tom and Justin’s analysis captures the notion of what we at Newfound call “specification versus style.” While all the strategies aim to capture the style of value investing, the specific means by which they go about pursuing that objective will create tracking error around the median result.
Below we plot the growth of $1 in 10 different U.S. large-cap value ETFs since their common inception date. We can see that while there are significant relative deviations, the portfolios track a common beta, driven both by broader equity market exposure and the general value style. We would argue that the unpredictable performance dispersion justifies the exploration of a multi-manager approach. This is especially true when we consider that it is impossible to know which, if any, method actually captures the “true” value premium.
Source: CSI. Calculations by Newfound Research. Returns are gross of all fees, transaction costs, and taxes with the exception of underlying ETF expense ratios. Returns assume the reinvestment of all distributions.
At Newfound, we believe that investors are best served by first focusing on the management of risk rather than the pursuit of excess returns. Further, we believe the best opportunities for creating differentiated returns is at the asset allocation level.
Nevertheless, we’ve spent quite a bit of time writing about value in the past. In fact, three years ago we even asked ourselves the question, “if we were tasked to build a systematic value portfolio, how would we do it?”
So, we thought about it. Then we did it.
While this is a meaningful departure from our usual tactical asset allocation mandates, we believe that projects such as these can inspire creativity and insights through lateral thinking. By building and managing a value portfolio, we may solve problems that can directly influence the way we manage other mandates.
Our objective in this project was to simultaneously create a deep value portfolio, but do so in a “style pure” manner that reduced specification risk potentially associated with employing a single measure or methodology. In line with our past research, we aim to do this by diversifying across the three axes of what,how, and when.
In the remainder of this commentary, we will explore how we thought about building and managing the portfolio using the framework of questions outlined by Validea.
What is the Universe?
While we believe that there is a larger opportunity to harvest style premia in small- and micro-cap securities, we elected to use the S&P 500 as our starting point. We did this for two primary reasons.
First, implementation in the small- and micro-cap universes requires far more effort in trade execution. Trading can have a significant negative impact on less liquid securities and illiquidity can even serve as a limit to arbitrage, potentially causing phantom value signals. As the objective of this research project was to create a style pure portfolio, we wanted to limit the potential negative impacts of execution and reduce operational burden of tracking this portfolio over time.
Secondly, as a firm that focuses on tactical asset allocation, we expect that our data for individual securities is dirtier than that of a firm that focuses specifically on building equity portfolios. We expected that data for well-followed large-cap securities was likely to be cleaner than for small- and micro-cap stocks. Therefore, while focusing on the large-cap universe likely reduces the potential premium earned by a systematic value approach, it also potentially limits risk from dirty data.
Which Metrics are Used?
There is no shortage of systematic value measures. A quick survey of popular value indices and ETFs demonstrates the breadth of measures used in practice, including:
- Book value to price
- Forward earnings to price
- Dividend to price
- Operating cash flow to price
- EBITDA to enterprise value
- Sales to price
Historically, value investing has worked regardless of the metric used. In the short run, however, specific choices can lead to significant performance deviations.
Using top-quintile performance data from the Kenneth French data library, we plot rolling 3-year annualized returns for portfolios formed on book-to-market, cash-flow-to-price, dividend yield, and earnings yield versus the cross-sectional average 3-year return. We can see that each metric spends time in-and-out of favor relative to the other metrics.
Source: Kenneth French Data Library. Calculations by Newfound Research. Returns are hypothetical and backtested. Returns are gross of all fees, transaction costs, and taxes. Returns assume the reinvestment of all distributions. You cannot invest in an index.
Without a particular view as to which metric will necessarily perform best going forward, we elected to use a composite of metrics. Much like asset diversification, ensemble approaches can help us hedge our uncertainty related to specification risk.
We categorize metrics into four broad categories: book, earnings, sales, and cash flow. Based upon our research, we viewed each of these categories as providing a different, albeit biased, perspective as to whether a security was cheap or expensive.
For each category, we identify a number of metrics that may apply. We then construct a composite score for each security by z-scoring each individual metric, averaging the z-scores at the category level, and then averaging the category z-scores.
While there is room for improvement here (e.g. removing price-to-book based upon evidence that it is no longer a meaningful value signal or exploring a sector-specific weighting of signals), we believe that the broad cross-section of signals helps meaningfully reduce single-signal bias without introducing additional biases by selectively weighting certain signals.
“Average of Portfolios” versus “Portfolio of Average Signals”
Historically, we have argued that ensembles should be implemented as averages of portfolios and not averages of signals. This argument arises from the math of Jensen’s Inequality, which says that the expectation of a nonnlinear function applied is not necessarily equal to the function applied to the expectation.
As a simple example, consider two signals that are used to make all-in or all-out calls on the market. The signals range from 0 to 100 and if the score is above 50, we invest. If each of these signals is used to build a portfolio and we then average the portfolio weights, the resulting allocations can be all-out, all-in, or fifty-fifty exposed. However, if we first average the signals together and then apply our rule, we can only be all-out or all-in.
We would argue for the former approach rather than the latter as we believe it represents diversification of signals whereas the latter simply represents the construction of a single, new signal.
This logic would lead us to believe that the appropriate way to construct our value portfolio would be to build portfolios based upon each metric and then average them together (or average them within each category and then average the categories). So why do we not take this approach?
To be clear, we do not have any empirical evidence to support our decision. Our hypothesis, however, is that the individual bias introduced by each signal may overwhelm the process if the “average of portfolios” approach is used rather than the “portfolio of average signals” approach. The former method may simply lead to a portfolio of securities that are identified as cheap only by one signal, whereas we are specifically trying to employ a composite approach to benefit from multiple definitional perspectives of value.
Cross-Market versus Cross-Industry
The academic implementation of the value-factor ranks stocks based upon their book-to-price metric and market-capitalization-weights the cheapest decile (or quintile) of securities. Doing so, however, can lead to significant sector tilts, as certain sectors may have structurally lower valuation metrics.
One solution to this problem is to measure valuation relative to other like securities. For example, instead of calculating cross-market scores, cross-industry-group scores could be employed. In Predicting Stock Returns Using Industry-Relative Firm Characteristics, Asness, Porter, and Stevens find that evaluating a firm’s characteristics versus industry average characteristics provides “more precise estimates” than the traditional cross-market approach.
In Are Factor Investors Getting Paid to Take on Industry Risk?, Bryan and McCullough find that while the traditionally built value portfolio expresses meaningful sector tilts, they are not additive to performance and can be neutralized by investors to reduce risk.
This is further confused by the fact that the sector tilts created by unconstrained value implementation can vary significantly by the valuation metric utilized. Price-to-book, for example, tends to dramatically overweight Financials, whereas EV-to-EBITDA overweights Consumer Discretionary.
This may all point to an argument supporting an industry-neutral implementation (or, at the very least, potentially support an ensemble approach to help neutralize structural industry biases that arise with a given measure).
Further, one strong argument towards cross-industry scoring is that it can serve as a mechanism for regularization of the already fuzzy heuristic measures we are employing.
The downside to a purely industry-neutral implementation, however, is that it can be highly susceptible to industry bubbles. During the late 1990s, for example, an industry-neutral value portfolio would still hold a significant allocation to technology companies. In most environments, this may not matter. In some environments, however, an unconstrained industry approach may be a significant boon to risk management.
With strong arguments supporting both approaches, we calculate both cross-market and a cross-industry score for each security and combine them. Note that this approach can still leave us with significant industry tilts. We’ll address this in a later section.
How Many Stocks are Held?
Once valuation scores are calculated, we screen our universe down to the cheapest quintile (approximately 100 securities) with a goal of eventually creating a deep value portfolio from approximately 1/10th of the securities in our universe.
Securities falling in the cheapest are quality scored. Our goal in using quality scores is to focus the portfolio on mis-pricings; i.e. identify deep-value stocks whose valuations are not justified by their poor quality. We believe that such an approach is supported by the empirical evidence suggesting that the value premium emerges from the re-valuation of mis-priced securities.
Like our valuation model, our quality score model is a composite of categories. The categories are informed by the elements of a Gordon Growth Model of security pricing.
We combine value and quality scores and re-rank, buying a number of stocks equal to 10% of our investable universe (approximately 50). However, we do not simply buy the top ranked stocks, as our selection simultaneously seeks to manage industry concentration risk (more on this point below).
We should note that the momentum measure is not incorporated in the overall quality score. Rather, if a stock has been selected for purchase but is exhibiting extreme negative relative momentum, we will defer its purchase to a later date.
How Often is the Portfolio Rebalanced?
Frequent readers will know that rebalance timing luck is an obsession of ours here at Newfound. So much so, we believe when a portfolio rebalances represents a unique axis of diversification.
To distill the potential problem, consider two deep value portfolios that only rebalance on an annual basis. If one portfolio rebalances in January and one rebalances in July, the holdings may be dramatically different even if the portfolios are constructed with the same process. This represents timing risk due to the opportunities present at the time of rebalancing.
This can lead to non-trivial performance dispersion. For example, in Fundamental Indexation: Rebalancing Assumptions and Performance, Blitz, van der Grient, and van Vliet demonstrate that a fundamental index rebalanced every March outperformed a capitalization-weighted benchmark in 2009 by over 10%, while the same methodology rebalanced in September underperformed.
In building our systematic value portfolio, we wanted to specifically address this risk. To do so, we implement a method we call “tranching.” Each month, we take 1/60thof our capital and purchase a deep value portfolio. We then hold that tranche for 5 years, at which point it is liquidated. At any given point we hold 60 tranches.
The purpose of this approach is two-fold.
First, we are able to significantly reduce the timing luck associated with a deep value signal sampled at a given time. In many ways, our approach is akin to how many investors think about managing market cycle risk to private equity funds. Without any particular view as to where we sit in the market cycle, investors often try to deploy their capital to multiple private equity funds over time, implicitly diversifying market cycle risk. Here we adopt the same approach, but do so to diversify value signal timing risk.
Secondly, we align our holding period with the decay speed of our value signal, giving our holdings ample time to not only be re-valued by the market, but also to benefit from the underlying growth potential of the companies, which we purchased at a hopefully unreasonably cheap price.
A potential downside of this approach is that securities which are deeply undervalued only for a brief period will only ever represent a small contribution to returns. On the other hand, securities that are undervalued for a longer period of time will have position sizes built up within multiple tranches. This allows us the potential benefit of increasing our exposure over time if a position continues to cheapen. The net effect is that we are patient allocators, with both signal strength and signal duration serving as meaningful contributors to position size.
Blindly holding each tranche for 5 years, however, may lead to a dilution of value characteristics within the portfolio. For example, what happens if valuations of securities in older tranches normalize faster than expected? To account for this, each month we evaluate all the securities in our portfolio and sell anything with a valuation score above the median score for the universe.
Below we plot the average value percentile score by tranche. We can see that older tranches have, on average over time, had lower average valuation scores than newer tranches. This implies that securities purchased are indeed re-valuing upward relative to peers (though we should acknowledge that this may not necessarily be due to price appreciation but could simply be due to deteriorating fundamentals).
We can also plot the average portfolio allocation to each tranche over time. We can see that the portfolio tends to heavily tilt towards more recently created tranches, with securities in older tranches being removed over time as they exceed the median universe valuation score.
Using this “portfolio-of-portfolios” approach to manage timing luck, the number of holdings at any given time can exceed our target of 10% of universe securities. In fact, it has historically hovered around 140, which is far higher than the approximate 50 we would expect. However, as we do not equally weight securities and tranches tend to get smaller over time, the concentration coefficient of the portfolio hovers closer to 55, indicating a more concentrated portfolio from an allocation perspective. The top 10 holdings currently account for over 25% of the portfolio weight, and the top 50 holdings account for just over 70% of the portfolio weight.
How Does the Portfolio Handle Industry Concentration?
As discussed above, we believe that industry bets represent a largely non-compensated source of risk within a value portfolio during most market environments. “Most” being the operative word in the prior sentence, as meaningfully avoiding technology during the dot-com era or financials during the 2008 crisis would have created meaningful relative outperformance.
Therefore, our goal was to retain the optionality for industry divergence while remaining mostly industry neutral during most periods. Note that we specifically seek to be industry neutral, not sector neutral. While industry neutrality implies sector neutrality, the reverse is not true as some sectors may have very diverse industries within them.
We begin by calculating the target relative industry weights based upon current market capitalizations. Assuming we will pick N securities for our portfolio, we run the following algorithm:
- Identify the industry that has both (1) selectable securities remaining and (2) is currently furthest away from its target weight.
- Select the highest ranked security in this industry.
- Increase the industry weight by 1/N.
- Remove the security from the list of eligible securities.
- Repeat until N securities have been identified.
With this algorithm, there are two particular things to note.
First, this process will only approximate an industry neutral implementation assuming N is sufficiently large and we actually implement with an equal-weight portfolio. In our case, N is approximately 50 and, as we will discuss in the next section, we do not equally-weight our portfolio. Thus, even when all industries are fairly represented, we should still expect some deviation.
Second, the algorithm will always target an industry neutral implementation assuming there are sufficient eligible securities to build one. The flexibility to avoid an industry enters into play when an industry does not have a sufficient number of constituents falling in the top quintile of value scores.
Source: Bloomberg. Calculations by Newfound Research.
How are Position Sizes Determined?
Once we have selected our securities, the final step of the process is to construct our portfolio. Each technique for portfolio construction will inherently introduce its own assumptions, biases, estimation risks, and implicit resulting portfolio characteristics. It should come as no surprise that we embrace an ensemble approach.
Specifically, we create six different portfolios using different construction techniques and average their results together. Each technique makes various assumptions about what we know as well as our model of the relationship between risk and return. The techniques are:
- Equal Weight: Assumes we know nothing about an individual security’s expected return or covariance with any other security. Optimal if returns and covariances are identical.
- Inverse Volatility: Assumes we have a view on security volatilities but not correlations. Optimal if excess returns are proportional to volatility (i.e. equal Sharpe ratios) and correlations are homogenous.
- Minimum Variance Portfolio: Assumes we have a view on volatility and correlation but no view on returns.Assumes markets are not risk-efficient and is optimal if expected returns are similar.
- Maximum Sharpe Portfolio: Assumes a view on volatility, correlation, and returns. Specifically, we assume returns are proportional to downside deviation.
- Quality-Score Weighted: Optimal if Sharpe ratios are linearly proportional to the return and covariances are homogenous.
- Value-Score Weighted: Optimal if Sharpe ratios are linearly proportional to the return and covariances are homogenous.
In equally-weighting the portfolios constructed by these six techniques, we aim to diminish strategy-specific risks.
We set out on this project to develop a “style pure” value strategy that would diversify across process and timing specification risk. As we approach three years of live performance, we can now attempt to quantify the success of our project.
Success, in this case, is not relative outperformance. We did not set out to “fix” value; rather, we set out to avoid the potential downside risks that might come from selecting an individual measure or methodology. Of course, avoiding this risk also means knowingly foregoing any potential benefit from that selection.
This means we explicitly do not want to see outlier performance, whether positive or negative, relative to a distribution of other value funds. The purpose of an ensemble approach is that we should specifically reduce the impact of specification-driven outlier events. In many ways, we can think of it as a virtual fund-of-funds.
Using a sample of U.S. value ETFs, we plot the minimum, maximum, and median monthly return as well as the monthly return for our Systematic Value strategy.
Source: CSI. Calculations by Newfound Research. Returns represent live strategy results. Returns for the Newfound Systematic Value strategy are gross of all management fees and taxes, but net of execution fees. Returns for ETFs included in study are gross of any management fees, but net of underlying ETF expense ratios. Returns assume the reinvestment of all distributions. ETFs included in the study are (in alphabetical order): DVP, FTA, IWD, PWV, QVAL, RPV, SCHV, SPVU, VLUE, and VTV.
We can see that our portfolio closely tracks the median result2, indicating that the process appears to provide access to the style of systematic value without necessarily inviting the specification risks that might go along with picking just one process. Thus, the potential benefits that typically accrue to a multi-manager implementation may be achievable with an ensemble approach.
Ultimately, if you feel confident about a specific measure of value, this type of strategy will not be for you. There are a number of ETFs that track indices constructed with much more concentrated approaches that can align with your philosophy.
But if you do not know which flavor of value will be favored over the short-term and want to hedge against that risk, diversifying specification and timing risk can make sense.
Earning the value premium requires bearing some risk. And as the last decade has shown, this can lead to long periods of underperformance relative to the broader market. It is the uncompensated risks that can compound this underperformance, and these are the risks that our Systematic Value portfolio seeks to mitigate.