The Research Library of Newfound Research

Category: Uncategorized

Machine Learning, Subset Resampling, and Portfolio Optimization

This post is available as a PDF download here

Summary

  • Portfolio optimization research can be challenging due to the plethora of factors that can influence results, making it hard to generalize results outside of the specific cases tested.
  • That being said, building a robust portfolio optimization engine requires a diligent focus on estimation risk. Estimation risk is the risk that the inputs to the portfolio optimization process (i.e. expected returns, volatilities, correlations) are imprecisely estimated by sampling from the historical data, leading to suboptimal allocations.
  • We summarize the results from two recent papers we’ve reviewed on the topic of managing estimation risk. The first paper relies on techniques from machine learning while the second paper uses a form of simulation called subset resampling.
  • Both papers report that their methodologies outperform various heuristic and optimization-based benchmarks.
  • We perform our own tests by building minimum variance portfolios using the 49 Fama/French industry portfolios.  We find that while both outperform equal-weighting on a risk-adjusted basis, the results are not statistically significant at the 5% level.

 

This week, we are going to review a couple of recent papers we’ve come across on the topic of reducing estimation risk in portfolio optimization.

Before we get started, we want to point out that while there are many fascinating papers on portfolio optimization, it is also one of the most frustrating areas to study in our opinion.  Why?  Because ultimately portfolio optimization is a very, very complex topic.  The results will be impacted in significant ways by a number of factors like:

  • What is the investment universe studied?
  • Over what time period?
  • How are the parameters estimated?
  • What are the lookback periods used to estimate parameters?
  • And so on…

Say that you find a paper that argues for the superiority of equal-weighted portfolios over mean-variance optimization by testing on a universe of large-cap U.S. equities. Does this mean that equal-weighting is superior to mean-variance optimization in general?  We tend to believe not.  Rather, we should take the study at face value: equal-weighting was superior to the particular style of mean-variance in this specific test.

In addition, the result in and of itself says nothing about why the outperformance occurred.  It could be that equal-weighting is a superior portfolio construction technique.

But maybe the equal-weighted stock portfolio just happens by chance to be close to the true Sharpe optimal portfolio.  If I have a number of asset classes that have reasonably similar returns, risks, and correlations, it is very likely that equal-weighting does a decent job of getting close to the Sharpe optimal solution.  On the other hand, consider an investment universe that consists of 9 equity sectors and U.S. Treasuries.  In this case, equal-weighting is much less likely to be close to optimal and we would find it more probable that optimization approaches could outperform.

Maybe equal-weighting exposes the stock portfolio to risk-premia like the value and size factors that improve performance.  I suspect that to some extent the outperformance of minimum variance portfolios in a number of studies is at least partially explained by the exposures that these portfolios have to the defensive or low beta factor (the tendency of low risk exposures to outperform high risk exposures on a risk-adjusted basis).

Maybe the mean estimates in the mean-variance optimization are just terrible and the results are less an indictment on MVO than on the particular mean estimation technique used.  To some extent, the difficulty of estimating means is a major part of the argument for equal-weighting or other heuristic or shrinkage-based approaches.  At the same time, we see a number of studies that estimate expected returns using sample means with long (i.e. 5 or 10 year) lookbacks.  These long-term horizons are exactly the period over which returns tend to mean revert and so the evidence would suggest these are precisely the types of mean estimates you wouldn’t want to use.  To properly test mean-variance, we should at least use mean estimates that have a chance of succeeding.

All this is a long-winded way of saying that it can be difficult to use the results from research papers to build a robust, general purpose portfolio optimizer.  The results may have limited value outside of the very specific circumstances explored in that particular paper.

That being said, this does not give us an excuse to stop trying.  With that preamble out of the way, we’ll return to our regularly scheduled programming.

 

Estimation Risk in Portfolio Optimization

Estimation risk is the risk that the inputs to the portfolio optimization process (i.e. expected returns, volatilities, correlations) are imprecisely estimated by sampling from the historical data, leading to suboptimal allocations.

One popular approach to dealing with estimation risk is to simply ignore parameters that are hard to estimate.  For example, the naïve 1/N portfolio, which allocates an equal amount of capital to each investment in the universe, completely foregoes using any information about the distribution of returns.  DiMiguel, Garlappi and Uppal (2007)[1] tested fourteen variations of sample-based mean-variance optimization on seven different datasets and concluded that “…none is consistently better than the 1/N rule in terms of Sharpe Ratio, certainty-equivalent return, or turnover, which indicates that, out of sample, the gain from optimal diversification is more than offset by estimator error.”

Another popular approach is to employ “shrinkage estimators” for key inputs.  For example, Ledoit and Wolf (2004)[2] propose shrinking the sample correlation matrix towards (a fancy way of saying “averaging it with”) the constant correlation matrix.  The constant correlation matrix is simply the correlation matrix where each diagonal element is equal to the pairwise average correlation across all assets.

Generally speaking, shrinkage involves blending an “unstructured estimator” like the sample correlation matrix with a “structured estimator” like the constant correlation matrix that tries to represent the data with few free parameters. Shrinkage tends to limit extreme observations, thereby reducing the unwanted impact that such observations can have on the optimization result.

Interestingly, the common practice of imposing a short-sale constraint when performing mean-variance optimization or minimum variance optimization is equivalent to shrinking the expected return estimates[3] and the covariance estimates[4], respectively.

Both papers that we’ll discuss here are alternate ways of performing shrinkage.

Applying Machine Learning to Reduce Estimation Risk

The first paper, Reducing Estimation Risk in Mean-Variance Portfolios with Machine Learning by Daniel Kinn (2018)[5], explores using a standard machine learning approach to reduce estimation risk in portfolio optimization.

Kinn’s approach recognizes that estimation error can be decomposed into two sources: bias and variance.  Both bias and variance result in suboptimal results, but in very different ways.  Bias results from the model doing a poor job of capturing the pertinent features of the data.  Variance, on the other hand, results from the model being sensitive to the data used to train the model.

To get a better intuitive sense of bias vs. variance, consider two weather forecasters, Mr. Bias and Ms. Variance.  Both Mr. Bias and Ms. Variance work in a town where the average temperature is 50 degrees.  Mr. Bias is very stubborn and set in his ways.  He forecasts that the temperature will be 75 degrees each and every day.  Ms. Variance, however, is known for having forecasts that jump up and down.  Half of the time she forecasts a temperature of 75 degrees and half of the time she forecasts a temperature of 25 degrees.

Both forecasters have roughly the same amount of forecast error, but the nature of their errors are very different.  Mr. Bias is consistent but has way too rosy of a picture of the town’s weather.  Ms. Variance on the other hand, actually has the right idea when it comes to long-term weather trends, but her volatile forecasts still leave much to be desired.

The following graphic from EliteDataScience.com gives another take on explaining the difference between the two concepts.

Source: https://elitedatascience.com/bias-variance-tradeoff

 

When it comes to portfolio construction, some popular techniques can be neatly classified into one of these two categories.  The 1/N portfolio, for example, has no variance (weights will be the same every period), but may have quite a bit of bias if it is far from the true optimal portfolio.  Sample-based mean-variance options, on the other hand, should have no bias (assuming the underlying distributions of asset class returns does not change over time), but can be highly sensitive to parameter measurements and therefore exhibit high variance.  At the end of the day, we are interested in minimum total estimation error, which will generally involve a trade-off between bias and variance.

Source: https://elitedatascience.com/bias-variance-tradeoff

 

Finding where this optimal trade-off lies is exactly what Kinn sets out to accomplish with the machine learning algorithm described in this paper.  The general outline of the algorithm is pretty straightforward:

  1. Identify the historical data to be used in calculating the sample moments (expected returns, volatilities, and correlations).
  2. Add a penalty function to the function that we are going to optimize. The paper discusses a number of different penalty functions including Ridge, Lasso, Elastic Net, and Principal Component Regression.  These penalty functions will effectively shrink the estimated parameters with the exact nature of the shrinkage dependent on the penalty function being used.  By doing so we introduce some bias, but hopefully with the benefit of reducing variance even further and as a result reducing overall estimation error.
  3. Use K-fold cross-validation to fit the parameter(s) of the penalty function. Cross-validation is a machine learning technique where the training data is divided in various sets of in sample and out of sample data.  The parameter(s) chosen will be those that produce the lowest estimation error in the out of sample data.
  4. Using the optimized parameters from #3, fit the model on the entire training set. The result will be the optimized portfolio weights for the next holding period.

Kinn tests three versions of the algorithm (one using a Ridge penalty function, one using a Lasso penalty function, and one using principal component regression) on the following real-world data sets.

  • 20 randomly selected stocks from the S&P 500 (covers January 1990 to November 2017)
  • 50 randomly selected stocks from the S&P 500 covers January 1990 to November 2017)
  • 30 industry portfolios using stocks listed on the NYSE, AMEX, and NASDAQ covers January 1990 to November January 2018)
  • 49 industry portfolios using stocks listed on the NYSE, AMEX, and NASDAQ covers January 1990 to November January 2018)
  • 200 largest cryptocurrencies by market value as of the end of 2017 (if there was ever a sign of a 2018 paper on portfolio optimization it has to be that one of the datasets relates to crypto)
  • 1200 cryptocurrencies observed from September 2013 to December 2017

As benchmarks, Kinn uses traditional sample-based mean-variance, sample-based mean-variance with no short selling, minimum variance, and 1/N.

The results are pretty impressive with the machine learning algorithms delivering statistically significant risk-adjusted outperformance.

Here are a few thoughts/comments we had when implementing the paper ourselves:

  1. The specific algorithm, as outlined in the paper, is a bit inflexible in the sense that it only works for mean-variance optimization where the means and covariances are estimated from the sample. In other words, we couldn’t use the algorithm to compute a minimum variance portfolio or a mean-variance portfolio where we want to substitute in our own return estimates.  That being said, we think there are some relatively straightforward tweaks that can make the process applicable in these scenarios.
  2. In our tests, the parameter optimization for the penalty functions was a bit unstable. For example, when using the principal component regression, we might identify two principal components as being worth keeping in one month and then ten principal components being worth keeping in the next month.  This can in term lead to instability in the allocations.  While this is a concern, it could be dealt with by smoothing the parameters over a number of months (although this introduces more questions like how exactly to smooth and over what period).
  3. The results tend to be biased towards having significantly fewer holdings than the 1/N benchmark. For example, see the righthand chart in the exhibit below.  While this is by design, we do tend to get wary of results showing such concentrated portfolios to be optimal especially when in the real world we know that asset class distributions are far from well-behaved.

 

Applying Subset Resampling to Reduce Estimation Error

The second paper, Portfolio Selection via Subset Resampling by Shen and Wang (2017)[6], uses a technique called subset resampling.  This approach works as follows:

  1. Select a random subset of the securities in the universe (e.g. if there are 30 commodity contracts, you could pick ten of them).
  2. Perform the portfolio optimization on the subset selected in #1.
  3. Repeat steps #1 and #2 many times.
  4. Average the resulting allocations together to get the following result.

The table below shows an example of how this would work for three asset classes and three simulations with two asset classes selected in each subset.

One way we can try to get intuition around subset resampling is by thinking about the extremes.  If we resampled using subsets of size 1, then we would end up with the 1/N portfolio.  If we resampled using subsets that were the same size as the universe, we would just have the standard portfolio optimized over the entire universe.  With subset sizes greater than 1 and less than the size of the whole universe, we end up with some type of blend between 1/N and the traditionally optimized portfolio.

The only parameter we need to select is the size of the universe.  The authors suggest a subset size equal to n0.8 where n is the number of securities in the universe.  For the S&P 500, this would correlate to a subset size of 144.

The authors test subset resampling on the following real-world data sets.

  • FF100: 100 Fama and French portfolios spanning July 1963 to December 2004
  • ETF139: 139 ETFs spanning January 2008 to October 2012
  • EQ181:  Individual equities from the Russell Top 200 Index (excluding those stocks with missing data) spanning January 2008 to October 2012
  • SP434:  Individual equities from the S&P 500 Index (excluding those stocks with missing data) spanning September 2001 to August 2013.

As benchmarks, the authors use 1/N (EW); value-weighted (VW); minimum-variance (MV); resampled efficiency (RES) from Michaud (1989)[7]; the two-fund portfolio (TZT) from Tu and Zhou (2011)[8], which blends 1/N and classic mean-variance; the three-fund portfolio (KZT) from Kan and Zhou (2007)[9] which blends the risk-free asset, classic mean-variance, and minimum variance; the four fund portfolio (TZF) from Tu and Zhou (2011) which blends KZT and 1/N; mean-variance using the shrinkage estimator from Ledoit and Wolf (2004) (SKC); and on-line passive aggressive mean reversion (PAMR) from Li (2012)[10].

Similar to the machine learning algorithm, subset resampling does very well in terms of risk-adjusted performance.  On three of the four data sets, the Sharpe Ratio of subset resampling is better than that of 1/N by a statistically significant margin.  Additionally, subset resampling has the lowest maximum drawdown in three of the four data sets.  From a practical standpoint, it is also positive to see that the turnover for subset resampling is significantly lower than many of the competing strategies.

 

As we did with the first paper, here are some thoughts that came to mind in reading and re-implementing the subset resampling paper:

  1. As presented, the subset resampling algorithm will be sensitive to the number and types of asset classes in an undesirable way. What do we mean by this?  Suppose we had three uncorrelated asset classes with identical means and standard deviations.  We use subset resampling with subsets of size two to compute a mean-variance portfolio.  The result will be approximately 1/3 of the portfolio in each asset class, which happens to match the true mean-variance optimal portfolio.  Now we add a fourth asset class that also has the same mean and standard deviation but is perfectly correlated to the third asset class.  With this setup, the third and fourth asset classes are one in the same.  As a result, the true mean-variance optimal portfolio will have 1/3 in the first and second asset classes and 1/6 in the third or fourth asset class (in reality the solution will be optimal as long as the allocations to the third and fourth asset classes sum to 1/3).  However, subset resampling will produce a portfolio that is 25% in each of the four asset classes, an incorrect result.  Note that this is a problem with many heuristic solutions, including the 1/N portfolio.
  2. There are ways that we could deal with the above issue by not sampling uniformly, but this will introduce some more complexity into the approach.
  3. In a mean-variance setting, the subset resampling will dilute the value of our mean estimates. Now, this should be expected when using any shrinkage-like approach, but it is something to at least be aware of. Dilution will be more severe the smaller the size of the subsets.
  4. In terms of computational burden, it can be very helpful to use some “smart” resampling that is able to get a representative sampling with fewer iterations that a naïve approach. Otherwise, subset resampling can take quite a while to run due to the sheer number of optimizations that must be calculated.

Performing Our Own Tests

In this section, we perform our own tests using what we learned from the two papers.  Initially, we performed the test using mean-variance as our optimization of choice with 12-month return as the mean estimate.  We found, however, that the impact of the mean estimate swamped that of the optimizations.  As a result, we repeated the tests, this time building minimum variance portfolios.  This will isolate the estimator error relating to the covariance matrix, which we think is more relevant anyways since few practitioners use sample-based estimates of expected returns. Note that we used the principal component regression version of the machine learning algorithm.

Our dataset was the 49 industry portfolios provided in the Fama and French data library. We tested the following optimization approaches:

  • EW: 1/N equally-weighted portfolio
  • NRP: naïve risk parity where positions are weighted inversely to their volatility, correlations are ignored
  • MV: minimum variance using the sample covariance matrix
  • ZERO: minimum variance using sample covariance matrix shrunk using a shrinkage target where all correlations are assumed to be zero
  • CONSTANT: minimum variance using sample covariance matrix shrunk using a shrinkage target where all correlations are equal to the sample pairwise correlation across all assets in the universe
  • PCA: minimum variance using sample covariance matrix shrunk using a shrinkage target that only keeps the top 10% of eigenvectors by variance explained
  • SSR: subset resampling
  • ML: machine learning with principal component regression

The results are presented below:

Results are hypothetical and backtested and do not reflect any fees or expenses. Returns include the reinvestment of dividends. Results cover the period from 1936 to 2018. Past performance does not guarantee future results.

 

All of the minimum variance strategies deliver lower risk than EW and NRP and outperform a risk-adjusted basis although none of the Sharpe Ratio differences are significant at a 5% confidence level. Of the strategies, ZERO (shrinking with a covariance matrix that assumes zero correlation) and SSR (subset resampling) delivered the highest Sharpe Ratios.

 

Conclusion

Portfolio optimization research can be challenging due to the plethora of factors that can influence results, making it hard to generalize results outside of the specific cases tested.  It can be difficult to ascertain whether the conclusions are truly attributable to the optimization processes being tested or some other factors.

That being said, building a robust portfolio optimization engine requires a diligent focus on estimation risk.  Estimation risk is the risk that the inputs to the portfolio optimization process (i.e. expected returns, volatilities, correlations) are imprecisely estimated by sampling from the historical data, leading to suboptimal allocations.

We summarize the results from two recent papers we’ve reviewed on the topic of managing estimation risk.  The first paper relies on techniques from machine learning to find the optimal shrinkage parameters that minimize estimation error by acknowledging the trade-off between bias and variance.  The second paper uses a form of simulation called subset resampling.  In this approach, we repeatedly select a random subset of the universe, optimize over that subset, and then blend the subset results to get the final result.

Both papers report that their methodologies outperform various heuristic and optimization-based benchmarks.  We feel that both the machine learning and subset resampling approaches have merit after making some minor tweaks to deal with real world complexities.

We perform our own tests by building minimum various portfolios using the 49 Fama/French industry portfolios.  We find that while both outperform equal-weighting on a risk-adjusted basis, the results are not statistically significant at the 5% level.  While this highlights that research results may not translate out of sample, this certainly does not disqualify either method as potentially being useful as tools to manage estimation risk.

 

 

[1] Paper can be found here: http://faculty.london.edu/avmiguel/DeMiguel-Garlappi-Uppal-RFS.pdf.

[2] Paper can be found here: http://www.ledoit.net/honey.pdf

[3] DiMiguel, Garlappi and Uppal (2007)

[4] Jagannathan and Ma (2003), “Risk reduction in large portfolios: Why imposing the wrong constraints helps.”

[5] Paper can be found here: https://arxiv.org/pdf/1804.01764.pdf.

[6] Paper can be found here: https://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14443

[7] Paper can be found here: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2387669

[8] Paper can be found here: https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=2104&context=lkcsb_research

[9] Paper can be found here: https://www.cambridge.org/core/journals/journal-of-financial-and-quantitative-analysis/article/optimal-portfolio-choice-with-parameter-uncertainty/A0E9F31F3B3E0873109AD8B2C8563393

[10] Paper can be found here: http://research.larc.smu.edu.sg/mlg/papers/PAMR_ML_final.pdf

 

A Closer Look At Growth and Value Indices

In a commentary a few weeks ago entitled Growth Is Not “Not Value,” we discussed a problem in the index construction industry in which growth and value are often treated as polar opposites. This treatment can lead to unexpected portfolio holdings in growth and value portfolios. Specifically, we may end up tilting more toward shrinking, expensive companies in both growth and value indices.

2D Quadrants - What we're really getting

The picture of what we want for each index looks more like this:

2D Quandrants - What we want

The overlap is not a bad thing; it simply acknowledges that a company can be cheap and growing, arguably a very good set of characteristics.

A common way of combining growth and value scores into a single metric is to divide growth ranks by value ranks. As we showed in the previous commentary, many index providers do something similar to this.

Essentially this means that low growth gets lumped in with high value and vice versa.

But how much does this affect the index allocations? Maybe there just are not many companies that get included or excluded based on this process.

Let’s play index provider for a moment.

Using data from Morningstar and Yahoo! Finance at the end of 2015, we can construct growth and value scores for each company in the S&P 500 and see where they fall in the growth/value planes shown above.

To calculate the scores, we will use an approach similar to the one in last commentary where the composite growth score is the average of the normalized scores for EPS growth, sales growth, and ROA, and the composite value score is the average of the normalized scores for P/B, P/S, and P/E ratios.

The chart below shows the classification when we take an independent approach to selecting growth and value companies based on those in the top third of the ranks.2D Sort Growth and Value

In each class, 87% of the companies were identified as only being growth or value while 13% of companies were included in both growth and value.

The next chart shows the classifications when we use the ratio of growth to value ranks as a composite score and again select the top third.1D Sort Growth and Value

Relative to what we saw previously, growth and value now extend further into the non-value (expensive) and non-value (cheap) realms of the graph, respectively.

There is also no overlap between the two categories, but we are now missing 16% of the companies that we had identified as good growth or value candidates before. On the flip side, 16% of the companies we now include were not identified as growth or value previously in our independent sort.

If we trust our independent growth and value ranking methodologies, the combined growth and value metric leaves out over a third of the companies that were classified as both growth and value. These companies did not appear in either index under the combined scoring scheme.

With the level of diversification in some of these indices, a few companies may not make or break the performance, but leaving out the top ones defeats the purpose of our initial ranking system. As with the NCAA March Madness tournament (won by Corey with a second place finish by Justin), having a high seed may not guarantee superior performance, but it is often a good predictor (since 1979, the champion has only been lower than a 3 seed 5 times).

Based on this analysis, we can borrow the final warning to buyers from the previous commentary:

“when you’re buying value and growth products tracking any of these indices, you’re probably not getting what you expect – or likely want.”

… and say that the words “probably” and “likely” are definitely an understatement for those seeking the best growth and value companies based on this ranking.

The Luck of the Rebalance Timing

One of the biggest hurdles in executing tactical models is when to rebalance.  When a signal changes?  Weekly?  Monthly?  The choices can have a dramatic effect upon strategy results: the more timely the rebalance to the signal, the more of the movement that tends to be captured — but the more whipsaw and trading costs that are generally incurred.

While we believe our model of dynamic, volatility-adjusted momentum is a more efficient method of capturing momentum opportunities, rebalance and timing discussions are still relevant in overall portfolio composition.

I wanted to dig in to this issue and show how the decision of when to rebalance can make an incredible difference in long-term performance.

To examine the effects, I chose to play with one of the more famous tactical risk management models: Mebane Faber’s 10-month simple moving average timing model, popularized in his 2006 paper “A Quantitative Approach to Tactical Asset Management“.  In the paper, Faber utilizes a simple methodology for determining whether an asset was eligible for inclusion in the portfolio based on whether it is above or below its 10-month moving average.

One of the issues is that in using the 10-month moving average, Faber’s model implicitly trades on the first day of each month.  But what happens if we rebalance the 2nd day, or the 3rd?  The 15th?  Did choosing the 1st day end up materially changing the results?

In the interest of simplicity, I decided to model months as 21-day periods, and compared 21 different strategies using 1-day offsets, running the model on the S&P 500 ETF “SPY”.  Each strategy rebalanced every 21 days; the 21st strategy rebalanced 20 days after (or, 1 day before, depending on your perspective) the 1st strategy.  Signals occurred after close and trading occurred at the next opens.  No trading costs or slippage effects were estimated.

The results are interesting to say the least.  Strategies for days 19 and 20 highlight the difference a single day can make:comparisonA single day changed the max drawdown from 19.03% to a 30.22%; annualized returns drops from 11.33% to 10.51%.  The full performance results for each strategy can be seen below:

tactical-timing-performance

While overall volatility levels remain fairly consistent, there is a 25,425bp spread between the total return for the best and worst returning strategies (717.14% and 462.89% respectively).

Obviously, when you chose to rebalance can have a huge impact on the whipsaw you incur.

So how can we fix this?  Well, one of the ways is to put 1/21st of our portfolio in each of these strategies — rebalancing 1/21st of our portfolio every day — and rebalancing back to equal-weight at the beginning of every year.  The results?

  • A total return of 625% (an annualized return of 10.98%)
  • Annualized volatility of 13.37%
  • A max drawdown of -19.03%

Now this analysis doesn’t take into account trading costs — but since we are rebalancing only 1/21st of our portfolio every day, the total turn-over ends up nearly identical to the turn-over from the original strategy.  It’s certainly a bit more work — but it also helps limit the impact of choosing the wrong date to rebalance.

By being smart about when we choose to rebalance, and how we choose to rebalance, we can remove the “luck of the timing” — be it good or bad — from our strategy and capture the pure quantitative effects.

As January goes, so goes the year?

At Newfound, we are strong proponents of rules-based investing. However, rules-based investing in and of itself is not a panacea. The best rules will be defensible both in theory and in practice and be robust to dynamic market environments.

The following chart shows for each month the percentage of times that the sign of that month’s S&P 500 return matched the sign of the return for the period starting in the beginning of that month and ending one year later.

For example, the January figure means that starting in 1950, 69.8% of the time the sign of the return from January 1st to February 1st of that year matched the sign of the return from January 1st of that year to January 1st of the next year.

MonthPercent
January69.8%
February63.5%
March73.0%
April58.7%
May65.1%
June61.9%
July54.0%
August55.6%
September52.4%
October65.1%
November65.1%
December76.2%

What can we learn from this data? March and December returns seem to have done a better job than January of predicting the return for the following one year period. However, we need to dig deeper to see if these statistics are meaningful both in theory and in practice.

From a theoretical perspective, if we make some simplifying assumptions about the distribution of S&P 500 returns then we can explicitly compute the values in the above table. For the following discussion, we assume:

  • Returns are normally distributed
  • Monthly returns are i.i.d. (the distribution of each monthly return is identical and then return in one month does not affect the returns of subsequent months)
  • Annual S&P return has mean of 7% and volatility of 15%

If January’s return is very slightly positive, the probability of a positive annual return is 67.2%. If January’s return is 2.0%, the probability of a positive annual return increases to 72.1%. If January’s return is 5.0%, the probability of a positive annual return increases further to 78.7%.

The chart below shows the probability of a positive annual return given various January returns.

post

This illustrates that the historical data backing the heuristic that as goes January, so goes the year is an expected statistical artifact and provides no basis for generating value as an investment strategy. Strong market performance in January does not cause strong market performance in the following eleven months. Instead, strong market performance in January simply makes it more likely that the full twelve month return is positive in the same way that the team winning a football game at the end of the third quarter has a better chance of winning the game. Strong January returns give the full year return a head start, providing no forward looking information that can be used to trade profitably.

We can go a step further to evaluate the practical value of the heuristic by examining the performance of a related trading strategy. Consider the following strategies:

  • Strategy A: Hold a 100% long position in the S&P 500
  • Strategy B: Go long the S&P 500 in January every month. If the return is positive, go long the S&P 500 for the remainder of the year, otherwise go short.

Strategy B, based on the January heuristic, underperformed both on an absolute return basis and a risk-adjusted return basis.

MetricStrategy AStrategy B
Return7.1%5.2%
Volatility15.4%16.5%
Return/Vol0.460.32

1) Is there economic/financial rationale for why the heuristic holds?When evaluating a potential trading heuristic, it is always useful to ask these questions:

2) Can the supporting data be explained statistically or is it truly an outperformance opportunity?

3) How would a trading strategy based on the heuristic have performed historically? If it has performed well, what are future market scenarios that could pose risks to its continued success and what are the magnitudes of these risks?

For another take on the January effect using conditional probabilities, see our weekly commentary.

Powered by WordPress & Theme by Anders Norén