• Backtests should be, and frequently are, evaluated with scrutiny and skepticism
  • While hypothetical results – backtested or live – are reported as a single, precise number, investors will have different performance based on
  • In most hypothetical results, this execution factor is often ignored.
  • To more accurately report hypothetical performance, Newfound’s investment team has implemented a more realistic execution model.

Backtests. Can’t live with ‘em, can’t live without ‘em.

On the one hand, it can be helpful to understand what the performance of an investment strategy may have been like in a different market regime.

On the other hand, live performance often fails to live up to the splendor of backtests that just go “up and to the right.”

At Newfound, we believe that backtests, when done appropriately (see our whitepaper Backtesting with Integrity), can be a useful tool for analysis. But it is as much art as science. A backtest can be implemented flawlessly from a technical perspective, but still be meaningless.

Consider: can you use 2015 logic to evaluate a 1975 world?

Simply put: backtesting is complicated. And so is its cousin the “hypothetical live model.” New allocations can be generated in real-time, but if no one trades them, does anyone actually know what the performance would have been?

While we’ve found that more and more advisors have become wise to the sorts of questions they should ask about hypothetical performance, we almost never receive any questions about execution assumptions.

And we think that is an important concept. Because days like Monday, August 24th remind us that implementation can make a massive difference between real performance achieved and the results of a hypothetical index.

For this commentary, we’re going to use a simple tactical model. The model will invest in the S&P 500 when it is above its 200-day moving average and invest in 0%-return cash when it is below. The model will be evaluated weekly based on Friday’s closing price.

So here’s the $1 million question: when our model “trades,” at what price does it do so?

Here’s what we think the absolutely wrong answer is: Friday’s closing price. Yet all too often we see this done in naively implemented backtests. But to achieve this in real life, you’d need a time machine. You’d need to wait until after the close, run your model, get in your time machine, rewind the clock a few hours, and then try to nail the closing price.

At Newfound, we’ve historically taken a different approach. We’ve assumed that any hypothetical models that update using Friday’s closing prices will trade at Monday morning’s opening price. While this isn’t necessarily true (e.g. we often trade after 10AM ET), it is at least possible.

So how would this choice affect our simple tactical S&P 500 model performance?


Annualized, the impact is about 0.33%. But that adds up after a few decades. And showing a backtest of this model using Friday’s closing price will end up overstating hypothetical returns.

But we’re just talking about 0.33%. No big deal right? Is this analysis even worth doing? Let’s cut this another way: how would our performance look if we always executed at the best price on Monday versus if we always got the worst price. Said another way: if we’re doing this in real time, how great could things be or how bad could they get based solely on execution?

For the best price, if we’re buying (selling) we assume we buy (sell) at the low (high). For the worst price, if we’re buying (selling) we assume we buy (sell) at the high (low).


So here’s what we know:

  • Executing at Friday’s close is impossible
  • The price we execute at on Monday really, really matters. In this example, it matters to the tune of 5.66%, annualized.

Yet expecting to get the exact best or the exact worst is unlikely, especially when trading more than one security. And any trader will tell you that you definitely don’t want to trade at the open – bid/ask spreads are too wide and your cost of execution will be too high.

So we’ve implemented a method based on time-weighted average price (TWAP). Now, without intraday tick-by-tick and volume data, this method is still imperfect. But we think it is once again a step closer to realistic.

The method is fairly simple.

On up days, we assume price moves from open to the low, then from the low to the high, then from the high to the close.  We assume that the time spent between these moves is proportional to the distance of the move. We then take the average price of the moves and weight them based on the duration of the move itself. And we end up with a TWAP estimate.

For down days, we use a similar process, but assume price moves from open to high, then from the high to the low, then from the low to the close.

This method gives us a single price that we believe gives a more realistic expectation of execution price.


We can see that changing from Monday’s open price to the TWAP price did not ultimately make that large of a difference for this strategy, but we believe that every step we can take to moving a hypothetical index towards something more realistically achievable by an investor, the better.

From 2012-2019, Justin Sibears served as Managing Director and Portfolio Manager at Newfound Research. At Newfound, Justin was responsible for portfolio management, investment research, strategy development, and communication of the firm's views to clients. Justin holds a Master of Science in Computational Finance and a Master of Business Administration from Carnegie Mellon University as a well as a BBA in Mathematics and Finance from the University of Notre Dame.