Investors utilize a variety of performance and risk metrics to evaluate strategies. These numbers provide a summary of what happened to the strategy historically and can be useful to quickly compare different strategies. To use these statistics effectively, it is helpful to look at some of the nuances of those frequently cited and cases where the information they provide could be misleading.
Many of the common metrics can be classified in ways that are similar to quantities we use to describe the world around us: temperature, speed, weight, voltage, etc. These classifications add context to what is being described based on how it is calculated and the information it contains.
In finance, one typical summary statistic is the annualized return of a strategy. To calculate this, all we need is the starting an ending point; what happened in between is irrelevant. Much like average speed simply uses the total time and distance traveled, annualized return smooths over any intermediate details. This is somewhat similar to a state variable in physics, such as temperature change, entropy, and internal energy, which depends only on the initial and final states.
If it were as simple as that, the two strategies shown below would be equivalent, but even a novice investor would likely choose to have owned strategy A.
Therefore, we look at metrics like annualized volatility, which incorporates the individual realized returns over a time period. We could call volatility a path dependent metric, much like mechanical work is in thermodynamics. It is a quantity that is likely to change if your “route” changes. However, annualized volatility only depends on what returns were realized, not in what order they came. This also applies to the Sharpe and Sortino ratios. To illustrate this concept, the following simulated paths both have the same realized volatility.
To differentiate between these two strategies using summaries statistics, we must capture the sequence of the returns. Maximum drawdown does this by measuring the worst loss from peak to trough over the time period. Still, maximum drawdown lacks information about the length of the drawdown, which can have a substantial impact on investors’ perception of a strategy. In fact, Strategies B and C shown previously have the same maximum drawdown of 25%.
Enter the Ulcer Index. It not only factors in the severity of the drawdowns but also their duration. It is calculated using the following formula:
where N is the total number of data points and each Ri is the fractional retracement from the highest price seen thus far. Whereas, the maximum drawdown is only the largest Ri, which can only increase through time, the Ulcer Index encapsulates every drawdown into one summary statistic that adapts to new data as it is realized. Using the Ulcer Index, we can finally distinguish between strategies that have the same annualized return, annualized volatility, and maximum drawdown: Strategies B and C have Ulcer Indices of 11.2% and 12.8%, respectively.
As a case study, the following chart shows the return of a 60/40 portfolio of SPY and AGG rebalanced at the beginning of each year from 01/2004 to 12/2013. Along with the true realized path, I have included the path with the returns reversed and 5 paths with random permutations of the true returns.
The metrics for each path are shown in the table below:
Only the Ulcer index can fully differentiate among these paths. Even in cases where the maximum drawdown is similar (e.g. the true path and Random 1), the Ulcer Index shows a sharp contrast between the strategies.
For a more concrete way of picturing the Ulcer index, imagine driving a car along a 55 mph speed limit road with stoplights spaced every half mile. Traffic is moderately heavy and the lights are poorly timed. As you accelerate, the light down the road turns yellow and then red. Easing off the accelerator will increase the time until you get to that light, perhaps to the point where you won’t have to stop, thus reducing the amount of time spent waiting for the light to change and the subsequent acceleration to approach the speed limit again.
You continue down the road anticipating the lights so that you do not brake when unnecessary or burn needless gas racing toward red lights. This not only reduces the variation in your speed (a volatility) but also the amount you have to slow down (the severity) and the time spent waiting at red lights (the duration). The smoother trip is likely to lead to less stress, not to mention wear and tear on the car, which can cause further headaches.
Ultimately, evaluating a strategy involves more than simple performance metrics since the methodology driving the strategy is key. But when comparing historical performance, it is helpful to have a toolbox equipped with implements able to measure performance on the bases of profitability and risk in ways that are amenable to our inherent, risk-averse inclinations.
 One exception is if you owned another strategy that had the correct characteristics relative to strategy B (negative correlation, positive return, and similar volatility) so that the overall return was even smoother than strategy A. Even so, these trends would not have any guarantee of continuing in the future.
 In simulations this is easy to do by reversing the order of the returns.
 Perhaps another interesting metric would be an exponentially weighted Ulcer Index that places more weight on more recent observations.