Category: Portfolio Construction Page 7 of 10

When Simplicity Met Fragility

By Corey Hoffstein

On October 29, 2018

In Craftsmanship, Portfolio Construction, Risk Management, Weekly Commentary

This post is available as a PDF download here.

Summary

Research suggests that simple heuristics are often far more robust than more complicated, theoretically optimal solutions.
Taken too far, we believe simplicity can actually introduce significant fragility into an investment process.
Using trend equity as an example, we demonstrate how using only a single signal to drive portfolio allocations can make a portfolio highly sensitive to the impact of randomness, clouding our ability to determine the difference between skill and luck.
We demonstrate that a slightly more complicated process that combines signals significantly reduces the portfolio’s sensitivity to randomness.
We believe that the optimal level of simplicity is found at the balance of diversification benefit and introduced estimation risk. When a more complicated process can introduce meaningful diversification gain into a strategy or portfolio with little estimation risk, it should be considered.

Introduction

In the world of finance, simple can be surprisingly robust. DeMiguel, Garlappi, and Uppal (2005)¹, for example, demonstrate that a naïve, equal-weight portfolio frequently delivers higher Sharpe ratios, higher certainty-equivalent returns, and lower turnover out-of-sample than competitive “optimal” allocation policies. In one of our favorite papers, Haldane (2012)²demonstrates that simplified heuristics often outperform more complicated algorithms in a variety of fields.

Yet taken to an extreme, we believe that simplicity can have the opposite effect, introducing extreme fragility into an investment strategy.

As an absurd example, consider a highly simplified portfolio that is 100% allocated to U.S. equities. Introducing bonds into the portfolio may not seem like a large mental leap but consider that this small change introduces an axis of decision making that brings with it a number of considerations. The proportion we allocate between stocks and bonds requires, at the very least, estimates of an investor’s objectives, risk tolerances, market outlook, and confidence levels in these considerations.³

Despite this added complexity, few investors would consider an all-equity portfolio to be more “robust” by almost any reasonable definition of robustness.

Yet this is precisely the type of behavior we see all too often in tactical portfolios – and particularly in trend equity strategies – where investors follow a single signal to make dramatic allocation decisions.

So Close and Yet So Far

To demonstrate the potential fragility of simplicity, we will examine several trend-following signals applied to broad U.S. equities:

Price minus the 10-month moving average
12-1 month total return
13-minus-34-week exponential moving average cross-over

Below we plot over time the distance each of these signals is from turning off. Whenever the line crosses over the 0% threshold, it means the signal has flipped direction, with negative values indicating a sell and positive values indicating a buy.

In orange we highlight those periods where the signal is within 1% of changing direction. We can see that for each signal there are numerous occasions where the signal was within this threshold but avoided flipping direction. Similarly, we can see a number of scenarios where the signal just breaks the 0% threshold only to revert back shortly thereafter. In the former case, the signal has often just managed to avoid whipsaw, while in the latter it has usually become unfortunately subject to it.

Source: Kenneth French Data Library. Calculations by Newfound Research.

Is the avoidance of whipsaw representative of the “skill” of the signals while the realization of whipsaw is just bad luck? Or might it be that the avoidance of whipsaw is often just as much luck as the realization of whipsaw is poor skill? How can we determine what is skill and what is luck when there are so many “close calls” and “just hits”?

What is potentially confusing for investors new to this space is that academic literature and practitioner evidence finds that these highly simplified approaches are surprisingly robust across a variety of investment vehicles, geographies, and time periods. What we must stress, however, is that evidence of general robustness is not evidence of specific robustness; i.e. there is little evidence suggesting that a single approach applied to a single instrument over a specific time horizon will be particularly robust.

What Randomness Tells Us About Fragility

To emphasize the potential fragility on utilizing a single in-or-out signal to drive our allocation decisions, we run a simple test:

Begin with daily market returns
Add a small amount of white noise (mean 0%; standard deviation 0.025%) to daily market returns
Calculate a long/flat trend equity strategy using 12-1 month momentum signals⁴
Calculate the rolling 12-month return of the strategy minus the alternate market history return.
Repeat 1,000 times to generate 1,000 slightly alternate histories.

The design of this test aims to deduce how fragile a strategy is via the introduction of randomness. By measuring 12-month rolling relative returns versus the modified benchmarks, we can compare the 1,000 slightly alternate histories to one another in an effort to determine the overall stability of the strategy itself.

Now bear with us, because while the next graph is a bit difficult to read, it succinctly captures the thrust of our entire thesis. At each point in time, we first calculate the average 12-month relative return of all 1,000 strategies. This average provides a baseline of expected relative strategy performance.

Next, we calculate the maximum and minimum relative 12-month relative performance and subtract the average. This spread – which is plotted in the graph below – aims to capture the potential return differential around the expected strategy performance due to randomness. Or, put another way, the spread captures the potential impact of luck in strategy results due only to slight changes in market returns.

Source: Kenneth French Data Library. Calculations by Newfound Research.

We can see that the spread frequently exceeds 5% and sometimes even exceeds 10. Thus, a tiny bit of injected randomness has a massive effect upon our realized results. Using a single signal to drive our allocation appears particularly fragile and success or failure over the short run can largely be dictated by the direction the random winds blow.

A backtest based upon a single signal may look particularly good, but this evidence suggests we should dampen our confidence as the strategy may actually have just been the accidental beneficiary of good fortune. In this situation, it is nearly impossible to identify skill from luck when in a slightly alternate universe we may have had substantially different results. After all, good luck in the past can easily turn into misfortune in the future.

Now let us perform the same exercise again using the same random sequences we generated. But rather than using a single signal to drive our allocation we will blend the three trend-following approaches above to determine the proportional amount of equities the portfolio should hold.⁵ We plot the results below using the same scale in the y-axis as the prior plot.

Source: Kenneth French Data Library. Calculations by Newfound Research.

We can see that our more complicated approach actually exhibits a significant reduction in the effects of randomness, with outlier events significantly decreased and far more symmetry in both positive and negative impacts.

Below we plot the actual spreads themselves. We can see that the spread from the combined signal approach is lower than the single signal approach on a fairly consistent basis. In the cases where the spread is larger, it is usually because the sensitivity is arising from either the 10-month SMA or 13-minus-34-week EWMA signals. Were spreads for single signal strategies based upon those approaches plotted, they would likely be larger during those time periods.

Source: Kenneth French Data Library. Calculations by Newfound Research.

Conclusion

So, where is the balance? How can we tell when simplicity creates robustness and simplicity introduces fragility? As we discussed in our article A Case Against Overweighting International Equity, we believe the answer is diversificationversus estimation risk.

In our case above, each trend signal is just a model: an estimate of what the underlying trend is. As with all models, it is imprecise and our confidence level in any individual signal at any point in time being correct may actually be fairly low. We can wrap this all together by simply saying that each signal is actually shrouded in a distribution of estimation risk. But by combining multiple trend signals, we exploit the benefits of diversification in an effort to reduce our overall estimation risk.

Thus, while we may consider a multi-model approach less transparent and more complicated, that added layer of complication serves to increase internal diversification and reduce estimation risk.

It should not go overlooked that the manner in which the signals were blended represents a model with its own estimation risk. Our choice to simply equally-weight the signals indicates a zero-confidence position in views about relative model accuracy and relative marginal diversification benefits among the models. Had we chosen a more complicated method of combining signals, it is entirely possible that the realized estimation risk could overwhelm the diversification gain we aimed to benefit from in the first place. Or, conversely, that very same added estimation risk could be entirely justified if we could continue to meaningfully improve diversification benefits.

If we return back to our original example of a 100% equity portfolio versus a blended stock-bond mix, the diversification versus estimation risk trade-off becomes obvious. Introducing bonds into our portfolio creates such a significant diversification gain that the estimation risk is often an insignificant consideration. The same might not be true, however, in a tactical equity portfolio.

Research and empirical evidence suggest that simplicity is surprisingly robust. But we should be skeptical of simplicity for the sake of simplicity when it foregoes low-hanging diversification opportunities, lest we make our portfolios and strategies unintentionally fragile.

Trade Optimization

By Corey Hoffstein

On August 27, 2018

In Craftsmanship, Portfolio Construction, Weekly Commentary

Trade optimization is more technical topic than we usually cover in our published research. Therefore, this note will relies heavily on mathematical notation and assumes readers have a basic understanding of optimization. Accompanying the commentary is code written in Python, meant to provide concrete examples of how these ideas can be implemented. The Python code leverages the PuLP optimization library.

Readers not proficient in these areas may still benefit from reading the Introduction and evaluating the example outlined in Section 5.

Summary

In practice, portfolio managers must account for the real-world implementation costs – both explicit (e.g. commission) and implicit (e.g. bid/ask spread and impact) associated with trading portfolios.
Managers often implement trade paring constraints that may limit the number of allowed securities, the number of executed trades, the size of a trade, or the size of holdings. These constraints can turn a well-formed convex optimization into a discrete problem.
In this note, we explore how to formulate trade optimization as a Mixed-Integer Linear Programming (“MILP”) problem and implement an example in Python.

0. Initialize Python Libraries

import pandas
import numpy

from pulp import *

import scipy.optimize

1. Introduction

In the context of portfolio construction, trade optimization is the process of managing the transactions necessary to move from one set of portfolio weights to another. These optimizations can play an important role both in the cases of rebalancing as well as in the case of a cash infusion or withdrawal. The reason for controlling these trades is to try to minimize the explicit (e.g. commission) and implicit (e.g. bid/ask spread and impact) costs associated with trading.

Two approaches are often taken to trade optimization:

Trading costs and constraints are explicitly considered within portfolio construction. For example, a portfolio optimization that seeks to maximize exposure to some alpha source may incorporate explicit measures of transaction costs or constrain the number of trades that are allowed to occur at any given rebalance.
Portfolio construction and trade optimization occur in a two step process. For example, a portfolio optimization may take place that creates the “ideal” portfolio, ignoring consideration of trading constraints and costs. Trade optimization would then occur as a second step, seeking to identify the trades that would move the current portfolio “as close as possible” to the target portfolio while minimizing costs or respecting trade constraints.

These two approaches will not necessarily arrive at the same result. At Newfound, we prefer the latter approach, as we believe it creates more transparency in portfolio construction. Combining trade optimization within portfolio optimization can also lead to complicated constraints, leading to infeasible optimizations. Furthermore, the separation of portfolio optimization and trade optimization allows us to target the same model portfolio across all strategy implementations, but vary when and how different portfolios trade depending upon account size and costs.

For example, a highly tactical strategy implemented as a pooled vehicle with a large asset base and penny-per-share commissions can likely afford to execute a much higher number of trades than an investor trying to implement the same strategy with $250,000 and $7.99 ticket charges. While implicit and explicit trading costs will create a fixed drag upon strategy returns, failing to implement each trade as dictated by a hypothetical model will create tracking error.

Ultimately, the goal is to minimize the fixed costs while staying within an acceptable distance (e.g. turnover distance or tracking error) of our target portfolio. Often, this goal is expressed by a portfolio manager with a number of semi-ad-hoc constraints or optimization targets. For example:

Asset Paring. A constraint that specifies the minimum or maximum number of securities that can be held by the portfolio.
Trade Paring. A constraint that specifies the minimum or maximum number of trades that may be executed.
Level Paring. A constraint that establishes a minimum level threshold for securities (e.g. securities must be at least 1% of the portfolio) or trades (e.g. all trades must be larger than 0.5%).

Unfortunately, these constraints often turn the portfolio optimization problem from continuous to discrete, which makes the process of optimization more difficult.

2. The Discreteness Problem

Consider the following simplified scenario. Given our current, drifted portfolio weights $w_{old}$ and a new set of target model weights $w_{target}$ , we want to minimize the number of trades we need to execute to bring our portfolio within some acceptable turnover threshold level, $\theta$ . We can define this as the optimization problem:

\begin{aligned} & \text{minimize} & & \sum\limits_{i} 1_{|t_i|}>0 \\ & \text{subject to} & & \sum\limits_{i} |w_{target, i} - (w_{old, i} + t_i)| \le 2 * \theta \\ & & & \sum\limits_{i} t_i = 0 \\ & \text{and} & & t_i \ge -w_{old,i} \end{aligned}

Unfortunately, as we will see below, simply trying to throw this problem into an off-the-shelf convex optimizer, as is, will lead to some potentially odd results. And we have not even introduced any complex paring constraints!

2.1 Example Data

# setup some sample data
tickers = "amj bkln bwx cwb emlc hyg idv lqd \
           pbp pcy pff rem shy tlt vnq vnqi vym".split()

w_target = pandas.Series([float(x) for x in "0.04095391 0.206519656 0 \
                      0.061190655 0.049414401 0.105442705 0.038080766 \
                      0.07004622 0.045115708 0.08508047 0.115974239 \
                      0.076953702 0 0.005797291 0.008955226 0.050530852 \
                      0.0399442".split()], index = tickers)

w_old = pandas.Series([float(x) for x in \
                   "0.058788745 0.25 0 0.098132817 \
                    0 0.134293993 0.06144967 0.102295438 \
                    0.074200473 0 0 0.118318536 0 0 \
                    0.04774768 0 0.054772649".split()], \
                      index = tickers)

n = len(tickers)

w_diff = w_target - w_old

2.2 Applying a Naive Convex Optimizer

The example below demonstrates the numerical issues associated with attempting to solve discrete problems with traditional convex optimizers. Using the portfolio and target weights established above, we run a naive optimization that seeks to minimize the number of trades necessary to bring our holdings within a 5% turnover threshold from the target weights.

# Try a naive optimization with SLSQP

theta = 0.05
theta_hat = theta + w_diff.abs().sum() / 2.

def _fmin(t):
    return numpy.sum(numpy.abs(t) > 1e-8)

def _distance_constraint(t):
    return theta_hat - numpy.sum(numpy.abs(t)) / 2.

def _sums_to_zero(t):
    return numpy.sum(numpy.square(t))

t0 = w_diff.copy()

bounds = [(-w_old[i], 1) for i in range(0, n)]

result = scipy.optimize.fmin_slsqp(_fmin, t0, bounds = bounds, \
                                   eqcons = [_sums_to_zero], \
                                   ieqcons = [_distance_constraint], \
                                   disp = -1)

result =  pandas.Series(result, index = tickers)

Note that the trades we received are simply $w_{target} - w_{old}$ , which was our initial guess for the optimization. The optimizer didn’t optimize.

What’s going on? Well, many off-the-shelf optimizers – such as the Sequential Least Squares Programming (SLSQP) approach applied here – will attempt to solve this problem by first estimating the gradient of the problem to decide which direction to move in search of the optimal solution. To achieve this numerically, small perturbations are made to the input vector and their influence on the resulting output is calculated.

In this case, small changes are unlikely to create an influence in the problem we are trying to minimize. Whether the trade is 5% or 5.0001% will have no influence on the *number* of trades executed. So the first derivative will appear to be zero and the optimizer will exit.

Fortunately, with a bit of elbow grease, we can turn this problem into a mixed integer linear programming problem (“MILP”), which have their own set of efficient optimization tools (in this article, we will use the PuLP library for the Python programming language). A MILP is a category of optimization problems that take the standard form:

\begin{aligned} & \text{minimize} & & c^{T}x + h^{T}y \\ & \text{subject to} & & Ax + Gy \le b \\ & \text{and} & & x \in \mathbb{Z}^{n} \end{aligned}

Here b is a vector and A and G are matrices. Don’t worry too much about the form.

The important takeaway is that we need: (1) to express our minimization problem as a linear function and (2) express our constraints as a set of linear inequalities.

But first, for us to take advantage of linear programming tools, we need to eliminate our absolute values and indicator functions and somehow transform them into linear constraints.

3. Linear Programming Transformation Techniques

3.1 Absolute Values

Consider an optimization of the form:

\begin{aligned} & \text{minimize} & & \sum\limits_{i} |x_i| \\ & \text{subject to} & & ... \end{aligned}

To get rid of the absolute value function, we can rewrite the function as a minimization of a new variable, $\psi$ .

\begin{aligned} & \text{minimize} & & \sum\limits_{i} \psi_i \\ & \text{subject to} & & \psi_i \ge x_i \\ & & & \psi_i \ge -x_i \\ & \text{and} & & ... \end{aligned}

The combination of constraints makes it such that $\psi_i \ge |x_i|$ . When $x_i$ is positive, $\psi_i$ is constrained by the first constraint and when $x_i$ is negative, it is constrained by the latter. Since the optimization seeks to minimize the sum of each $\psi_i$ , and we know $\psi_i$ will be positive, the optimizer will reduce $\psi_i$ to equal $|x_i|$ , which is it’s minimum possible value.

Below is an example of this trick in action. Our goal is to minimize the absolute value of some variables $x_i$ . We apply bounds on each $x_i$ to allow the problem to converge on a solution.

lp_problem = LpProblem("Absolute Values", LpMinimize)

x_vars = []
psi_vars = []

bounds = [[1, 7], [-10, 0], [-9, -1], [-1, 5], [6, 9]]

print "Bounds for x: "
print pandas.DataFrame(bounds, columns = ["Left", "Right"])

for i in range(5):
    x_i = LpVariable("x_" + str(i), None, None)
    x_vars.append(x_i)
    
    psi_i = LpVariable("psi_i" + str(i), None, None)
    psi_vars.append(psi_i)
    
lp_problem += lpSum(psi_vars), "Objective"

for i in range(5):
    lp_problem += psi_vars[i] >= -x_vars[i]
    lp_problem += psi_vars[i] >= x_vars[i]
    
    lp_problem += x_vars[i] >= bounds[i][0]
    lp_problem += x_vars[i] <= bounds[i][1]
    
lp_problem.solve()

print "\nx variables"
print pandas.Series([x_i.value() for x_i in x_vars])

print "\npsi Variables (|x|):"
print pandas.Series([psi_i.value() for psi_i in psi_vars])

Bounds for x: 
   Left  Right
0     1      7
1   -10      0
2    -9     -1
3    -1      5
4     6      9

x variables
0    1.0
1    0.0
2   -1.0
3    0.0
4    6.0
dtype: float64

psi Variables (|x|):
0    1.0
1    0.0
2    1.0
3    0.0
4    6.0
dtype: float64

3.2 Indicator Functions

Consider an optimization problem of the form:

\begin{aligned} & \text{minimize} & & \sum\limits_{i} 1_{x_i > 0} \\ & \text{subject to} & & ... \end{aligned}

We can re-write this problem by introducing a new variable, $y_i$ , and adding a set of linear constraints:

\begin{aligned} & \text{minimize} & & \sum\limits_{i} y_i \\ & \text{subject to} & & x_i \le A*y_i\\ & & & y_i \ge 0 \\& & & y_i \le 1 \\ & & & y_i \in \mathbb{Z} \\ & \text{and} & & ... \end{aligned}

Note that the last three constraints, when taken together, tell us that $y_i \in \{0, 1\}$ . The new variable A should be a large constant, bigger than any value of $x_i$ . Let’s assume $A = max(x) + 1$ .

Let’s first consider what happens when $x_i \le 0$ . In such a case, $y_i$ can be set to zero without violating any constraints. When $x_i$ is positive, however, for $x_i \le A*y_i$ to be true, it must be the case that $y_i = 1$ .

What prevents $y_i$ from equalling 1 in the case where $x_i \le 0$ is the goal of minimizing the sum of $y_i$ , which will force $y_i$ to be 0 whenever possible.

Below is a sample problem demonstrating this trick, similar to the example described in the prior section.

lp_problem = LpProblem("Indicator Function", LpMinimize)

x_vars = []
y_vars = []

bounds = [[-4, 1], [-3, 5], [-6, 1], [1, 7], [-5, 9]]

A = 11    

print "Bounds for x: "
print pandas.DataFrame(bounds, columns = ["Left", "Right"])

for i in range(5):
    x_i = LpVariable("x_" + str(i), None, None)
    x_vars.append(x_i)
    
    y_i = LpVariable("ind_" + str(i), 0, 1, LpInteger)
    y_vars.append(y_i)
    
lp_problem += lpSum(y_vars), "Objective"

for i in range(5):
    lp_problem += x_vars[i] >= bounds[i][0]
    lp_problem += x_vars[i] <= bounds[i][1]
    
    lp_problem += x_vars[i] <= A * y_vars[i]
    
lp_problem.solve()

print "\nx variables"
print pandas.Series([x_i.value() for x_i in x_vars])

print "\ny Variables (Indicator):"
print pandas.Series([y_i.value() for y_i in y_vars])

Bounds for x: 
   Left  Right
0    -4      1
1    -3      5
2    -6      1
3     1      7
4    -5      9

x variables
0   -4.0
1   -3.0
2   -6.0
3    1.0
4   -5.0
dtype: float64

y Variables (Indicator):
0    0.0
1    0.0
2    0.0
3    1.0
4    0.0
dtype: float64

3.3 Tying the Tricks Together

A problem arises when we try to tie these two tricks together, as both tricks rely upon the minimization function itself. The $\psi_i$ are dragged to the absolute value of $x_i$ because we minimize over them. Similarly, $y_i$ is dragged to zero when the indicator should be off because we are minimizing over it.

What happens, however, if we want to solve a problem of the form:

\begin{aligned} & \text{minimize} & & \sum\limits_{i} 1_{|x_i| > 0} \\ & \text{subject to} & & ... \end{aligned}

One way of trying to solve this problem is by using our tricks and then combining the objectives into a single sum.

\begin{aligned} & \text{minimize} & & \sum\limits_{i} y_i + \psi_i \\ & \text{subject to} & & \psi_i \ge x_i \\ & & & \psi_i \ge -x_i \\ & & & x_i \le A*y_i\\ & & & y_i \ge 0 \\ & & & y_i \le 1 \\ & & & y_i \in \mathbb{Z} \\ & \text{and} & & .. \end{aligned}

By minimizing over the sum of both variables, $\psi_i$ is forced towards $|x_i|$ and $y_i$ is forced to zero when $\psi_i = 0$ .

Below is an example demonstrating this solution, again similar to the examples discussed in prior sections.

lp_problem = LpProblem("Absolute Values", LpMinimize)

x_vars = []
psi_vars = []
y_vars = []

bounds = [[-7, 3], [7, 8], [5, 9], [1, 4], [-6, 2]]

A = 11    

print "Bounds for x: "
print pandas.DataFrame(bounds, columns = ["Left", "Right"])

for i in range(5):
    x_i = LpVariable("x_" + str(i), None, None)
    x_vars.append(x_i)
    
    psi_i = LpVariable("psi_i" + str(i), None, None)
    psi_vars.append(psi_i)
    
    y_i = LpVariable("ind_" + str(i), 0, 1, LpInteger)
    y_vars.append(y_i)
    
    
lp_problem += lpSum(y_vars) + lpSum(psi_vars), "Objective"

for i in range(5):
    lp_problem += x_vars[i] >= bounds[i][0]
    lp_problem += x_vars[i] <= bounds[i][1]
    
for i in range(5):
    lp_problem += psi_vars[i] >= -x_vars[i]
    lp_problem += psi_vars[i] >= x_vars[i]
    
    lp_problem += psi_vars[i] <= A * y_vars[i]
    
lp_problem.solve()

print "\nx variables"
print pandas.Series([x_i.value() for x_i in x_vars])

print "\npsi Variables (|x|):"
print pandas.Series([psi_i.value() for psi_i in psi_vars])

print "\ny Variables (Indicator):"
print pandas.Series([y_i.value() for y_i in y_vars])

Bounds for x: 
   Left  Right
0    -7      3
1     7      8
2     5      9
3     1      4
4    -6      2

x variables
0    0.0
1    7.0
2    5.0
3    1.0
4    0.0
dtype: float64

psi Variables (|x|):
0    0.0
1    7.0
2    5.0
3    1.0
4    0.0
dtype: float64

y Variables (Indicator):
0    0.0
1    1.0
2    1.0
3    1.0
4    0.0
dtype: float64

4. Building a Trade Minimization Model

Returning to our original problem,

\begin{aligned} & \text{minimize} & & \sum\limits_{i} 1_{|t_i| > 0} \\ & \text{subject to} & & \sum\limits_{i} |w_{target, i} - (w_{old, i} + t_i)| \le 2 * \theta \\ & & & \sum\limits_{i} t_i = 0 \\ & \text{and} & & t_i \ge -w_{old,i} \end{aligned}

We can now use the tricks we have established above to re-write this problem as:

\begin{aligned} & \text{minimize} & & \sum\limits_{i} (\phi_i + \psi_i + y_i) \\ & \text{subject to} & & \psi_i \ge t_i \\ & & & \psi_i \ge -t_i \\ & & & \psi_i \le A*y_i \\ & & & \phi_i \ge (w_{target,i} - (w_{old,i} + t_i))\\ & & & \phi_i \ge -(w_{target,i} - (w_{old,i} + t_i)) \\ & & & \sum\limits_{i} \phi_i \le 2 * \theta \\ & & & \sum\limits_{i} t_i = 0 \\ & \text{and} & & t_i \ge -w_{old,i} \end{aligned}

While there are a large number of constraints present, in reality there are just a few key steps going on. First, our key variable in question is $t_i$ . We then use our absolute value trick to create $\psi_i = |t_i|$ . Next, we use the indicator function trick to create $y_i$ , which tells us whether each position is traded or not. Ultimately, this is the variable we are trying to minimize.

Next, we have to deal with our turnover constraint. Again, we invoke the absolute value trick to create $\phi_i$ , and replace our turnover constraint as a sum of $\phi$ ‘s.

Et voila?

As it turns out, not quite.

Consider a simple two-asset portfolio. The current weights are [0.25, 0.75] and we want to get these weights within 0.05 of [0.5, 0.5] (using the L^1 norm – i.e. the sum of absolute values – as our definition of “distance”).

Let’s consider the solution [0.475, 0.525]. At this point, $\phi = [0.025, 0.025]$ and $\psi = [0.225, 0.225]$ . Is this solution “better” than [0.5, 0.5]? At [0.5, 0.5], $\phi = [0.0, 0.0]$ and $\psi = [0.25, 0.25]$ . From the optimizer’s viewpoint, these are equivalent solutions. Within this region, there are an infinite number of possible solutions.

Yet if we are willing to let some of our tricks “fail,” we can find a solution. If we want to get as close as possible, we effectively want to minimize the sum of $\psi$ ‘s. The infinite solutions problem arises when we simultaneously try to minimize the sum of $\psi$ ‘s and $\phi$ ‘s, which offset each other.

Do we actually need the values of $\psi$ to be correct? As it turns out: no. All we really need is that $\psi_i$ is positive when $t_i$ is non-zero, which will then force $y_i$ to be 1. By minimizing on $y_i$ , $\psi_i$ will still be forced to 0 when $t_i = 0$ .

So if we simply remove $\psi_i$ from the minimization, we will end up reducing the number of trades as far as possible and then reducing the distance to the target model as much as possible given that trade level.

\begin{aligned} & \text{minimize} & & \sum\limits_{i} (\phi_i + y_i) \\ & \text{subject to} & & \psi_i \ge t_i \\ & & & \psi_i \ge -t_i \\ & & & \psi_i \le A*y_i \\ & & & \phi_i \ge (w_{target,i} - (w_{old,i} + t_i))\\ & & & \phi_i \ge -(w_{target,i} - (w_{old,i} + t_i)) \\ & & & \sum\limits_{i} \phi_i \le 2 * \theta \\ & & & \sum\limits_{i} t_i = 0 \\ & \text{and} & & t_i \ge -w_{old,i} \end{aligned}

As a side note, because the sum of $\phi$ ‘s will at most equal 2 and the sum of y‘s can equal the number of assets in the portfolio, the optimizer will get more minimization bang for its buck by focusing on reducing the number of trades first before reducing the distance to the target model. This priority can be adjusted by multiplying $\phi_i$ by a sufficiently large scaler in our objective.

theta = 0.05

trading_model = LpProblem("Trade Minimization Problem", LpMinimize)

t_vars = []
psi_vars = []
phi_vars = []
y_vars = []

A = 2
    
for i in range(n):
    t = LpVariable("t_" + str(i), -w_old[i], 1 - w_old[i]) 
    t_vars.append(t)
    
    psi = LpVariable("psi_" + str(i), None, None)
    psi_vars.append(psi)

    phi = LpVariable("phi_" + str(i), None, None)
    phi_vars.append(phi)
    
    y = LpVariable("y_" + str(i), 0, 1, LpInteger) #set y in {0, 1}
    y_vars.append(y)

    
# add our objective to minimize y, which is the number of trades
trading_model += lpSum(phi_vars) + lpSum(y_vars), "Objective"
            
for i in range(n):
    trading_model += psi_vars[i] >= -t_vars[i]
    trading_model += psi_vars[i] >= t_vars[i]
    trading_model += psi_vars[i] <= A * y_vars[i]
    
for i in range(n):
    trading_model += phi_vars[i] >= -(w_diff[i] - t_vars[i])
    trading_model += phi_vars[i] >= (w_diff[i] - t_vars[i])
    
# Make sure our trades sum to zero
trading_model += (lpSum(t_vars) == 0)

# Set our trade bounds
trading_model += (lpSum(phi_vars) / 2. <= theta)

trading_model.solve()

results = pandas.Series([t_i.value() for t_i in t_vars], index = tickers)

print "Number of trades: " + str(sum([y_i.value() for y_i in y_vars]))

print "Turnover distance: " + str((w_target - (w_old + results)).abs().sum() / 2.)

Number of trades: 12.0
Turnover distance: 0.032663284500000014

5. A Sector Rotation Example

As an example of applying trade paring, we construct a sample sector rotation strategy. The investment universe consists of nine sector ETFs (XLB, XLE, XLF, XLI, XLK, XLU, XLV and XLY). The sectors are ranked by their 12-1 month total returns and the portfolio holds the four top-ranking ETFs in equal weight. To reduce timing luck, we apply a four-week tranching process.

We construct three versions of the strategy.

Naive: A version which rebalances back to hypothetical model weights on a weekly basis.
Filtered: A version that rebalances back to hypothetical model weights when drifted portfolio weights exceed a 5% turnover distance from target weights.
Trade Pared: A version that applies trade paring to rebalance back to within a 1% turnover distance from target weights when drifted weights exceed a 5% turnover distance from target weights.

The equity curves and per-year trade counts are plotted for each version below. Note that the equity curves do not account for any implicit or explicit trading costs.

Data Source: CSI. Calculations by Newfound Research. Past performance does not guarantee future results. All returns are hypothetical index returns. You cannot invest directly in an index and unmanaged index returns do not reflect any fees, expenses, sales charges, or trading expenses. Index returns include the reinvestment of dividends. No index is meant to measure any strategy that is or ever has been managed by Newfound Research. The indices were constructed by Newfound in August 2018 for purposes of this analysis and are therefore entirely backtested and not investment strategies that are currently managed and offered by Newfound.

For the reporting period covering full years (2001 – 2017), the trade filtering process alone reduced the average number of annual trades by 40.6% (from 255.7 to 151.7). The added trade paring process reduced the number of trades another 50.9% (from 151.7 to 74.5), for a total reduction of 70.9%.

6. Possible Extensions & Limitations

There are a number of extensions that can be made to this model, including:

Accounting for trading costs. Instead of minimizing the number of trades, we could minimize the total cost of trading by multiplying each trade against an estimate of cost (including bid/ask spread, commission, and impact).
Forcing accuracy. There may be positions for which more greater drift can be permitted and others where drift is less desired. This can be achieved by adding specific constraints to our $\phi_i$ variables.

Unfortunately, there are also a number of limitations. The first set is due to the fact we are formulating our optimization as a linear program. This means that quadratic constraints or objectives, such as tracking error constraints, are forbidden. The second set is due to the complexity of the optimization problem. While the problem may be technically solvable, problems containing a large number of securities and constraints may be time infeasible.

6.1 Non-Linear Constraints

In the former case, we can choose to move to a mixed integer quadratic programming framework. Or, we can also employ multi-step heuristic methods to find feasible, though potentially non-optimal, solutions.

For example, consider the case where we wish our optimized portfolio to fall within a certain tracking error constraint of our target portfolio. Prior to optimization, the marginal contribution to tracking error can be calculated for each asset and the total current tracking error can be calculated. A constraint can then be added such that the current tracking error minus the sum of weighted marginal contributions must be less than the tracking error target. After the optimization is complete, we can determine whether our solution meets the tracking error constraint.

If it does not, we can use our solution as our new $w_{old}$ , re-calculate our tracking error and marginal contribution figures, and re-optimize. This iterative approach approximates a gradient descent approach.

In the example below, we introduce a covariance matrix and seek to target a solution whose tracking error is less than 0.25%.

covariance_matrix = [[ 3.62767735e-02,  2.18757921e-03,  2.88389154e-05,
         7.34489308e-03,  1.96701876e-03,  4.42465667e-03,
         1.12579361e-02,  1.65860525e-03,  5.64030644e-03,
         2.76645571e-03,  3.63015800e-04,  3.74241173e-03,
        -1.35199744e-04, -2.19000672e-03,  6.80914121e-03,
         8.41701096e-03,  1.07504229e-02],
       [ 2.18757921e-03,  5.40346050e-04,  5.52196510e-04,
         9.03853792e-04,  1.26047511e-03,  6.54178355e-04,
         1.72005989e-03,  3.60920296e-04,  4.32241813e-04,
         6.55664695e-04,  1.60990263e-04,  6.64729334e-04,
        -1.34505970e-05, -3.61651337e-04,  6.56663689e-04,
         1.55184724e-03,  1.06451898e-03],
       [ 2.88389154e-05,  5.52196510e-04,  4.73857357e-03,
         1.55701811e-03,  6.22138578e-03,  8.13498400e-04,
         3.36654245e-03,  1.54941008e-03,  6.19861236e-05,
         2.93028853e-03,  8.70115005e-04,  4.90113403e-04,
         1.22200026e-04,  2.34074752e-03,  1.39606650e-03,
         5.31970717e-03,  8.86435533e-04],
       [ 7.34489308e-03,  9.03853792e-04,  1.55701811e-03,
         4.70643696e-03,  2.36059044e-03,  1.45119740e-03,
         4.46141908e-03,  8.06488179e-04,  2.09341490e-03,
         1.54107719e-03,  6.99000273e-04,  1.31596059e-03,
        -2.52039718e-05, -5.18390335e-04,  2.41334278e-03,
         5.14806453e-03,  3.76769305e-03],
       [ 1.96701876e-03,  1.26047511e-03,  6.22138578e-03,
         2.36059044e-03,  1.26644146e-02,  2.00358907e-03,
         8.04023724e-03,  2.30076077e-03,  5.70077091e-04,
         5.65049374e-03,  9.76571021e-04,  1.85279450e-03,
         2.56652171e-05,  1.19266940e-03,  5.84713900e-04,
         9.29778319e-03,  2.84300900e-03],
       [ 4.42465667e-03,  6.54178355e-04,  8.13498400e-04,
         1.45119740e-03,  2.00358907e-03,  1.52522064e-03,
         2.91651452e-03,  8.70569737e-04,  1.09752760e-03,
         1.66762294e-03,  5.36854007e-04,  1.75343988e-03,
         1.29714019e-05,  9.11071171e-05,  1.68043070e-03,
         2.42628131e-03,  1.90713194e-03],
       [ 1.12579361e-02,  1.72005989e-03,  3.36654245e-03,
         4.46141908e-03,  8.04023724e-03,  2.91651452e-03,
         1.19931947e-02,  1.61222907e-03,  2.75699780e-03,
         4.16113427e-03,  6.25609018e-04,  2.91008175e-03,
        -1.92908806e-04, -1.57151126e-03,  3.25855486e-03,
         1.06990068e-02,  6.05007409e-03],
       [ 1.65860525e-03,  3.60920296e-04,  1.54941008e-03,
         8.06488179e-04,  2.30076077e-03,  8.70569737e-04,
         1.61222907e-03,  1.90797844e-03,  6.04486114e-04,
         2.47501106e-03,  8.57227194e-04,  2.42587888e-03,
         1.85623409e-04,  2.91479004e-03,  3.33754926e-03,
         2.61280946e-03,  1.16461350e-03],
       [ 5.64030644e-03,  4.32241813e-04,  6.19861236e-05,
         2.09341490e-03,  5.70077091e-04,  1.09752760e-03,
         2.75699780e-03,  6.04486114e-04,  2.53455649e-03,
         9.66091919e-04,  3.91053383e-04,  1.83120456e-03,
        -4.91230334e-05, -5.60316891e-04,  2.28627416e-03,
         2.40776877e-03,  3.15907037e-03],
       [ 2.76645571e-03,  6.55664695e-04,  2.93028853e-03,
         1.54107719e-03,  5.65049374e-03,  1.66762294e-03,
         4.16113427e-03,  2.47501106e-03,  9.66091919e-04,
         4.81734656e-03,  1.14396535e-03,  3.23711266e-03,
         1.69157413e-04,  3.03445975e-03,  3.09323955e-03,
         5.27456576e-03,  2.11317800e-03],
       [ 3.63015800e-04,  1.60990263e-04,  8.70115005e-04,
         6.99000273e-04,  9.76571021e-04,  5.36854007e-04,
         6.25609018e-04,  8.57227194e-04,  3.91053383e-04,
         1.14396535e-03,  1.39905835e-03,  2.01826986e-03,
         1.04811491e-04,  1.67653296e-03,  2.59598793e-03,
         1.01532651e-03,  2.60716967e-04],
       [ 3.74241173e-03,  6.64729334e-04,  4.90113403e-04,
         1.31596059e-03,  1.85279450e-03,  1.75343988e-03,
         2.91008175e-03,  2.42587888e-03,  1.83120456e-03,
         3.23711266e-03,  2.01826986e-03,  1.16861730e-02,
         2.24795908e-04,  3.46679680e-03,  8.38606091e-03,
         3.65575720e-03,  1.80220367e-03],
       [-1.35199744e-04, -1.34505970e-05,  1.22200026e-04,
        -2.52039718e-05,  2.56652171e-05,  1.29714019e-05,
        -1.92908806e-04,  1.85623409e-04, -4.91230334e-05,
         1.69157413e-04,  1.04811491e-04,  2.24795908e-04,
         5.49990619e-05,  5.01897963e-04,  3.74856789e-04,
        -8.63113243e-06, -1.51400879e-04],
       [-2.19000672e-03, -3.61651337e-04,  2.34074752e-03,
        -5.18390335e-04,  1.19266940e-03,  9.11071171e-05,
        -1.57151126e-03,  2.91479004e-03, -5.60316891e-04,
         3.03445975e-03,  1.67653296e-03,  3.46679680e-03,
         5.01897963e-04,  8.74709395e-03,  6.37760454e-03,
         1.74349274e-03, -1.26348683e-03],
       [ 6.80914121e-03,  6.56663689e-04,  1.39606650e-03,
         2.41334278e-03,  5.84713900e-04,  1.68043070e-03,
         3.25855486e-03,  3.33754926e-03,  2.28627416e-03,
         3.09323955e-03,  2.59598793e-03,  8.38606091e-03,
         3.74856789e-04,  6.37760454e-03,  1.55034038e-02,
         5.20888498e-03,  4.17926704e-03],
       [ 8.41701096e-03,  1.55184724e-03,  5.31970717e-03,
         5.14806453e-03,  9.29778319e-03,  2.42628131e-03,
         1.06990068e-02,  2.61280946e-03,  2.40776877e-03,
         5.27456576e-03,  1.01532651e-03,  3.65575720e-03,
        -8.63113243e-06,  1.74349274e-03,  5.20888498e-03,
         1.35424275e-02,  5.49882762e-03],
       [ 1.07504229e-02,  1.06451898e-03,  8.86435533e-04,
         3.76769305e-03,  2.84300900e-03,  1.90713194e-03,
         6.05007409e-03,  1.16461350e-03,  3.15907037e-03,
         2.11317800e-03,  2.60716967e-04,  1.80220367e-03,
        -1.51400879e-04, -1.26348683e-03,  4.17926704e-03,
         5.49882762e-03,  7.08734925e-03]]

covariance_matrix = pandas.DataFrame(covariance_matrix, \
                                     index = tickers, \
                                     columns = tickers)

theta = 0.05
target_te = 0.0025

w_old_prime = w_old.copy()

# calculate the difference from the target portfolio
# and use this difference to estimate tracking error 
# and marginal contribution to tracking error ("mcte")
z = (w_old_prime - w_target)
te = numpy.sqrt(z.dot(covariance_matrix).dot(z))
mcte = (z.dot(covariance_matrix)) / te

while True:
    w_diff_prime = w_target - w_old_prime

    trading_model = LpProblem("Trade Minimization Problem", LpMinimize)

    t_vars = []
    psi_vars = []
    phi_vars = []
    y_vars = []

    A = 2

    for i in range(n):
        t = LpVariable("t_" + str(i), -w_old_prime[i], 1 - w_old_prime[i]) 
        t_vars.append(t)

        psi = LpVariable("psi_" + str(i), None, None)
        psi_vars.append(psi)

        phi = LpVariable("phi_" + str(i), None, None)
        phi_vars.append(phi)

        y = LpVariable("y_" + str(i), 0, 1, LpInteger) #set y in {0, 1}
        y_vars.append(y)


    # add our objective to minimize y, which is the number of trades
    trading_model += lpSum(phi_vars) + lpSum(y_vars), "Objective"

    for i in range(n):
        trading_model += psi_vars[i] >= -t_vars[i]
        trading_model += psi_vars[i] >= t_vars[i]
        trading_model += psi_vars[i] <= A * y_vars[i]

    for i in range(n):
        trading_model += phi_vars[i] >= -(w_diff_prime[i] - t_vars[i])
        trading_model += phi_vars[i] >= (w_diff_prime[i] - t_vars[i])

    # Make sure our trades sum to zero
    trading_model += (lpSum(t_vars) == 0)
    
    # Set tracking error limit
    #    delta(te) = mcte * delta(z) 
    #              = mcte * ((w_old_prime + t - w_target) - 
    #                        (w_old_prime - w_target)) 
    #              = mcte * t
    #    te + delta(te) <= target_te
    #    ==> delta(te) <= target_te - te
    trading_model += (lpSum([mcte.iloc[i] * t_vars[i] for i in range(n)]) \
                              <= (target_te - te))

    # Set our trade bounds
    trading_model += (lpSum(phi_vars) / 2. <= theta)

    trading_model.solve()
    
    # update our w_old' with the current trades
    results = pandas.Series([t_i.value() for t_i in t_vars], index = tickers)
    w_old_prime = (w_old_prime + results)
    
    z = (w_old_prime - w_target)
    te = numpy.sqrt(z.dot(covariance_matrix).dot(z))
    mcte = (z.dot(covariance_matrix)) / te
    
    if te < target_te:
        break
        
print "Tracking error: " + str(te) 

# since w_old' is an iterative update,
# the current trades only reflect the updates from
# the prior w_old'.  Thus, we need to calculate
# the trades by hand
results = (w_old_prime - w_old)
n_trades = (results.abs() > 1e-8).astype(int).sum()

print "Number of trades: " + str(n_trades)

print "Turnover distance: " + str((w_target - (w_old + results)).abs().sum() / 2.)

Tracking error: 0.0016583319880074485
Number of trades: 13
Turnover distance: 0.01624453350000001

6.2 Time Constraints

For time feasibility, heuristic approaches can be employed in effort to rapidly converge upon a “close enough” solution. For example, Rong and Liu (2011) discuss “build-up” and “pare-down” heuristics.

The basic algorithm of “pare-down” is:

Start with a trade list that includes every security
Solve the optimization problem in its unconstrained format, allowing trades to occur only for securities in the trade list.
If the solution meets the necessary constraints (e.g. maximum number of trades, trade size thresholds, tracking error constraints, etc), terminate the optimization.
Eliminate from the trade list a subset of securities based upon some measure of trade utility (e.g. violation of constraints, contribution to tracking error, etc).
Go to step 2.

The basic algorithm of “build-up” is:

Start with an empty trade list
Add a subset of securities to the trade list based upon some measure of trade utility.
Solve the optimization problem in its unconstrained format, allowing trades to occur only for securities in the trade list.
If the solution meets the necessary constraints (e.g. maximum number of trades, trade size thresholds, tracking error constraints, etc), terminate the optimization.
Go to step 2.

These two heuristics can even be combined in an integrated fashion. For example, a binary search approach can be employed, where the initial trade list list is filled with 50% of the tradable securities. Depending upon success or failure of the resulting optimization, a pare-down or build-up approach can be taken to either prune or expand the trade list.

7. Conclusion

In this research note we have explored the practice of trade optimization, which seeks to implement portfolio changes in as few trade as possible. While a rarely discussed detail of portfolio management, trade optimization has the potential to eliminate unnecessary trading costs – both explicit and implicit – that can be a drag on realized investor performance.

Constraints within the practice of trade optimization typically fall into one of three categories: asset paring, trade paring, and level paring. Asset paring restricts the number of securities the portfolio can hold, trade paring restricts the number of trades that can be made, and level paring restricts the size of positions and trades. Introducing these constraints often turns an optimization into a discrete problem, making it much more difficult to solve for traditional convex optimizations.

With this in mind, we introduced mixed-integer linear programming (“MILP”) and explore a few techniques that can be utilized to transform non-linear functions into a set of linear constraints. We then combined these transformations to develop a simple trade optimization framework that can be solved using MILP optimizers.

To offer numerical support in the discussion, we created a simple momentum-based sector rotation strategy. We found that naive turnover-filtering helped reduce the number of trades executed by 50%, while explicit trade optimization reduced the number of trades by 70%.

Finally, we explored how our simplified framework could be further extended to account for both non-linear functional constraints (e.g. tracking error) and operational constraints (e.g. managing execution time).

The paring constraints introduced by trade optimization often lead to problems that are difficult to solve. However, when we consider that the cost of trading is a very real drag on the results realized by investors, we believe that the solutions are worth pursuing.

The State of Risk Management

By Justin Sibears

On August 20, 2018

In Portfolio Construction, Risk Management, Weekly Commentary

This post is available as PDF download here.

Summary

We compare and contrast different approaches to risk managing equity exposure; including fixed income, risk parity, managed futures, tactical equity, and options-based strategies; over the last 20 years.
We find that all eight strategies studied successfully reduce risk, while six of the eight strategies improve risk-adjusted returns. The lone exceptions are two options-based strategies that involve being long volatility and therefore are on the wrong side of the volatility risk premium.
Over time, performance of the risk management strategies varies significantly both relative to the S&P 500 and compared to the other strategies. Generally, risk-managed strategies tend to behave like insurance, underperforming on the upside and outperforming on the downside.
Diversifying your diversifiers by blending a number of complementary risk-managed strategies together can be a powerful method of improving long-term outcomes. The diversified approach to risk management shows promise in terms of reducing sequence risk for those investors nearing or in retirement.

I was perusing Twitter the other day and came across this tweet from Jim O’Shaughnessy, legendary investor and author of What Works on Wall Street.

As always. Jim’s wisdom is invaluable. But what does this idea mean for Newfound as a firm? Our first focus is on managing risk. As a result, one of the questions that we MUST know the answer to is how to get more investors comfortable with sticking to a risk management plan through a full market cycle.

Unfortunately, performance chasing seems to us to be just as prevalent in risk management as it is in investing as a whole. The benefits of giving up some upside participation in exchange for downside protection seemed like a no brainer in March of 2009. After 8+ years of strong equity market returns (although it hasn’t always been as smooth of a ride as the market commentators may make you think), the juice may not quite seem worth the squeeze.

While we certainly don’t profess to know the answer to our burning question from above, we do think the first step towards finding one is a thorough understanding on the risk management landscape. In that vein, this week we will update our State of Risk Management presentation from early 2016.

We examine eight strategies that roughly fit into four categories:

Diversification Strategies: strategic 60/40 stock/bond mix¹ and risk parity²
Options Strategies: equity collar³, protective put⁴, and put-write⁵
Equity Strategies: long-only defensive equity that blends a minimum volatility strategy⁶, a quality strategy⁷, and a dividend growth strategy⁸ in equal weights
Trend-Following Strategies: managed futures⁹ and tactical equity¹⁰

The Historical Record

We find that over the period studied (December 1997 to July 2018) six of the eight strategies outperform the S&P 500 on a risk-adjusted basis both when we define risk as volatility and when we define risk as maximum drawdown. The two exceptions are the equity collar strategy and the protective put strategy. Both of these strategies are net long options and therefore are forced to pay the volatility risk premium. This return drag more than offsets the reduction of losses on the downside.

Data Source: Bloomberg, CSI. Calculations by Newfound Research. Past performance does not guarantee future results. Volatility is a statistical measure of the amount of variation around the average returns for a security or strategy. All returns are hypothetical index returns. You cannot invest directly in an index and unmanaged index returns do not reflect any fees, expenses, sales charges, or trading expenses. Index returns include the reinvestment of dividends. No index is meant to measure any strategy that is or ever has been managed by Newfound Research. The Tactical Equity strategy was constructed by Newfound in August 2018 for purposes of this analysis and is therefore entirely backtested and not an investment strategy that is currently managed and offered by Newfound.

Data Source: Bloomberg, CSI. Calculations by Newfound Research. Past performance does not guarantee future results. Drawdown is a statistical measure of the losses experienced by a security or strategy relative to its historical maximum. The maximum drawdown is the largest drawdown over the security or strategy’s history. All returns are hypothetical index returns. You cannot invest directly in an index and unmanaged index returns do not reflect any fees, expenses, sales charges, or trading expenses. Index returns include the reinvestment of dividends. No index is meant to measure any strategy that is or ever has been managed by Newfound Research. The Tactical Equity strategy was constructed by Newfound in August 2018 for purposes of this analysis and is therefore entirely backtested and not an investment strategy that is currently managed and offered by Newfound.

Not Always a Smooth Ride

While it would be nice if this outperformance accrued steadily over time, reality is quite a bit messier. All eight strategies exhibit significant variation in their rolling one-year returns vs. the S&P 500. Interestingly, the two strategies with the widest ranges of historical one-year performance vs. the S&P 500 are also the two strategies that have delivered the most downside protection (as measured by maximum drawdown). Yet another reminder that there is no free lunch in investing. The more aggressively you wish to reduce downside capture, the more short-term tracking error you must endure.

Relative 1-Year Performance vs. S&P 500 (December 1997 to July 2018)

Data Source: Bloomberg, CSI. Calculations by Newfound Research. Past performance does not guarantee future results. All returns are hypothetical index returns. You cannot invest directly in an index and unmanaged index returns do not reflect any fees, expenses, sales charges, or trading expenses. Index returns include the reinvestment of dividends. No index is meant to measure any strategy that is or ever has been managed by Newfound Research. The Tactical Equity strategy was constructed by Newfound in August 2018 for purposes of this analysis and is therefore entirely backtested and not an investment strategy that is currently managed and offered by Newfound.

Thinking of Risk Management as (Uncertain) Portfolio Insurance

When we examine this performance dispersion across different market environments, we find a totally intuitive result: risk management strategies generally underperform the S&P 500 when stocks advance and outperform the S&P 500 when stocks decline. The hit rate for the risk management strategies relative to the S&P 500 is 81.2% in the four years that the S&P 500 was down (2000, 2001, 2002, and 2008) and 19.8% in the seventeen years that the S&P was up.

In this way, risk management strategies are akin to insurance. A premium, in the form of upside capture ratios less than 100%, is paid in exchange for a (hopeful) reduction in downside capture.

With this perspective, it’s totally unsurprising that these strategies have underperformed since the market bottomed during the global market crisis. Seven of the eight strategies (with the long-only defensive equity strategy being the lone exception) underperformed the S&P 500 on an absolute return basis and six of the eight strategies (with defensive equity and the 60/40 stock/bond blend) underperformed on a risk-adjusted basis.

Annual Out/Underperformance Relative to S&P 500 (December 1997 to July 2018)

Diversifying Your Diversifiers

The good news is that there is significant year-to-year variation in the performance across strategies, as evidenced by the periodic table of returns above, suggesting there are diversification benefits to be harvested by allocating to multiple risk management strategies. The average annual performance differential between the best performing strategy and the worst performing strategy is 20.0%. This spread was less than 10% in only 3 of the 21 years studied.

We see the power of diversifying your diversifiers when we test simple equal-weight blends of the risk management strategies. Both blends have higher Sharpe Ratios than 7 of the 8 individual strategies and higher excess return to drawdown ratios than 6 of the eight individual strategies.

This is a very powerful result, indicating that naïve diversification is nearly as good as being able to pick the best individual strategies with perfect foresight.

Why Bother with Risk Management in the First Place?

As we’ve written about previously, we believe that for most investors investing “failure” means not meeting one’s financial objectives. In the portfolio management context, failure comes in two flavors. “Slow” failure results from taking too little risk, while “fast” failure results from taking too much risk.

In this book, Red Blooded Risk, Aaron Brown summed up this idea nicely: “Taking less risk than is optimal is not safer; it just locks in a worse outcome. Taking more risk than is optimal also results in a worst outcome, and often leads to complete disaster.”

Risk management is not synonymous with risk reduction. It is about taking the right amount of risk, not too much or too little.

Having a pre-defined risk management plan in place before a crisis can help investors avoid panicked decisions that can turn a bad, but survivable event into catastrophe (e.g. the retiree that sells all of his equity exposure in early 2009 and then stays out of the market for the next five years).

It’s also important to remember that individuals are not institutions. They have a finite investment horizon. Those that are at or near retirement are exposed to sequence risk, the risk of experiencing a bad investment outcome at the wrong time.

We can explore sequence risk using Monte Carlo simulation. We start by assessing the S&P 500 with no risk management overlay and assume a 30-year retirement horizon. The simulation process works as follows:

Randomly choose a sequence of 30 annual returns from the set of actual annual returns over the period we studied (December 1998 to July 2018).
Adjust returns for inflation.
For the sequence of returns chosen, calculate the perfect withdrawal rate (PWR). Clare et al, 2016 defines the PWR as “the withdrawal rate that effectively exhausts wealth at death (or at the end of a fixed period, known period) if one had perfect foresight of all returns over the period.¹¹
Return to #1, repeating 1000 times in total.

We plot the distribution of PWRs for the S&P 500 below. While the average PWR is a respectable 5.7%, the range of outcomes is very wide (0.6% to 14.7%). The 95 percent confidence interval around the mean is 2.0% to 10.3%. This is sequence risk. Unfortunately, investors do not have the luxury of experiencing the average, they only see one draw. Get lucky and you may get to fund a better lifestyle than you could have imagined with little to no financial stress. Get unlucky and you may have trouble paying the bills and will be sweating every market move.

Calculations by Newfound Research. Past performance does not guarantee future results. All returns are hypothetical index returns. You cannot invest directly in an index and unmanaged index returns do not reflect any fees, expenses, sales charges, or trading expenses. Index returns include the reinvestment of dividends.

Next, we repeat the simulation, replacing the pure S&P 500 exposure with the equal-weight blend of risk management strategies excluding the equity collar and the protective put. We see quite a different result. The average PWR is similar (6.2% to 5.7%), but the range of outcomes is much smaller (95% confidence interval from 4.4% to 8.1%). At its very core, this is what implementing a risk management plan is all about. Reducing the role of investment luck in financial planning. We give up some of the best outcomes (in the right tail of the S&P 500 distribution) in exchange for reducing the probability of the very worst outcomes (in the left tail).

Conclusion

There is no holy grail when it comes to risk management. While a number of approaches have historically delivered strong results, each comes with its own pros and cons.

In an uncertain world where we cannot predict exactly what the next crisis will look like, diversifying your diversifiers by combining a number of complementary risk-managed strategies may be a prudent course of action. We believe that this type of balanced approach has the potential to deliver compelling results over a full market cycle while managing the idiosyncratic risk of any one manager or strategy.

Diversification can also help to increase the odds of an investor sticking with their risk management plan as the short-term performance lows won’t be quite as low as they would be with a single strategy (conversely, the highs won’t be as high either).

That being said, having the discipline to stick with a risk management plan also requires being realistic. While it would be great to build a strategy with 100% upside and 0% downside, such an outcome is unrealistic. Risk-managed strategies tend to behave a lot like uncertain insurance for the portfolio. A premium, in the form of upside capture ratios less than 100%, is paid in exchange for a (hopeful) reduction in downside capture. This upside underperformance is a feature, not a bug. Trying too hard to correct it may lead to overfit strategies fail to deliver adequate protection on the downside.

Measuring Process Diversification in Trend Following

By Corey Hoffstein

On July 30, 2018

In Craftsmanship, Portfolio Construction, Risk Management, Weekly Commentary

This post is available as a PDF download here.

Summary

We prefer to think about diversification in a three-dimensional framework: what, how, and when.
The “how” axis covers the process with which an investment decision is made.
There are a number of models that trend-followers might use to capture a trend. For example, trend-followers might employ a time-series momentum model, a price-minus moving average model, or a double moving average cross-over model.
Beyond multiple models, each model can have a variety of parameterizations. For example, a time-series momentum model can just as equally be applied with a 3-month formation period as an 18-month period.
In this commentary, we attempt to measure how much diversification opportunity is available by employing multiple models with multiple parameterizations in a simple long/flat trend-following process.

When investors talk about diversification, they typically mean across different investments. Do not just by a single stock, for example, buy a basket of stocks in order to diversify away the idiosyncratic risk.

We call this “what” diversification (i.e. “what are you buying?”) and believe this is only one of three meaningful axes of diversification for investors. The other two are “how” (i.e. “how are you making your decision?”) and “when” (i.e. “when are you making your decision?”). In recent years, we have written a great deal about the “when” axis, and you can find a summary of that research in our commentary Quantifying Timing Luck.

In this commentary, we want to discuss the potential benefits of diversifying across the “how” axis in trend-following strategies.

But what, exactly, do we mean by this? Consider that there are a number of ways investors can implement trend-following signals. Some popular methods include:

Prior total returns (“time-series momentum”)
Price-minus-moving-average (e.g. price falls below the 200-day moving average)
Moving-average double cross-over (e.g. the 50-day moving average crosses the 200-day moving average)
Moving-average change-in-direction (e.g. the 200-day moving average slope turns positive or negative)

As it turns out, these varying methodologies are actually cousins of one another. Recent research has established that these models can, more or less, be thought of as different weighting schemes of underlying returns. For example, a time-series momentum model (with no skip month) derives its signal by averaging daily log returns over the lookback period equally.

With this common base, a number of papers over the last decade have found significant relationships between the varying methods. For example:

	Evidence
Bruder, Dao, Richard, and Roncalli (2011)	Moving-average-double-crossover is just an alternative weighting scheme for time-series momentum.
Marshall, Nguyen and Visaltanachoti (2014)	Time-series momentum is related to moving-average-change-in-direction.
Levine and Pedersen (2015)	Time-series-momentum and moving-average cross-overs are highly related; both methods perform similarly on 58 liquid futures contracts.
Beekhuizen and Hallerbach (2015)	Mathematically linked moving averages with prior returns.
Zakamulin (2015)	Price-minus-moving-average, moving-average-double-cross-over, and moving-average-change-of-direction can all be interpreted as a computation of a weighted moving average of momentum rules.

As we have argued in past commentaries, we do not believe any single method is necessarily superior to another. In fact, it is trivial to evaluate these methods over different asset classes and time-horizons and find an example that proves that a given method provides the best result.

Without a crystal ball, however, and without any economic interpretation why one might be superior to another, the choice is arbitrary. Yet the choice will ultimately introduce randomness into our results: a factor we like to call “process risk.” A question we should ask ourselves is, “if we have no reason to believe one is better than another, why would we pick one at all?”

We like to think of it this way: ex-post, we will know whether the return over a given period is positive or negative. Ex-ante, all we have is a handful of trend-following signals that are forecasting that direction. If, historically, all of these trend signals have been effective, then there may be no reason to necessarily believe on over another.

Combining them, in many ways, is sort of like trying to triangulate on the truth. We have a number of models that all look at the problem from a slightly different perspective and, therefore, provide a slightly different interpretation. A (very) loose analogy might be using the collective information from a number of cell towers in effort to pinpoint the geographic location of a cellphone.

We may believe that all of the trend models do a good job of identifying trends over the long run, but most will prove false from time-to-time in the short-run. By using them together, we can potentially increase our overall confidence when the models agree and decrease our confidence when they do not.

With all this in mind, we want to explore the simple question: “how much potential benefit does process diversification bring us?”

The Setup

To answer this question, we first generate a number of long/flat trend following strategies that invest in a broad U.S. equity index or the risk-free rate (both provided by the Kenneth French database and ranging from 1926 to 2018). There are 48 strategy variations in total constructed through a combination of four difference processes – time-series momentum, price-minus-moving-average, and moving-average double cross-over– and 16 different lookback periods (from the approximate equivalent of 3-to-18 months).

We then treat each of the 64 variations as its own unique asset.

To measure process diversification, we are going to use the concept of “independent bets.” The greater the number of independent bets within a portfolio, the greater the internal diversification. Below are a couple examples outlining the basic intuition for a two-asset portfolio:

If we have a portfolio holding two totally independent assets with similar volatility levels, a 50% allocation to each would maximize our diversification.Intuitively, we have equally allocated across two unique bets.
If we have a portfolio holding two totally independent assets with similar volatility levels, a 90% allocation to one asset and a 10% allocation to another would lead us to a highly concentrated bet.
If we have a portfolio holding two highly correlated assets, no matter the allocation split, we have a large, concentrated bet.
If we have a portfolio of two assets with disparate volatility levels, we will have a large concentrated bet unless the lower volatility asset comprises the vast majority of the portfolio.

To measure this concept mathematically, we are going to use the fact that the square of the “diversification ratio” of a portfolio is equal to the number of independent bets that portfolio is taking.¹

Diversifying Parameterization Risk

Within process diversification, the first variable we can tweak is the formation period of our trend signal. For example, if we are using a time-series momentum model that simply looks at the sign of the total return over the prior period, the length of that period may have a significant influence in the identification of a trend. Intuition tells us that shorter formation periods might identify short-term trends as well as react to long-term trend changes more quickly but may be more sensitive to whipsaw risk.

To explore the diversification opportunities available to us simply by varying our formation parameterization, we build equal-weight portfolios comprised of two strategies at a time, where each strategy utilizes the same trend model but a different parameterization. We then measure the number of independent bets in that combination.

We run this test for each trend following process independently. As an example, we compare using a shorter lookback period with a longer lookback period in the context of time-series momentum in isolation. We will compare across models in the next section.

In the graphs below, L0 through L15 represent the lookback periods, with L0 being the shortest lookback period and L15 representing the longest lookback period.

As we might suspect, the largest increase in available bets arises from combining shorter formation periods with longer formation periods. This makes sense, as they represent the two horizons that share the smallest proportion of data and therefore have the least “information leakage.” Consider, for example, a time-series momentum signal that has a 4-monnth lookback and one with an 8-month lookback. At all times, 50% of the information used to derive the latter model is contained within the former model. While the technical details are subtler, we would generally expect that the more informational overlap, the less diversification is available.

We can see that combining short- and long-term lookbacks, the total number of bets the portfolio is taking from 1.0 to approximately 1.2.

This may not seem like a significant lift, but we should remember Grinold and Kahn’s Fundamental Law of Active Management:

Information Ratio = Information Coefficient x SQRT(Independent Bets)

Assuming the information coefficient stays the same, an increase in the number of independent bets from 1.0 to 1.2 increases our information ratio by approximately 10%. Such is the power of diversification.

Another interesting way to approach this data is by allowing an optimizer to attempt to maximize the diversification ratio. In other words, instead of only looking at naïve, equal-weight combinations of two processes at a time, we can build a portfolio from all available lookback variations.

Doing so may provide two interesting insights.

First, we can see how the optimizer might look to combine different variations to maximize diversification. Will it barbell long and short lookbacks, or is there benefit to including medium lookbacks? Will the different processes have different solutions? Second, by optimizing over the full history of data, we can find an upper limit threshold to the number of independent bets we might be able to capture if we had a crystal ball.

A few takeaways from the graphs above:

Almost all of the processes barbell short and long lookback horizons to maximize diversification.
The optimizer finds value, in most cases, in introducing medium-term lookback horizons as well. We can see for Time-Series MOM, the significant weights are placed on L0, L1, L6, L10, and L15. While not perfectly spaced or equally weighted, this still provides a strong cross-section of available information. Double MA Cross-Over, on the other hand, finds value in weighting L0, L8, and L15.
While the optimizer increases the number of independent bets in all cases versus a naïve, equal-weight approach, the pickup is not incredibly dramatic. At the end of the day, a crystal ball does not find a meaningfully better solution than our intuition may provide.

Diversifying Model Risk

Similar to the process taken in the above section, we will now attempt to quantify the benefits of cross-process diversification.

For each trend model, we will calculate the number of independent bets available by combining it with another trend model but hold the lookback period constant. As an example, we will combine the shortest lookback period of the Time-Series MOM model with the shortest lookback period of the MA Double Cross-Over.

We plot the results below of the number of independent bets available through a naïve, equal-weight combination.

We can see that model combinations can lift the number of independent bets from by 0.05 to 0.1. Not as significant as the theoretical lift from parameter diversification, but not totally insignificant.

Combining Model and Parameterization Diversification

We can once again employ our crystal ball in an attempt to find an upper limit to the diversification available to trend followers, as well as the process / parameterization combinations that will maximize this opportunity. Below, we plot the results.

We see a few interesting things of note:

The vast majority of models and parameterizations are ignored.
Time-Series MOM is heavily favored as a model, receiving nearly 60% of the portfolio weight.
We see a spread of weight across short, medium, and long-term weights. Short-term is heavily favored, with Time-Series MOM L0 and Price-Minus MA L0 approaching nearly 45% of model weight.
All three models are, ultimately, incorporated, with approximately 10% being allocated to Double MA Cross-Over, 30% to Price-Minus MA, and 60% to Time-Series MOM.

It is worth pointing out that naively allocating equally across all 48 models creates 1.18 independent bets while the full-period crystal ball generated 1.29 bets.

Of course, having a crystal ball is unrealistic. Below, we look at a rolling window optimization that looks at the prior 5 years of weekly returns to create the most diversified portfolio. To avoid plotting a graph with 48 different components, we have plot the results two ways: (1) clustered by process and (2) clustered by lookback period.

Using the rolling window, we see similar results as we saw with the crystal ball. First, Time-Series MOM is largely favored, often peaking well over 50% of the portfolio weights. Second, we see that a barbelling approach is frequently employed, balancing allocations to the shortest lookbacks (L0 and L1) with the longest lookbacks (L14 and L15). Mid-length lookbacks are not outright ignored, however, and L5 through L11 combined frequently make up 20% of the portfolio.

Finally, we can see that the rolling number of bets is highly variable over time, but optimization frequently creates a meaningful impact over an equal-weight approach.²

Conclusion

In this commentary, we have explored the idea of process diversification. In the context of a simple long/flat trend-following strategy, we find that combining strategies that employ different trend identification models and different formation periods can lead to an increase in the independent number of bets taken by the portfolio.

As it specifically pertains to trend-following, we see that diversification appears to be maximized by allocating across a number of lookback horizons, with an optimizer putting a particular emphasis on barbelling shorter and longer lookback periods.

We also see that incorporating multiple processes can increase available diversification as well. Interestingly, the optimizer did not equally diversify across models. This may be due to the fact that these models are not truly independent from one another than they might seem. For example, Zakamulin (2015) demonstrated that these models can all be decomposed into a different weighted average of the same general momentum rules.

Finding process diversification, then, might require moving to a process that may not have a common basis. For example, trend followers might consider channel methods or a change in basis (e.g. constant volume bars instead of constant time bars).

Machine Learning, Subset Resampling, and Portfolio Optimization

By Justin Sibears

On July 23, 2018

In Portfolio Construction, Uncategorized

This post is available as a PDF download here.

Summary

Portfolio optimization research can be challenging due to the plethora of factors that can influence results, making it hard to generalize results outside of the specific cases tested.
That being said, building a robust portfolio optimization engine requires a diligent focus on estimation risk. Estimation risk is the risk that the inputs to the portfolio optimization process (i.e. expected returns, volatilities, correlations) are imprecisely estimated by sampling from the historical data, leading to suboptimal allocations.
We summarize the results from two recent papers we’ve reviewed on the topic of managing estimation risk. The first paper relies on techniques from machine learning while the second paper uses a form of simulation called subset resampling.
Both papers report that their methodologies outperform various heuristic and optimization-based benchmarks.
We perform our own tests by building minimum variance portfolios using the 49 Fama/French industry portfolios. We find that while both outperform equal-weighting on a risk-adjusted basis, the results are not statistically significant at the 5% level.

This week, we are going to review a couple of recent papers we’ve come across on the topic of reducing estimation risk in portfolio optimization.

Before we get started, we want to point out that while there are many fascinating papers on portfolio optimization, it is also one of the most frustrating areas to study in our opinion. Why? Because ultimately portfolio optimization is a very, very complex topic. The results will be impacted in significant ways by a number of factors like:

What is the investment universe studied?
Over what time period?
How are the parameters estimated?
What are the lookback periods used to estimate parameters?
And so on…

Say that you find a paper that argues for the superiority of equal-weighted portfolios over mean-variance optimization by testing on a universe of large-cap U.S. equities. Does this mean that equal-weighting is superior to mean-variance optimization in general? We tend to believe not. Rather, we should take the study at face value: equal-weighting was superior to the particular style of mean-variance in this specific test.

In addition, the result in and of itself says nothing about why the outperformance occurred. It could be that equal-weighting is a superior portfolio construction technique.

But maybe the equal-weighted stock portfolio just happens by chance to be close to the true Sharpe optimal portfolio. If I have a number of asset classes that have reasonably similar returns, risks, and correlations, it is very likely that equal-weighting does a decent job of getting close to the Sharpe optimal solution. On the other hand, consider an investment universe that consists of 9 equity sectors and U.S. Treasuries. In this case, equal-weighting is much less likely to be close to optimal and we would find it more probable that optimization approaches could outperform.

Maybe equal-weighting exposes the stock portfolio to risk-premia like the value and size factors that improve performance. I suspect that to some extent the outperformance of minimum variance portfolios in a number of studies is at least partially explained by the exposures that these portfolios have to the defensive or low beta factor (the tendency of low risk exposures to outperform high risk exposures on a risk-adjusted basis).

Maybe the mean estimates in the mean-variance optimization are just terrible and the results are less an indictment on MVO than on the particular mean estimation technique used. To some extent, the difficulty of estimating means is a major part of the argument for equal-weighting or other heuristic or shrinkage-based approaches. At the same time, we see a number of studies that estimate expected returns using sample means with long (i.e. 5 or 10 year) lookbacks. These long-term horizons are exactly the period over which returns tend to mean revert and so the evidence would suggest these are precisely the types of mean estimates you wouldn’t want to use. To properly test mean-variance, we should at least use mean estimates that have a chance of succeeding.

All this is a long-winded way of saying that it can be difficult to use the results from research papers to build a robust, general purpose portfolio optimizer. The results may have limited value outside of the very specific circumstances explored in that particular paper.

That being said, this does not give us an excuse to stop trying. With that preamble out of the way, we’ll return to our regularly scheduled programming.

Estimation Risk in Portfolio Optimization

Estimation risk is the risk that the inputs to the portfolio optimization process (i.e. expected returns, volatilities, correlations) are imprecisely estimated by sampling from the historical data, leading to suboptimal allocations.

One popular approach to dealing with estimation risk is to simply ignore parameters that are hard to estimate. For example, the naïve 1/N portfolio, which allocates an equal amount of capital to each investment in the universe, completely foregoes using any information about the distribution of returns. DiMiguel, Garlappi and Uppal (2007)[1] tested fourteen variations of sample-based mean-variance optimization on seven different datasets and concluded that “…none is consistently better than the 1/N rule in terms of Sharpe Ratio, certainty-equivalent return, or turnover, which indicates that, out of sample, the gain from optimal diversification is more than offset by estimator error.”

Another popular approach is to employ “shrinkage estimators” for key inputs. For example, Ledoit and Wolf (2004)[2] propose shrinking the sample correlation matrix towards (a fancy way of saying “averaging it with”) the constant correlation matrix. The constant correlation matrix is simply the correlation matrix where each diagonal element is equal to the pairwise average correlation across all assets.

Generally speaking, shrinkage involves blending an “unstructured estimator” like the sample correlation matrix with a “structured estimator” like the constant correlation matrix that tries to represent the data with few free parameters. Shrinkage tends to limit extreme observations, thereby reducing the unwanted impact that such observations can have on the optimization result.

Interestingly, the common practice of imposing a short-sale constraint when performing mean-variance optimization or minimum variance optimization is equivalent to shrinking the expected return estimates[3] and the covariance estimates[4], respectively.

Both papers that we’ll discuss here are alternate ways of performing shrinkage.

Applying Machine Learning to Reduce Estimation Risk

The first paper, Reducing Estimation Risk in Mean-Variance Portfolios with Machine Learning by Daniel Kinn (2018)[5], explores using a standard machine learning approach to reduce estimation risk in portfolio optimization.

Kinn’s approach recognizes that estimation error can be decomposed into two sources: bias and variance. Both bias and variance result in suboptimal results, but in very different ways. Bias results from the model doing a poor job of capturing the pertinent features of the data. Variance, on the other hand, results from the model being sensitive to the data used to train the model.

To get a better intuitive sense of bias vs. variance, consider two weather forecasters, Mr. Bias and Ms. Variance. Both Mr. Bias and Ms. Variance work in a town where the average temperature is 50 degrees. Mr. Bias is very stubborn and set in his ways. He forecasts that the temperature will be 75 degrees each and every day. Ms. Variance, however, is known for having forecasts that jump up and down. Half of the time she forecasts a temperature of 75 degrees and half of the time she forecasts a temperature of 25 degrees.

Both forecasters have roughly the same amount of forecast error, but the nature of their errors are very different. Mr. Bias is consistent but has way too rosy of a picture of the town’s weather. Ms. Variance on the other hand, actually has the right idea when it comes to long-term weather trends, but her volatile forecasts still leave much to be desired.

The following graphic from EliteDataScience.com gives another take on explaining the difference between the two concepts.

Source: https://elitedatascience.com/bias-variance-tradeoff

When it comes to portfolio construction, some popular techniques can be neatly classified into one of these two categories. The 1/N portfolio, for example, has no variance (weights will be the same every period), but may have quite a bit of bias if it is far from the true optimal portfolio. Sample-based mean-variance options, on the other hand, should have no bias (assuming the underlying distributions of asset class returns does not change over time), but can be highly sensitive to parameter measurements and therefore exhibit high variance. At the end of the day, we are interested in minimum total estimation error, which will generally involve a trade-off between bias and variance.

Source: https://elitedatascience.com/bias-variance-tradeoff

Finding where this optimal trade-off lies is exactly what Kinn sets out to accomplish with the machine learning algorithm described in this paper. The general outline of the algorithm is pretty straightforward:

Identify the historical data to be used in calculating the sample moments (expected returns, volatilities, and correlations).
Add a penalty function to the function that we are going to optimize. The paper discusses a number of different penalty functions including Ridge, Lasso, Elastic Net, and Principal Component Regression. These penalty functions will effectively shrink the estimated parameters with the exact nature of the shrinkage dependent on the penalty function being used. By doing so we introduce some bias, but hopefully with the benefit of reducing variance even further and as a result reducing overall estimation error.
Use K-fold cross-validation to fit the parameter(s) of the penalty function. Cross-validation is a machine learning technique where the training data is divided in various sets of in sample and out of sample data. The parameter(s) chosen will be those that produce the lowest estimation error in the out of sample data.
Using the optimized parameters from #3, fit the model on the entire training set. The result will be the optimized portfolio weights for the next holding period.

Kinn tests three versions of the algorithm (one using a Ridge penalty function, one using a Lasso penalty function, and one using principal component regression) on the following real-world data sets.

20 randomly selected stocks from the S&P 500 (covers January 1990 to November 2017)
50 randomly selected stocks from the S&P 500 covers January 1990 to November 2017)
30 industry portfolios using stocks listed on the NYSE, AMEX, and NASDAQ covers January 1990 to November January 2018)
49 industry portfolios using stocks listed on the NYSE, AMEX, and NASDAQ covers January 1990 to November January 2018)
200 largest cryptocurrencies by market value as of the end of 2017 (if there was ever a sign of a 2018 paper on portfolio optimization it has to be that one of the datasets relates to crypto)
1200 cryptocurrencies observed from September 2013 to December 2017

As benchmarks, Kinn uses traditional sample-based mean-variance, sample-based mean-variance with no short selling, minimum variance, and 1/N.

The results are pretty impressive with the machine learning algorithms delivering statistically significant risk-adjusted outperformance.

Here are a few thoughts/comments we had when implementing the paper ourselves:

The specific algorithm, as outlined in the paper, is a bit inflexible in the sense that it only works for mean-variance optimization where the means and covariances are estimated from the sample. In other words, we couldn’t use the algorithm to compute a minimum variance portfolio or a mean-variance portfolio where we want to substitute in our own return estimates. That being said, we think there are some relatively straightforward tweaks that can make the process applicable in these scenarios.
In our tests, the parameter optimization for the penalty functions was a bit unstable. For example, when using the principal component regression, we might identify two principal components as being worth keeping in one month and then ten principal components being worth keeping in the next month. This can in term lead to instability in the allocations. While this is a concern, it could be dealt with by smoothing the parameters over a number of months (although this introduces more questions like how exactly to smooth and over what period).
The results tend to be biased towards having significantly fewer holdings than the 1/N benchmark. For example, see the righthand chart in the exhibit below. While this is by design, we do tend to get wary of results showing such concentrated portfolios to be optimal especially when in the real world we know that asset class distributions are far from well-behaved.

Applying Subset Resampling to Reduce Estimation Error

The second paper, Portfolio Selection via Subset Resampling by Shen and Wang (2017)[6], uses a technique called subset resampling. This approach works as follows:

Select a random subset of the securities in the universe (e.g. if there are 30 commodity contracts, you could pick ten of them).
Perform the portfolio optimization on the subset selected in #1.
Repeat steps #1 and #2 many times.
Average the resulting allocations together to get the following result.

The table below shows an example of how this would work for three asset classes and three simulations with two asset classes selected in each subset.

One way we can try to get intuition around subset resampling is by thinking about the extremes. If we resampled using subsets of size 1, then we would end up with the 1/N portfolio. If we resampled using subsets that were the same size as the universe, we would just have the standard portfolio optimized over the entire universe. With subset sizes greater than 1 and less than the size of the whole universe, we end up with some type of blend between 1/N and the traditionally optimized portfolio.

The only parameter we need to select is the size of the universe. The authors suggest a subset size equal to n^0.8 where n is the number of securities in the universe. For the S&P 500, this would correlate to a subset size of 144.

The authors test subset resampling on the following real-world data sets.

FF100: 100 Fama and French portfolios spanning July 1963 to December 2004
ETF139: 139 ETFs spanning January 2008 to October 2012
EQ181: Individual equities from the Russell Top 200 Index (excluding those stocks with missing data) spanning January 2008 to October 2012
SP434: Individual equities from the S&P 500 Index (excluding those stocks with missing data) spanning September 2001 to August 2013.

As benchmarks, the authors use 1/N (EW); value-weighted (VW); minimum-variance (MV); resampled efficiency (RES) from Michaud (1989)[7]; the two-fund portfolio (TZT) from Tu and Zhou (2011)[8], which blends 1/N and classic mean-variance; the three-fund portfolio (KZT) from Kan and Zhou (2007)[9] which blends the risk-free asset, classic mean-variance, and minimum variance; the four fund portfolio (TZF) from Tu and Zhou (2011) which blends KZT and 1/N; mean-variance using the shrinkage estimator from Ledoit and Wolf (2004) (SKC); and on-line passive aggressive mean reversion (PAMR) from Li (2012)[10].

Similar to the machine learning algorithm, subset resampling does very well in terms of risk-adjusted performance. On three of the four data sets, the Sharpe Ratio of subset resampling is better than that of 1/N by a statistically significant margin. Additionally, subset resampling has the lowest maximum drawdown in three of the four data sets. From a practical standpoint, it is also positive to see that the turnover for subset resampling is significantly lower than many of the competing strategies.

As we did with the first paper, here are some thoughts that came to mind in reading and re-implementing the subset resampling paper:

As presented, the subset resampling algorithm will be sensitive to the number and types of asset classes in an undesirable way. What do we mean by this? Suppose we had three uncorrelated asset classes with identical means and standard deviations. We use subset resampling with subsets of size two to compute a mean-variance portfolio. The result will be approximately 1/3 of the portfolio in each asset class, which happens to match the true mean-variance optimal portfolio. Now we add a fourth asset class that also has the same mean and standard deviation but is perfectly correlated to the third asset class. With this setup, the third and fourth asset classes are one in the same. As a result, the true mean-variance optimal portfolio will have 1/3 in the first and second asset classes and 1/6 in the third or fourth asset class (in reality the solution will be optimal as long as the allocations to the third and fourth asset classes sum to 1/3). However, subset resampling will produce a portfolio that is 25% in each of the four asset classes, an incorrect result. Note that this is a problem with many heuristic solutions, including the 1/N portfolio.
There are ways that we could deal with the above issue by not sampling uniformly, but this will introduce some more complexity into the approach.
In a mean-variance setting, the subset resampling will dilute the value of our mean estimates. Now, this should be expected when using any shrinkage-like approach, but it is something to at least be aware of. Dilution will be more severe the smaller the size of the subsets.
In terms of computational burden, it can be very helpful to use some “smart” resampling that is able to get a representative sampling with fewer iterations that a naïve approach. Otherwise, subset resampling can take quite a while to run due to the sheer number of optimizations that must be calculated.

Performing Our Own Tests

In this section, we perform our own tests using what we learned from the two papers. Initially, we performed the test using mean-variance as our optimization of choice with 12-month return as the mean estimate. We found, however, that the impact of the mean estimate swamped that of the optimizations. As a result, we repeated the tests, this time building minimum variance portfolios. This will isolate the estimator error relating to the covariance matrix, which we think is more relevant anyways since few practitioners use sample-based estimates of expected returns. Note that we used the principal component regression version of the machine learning algorithm.

Our dataset was the 49 industry portfolios provided in the Fama and French data library. We tested the following optimization approaches:

EW: 1/N equally-weighted portfolio
NRP: naïve risk parity where positions are weighted inversely to their volatility, correlations are ignored
MV: minimum variance using the sample covariance matrix
ZERO: minimum variance using sample covariance matrix shrunk using a shrinkage target where all correlations are assumed to be zero
CONSTANT: minimum variance using sample covariance matrix shrunk using a shrinkage target where all correlations are equal to the sample pairwise correlation across all assets in the universe
PCA: minimum variance using sample covariance matrix shrunk using a shrinkage target that only keeps the top 10% of eigenvectors by variance explained
SSR: subset resampling
ML: machine learning with principal component regression

The results are presented below:

Results are hypothetical and backtested and do not reflect any fees or expenses. Returns include the reinvestment of dividends. Results cover the period from 1936 to 2018. Past performance does not guarantee future results.

All of the minimum variance strategies deliver lower risk than EW and NRP and outperform a risk-adjusted basis although none of the Sharpe Ratio differences are significant at a 5% confidence level. Of the strategies, ZERO (shrinking with a covariance matrix that assumes zero correlation) and SSR (subset resampling) delivered the highest Sharpe Ratios.

Conclusion

Portfolio optimization research can be challenging due to the plethora of factors that can influence results, making it hard to generalize results outside of the specific cases tested. It can be difficult to ascertain whether the conclusions are truly attributable to the optimization processes being tested or some other factors.

That being said, building a robust portfolio optimization engine requires a diligent focus on estimation risk. Estimation risk is the risk that the inputs to the portfolio optimization process (i.e. expected returns, volatilities, correlations) are imprecisely estimated by sampling from the historical data, leading to suboptimal allocations.

We summarize the results from two recent papers we’ve reviewed on the topic of managing estimation risk. The first paper relies on techniques from machine learning to find the optimal shrinkage parameters that minimize estimation error by acknowledging the trade-off between bias and variance. The second paper uses a form of simulation called subset resampling. In this approach, we repeatedly select a random subset of the universe, optimize over that subset, and then blend the subset results to get the final result.

Both papers report that their methodologies outperform various heuristic and optimization-based benchmarks. We feel that both the machine learning and subset resampling approaches have merit after making some minor tweaks to deal with real world complexities.

We perform our own tests by building minimum various portfolios using the 49 Fama/French industry portfolios. We find that while both outperform equal-weighting on a risk-adjusted basis, the results are not statistically significant at the 5% level. While this highlights that research results may not translate out of sample, this certainly does not disqualify either method as potentially being useful as tools to manage estimation risk.

[1] Paper can be found here: http://faculty.london.edu/avmiguel/DeMiguel-Garlappi-Uppal-RFS.pdf.

[2] Paper can be found here: http://www.ledoit.net/honey.pdf

[3] DiMiguel, Garlappi and Uppal (2007)

[4] Jagannathan and Ma (2003), “Risk reduction in large portfolios: Why imposing the wrong constraints helps.”

[5] Paper can be found here: https://arxiv.org/pdf/1804.01764.pdf.

[6] Paper can be found here: https://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14443

[7] Paper can be found here: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2387669

[8] Paper can be found here: https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=2104&context=lkcsb_research

[9] Paper can be found here: https://www.cambridge.org/core/journals/journal-of-financial-and-quantitative-analysis/article/optimal-portfolio-choice-with-parameter-uncertainty/A0E9F31F3B3E0873109AD8B2C8563393

[10] Paper can be found here: http://research.larc.smu.edu.sg/mlg/papers/PAMR_ML_final.pdf

Category: Portfolio Construction Page 7 of 10

When Simplicity Met Fragility

Summary­

Introduction

So Close and Yet So Far

What Randomness Tells Us About Fragility

Conclusion

Trade Optimization

Summary

0. Initialize Python Libraries

1. Introduction

2. The Discreteness Problem

2.1 Example Data

2.2 Applying a Naive Convex Optimizer

3. Linear Programming Transformation Techniques

3.1 Absolute Values

3.2 Indicator Functions

3.3 Tying the Tricks Together

4. Building a Trade Minimization Model

5. A Sector Rotation Example

6. Possible Extensions & Limitations

6.1 Non-Linear Constraints

6.2 Time Constraints

7. Conclusion

The State of Risk Management

Summary

The Historical Record

Not Always a Smooth Ride

Thinking of Risk Management as (Uncertain) Portfolio Insurance

Diversifying Your Diversifiers

Why Bother with Risk Management in the First Place?

Conclusion

Measuring Process Diversification in Trend Following

Summary­

The Setup

Diversifying Parameterization Risk

Diversifying Model Risk

Combining Model and Parameterization Diversification

Conclusion

Machine Learning, Subset Resampling, and Portfolio Optimization

Summary

Estimation Risk in Portfolio Optimization

Applying Machine Learning to Reduce Estimation Risk

Applying Subset Resampling to Reduce Estimation Error

Conclusion

Summary

Summary