In the last post of the series, Understanding Optimization: Intuition, we provided an overview of what optimization is and provided an intuition for how it works for non-practitioners. We left off with an image of the Rastrigin function, a complex terrain of hills and valleys that would stump our naive optimization methods. In this post, we will explain the many factors that can lead to an optimization algorithm returning a sub-optimal solution.

The Rastrigin function falls into a category of functions that are used to test optimization algorithms for speed of convergence, robustness, and precision (for similar functions, see an excellent taxonomy at Wikipedia). Two other functions Rosenbrock's Valley and Schwefel's Function, plotted below, demonstrate other complex terrains that optimization algorithms may face with real world data: shallow plateaus and jagged surface terrain.

The simple marble algorithm described in our intuition post, even when extended it to multiple dimensions, would likely get stuck on any of these examples unless we knew precisely where to begin our search (which is unlikely for most real-world problems).

To complicate matters further, the surfaces above only show a two dimensional case (i.e we want to find the parameters x and y that finds the lowest point on the surface / minimize the function f(x,y)); these problems become much more difficult when the dimensionality of the problem increases! Below, we've plotted the functional surface for the Ackley function (another optimization test function) under 3 dimensions (x, y, and "time").

Not only do we have to find the x and y locations of the minimum (or maximum) point on the surface, but the surface keeps changing depending on our time parameter! To visualize adding yet another dimension to this problem would require creating a video for every possible value the new parameter could take on. In higher dimensions, we quickly lose the ability to visualize, which is too bad, because many of the problems we face are high dimensional. For example, when we perform a portfolio optimization, the allocation for each asset represents a new dimension; 30 assets in a portfolio means 30 dimensions we have to search over!

Optimization is just a process. When the surfaces are complex, there is no guarantee that the process we employ will actually find the optimal parameters for our fitness function. Sometimes, the process will converge to a location on the surface that it thinks is optimal, but is actually just a *local* minimum, not the *global* minimum. If we then use these parameters as an input to another part of our portfolio process, we have introduced an inefficiency that will lead to a sub-optimal solution.

We can try to solve for this issue by limiting our optimization to a classification of functions, known as *convex functions*, that are easy to optimize. However, even imposing this limitation does not guarantee that optimal parameters will be found for our function. Why? There are at least two other major sources of error: the specification of the problem and the data we use.

Let's consider an industry-based example: mean-variance optimization (MVO). Fortunately, MVO falls into the category of *convex functions, *so we don't have to worry too much about whether our optimization process will return a sub-optimal solution. But there are two other major sources of error that could result in sub-optimal allocations for our investor.

**Wrong Data**: Do we*know*, with certainty, expected return, volatility and correlation values for our assets? One of the assumptions of MVO is that we do. If we are using data sampling techniques to derive these values, we likely have a degree of uncertainty; depending on how far the "true" statistics are from our estimated statistics can lead to dramatically different results than the "optimal" solution. Like all algorithms, optimization is subject to the "GIGO" rule:*garbage in, garbage out*.**Wrong Objective Function**: In its simplest form, MVO maximizes risk-adjusted return (subject to constraints), defining risk as volatility in returns. As we evaluate our results, we should always ask ourselves if the function we are optimizing over is actually applicable. In other words, does performing a MVO actually satisfy the needs of investors? Does it take into account all of their complex goals and risk profiles? Perfect data and optimal parameters do not matter if we have mis-specified our problem.

If we have a mis-specified problem, bad data, complex functional terrain or a poorly chosen optimization algorithm, we will end up with sub-optimal results. In fact, the results will probably be far more sub-optimal than if we just used "common sense" numbers and heuristics for our parameters in whatever problem we are working on. For evidence, look no further than the fact that many institutions follow a 60/40 equity/bond framework instead of performing MVO.

Like most tools, the Law of the Instrument applies to optimization, described by Abraham Maslow in 1966: "I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail." As quantitative practitioners, it is important that we understand the limitations of our tools.

In our next post of the series, we will provide an overview of different optimization algorithms that are used in the industry, how they work, their strengths, and their weaknesses.