Optimization is one of the primary tools in the quantitative portfolio manager's tool-belt. In this series of blog posts, we hope to de-mystify optimization and provide an intuition as to how it works and the potential problems practitioners face in using it.
In asset management, optimization is most often referred to in relation to computing asset weights, though its uses extend much further. It is often used in parameter selection for calibrating return or volatility forecasting models. It is used to calibrate derivative pricing models (e.g. deriving implied volatility using the Black-Scholes formula from market prices). It is also the basis for most machine learning algorithms such as least-squares regression, classification (logistic regression or SVM), maximum-likelihood estimation, and k-means clustering. Some practitioners have even gone so far as to select their trading rules based using an on-going optimization process (most often through evolutionary algorithms, which we will address in a later blog post).
But what is optimization? Let's consider an example. You're going to a party and your significant other has asked you to bake a cake. She told you that it should cook for an hour, but you've forgotten at what temperature. Fortunately, the party is several hours away, you have a lot of excess batter, and you think the temperature should be somewhere between 200-450°. How are we going to figure out what temperature to cook at? Through an optimization.
You put the first cake in the oven at 325° (half way between 200° and 450°) and after an hour it comes out undercooked. So you put the next cake in the oven at 387.5° (half way between 325° and 400°) and after an hour, it comes out burnt. So you put the third cake in at 356.25° (half way between 325° and 387.5°) and this time the cake comes out just right; she must have told you 350°!
In this example, we performed a process (the optimization algorithm) to identify the oven temperature (our parameter or input) to maximize the edibility of our cake (the "fitness" of our solution) when we cook it in the oven (our objective function). Optimization, then, is the process whereby we identify the parameters (the temperature) for our objective function (the oven) that lead to our optimal output (a perfectly cooked cake). The example highlights how optimization algorithms work:
- Calculate the current "fitness": What is the edibility of the cake we've cooked?
- Calculate the direction to move: Use our halving method, based on whether the cake is over or under cooked.
- Determine when to stop: Is our cake perfectly cooked? (Is the result optimal?) If not, go back to step #1!
As another example, let's say we had the function , which we want to find the minimum of over the bounds . While the function may not have a real world interpretation, it is no different than our cake example:
- f(x), our function, is like our oven: it takes an input (a value for 'x' / temperature), performs some work, and provides an output (some value for f(x) / our cooked cake)
- x, our parameter, is like the temperature, defining how the oven will operate
- defines the output, similar to the internal calculation of the oven, which takes the temperature and figures out how much to heat up the coils, defining what the outcome of the cake
- is like our temperature boundaries
This function is plotted below:
Now, the computer does not have the benefit of looking at the graph the way we do to identify the minimum point; it only knows that . A simple way to think about running an optimization as it relates to this function would be to have the computer model itself as a marble rolling along the top of the function and letting gravity and friction compete until marble comes to rest in a stable location. Below we simulate this process.
The simulation again highlights how many optimization functions work:
- Calculate the current "fitness": The marble's current height on the surface
- Calculate the direction to move: Gravity constantly pulls the marble towards lower ground
- Determine when to stop: Friction eventually overpowers gravity and the marble stops moving
Unlike the cake example, our intuition of dropping a marble also highlights the potential flaws of this method:
- What happens if we drop the marble at , where the surface is completely flat? We've found a local minimum, but not a global minimum. (Actually, it isn't a minimum at all -- it's called an inflection point -- but our algorithm doesn't know that!)
- What happens if we drop the marble at ? The marble will roll to the right, out of bounds, and we won't have found any minimum! We've had what's called a failure to converge!
- The marble keeps oscillating around the solution before it converges; can't we converge faster?
- Does the marble keep skipping a bit? It sure does! That's algorithmic and numerical inaccuracy at work. We are, after all, only 'modeling' the physics.
These problems exemplify the issues that face optimization researchers: convergence speed, precision, and robustness. Often an increase in one area is at the expense of another. Consider, to increase robustness and precision, we decide to drop 10 marbles in random locations. Wherever the most marbles converge to, we'll call our winning location. So we've tried to increase precision and robustness -- but now we have to simulate 10 marbles instead of 1, meaning we've increased our computational workload by a factor of 10 and thereby decreased how quickly our model will converge to the correct answer.
We can see in the above simulation that while several marbles stopped at sub-optimal solutions, the majority converged to the correct solution, the global minimum.
Not all problems are so easily solved. Consider Rastrigen's function, plotted in three dimensions:
Dropping marbles randomly on this surface won't be any good! As our function terrain gets more complex, so must our methodologies. However, the same basic intuition always applies: (1) evaluate our current solution(s), (2) move to potentially better locations, (3) determine if we should stop.
In the next post of the series, we'll discuss some of the dangers of using optimization tools. Stay tuned!
For those interested, the code used to simulate the movement of the ball is below:
import numpy import math n = 1 f = lambda x: numpy.sin(x)*(x**2) x = -5. def sign(x): return x >= 0 eps = 1e-8 time_step = 0.001 last_x = float('inf') v = 0 m = 1 # kg g = 9.8 # m/s^2 f_g = m * g kinetic_coefficient_of_friction = 0.4 static_coefficient_of_friction = 0.9 while (last_x - x)**2 > 1e-16: last_x = x slope = (f(x+eps) - f(x-eps)) / (2 * eps) angle = math.atan(slope) # are we kinetic or static? apply the right friction coefficient_of_friction = kinetic_coefficient_of_friction if abs(v) < 1e-8: coefficient_of_friction = static_coefficient_of_friction if sign(angle) == sign(v) and abs(v) > 1e-8: # we are going up hill f_x = -coefficient_of_friction * f_g * numpy.cos(angle) - f_g * numpy.sin(angle) else: # we are going down hill f_x = coefficient_of_friction * f_g * numpy.cos(angle) - f_g * numpy.sin(angle) acceleration = f_x / m v = v + acceleration * time_step x = x + v * time_step print x