The team over at ReSolve recently posted about their very unique March Madness Challenge. The crux of their idea is that the rules governing a more traditional bracket system is fundamentally flawed since it inherently reduces the sample size upon which skill is measured.
For example, nearly everyone in the bracket will choose the #1 seed to beat the #16 seed in each region, eliminating the importance of those four games. Similarly, an early upset of a team picked to go deep will eliminate entire branches of games as well.
Instead, they offer an ingenious solution: assign each team a points-per-win (“PPW”) score that they accumulate for each win and our objective is to pick a portfolio of teams that is designed to maximize the total number of points earned over the entire bracket.
Building a Naïve Solution
At the start of the game, we receive the PPW assigned to each team. The method of assigning points-per-win is based upon the natural log of the inverse probability that the team will win the tournament.
Assuming we know nothing else, I would argue there are two naïve solutions:
- Equal-weight: Equal-weight is often a surprisingly difficult benchmark. In portfolio terms, we’re basically saying: “I have no idea about anything, so I am assuming each team has the same expectation for total points.”
- Inverse PPW: Weighting in proportion to inverse PPW is effectively saying, “I think each team will win a number of games in inverse proportion to its PPW, and so I want each team to have the same marginal contribution to portfolio expected point total.”
An Informational Edge
If we’re going to go beyond a naïve solution, we better have a strong conviction that we have an edge.
In this case, I see two big glaring edges.
- PPW relies upon the odds of a team winning the entire bracket, but points are awarded for the number of wins. So there may be teams with a very high likelihood of winning 3 or 4 games, but no chance of winning the 5th. These teams will likely have a high PPW we can exploit.
- The realized total PPW will have a strict dependence structure. If we can model this co-dependence, we can likely build a portfolio that embraces diversification opportunities.
So if I can find a way to back out expected number of wins, I might be able to identify which teams are good value plays.
While not an intended edge, it is worth pointing out that upon receiving the PPW for each team, I identified two more potential edges. First is a time edge: allocations were not due until Wednesday at 11:59pm EST. The play in games happened on Tuesday, so we would expect the odds of winning the entire tournament to change between when the PPWs were delivered and then the game started. I decided not to exploit this edge. Second was a precision edge. The PPWs appeared to be calculated using odds from betting markets. Theoretically, a highly liquid betting market should provide incredibly precise However, these markets may not be highly liquid for a number of reasons, including a lack of interest in betting on low probability teams. If I can identify a source that provides more accurate probabilities, we might be able to identify value.
Fortunately, the quants over at FiveThirtyEight provide a very detailed data breakdown of tournament stats (hat tip to Justin for pointing this out to me). Downloading the forecast data, I'm greeted with some very valuable information: the probability that each team wins a particular round.
This data is valuable because it gives me the ability to back-out the probability that a team wins a certain number of games, which in turn will allow us to compute the number of points we can expect to accumulate from each team.
Ignoring the “play in” seeds at the moment, here is the basic math. Given the probability of winning a given round for each team,
- The probability of winning 1 game is the probability of winning round 1 times the probability of losing the second game.
- The probability of winning 2 games is the probability of winning round 2 times the probability of losing the third game.
- The probability of winning 3 games is the probability of winning round 3 times the probability of losing the fourth game.
So to find the probability of winning 3 games, we need to know the probability of losing the 4th game. To identify the (unconditional) probability of winning or losing any given game should be fairly trivial, however:
- The probability of winning game 1 is the probability of winning round 1.
- The probability of winning round 2 is the probability of winning round 1 times the probability of winning game 2. Therefore, the probability of winning game 2 is the probability of winning round 2 divided by the probability of winning round 1.
Now with this data, I can identify the expected number of games won by each team and therefore the number of points each team is expected to collect.
Now I could stop here and simply weight based in proportion to expected total score. This is basically a constrained Sharpe optimal portfolio where we assume each team has the same variance and correlations are zero.
But that last assumption is definitely not true...
Start Up the Simulation Engine
The problem with stopping at the above step is that I know certain teams have a tremendous amount of dependence on one another, while others do not. For example, if two teams are in different conferences, then they are unaffected by one another until they reach game 4 or 5. On the other hand, if two teams face each other in round 1, their fates are negatively correlated.
To handle this dependency, I built a bracket simulation engine. The engine takes into account the tournament lineup as well as each team’s unconditional probability of winning each game to simulate a random potential bracket outcome. I then took the number of games won by each team in that bracket and multiplied it by PPW to get the total score per team for that simulation.
I ran 10,000 simulations, coming up with 10,000 sample final potential scores for each team. More importantly, these simulations help provide an understanding how how scores will co-vary.
Controlling for Model & Data Risk: Resampling
Given these simulations, I could easily compute vector of expected total scores and a variance-covariance matrix. The problem with this approach, as I see it, is that the entire bracket simulation exercise has a tremendous amount of path dependency, so upsets create a tremendous amount of tail risk.
Looking at all 10,000 samples at once would potentially smooth over these risks too much.
So I used a resampled efficient frontier approach, where I select a random subset of 500 simulations at a time and build an efficient frontier. I did this 1000 times and average the efficient frontiers together to get our resampled frontier.
This process also helps me add in some extra randomness to account for the fact that while FiveThirtyEight’s data may be more numerically precise, it doesn’t actually mean it is necessarily more accurate (tip: don’t confuse precision for accuracy!). This may be an entirely futile case of garbage in, garbage out if FiveThirtyEight’s predictions are bad. So adding a bit of randomness to the process can help me control for model and data risk.
Going for Gold
At this point, I'm left with a frontier of expected total points versus variance in total points.
Now I needed to decide how I was going to turn this into a winning allocation. Ultimately, I need to take on enough risk to accumulate more points than our opponents, but not so much risk I blow up.
I ultimately decided to target an expected total point level (one sufficiently high above the expected total point level of a naïve solution) while simultaneously minimizing variance. The effort here is to find the point on the efficient frontier where I think I can safely beat the competition without taking on undue risk.
Without further ado, here are the top allocations:
There are a few things about this allocation I think are worth noting:
- This appears to be a fairly concentrated portfolio in highly seeded, quality teams.
- In actuality, the portfolio is pretty well diversified across regions as well as teams within regions that won’t possibly play each other until late in the tournament. The only two teams that could face each other before game 3 are VCU and Oklahoma.
- There are a few lower seeded teams (e.g. VCU and Texas) that garner an allocation because while their probability of winning the tournament is low, the expected number of games they will win is high. For example, VCU’s odds of winning the tournament are 400:1, giving them a PPW of 5.99. Based on my calculations, however, the expected number of games they will win is 0.97. I'd call this a value play. On the other hand, Kansas, who is very highly seeded, gets a very low weight. What happened here? While Kansas is expected to win 3.49 games, Villanova, the 2nd seed in the region, is expected to win 2.62 games. So Kansas has an expected point total of 5.61 while Villanova has an expected point total of 7.84. So Villanova is likely a better value for our allocation than Kansas is.
Where are the Risks?
This wouldn't be a complete exercise without an analysis of the risks.
- Data Risk: The entire exercise relies on data coming from FiveThirtyEight. If their predictions end up being wildly inaccurate, so will our allocations.
- Model Risk: In building my bracket simulator, I relied on the unconditional probability of each team winning a given game. In other words, Villanova has an unconditional probability of winning game 2, but a conditional probability of winning game 2 depending on who their opponent is. Our brackets would be more realistic if I used the conditional probabilities – but I did not feel like I had a good source for this data. Furthermore, I used a variance-covariance matrix to model the joint behavior between teams. This may have been a totally inappropriate model of dependence.
- Coding Risk: I whipped together the code for the simulator in short order. I wouldn't be horribly surprised if there was a bug lurking in there that throws off the results.
- Objective Risk: The selected optimization objective was to target an expected point level while minimizing variance. Ultimately the target point level may prove to have been too low, or we may not have taken on enough potential positive risk. For example, given that I had all the simulations, I could have optimized to minimize conditional value-at-risk.
I really enjoyed the novelty of this exercise and the ability to leverage portfolio construction techniques in a March Madness bracket. I particularly liked how the resulting portfolio – which was entirely quantitatively designed – aligned with reasonable expectations about identifying value and embracing diversification opportunities.
That said, I think what was particularly illuminating about this exercise is that it highlights the difference between "statistically optimal over 10,000 samples" and "winning for a single realization." If March Madness were played 10,000 times, I'd probably have a pretty good portfolio. But with only one March Madness in 2016, an upset or two could make my portfolio woefully sub-optimal. To quote Justin from his commentary this week:
"The exact future we end up in may make diversification a total drag or a total benefit. Diversification is important precisely because we do not know what the future holds. "