About a year ago, we wrote a post about the difficulty of estimating correlations from historical data. That post showed that even when we know the distribution of the returns of two, uncorrelated assets, our estimate of this correlation using historical returns can vary widely. This study repeatedly sampled returns from the known distribution to estimate the correlation and yielded a 95% confidence interval of -0.21 to 0.21, which, although centered around the true value of 0, shows a significant amount of uncertainty (the range was from -0.54 to 0.59!). The post wraps up with three points about real-world returns that make estimating correlation even trickier:
- Correlation regime changes
- Non-linear return relationships (think correlations going to 1 in market crashes)
This illustrates not only that correlations are inherently difficult to estimate, but also a more general concept that we face every day: an estimate of a parameter is itself a random variable with its own distribution.
Continuing with the correlation example, how can we quantify how far off our estimate might be when we do not have the luxury of knowing the underlying distributions and when we have one sample of returns?
The simplest approach is to stick with the assumption that the underlying distributions are normal and use the formula for standard error for the correlation coefficient.
With an estimated correlation coefficient of 0.21, the standard error is about 0.13, so the 95% confidence interval includes 0. Using this method, estimated correlation coefficients in the range of -0.25 to 0.25 are not statistically significant from 0 using n=60. Increasing the number of observations used in the calculation would lower this error, but using a longer historical window likely reduces that applicability of the normality assumption.
Other techniques exist for computing the standard error of the correlation coefficient estimate, but is a more refined standard error estimate necessary? If I already cannot be sure whether a correlation is 0.25 or -0.25, a technique like mean-variance optimization sure won't yield robust results. The three bullet points listed above tell me that estimating correlations is an inherently losing battle. The joint distribution of returns can constantly evolve thereby rendering my forward looking correlation estimate even more prone to error. Perhaps the most important thing to realize is simply that the estimate can be far away from the true value of the parameter. Knowing that the estimate has an error is the first step toward ensuring that investment strategies are robust to changes in these parameters.
Testing models and strategies requires not only testing changes in the inputs stemming from our imperfect estimates (as quantified by the standard error), but also from changes in the actual parameters (i.e. their true value we are trying to estimate) caused by evolution of the markets. Assuming that the underlying relationships do not change, we can find the first layer of sensitivity by answering a question like "How does the strategy perform when the actual correlations are 2 standard errors away from the estimate?" The second layer of sensitivity can be found by answering a question like "How does it perform when correlations go to 1 during a crisis?"
These are the types of questions that underpin sound evaluations of strategies and provide insight into the probable set of risks. Reducing the number of estimated parameters and assumptions that feed into a model simplifies this process greatly.