Part of being a quant means having an intuitive understanding of the methodologies you are utilizing.  Our job is to utilize the mathematical process that lines up with our purpose at hand.

[Edit: In the spirit of openness, there is a discussion about this methodology on Cross Validated about the replicability of these results as well as the mathematical rigor.  This post was not meant as an introduction of a defined, new measure, but just a work in progress.  Please take it only as a pragmatic measure that I was toying around with and nothing more serious than than.]

Correlation is one of those metrics that has always bugged me.  We’ve blogged about how it can be a deceiving metric before, but never offered up more of a solution than simply, “make sure you assume a zero-mean.”  Correlation is incredibly important to us because it helps us determine how to manage model failure.  When our models are purely price based, a model will likely err at the same time on two highly correlated price series.  In other words, a lack of asset price movement diversification means a lack of model diversification.

Of course, getting a quantitative measure of “highly correlated” to match with my intuitive understanding has never worked out so well.  Correlated, for me, tends to mean: “oscillates with the same periodicity in the same overall trend.”  A daily correlation measure, as defined mathematically, is actually a measure of how divergent two time-series’s noise is (where noise is the divergence of a time-series from its average trend).  These definitions certainly don’t line up.

Well, I’ve been playing around with a new model for similarity that I think helps fit our intuitive understanding a little better.  Consider the following graph, which  shows how a dollar invested in the gold ETF GLD would have fared compared to a dollar invested in the S&P 500 ETF SPY.

Growth of $1 in GLD & SPY

My intuition of correlation is that these two assets exhibit negative correlation.  A traditional measure of correlation, even assuming a zero mean, is 27.03%.

Thinking more about what my intuitive definition was, I started playing around with a different measure.  First, I run a 20-day simple moving average over the time series.  Then, I calculate the %-difference of price from the moving average, which gives me a value that oscillates around zero (but can remain very positive or very negative for long trends).  Consider the following two graphs, which show these values for GLD and SPY.

GLD DivergenceSPY Divergence

Seeing how the blue area-graphs line up in almost opposite fashion, this metric may meet our intuitive definition well.  By taking the zero-mean correlation of the %-divergence values, we get a ‘similarity’ measure of -9% for SPY & GLD.

Obviously, this measure changes based on how long the moving average is.  The longer the moving average, the more we capture the “trend” element.  If I make the moving average a 40-day instead of a 20-day, I get a similarity measure of -54%.  The shorter the moving average, the more we capture day-to-day noise.  Technically, if we used a 1-day lagged moving average, we would actually be calculating correlation itself.

The measure is, by no means, finalized.  Just a work-in-progress that I thought I would share and see if anyone has played around with something similar.

[sourcecode language=”python” wraplines=”false” collapse=”false”]
import pandas
import numpy
import datetime
import collections
import as web

def rolling_std_zero_mean(df, lookback):
# Returns a rolling standard deviation where a mean
# of zero is assumed for all random variables
def _get_std(x):
return numpy.sqrt(numpy.mean(numpy.square(x)))

return pandas.rolling_apply(df, lookback, _get_std)

def rolling_corr_pairwise_zero_mean(df, lookback):
# Returns a rolling correlation matrix where a mean of
# zero is assumed for all random variables
all_results = collections.defaultdict(dict)

for i, k1 in enumerate(df.columns):
for k2 in df.columns[i:]:

std_k1 = rolling_std_zero_mean(df[k1], lookback)
std_k2 = rolling_std_zero_mean(df[k2], lookback)

joined = df[k1] * df[k2]

rolling_avg = pandas.rolling_mean(joined, lookback)

corr = rolling_avg / (std_k1 * std_k2)

all_results[k1][k2] = corr
all_results[k2][k1] = corr

return pandas.Panel.from_dict(all_results).swapaxes(‘items’, ‘major’)

if __name__ == ‘__main__’:
start_date = datetime.datetime(2012,10,1)
end_date = datetime.datetime(2013,5,15)

spy = web.get_data_yahoo(‘spy’, start_date, end_date)[‘Adj Close’]
gld = web.get_data_yahoo(‘gld’, start_date, end_date)[‘Adj Close’]

spy_ma = pandas.rolling_mean(spy, 20).dropna()
gld_ma = pandas.rolling_mean(gld, 20).dropna()

dist_spy = (spy / spy_ma – 1.).dropna()
dist_gld = (gld / gld_ma – 1.).dropna()

series = pandas.concat([dist_spy, dist_gld], axis = 1)

print rolling_corr_pairwise_zero_mean(series, lookback = len(series)).ix[-1][0][1]

Corey is co-founder and Chief Investment Officer of Newfound Research, a quantitative asset manager offering a suite of separately managed accounts and mutual funds. At Newfound, Corey is responsible for portfolio management, investment research, strategy development, and communication of the firm's views to clients. Prior to offering asset management services, Newfound licensed research from the quantitative investment models developed by Corey. At peak, this research helped steer the tactical allocation decisions for upwards of $10bn. Corey holds a Master of Science in Computational Finance from Carnegie Mellon University and a Bachelor of Science in Computer Science, cum laude, from Cornell University. You can connect with Corey on LinkedIn or Twitter.