Data is one of the key drivers in the finance industry. Whether it is asset prices, housing starts, the latest unemployment statistics, or interest rates, millions of decisions are made every day based on data. Thus, obtaining accurate data from reliable sources is one essential part of making informed decisions.
Over the past two decades, the Internet has made data much more accessible. If I want to see the price of long-term U.S. Treasuries, I can just type ‘TLT’ into Google and see the current price of the iShares Barclays 20+ Year Treasury Bond ETF in a fraction of a second at the top of the search results. Even if I require data for something like corn futures, options on Apple stock, or the consumer price index (CPI), I can find them through a bit of web searching or directly if I know where to look (CME Group, YahooFinance, St. Louis FED, respectively).
For investors looking to do data analysis on their own (e.g. testing strategies, researching historical relationships, visualizations, etc.), finding the data can be a huge hurdle, and even if a source is located, it may be prohibitively expensive or of questionable accuracy. Regardless of how easy the analysis will be, no data equals no results.
One great solution to this problem is Quandl. Quandl is a data aggregation site that provides transparency of the sources and ease of access with tools to pull it using R, Python, Matlab, or their own API. They have a wide array of basic financial data, such as stock, bond, and mutual fund prices, exchange rates, and futures, along with a wealth of macroeconomic and demographic data. They even have some more esoteric datasets like Bitcoin prices.
As an example of how Quandl can be used, let’s look at a quick example. We will pull two datasets and determine the correlation between their values. The following bit of Python code pulls the Property Crime Rate in the U.S. and Percent of Population with Improved Access to Sanitation Facilities in Vanuatu and calculates the correlation.
sanitation = Quandl.get("WORLDBANK/VUT_SH_STA_ACSN")['Value']
crime = Quandl.get("FBI/CRIME11")['Property Crime Rate']
correlation = numpy.corrcoef(crime, sanitation)[0,1]
The correlation between the two datasets is -0.95. We can also do the same analysis in Excel using files downloaded directly from Quandl. Here is a scatterplot of the data, which shows that even if a causal relationship did exist between the two variables, a linear fit over the entire range would be a stretch. (An interesting site that has a variety of these spurious correlations can be found here)
There are two takeaways from this:
1) Sites like Quandl can be used to access data very easily and can integrate well with software to do very quick analyses.
2) We must always keep our heads on straight while doing data analysis despite how easy it is to get the data, lest we conclude that increasing the access to improved sanitation on Vanuatu will lower the U.S. property crime rate.
Having data at our fingertips widens the realm accessible to investors and greatly simplifies one aspect of investing. Nevertheless, we must always be aware of how we use that data to make our decisions.
Note: Newfound is in no way affiliated with Quandl. We just enjoy quantitative investigations and agree with facilitating transparency and proper use of historical data. We also understand the importance of accessing reliable data.