NASDAQ 100 Couples

Heaven, I’m in heaven, and my heart beats so that I can hardly speak, and I seem to find the happiness I seek, when we’re out together dancing cheek to cheek (Cheek To Cheek, Irving Berlin)

There are about 6.500 available packages in CRAN repository. If I were a superhuman, able to learn one package a day, I would spend almost 18 years of my life studying R. And how many packages would be uploaded to CRAN during this period? Who knows: R is infinite.

Today, my experiment deals with quantmod package, which allows you to play to be quant for a while. I download the daily quotes of NASDAQ 100 companies and measure distances between each pair of companies. Distance is based on the cross-correlation between two series so high-correlated series (not exceeding a maximum lag) are closer than low-correlated ones. You can read a good description of this distance here. Since NASDAQ 100 contains 107 companies, I calculate distances for 5.671 different couples. Next plot represent distances between each pair of companies. The darker is the color, the closer are the related companies:

Nasdaq100

Yes, I know is not a graph for someone with visual problems. Let me show you an example of what is behind one of these little tiles. Distance between Mattel Inc. and 21st Century Fox is very small (its related tile is dark coloured). Why? Because of this:

MattelFox
These two companies have been dancing cheek to cheek for more than seven years. It is also curious how some companies are far from any of their NASDAQ 100 colleagues. Some examples of these unpaired companies are Express Scripts Holding Company (ESRX), Expeditors International of Washington Inc. (EXPD) and Fastenal Company (FAST). I do not why but there must be an explanation, do not you think so?

Something tells me I will do some other experiment using quantmod package:

library("quantmod")
library("TSdist")
library("ade4")
library("ggplot2")
library("Hmisc")
library("zoo")
library("scales")
library("reshape2")
setwd("YOUR WORKING DIRECTORY HERE")
temp=tempfile()
download.file("http://www.nasdaq.com/quotes/nasdaq-100-stocks.aspx?render=download",temp)
data=read.csv(temp, header=TRUE)
for (i in 1:nrow(data)) getSymbols(as.character(data[i,1]))
results=t(apply(combn(sort(as.character(data[,1]), decreasing = TRUE), 2), 2,
      function(x)
      {
        ts1=drop(Cl(eval(parse(text=x[1]))))
        ts2=drop(Cl(eval(parse(text=x[2]))))
        c(symbol1=x[1], symbol2=x[2], tsDistances(ts1, ts2, distance="crosscorrelation"))
      }))
results=as.data.frame(results)
colnames(results)=c("Sym1", "Sym2", "TSdist")
results$TSdist=as.numeric(as.character(results$TSdist))
results=rbind(results, data.frame(Sym1=as.character(data[,1]), Sym2=as.character(data[,1]), TSdist=0))
results$TSdist2=as.numeric(cut2(results$TSdist, g=4))
opts=theme(axis.text.x = element_text(angle = 90, vjust=.5, hjust = 0),
           panel.background = element_blank(),
           axis.text = element_text(colour="gray25", size=8),
           legend.position = "none",
           panel.grid = element_blank())
ggplot(results,aes(x=Sym2,y=Sym1))+
  geom_tile(aes(fill = TSdist2), colour="gray80")+
  scale_size_continuous(range=c(1,10))+
  scale_x_discrete("", limits=sort(unique(as.character(results$Sym1))))+
  scale_y_discrete("", limits=sort(unique(as.character(results$Sym2)), decreasing = TRUE))+
  scale_fill_gradient(low = "steelblue", high = "white")+
  opts
MAT.close=Cl(MAT)
FOX.close=Cl(FOX)
cls=merge(MAT.close, FOX.close, all = FALSE)
df=data.frame(date = time(cls), coredata(cls))
names(df)[-1]=c("mat", "fox")
df1=melt(df, id.vars = "date", measure.vars = c("mat", "fox"))
opts2=theme(
  panel.background = element_rect(fill="gray98"),
  panel.border = element_rect(colour="black", fill=NA),
  axis.line = element_line(size = 0.5, colour = "black"),
  axis.ticks = element_line(colour="black"),
  panel.grid.major = element_line(colour="gray75", linetype = 2),
  panel.grid.minor = element_blank(),
  axis.text = element_text(colour="gray25", size=15),
  axis.title = element_text(size=18, colour="gray10"),
  legend.key = element_blank(),
  legend.position = "none",
  legend.background = element_blank(),
  plot.title = element_text(size = 40, colour="gray10"))
ggplot(df1, aes(x = date, y = value, color = variable))+
  geom_line(size = I(1.2))+
  scale_color_discrete(guide = "none")+
  scale_x_date(labels = date_format("%Y-%m-%d"))+
  labs(title="Nasdaq 100 Couples: Mattel And Fox", x="Date", y="Closing Price")+
  annotate("text", x = as.Date("2011-01-01", "%Y-%m-%d"), y = c(10, 30), label = c("21st Century Fox", "Mattel Inc."), size=7, colour="gray25")+
  opts2
Advertisements

8 thoughts on “NASDAQ 100 Couples

  1. Stock prices aren’t stationnary therefore correlation is a bad estimator. If you really want to measure how two stocks move together you should use cointegration. It’s available in R in the tseries package (adf.test).

    The R Trader

    1. Thanks a lot! I thought cross correlation it was a good indicator because examples I checked seem to be logical. I will do the same example using cointegration to compare results. Regards!

      1. Hi : R trader is correct about the spurious correlations ( see granger and newbold, 1976 for details ) but co-integration has its problems also. For a simpler
        analysis that alleviates some of the problem with your approach, decide on a timeframe and calculate the correlation of the log difference of the prices ( i.e: the returns ). Problem with this approach is that different timeframes will give different results.

        Mark

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s