Monthly Archives: April 2016

The Hype Bubble Map for Dog Breeds

In the whole history of the world there is but one thing that money can not buy… to wit the wag of a dog’s tail (Josh Billings)

In this post I combine several things:

  • Simple webscraping to read the list of companion dogs from Wikipedia. I love rvest package to do these things.
  • Google Trends queries to download the evolution of searchings of breeds during last 6 months. I use gtrendsR package to do this and works quite well.
  • A dinamic Highchart visualization using the awesome highcharter package
  • A static ggplot visualization.

The experiment is based on a simple idea: what people search on the Internet is what people do. Can be Google Trends an useful tool to know which breed will become fashionable in the future? To be honest, I don’t really know but I will make my own bet.

What I have done is to extract last 6 months of Google trends of this list of companion breeds. After some simple text mining, I divide the set of names into 5-elements subsets because Google API doesn’t allow searchings with more than 5 items. The result of the query to Google trends is a normalized time series, meaning the 0 – 100 values are relative, not absolute, measures. This is done by taking all of the interest data for your keywords and dividing it by the highest point of interest for that date range. To make all 5-items of results comparable I always include King Charles Spaniel breed in all searchings (as a kind of undercover agent I will use to compare searching levels). The resulting number is my “Level” Y-Axis of the plot. I limit searchings to code=”0-66″ which is restrict results to Animals and pets category. Thanks, Philippe, for your help in this point. I also restrict rearchings To the United States of America.

There are several ways to obtain an aggregated trend indicator of a time series. My choice here was doing a short moving average order=2 to the resulting interest over time obtained from Google. The I divide the weekly variations by the smoothed time series. The trend indicator is the mean of these values. To obtains a robust indicator, I remove outliers of the original time series. This is my X-axis.

This is how dog breeds are arranged with respect my Trend and Level indicators:

HypeBubbleGgplot

Inspired by Gartner’s Hype Cycle of Emerging Technologies I distinguish two sets of dog breeds:

  • Plateau of Productivity Breeds (succesful breeds with very high level indicator and possitive trend): Golden Retriever, Pomeranian, Chihuahua, Collie and Shih Tzu.
  • Innovation Trigger Breeds (promising dog breeds with very high trend indicator and low level): Mexican Hairless Dog, Keeshond, West Highland White Terrier and German Spitz.

I discovered recently a wonderful package called highcharter which allows you to create incredibly cool dynamic visualizations. I love it and I could not resist to use it to do the previous plot with the look and feel of The Economist. This is an screenshot (reproduce it to play with tits interactivity):

BubbleEconomist
And here comes my prediction. After analyzing the set Innovation Trigger Breeds, my bet is Keeshond will increase its popularity in the nearly future: don’t you think it is lovely?

640px-Little_Puppy_Keeshond
Photo by Terri BrownFlickr: IMG_4723, CC BY 2.0

Here you have the code:

library(gtrendsR)
library(rvest)
library(dplyr)
library(stringr)
library(forecast)
library(outliers)
library(highcharter)
library(ggplot2)
library(scales)

x="https://en.wikipedia.org/wiki/Companion_dog"
read_html(x) %>% 
  html_nodes("ul:nth-child(19)") %>% 
  html_text() %>% 
  strsplit(., "\n") %>% 
  unlist() -> breeds

breeds=iconv(breeds[breeds!= ""], "UTF-8")

usr <- "YOUR GOOGLE ACCOUNT"
psw <- "YOUR GOOGLE PASSWORD"
gconnect(usr, psw)

#Reference
ref="King Charles Spaniel"

#New set
breeds=setdiff(breeds, ref)

#Subsets. Do not worry about warning message
sub.breeds=split(breeds, 1:ceiling(length(breeds)/4))

results=list()
for (i in 1:length(sub.breeds))
{
  res <- gtrends(unlist(union(ref, sub.breeds[i])), 
          start_date = Sys.Date()-180,
          cat="0-66",
          geo="US")
  results[[i]]=res
}

trends=data.frame(name=character(0), level=numeric(0), trend=numeric(0))
for (i in 1:length(results))
{
  df=results[[i]]$trend
  lr=mean(results[[i]]$trend[,3]/results[[1]]$trend[,3])
  for (j in 3:ncol(df))
  {
    s=rm.outlier(df[,j], fill = TRUE)
    t=mean(diff(ma(s, order=2))/ma(s, order=2), na.rm = T)
    l=mean(results[[i]]$trend[,j]/lr)
    trends=rbind(data.frame(name=colnames(df)[j], level=l, trend=t), trends)
  }
}

trends %>% 
  group_by(name) %>% 
  summarize(level=mean(level), trend=mean(trend*100)) %>% 
  filter(level>0 & trend > -10 & level<500) %>% 
  na.omit() %>% 
  mutate(name=str_replace_all(name, ".US","")) %>% 
  mutate(name=str_replace_all(name ,"[[:punct:]]"," ")) %>% 
  rename(
    x = trend,
    y = level
  ) -> trends
trends$y=(trends$y/max(trends$y))*100

#Dinamic chart as The Economist
highchart() %>% 
  hc_title(text = "The Hype Bubble Map for Dog Breeds") %>%
  hc_subtitle(text = "According Last 6 Months of Google Searchings") %>% 
  hc_xAxis(title = list(text = "Trend"), labels = list(format = "{value}%")) %>% 
  hc_yAxis(title = list(text = "Level")) %>% 
  hc_add_theme(hc_theme_economist()) %>%
  hc_add_series(data = list.parse3(trends), type = "bubble", showInLegend=FALSE, maxSize=40) %>% 
  hc_tooltip(formatter = JS("function(){
                            return ('<b>Trend: </b>' + Highcharts.numberFormat(this.x, 2)+'%' + '<br><b>Level: </b>' + Highcharts.numberFormat(this.y, 2) + '<br><b>Breed: </b>' + this.point.name)
                            }"))

#Static chart
opts=theme(
  panel.background = element_rect(fill="gray98"),
  panel.border = element_rect(colour="black", fill=NA),
  axis.line = element_line(size = 0.5, colour = "black"),
  axis.ticks = element_line(colour="black"),
  panel.grid.major = element_line(colour="gray75", linetype = 2),
  panel.grid.minor = element_blank(),
  axis.text.y = element_text(colour="gray25", size=15),
  axis.text.x = element_text(colour="gray25", size=15),
  text = element_text(size=20),
  legend.key = element_blank(),
  legend.position = "none",
  legend.background = element_blank(),
  plot.title = element_text(size = 30))
ggplot(trends, aes(x=x/100, y=y, label=name), guide=FALSE)+
  geom_point(colour="white", fill="darkorchid2", shape=21, alpha=.3, size=9)+
  scale_size_continuous(range=c(2,40))+
  scale_x_continuous(limits=c(-.02,.02), labels = percent)+
  scale_y_continuous(limits=c(0,100))+
  labs(title="The Hype Bubble Map for Dog Breeds",
       x="Trend",
       y="Level")+
  geom_text(data=subset(trends, x> .2 & y > 50), size=4, colour="gray25")+
  geom_text(data=subset(trends, x > .7), size=4, colour="gray25")+opts
Advertisements

The Coaster Maker by Shiny

The word you invented is well formed and could be used in the Italian language (The Accademia della Crusca regarding to the word “Petaloso”, recently invented by an eight-year-old boy)

Are you tired of your old coasters? Do you like to make things by your own? Do you have a PC and a printer at home? If you answered yes to all these questions, just follow these simple instructions:

  • Install R and RStudio in your PC
  • Open RStudio and create a new Shiny Web App multiple file (ui.R/server.R)
  • Substitute sample code of each file by the code below
  • Press Run App
  • Press buttom Get your coaster! until you obtain a image you like
  • Print the image
  • Cut out the image
  • Place on the coaster your favorite drinking

These are some examples:

This is the code of ui.R

#
# This is the user-interface definition of a Shiny web application. You can
# run the application by clicking 'Run App' above.
#
# Find out more about building applications with Shiny here:
#
#    http://shiny.rstudio.com/
#
library(shiny)
shinyUI(fluidPage(
  titlePanel("The coaster maker"),
  sidebarLayout(
    sidebarPanel(
      #helpText(),

      # adding the new div tag to the sidebar
      tags$div(class="header", checked=NA,
               tags$p("This coasters are generated by hypocycloid curves.The curve is formed by the locus of a point,
                      attached to a circle, that rolls on the inside of another circle.
                      In the curve's equation the first part denotes the relative position between the two circles,
                      the second part denotes the rotation of the rolling circle.")),
      tags$div(class="header", checked=NA,
               HTML("

More info <a href=\"http://www.2dcurves.com/roulette/rouletteh.html#rhodon\">here</a>")
      ),
      actionButton('rerun','Get your coaster!')
    ),
    mainPanel(
      plotOutput("HarmPlot")
    )
  )
))

This is the code of server.R

# This is the server logic of a Shiny web application. You can run the
# application by clicking 'Run App' above.
#
# Find out more about building applications with Shiny here:
#
#    http://shiny.rstudio.com/
#
library(shiny)
library(ggplot2)
CreateDS = function ()
{
  t=seq(-31*pi, 31*pi, 0.002)
  a=sample(seq(from=1/31, to=29/31, by=2/31), 1)
  b=runif(1, min = 1, max = 3)
  data.frame(x=(1-a)*cos(a*t)+a*b*cos((1-a)*t), y=(1-a)*sin(a*t)-a*b*sin((1-a)*t))
}
shinyServer(function(input, output) {
  dat<-reactive({if (input$rerun) dat=CreateDS() else dat=CreateDS()})
    output$HarmPlot<-renderPlot({
    ggplot(dat())+
      geom_point(data=data.frame(x=0,y=0), aes(x,y), color=rgb(rbeta(1, .5, .5), rbeta(1, .5, .5), rbeta(1, .5, .5)) , shape=19, fill="yellow", size=220)+
      geom_polygon(aes(x, y), fill=rgb(rbeta(1, 2, 2), rbeta(1, 2, 2), rbeta(1, 2, 2))) +
      theme(legend.position="none",
            panel.background = element_rect(fill="white"),
            panel.grid=element_blank(),
            axis.ticks=element_blank(),
            axis.title=element_blank(),
            axis.text=element_blank())
  }, height = 500, width = 500)
})