Category Archives: D3

The Batman’s Ecosystem

If I weren’t crazy, I’d be insane! (Joker)

I present today a dynamical D3.js plot where I combine three things:

  • The Batman curve
  • A text mining analysis to obtain most common words from the Batman’s page at Wikipedia
  • A line plot using morris.js library of rCharts package where point labels are the words obtained in the previous step

This is my particular homage to one of the most amazing superheros ever, together with Daredevil. Since I am not allowed to post JavaScript nor embed objects in my WordPress.com blog, I can only show you an screenshot of the graph:

batman2
If you want to play with the graph, you can download it here. This is not the first time I write about Batman (see this), and maybe will not be the last one.

The code:

require(ggplot2)
require(dplyr)
require(rCharts)
library(rvest)
library(tm)
f1u <- function(x) {ifelse ((abs(x) >  3 & abs(x) <= 7), 3*sqrt(1-(x/7)^2), 0)}
f1d <- function(x) {ifelse ((abs(x) >= 4 & abs(x) <= 7), -3*sqrt(1-(x/7)^2), 0)}
f2u <- function(x) {ifelse ((abs(x) > 0.50 & abs(x) < 0.75),  3*abs(x)+0.75, 0)}
f2d <- function(x) {ifelse ((abs(x) > -4 & abs(x) < 4), abs(x/2)-(3*sqrt(33)-7)*x^2/112-3 + sqrt(1-(abs(abs(x)-2)-1)^2), 0)}
f3u <- function(x) {ifelse ((x > -0.5 & x < 0.5), 2.25, 0)}
f4u <- function(x) {ifelse ((abs(x) >  1 & abs(x) <= 3), 6 * sqrt(10)/7 + (1.5 - 0.5 * abs(x)) * sqrt(abs(abs(x)-1)/(abs(x)-1)) - 6 * sqrt(10) * sqrt(4-(abs(x)-1)^2)/14, 0)}
f5u <- function(x) {ifelse ((abs(x) >= 0.75 & abs(x) <= 1), 9-8*abs(x), 0)}
fu <- function (x) f1u(x)+f2u(x)+f3u(x)+f4u(x)+f5u(x)
fd <- function (x) f1d(x)+f2d(x)
batman <- function(r,x) {ifelse(r%%2==0, fu(x), fd(x))}
data.frame(x=seq(from=-7, to=7, by=0.125)) %>%
  mutate(y=batman(row_number(), x)) -> df
html("https://en.wikipedia.org/wiki/Batman") %>%
  html_nodes("#bodyContent")  %>%
  html_text() %>%
  VectorSource() %>%
  Corpus() %>%
  tm_map(tolower) %>%
  tm_map(removePunctuation) %>%
  tm_map(removeNumbers) %>%  
  tm_map(stripWhitespace) %>%
  tm_map(removeWords, c(stopwords(kind = "en"), "batman", "batmans")) %>%
  DocumentTermMatrix() %>%
  as.matrix() %>%
  colSums() %>%
  sort(decreasing=TRUE) %>%
  head(n=nrow(df)) %>%
  attr("names") -> df$word
m1=mPlot(x = "x",  y = "y",  data = df,  type = "Line")
m1$set(pointSize = 5,
       lineColors = c('black', 'black'),
       width = 900,
       height = 500,
       hoverCallback = "#! function(index, options, content)
                  { var row = options.data[index]
                  return '<b>' + row.word + '</b>'} !#",
       lineWidth = 2,
       grid=FALSE,
       axes=FALSE)
m1
m1$save('Batman.html', standalone = TRUE)

Visualising The Evolution Of Migration Flows With rCharts

Heaven we hope is just up the road (Atlas, Coldplay)

Following with the analysis of migration flows, I have done next two visualizations. These charts are called bump charts and are very suitable to represent rankings. This is what I have done:

  • Obtaining top 20 countries of the world according to % of migrants respect its population
  • To do this, I divide total number of migrants between 1960 and 2009 by the mean population in the same period.
  • I do the same to obtain top 20 countries of the world according to % of immigrants.
  • In both cases, I only consider countries with population greater than 2 million.
  • For these countries, I calculate % of migrants in each decade (60’s, 70’s, 80’s, 90’s and 00’s), dividing total number of migrants by mean population each decade
  • I do the same in the case of immigrants.
  • Instead of representing directly % of migrants and immigrants, I represent the ranking of countries according these indicators by decade

This is the bump chart of migrants:

migrants2And this is the one of immigrants:

inmigrants2Some comments:

  • There is a permanent exodus in Puerto Rico: all decades (except 70’s) is located in the top 1 of countries with most migrants respect its population
  • Ireland is also living a diaspora although in the 00’s decade has lost some positions
  • Albania, Georgia and Bosnia and Herzegovina are gaining positions. Is East Europe gradually becoming uncomfortable?
  • Jamaica is also moving up in this sad competition.
  • On the other hand, Hong Kong and Israel are persistently leaders as receivers
  • Saudi Arabia has presented an impressive growth receiving immigrants since 70’s
  • United States does not appear in the immigrants ranking
  • Singapore is gaining positions: in the 00’s decade is the third receiver country
  • Also in the 00s, Switzerland is the first European country in the ranking, holding the fifth position

I like using rCharts as well as using Enigma data sets, as I have done previously. If you want to play with these charts, you can download them here. If you want to know where to find both datasets, read this. Or do it yourself with the next code:

library(data.table)
library(rCharts)
library(dplyr)
setwd("YOUR WORKING DIRECTORY HERE")
populflows = read.csv(file="enigma-org.worldbank.migration-remittances.migrants.migration-flow-c57405e33412118c8757b1052e8a1490.csv", stringsAsFactors=FALSE)
population = fread("enigma-org.worldbank.hnp.data-eaa31d1a34fadb52da9d809cc3bec954.csv")
population %>% 
  filter(indicator_name=="Population, total") %>% 
  as.data.frame %>% 
  mutate(decade=(year-year%%10)) %>% 
  group_by(country_name, country_code, decade) %>% 
  summarise(population=mean(value)) %>% 
  plyr::rename(., c("country_name"="country")) -> population2
populflows %>% filter(!is.na(total_migrants)) %>% 
  group_by(migration_year, destination_country) %>% 
  summarise(inmigrants = sum(total_migrants))  %>% 
  plyr::rename(., c("destination_country"="country", "migration_year"="decade"))   -> inmigrants
populflows %>% filter(!is.na(total_migrants)) %>% 
  group_by(migration_year, country_of_origin) %>% 
  summarise(migrants = sum(total_migrants)) %>%  
  plyr::rename(., c("country_of_origin"="country", "migration_year"="decade"))   -> migrants
# Join of data sets
migrants %>% 
  merge(inmigrants, by = c("country", "decade")) %>%
  merge(population2, by = c("country", "decade")) %>%
  mutate(p_migrants=migrants/population, p_inmigrants=inmigrants/population) -> populflows2
# Global Indicators
populflows2 %>% 
  group_by(country) %>% 
  summarise(migrants=sum(migrants), inmigrants=sum(inmigrants), population=mean(population)) %>% 
  mutate(p_migrants=migrants/population, p_inmigrants=inmigrants/population)  %>% 
  filter(population > 2000000)  %>%
  mutate(rank_migrants = dense_rank(desc(p_migrants)), rank_inmigrants = dense_rank(desc(p_inmigrants))) -> global
# Migrants dataset
global %>% 
  filter(rank_migrants<=20) %>% 
  select(country) %>% 
  merge(populflows2, by = "country") %>% 
  arrange(decade, p_migrants) %>%
  mutate(decade2=as.numeric(as.POSIXct(paste0(as.character(decade), "-01-01"), origin="1900-01-01"))) %>%
  plyr::ddply("decade", transform, rank = dense_rank(p_migrants)) -> migrants_rank
# Migrants dataset
global %>% 
  filter(rank_inmigrants<=20) %>% 
  select(country) %>% 
  merge(populflows2, by = "country") %>% 
  arrange(decade, p_inmigrants) %>%
  mutate(decade2=as.numeric(as.POSIXct(paste0(as.character(decade), "-01-01"), origin="1900-01-01"))) %>%
  plyr::ddply("decade", transform, rank = dense_rank(p_inmigrants)) -> inmigrants_rank
# Function for plotting
plotBumpChart <- function(df){
  bump_chart = Rickshaw$new()
  mycolors = ggthemes::tableau_color_pal("tableau20")(20)
  bump_chart$layer(rank ~ decade2, group = 'country_code', data = df, type = 'line', interpolation = 'none', colors = mycolors)
  bump_chart$set(slider = TRUE, highlight = TRUE, legend=TRUE)
  bump_chart$yAxis(tickFormat = "#!  function(y) { if (y == 0) { return '' } else { return String((21-y)) } } !#")
  bump_chart$hoverDetail(yFormatter = "#! function(y){return (21-y)} !#")
  return(bump_chart)
}
plotBumpChart(migrants_rank)
plotBumpChart(inmigrants_rank)

A Visualization Of The 100 Greatest Love Songs ft. D3.js

What would you do? If my heart was torn in two (More Than Words, Extreme)

Playing with rCharts package I had the idea of representing the list of 100 best love songs as a connected set of points which forms a heart. Songs can be seen putting mouse cursor over each dot. This is an screenshot of the graph:

HeartScreenshotIf you want to play with the visualization, you can download it here. Or better than this, you can reproduce it with this simple code:

library(dplyr)
library(rCharts)
library(rvest)
setwd("YOUR WORKING DIRECTORY HERE")
heart <- function(r,x) {ifelse(abs(x)<2, ifelse(r%%2==0, sqrt(1-(abs(x)-1)^2), acos(1-abs(x))-pi), 0)} data.frame(x=seq(from=-3, to=3, length.out=100)) %>% 
  mutate(y=jitter(heart(row_number(), x), amount=.1)) -> df
love_songs <- html("http://www.cs.ubc.ca/~davet/music/list/Best13.html") love_songs %>%
  html_nodes("table") %>%
  .[[2]] %>%
  html_table(header=TRUE, fill = TRUE) %>%
  cbind(df) -> df
m1=mPlot(x = "x",  y = "y",  data = df,  type = "Line")
m1$set(pointSize = 5, 
       lineColors = c('red', 'red'),
       width = 850,
       height = 600,
       lineWidth = 2,
       hoverCallback = "#! function(index, options, content){
       var row = options.data[index]
       return '<b>' + row.ARTIST + '</b>' + '<br/>' + row.TITLE} !#",
       grid=FALSE,
       axes=FALSE)
m1$save('Top_100_Greatest_Love_Songs.html', standalone = TRUE)