Tag Archives: highcharter

Visualizing the Gender of US Senators With R and Highmaps

I wake up every morning in a house that was built by slaves (Michelle Obama)

Some days ago I was invited by the people of Highcharts to write a post in their blog. What I have done is a simple but revealing map of women senators of the United States of America. Briefly, this is what I’ve done to generate it:

  • read from the US senate website a XML file with senators info
  • clean and obtain gender of senators from their first names
  • summarize results by state
  • join data with a US geojson dataset to create the highmap

You can find details and R code here.

It is easy creating a highcharts using highcharter, an amazing library as genderizeR, the one I use to obtain gender names. I like them a lot.

Visualizing Stirling’s Approximation With Highcharts

I said, “Wait a minute, Chester, you know I’m a peaceful man”, He said, “That’s okay, boy, won’t you feed him when you can” (The Weight, The Band)

It is quite easy to calculate the probability of obtaining the same number of heads and tails when tossing a coin N times, and N is even. There are 2^{N} possible outcomes and only C_{N/2}^{N} are favorable so the exact probability is the quotient of these numbers (# of favorable divided by # of possible).

There is another way to approximate this number incredibly well: to use the Stirling’s formula, which is 1/\sqrt{\pi\cdot N/2}

This plot represents both calculations for N from 2 to 200:

Stirling

Although for small values of N, Stirling’s approximation tends to overestimate probability …

Stirling 2

… is extremely precise as N becomes bigger:

Stirling 3

James Stirling published this amazing formula in 1730. It simplifies the calculus to the extreme and also gives a quick way to obtain the answer to a very interesting question: How many tosses are needed to be sure that the probability of obtaining the same number of heads and tails is under any given threshold? Just solve the formula for N and you will obtain the answer. And, also, the formula is another example of the presence of pi in the most unexpected places, as happens here.

Just another thing: the more I use highcharter package the more I like it.

This is the code:

library(highcharter)
library(dplyr)
data.frame(N=seq(from=2, by=2, length.out = 100)) %>%
  mutate(Exact=choose(N,N/2)/2**N, Stirling=1/sqrt(pi*N/2))->data
hc <- highchart() %>% 
  hc_title(text = "Stirling's Approximation") %>% 
  hc_subtitle(text = "How likely is getting 50% heads and 50% tails tossing a coin N times?") %>% 
  hc_xAxis(title = list(text = "N: Number of tosses"), categories = data$N) %>% 
  hc_yAxis(title = list(text = "Probability"), labels = list(format = "{value}%", useHTML = TRUE)) %>% 
  hc_add_series(name = "Stirling", data = data$Stirling*100,  marker = list(enabled = FALSE), color="blue") %>% 
  hc_add_series(name = "Exact", data = data$Exact*100,  marker = list(enabled = FALSE), color="lightblue") %>% 
  hc_tooltip(formatter = JS("function(){return ('<b>Number of tosses: </b>'+this.x+'<br><b>Probability: </b>'+Highcharts.numberFormat(this.y, 2)+'%')}")) %>%
  hc_exporting(enabled = TRUE) %>%
  hc_chart(zoomType = "xy")
hc