Monthly Archives: June 2016

Amazing Things That Happen When You Toss a Coin 12 Times

If there is a God, he’s a great mathematician (Paul Dirac)

Imagine you toss a coin 12 times and you count how many heads and tails you are obtaining after each throwing (the coin is equilibrated so the probability of head or tail is the same). At some point, it can happen that number of heads and number of tails are the same. For example, if you obtain the sequence T-H-T-T-H-T-H-H-T-T-H-H, after the second throwing, number of heads is equal to number of tails (and both equal to one). It happens again after the 8th throwing and after last one. In this example, the last throwing where equallity occurs is the number 12. Obviously, equallity can only be observed in even throwings.

If you repeat the experiment 10.000 times you will find something like this if you draw the relative frequency of the last throwing where cumulated number of heads is equal to the one of tails:

Coin12Times
From my point of view there are three amazing things in this plot:

  1. It is symmetrical, so prob(n)=prob(12-n)
  2. The least likely throwing to obtain the last equality is the central one.
  3. As a corollary, the most likely is not obtaining any equality (number of heads never are the same than number of tails) or obtaining last equality in the last throwing: two extremely different scenarios with the same chances to be observed.

Behind the simplicity of tossing coins there is a beautiful universe of mathematical surprises.

library(dplyr)
library(ggplot2)
library(scales)
tosses=12
iter=10000
results=data.frame(nmax=numeric(0), count=numeric(0), iter=numeric(0))
tmp=data.frame(nmax=numeric(0))
for (j in 1:iter)
{
data.frame(x=sample(c(-1,1), size=tosses, replace=TRUE)) %>%
add_rownames(var = "n") %>%
mutate(cumsum = cumsum(x)) %>% filter(cumsum==0) %>%
summarize(nmax=max(as.numeric(n))) %>% rbind(tmp)->tmp
}
tmp %>%
group_by(nmax) %>%
summarize(count=n()) %>%
mutate(nmax=ifelse(is.finite(nmax), nmax, 0), iter=iter) %>%
rbind(results)->results
opts=theme(
panel.background = element_rect(fill="darkolivegreen1"),
panel.border = element_rect(colour="black", fill=NA),
axis.line = element_line(size = 0.5, colour = "black"),
axis.ticks = element_line(colour="black"),
panel.grid.major = element_line(colour="white", linetype = 1),
panel.grid.minor = element_blank(),
axis.text.y = element_text(colour="black"),
axis.text.x = element_text(colour="black"),
text = element_text(size=20),
legend.key = element_blank(),
plot.title = element_text(size = 30)
)
ggplot(results, aes(x=nmax, y=count/iter)) +
geom_line(size=2, color="green4")+
geom_point(size=8, fill="green4", colour="darkolivegreen1",pch=21)+
scale_x_continuous(breaks = seq(0, tosses, by=2))+
scale_y_continuous(labels=percent, limits=c(0, .25))+
labs(title="What happens when you toss a coin 12 times?",
x="Last throwing where cumulated #tails = #heads",
y="Probability (estimated)")+opts
Advertisements

Women in Orchestras

I believe in the truth of fairy-tales more than I believe in the truth in the newspaper (Lotte Reiniger)

In my opinion, this graph is a visual demonstration that we live in a male chauvinist world.

Orchestras2

In this experiment I download the members of ten top orchestras of the world with the amazing rvest package. After cleaning texts, I obtain the gender of names with genderizeR package as I did here. Since I only take into account names genderized with high probability, these numbers cannot be exact. Apart of this, the plot speaks by itself.

setwd("YOUR WORKING DIRECTORY HERE")
library(rvest)
library(dplyr)
library(genderizeR)
read_html("http://www.berliner-philharmoniker.de/en/orchestra/") %>%
html_nodes(".name") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("[\r,\n]"," ", .) %>%
gsub("\\s+", " ", .) %>%
paste(collapse=" ") %>%
findGivenNames() -> berliner
saveRDS(berliner, file="berliner.RDS")
read_html("https://www.concertgebouworkest.nl/en/musicians") %>%
html_nodes(".u-padding--b2") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("\\s+", " ", .) %>%
paste(collapse=" ") %>%
findGivenNames() -> rco
saveRDS(rco, file="rco.RDS")
read_html("http://www.philharmonia.spb.ru/en/about/orchestra/zkrasof/contents/") %>%
html_nodes(".td") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("[\r,\n]"," ", .) %>%
gsub("\\s+", " ", .) %>%
.[23] %>%
findGivenNames() -> spb
saveRDS(spb, file="spb.RDS")
read_html("http://ocne.mcu.es/conoce-a-la-ocne/orquesta-nacional-de-espana/componentes/") %>%
html_nodes(".col-main") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("[\r,\n]"," ", .) %>%
gsub("\\s+", " ", .) %>%
gsub("([[:lower:]])([[:upper:]][[:lower:]])", "\\1 \\2", .) %>%
findGivenNames() -> one
saveRDS(one, file="one.RDS")
read_html("http://www.gewandhausorchester.de/en/orchester/") %>%
html_nodes("#content") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("[\r,\n]"," ", .) %>%
gsub("\\s+", " ", .) %>%
findGivenNames() -> leipzig
saveRDS(leipzig, file="leipzig.RDS")
read_html("http://www.wienerphilharmoniker.at/orchestra/members") %>%
html_nodes(".ModSuiteMembersC") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("[\r,\n,\t,*]"," ", .) %>%
gsub("\\s+", " ", .) %>%
gsub("([[:lower:]])([[:upper:]][[:lower:]])", "\\1 \\2", .) %>%
paste(collapse=" ") %>%
.[-18] %>%
findGivenNames() -> wiener
saveRDS(wiener, file="wiener.RDS")
read_html("http://www.laphil.com/philpedia/orchestra-roster") %>%
html_nodes(".view-content") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("\\s+", " ", .) %>%
gsub("(?%
.[1] %>%
findGivenNames() -> laphil
saveRDS(laphil, file="laphil.RDS")
read_html("http://nyphil.org/about-us/meet/musicians-of-the-orchestra") %>%
html_nodes(".resp-tab-content-active") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("[\r,\n]"," ", .) %>%
gsub("\\s+", " ", .) %>%
gsub("(?%
findGivenNames() -> nyphil
saveRDS(nyphil, file="nyphil.RDS")
urls=c("http://lso.co.uk/orchestra/players/strings.html",
"http://lso.co.uk/orchestra/players/woodwind.html",
"http://lso.co.uk/orchestra/players/brass.html",
"http://lso.co.uk/orchestra/players/percussion-harps-and-keyboards.html")
sapply(urls, function(x)
{
read_html(x) %>%
html_nodes(".clearfix") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("[\r,\n,\t,*]"," ", .) %>%
gsub("\\s+", " ", .)
}) %>% paste(., collapse=" ") %>%
findGivenNames() -> lso
saveRDS(lso, file="lso.RDS")
read_html("http://www.osm.ca/en/discover-osm/orchestra/musicians-osm") %>%
html_nodes("#content-column") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("[\r,\n]"," ", .) %>%
gsub("\\s+", " ", .) %>%
findGivenNames() -> osm
saveRDS(osm, file="osm.RDS")
rbind(c("berliner", "Berliner Philharmoniker"),
c("rco", "Royal Concertgebouw Amsterdam"),
c("spb", "St. Petersburg Philharmonic Orchestra"),
c("one", "Orquesta Nacional de España"),
c("leipzig", "Gewandhaus Orchester Leipzig"),
c("wiener", "Wiener Philarmoniker"),
c("laphil", "The Los Angeles Philarmonic"),
c("nyphil", "New York Philarmonic"),
c("lso", "London Symphony Orchestra"),
c("osm", "Orchestre Symphonique de Montreal")) %>% as.data.frame()-> Orchestras
colnames(Orchestras)=c("Id", "Orchestra")
list.files(getwd(),pattern = ".RDS") %>%
lapply(function(x)
readRDS(x) %>% as.data.frame(stringsAsFactors = FALSE) %>% cbind(Id=gsub(".RDS", "", x))
) %>% do.call("rbind", .) -> all
all %>% mutate(probability=as.numeric(probability)) %>%
filter(probability > 0.9 & count > 15) %>%
filter(!name %in% c("viola", "tuba", "harp")) %>%
group_by(Id, gender) %>%
summarize(Total=n())->all
all %>% filter(gender=="female") %>% mutate(females=Total) %>% select(Id, females) -> females
all %>% group_by(Id) %>% summarise(Total=sum(Total)) -> total
inner_join(total, females, by = "Id") %>% mutate(po_females=females/Total) %>%
inner_join(Orchestras, by="Id")-> df
library(ggplot2)
library(scales)
opts=theme(legend.position="none",
plot.background = element_rect(fill="gray85"),
panel.background = element_rect(fill="gray85"),
panel.grid.major.y=element_blank(),
panel.grid.major.x=element_line(colour="white", size=2),
panel.grid.minor=element_blank(),
axis.title = element_blank(),
axis.line.y = element_line(size = 2, color="black"),
axis.text = element_text(colour="black", size=18),
axis.ticks=element_blank(),
plot.title = element_text(size = 35, face="bold", margin=margin(10,0,10,0), hjust=0))
ggplot(df, aes(reorder(Orchestra, po_females), po_females)) +
geom_bar(stat="identity", fill="darkviolet", width=.5)+
scale_y_continuous(labels = percent, expand = c(0, 0), limits=c(0,.52))+
geom_text(aes(label=sprintf("%1.0f%%", 100*po_females)), hjust=-0.05, size=6)+
ggtitle(expression(atop(bold("Women in Orchestras"), atop("% of women among members", "")))) +
coord_flip()+opts