Tag Archives: R

Bye Ripples, Hi Fronkonstin

Nobody knows how to say goodbye, It seems so easy until you try (Nobody Knows, The Lumineers)

I have been user of WordPress.com for (almost) three years and it has been a very good experience. But I want to take more control of my blog: install plugins, manage ads (someone interested out there to publicize in my blog?) and place interactive charts in my posts instead static screenshots. This is why I opened a new blog called Fronkonstin (I love the name, no other reason) where I will continue publishing. I will maintain both blogs during some time (although Ripples will not be updated anymore). If you follow Ripples, please start following Fronkonstin. I promise I will try to do my best to continue making interesting experiments in this new era.



Chaotic Galaxies

Tell me, which side of the earth does this nose come from? Ha! (ALF)

Reading about strange attractors I came across with this book, where I discovered a way to generate two dimensional chaotic maps. The generic equation is pretty simple:

x_{n+1}= a_{1}+a_{2}x_{n}+a_{3}x_{n}^{2}+a_{4}x_{n}y_{n}+a_{5}y_{n}+a_{6}y_{n}^{2}
y_{n+1}= a_{7}+a_{8}x_{n}+a_{9}x_{n}^{2}+a_{10}x_{n}y_{n}+a_{11}y_{n}+a_{12}y_{n}^{2}

I used it to generate these chaotic galaxies:

Changing the vector of parameters you can obtain other galaxies. Do you want to try?

#Generic function
attractor = function(x, y, z)
  c(z[1]+z[2]*x+z[3]*x^2+ z[4]*x*y+ z[5]*y+ z[6]*y^2, 
#Function to iterate the generic function over the initial point c(0,0)
galaxy= function(iter, z)
  for (i in 2:iter) df[i,]=attractor(df[i-1, 1], df[i-1, 2], z)
  df %>% rbind(data.frame(x=runif(iter/10, min(df$x), max(df$x)), 
                          y=runif(iter/10, min(df$y), max(df$y))))-> df
          panel.background = element_rect(fill="#00000c"),
          plot.background = element_rect(fill="#00000c"),
          plot.margin=unit(c(-0.1,-0.1,-0.1,-0.1), "cm"))
#First galaxy
z1=c(1.0, -0.1, -0.2,  1.0,  0.3,  0.6,  0.0,  0.2, -0.6, -0.4, -0.6,  0.6)
galaxy1=galaxy(iter=2400, z=z1) %>% ggplot(aes(x,y))+
  geom_point(shape= 8, size=jitter(12, factor=4), color="#ffff99", alpha=jitter(.05, factor=2))+
  geom_point(shape=16, size= jitter(4, factor=2), color="#ffff99", alpha=jitter(.05, factor=2))+
  geom_point(shape=46, size= 0, color="#ffff00")+opt
#Second galaxy
z2=c(-1.1, -1.0,  0.4, -1.2, -0.7,  0.0, -0.7,  0.9,  0.3,  1.1, -0.2,  0.4)
galaxy2=galaxy(iter=2400, z=z2) %>% ggplot(aes(x,y))+
  geom_point(shape= 8, size=jitter(12, factor=4), color="#ffff99", alpha=jitter(.05, factor=2))+
  geom_point(shape=16, size= jitter(4, factor=2), color="#ffff99", alpha=jitter(.05, factor=2))+
  geom_point(shape=46, size= 0, color="#ffff00")+opt
#Third galaxy
z3=c(-0.3,  0.7,  0.7,  0.6,  0.0, -1.1,  0.2, -0.6, -0.1, -0.1,  0.4, -0.7)
galaxy3=galaxy(iter=2400, z=z3) %>% ggplot(aes(x,y))+
  geom_point(shape= 8, size=jitter(12, factor=4), color="#ffff99", alpha=jitter(.05, factor=2))+
  geom_point(shape=16, size= jitter(4, factor=2), color="#ffff99", alpha=jitter(.05, factor=2))+
  geom_point(shape=46, size= 0, color="#ffff00")+opt
#Fourth galaxy
z4=c(-1.2, -0.6, -0.5,  0.1, -0.7,  0.2, -0.9,  0.9,  0.1, -0.3, -0.9,  0.3)
galaxy4=galaxy(iter=2400, z=z4) %>% ggplot(aes(x,y))+
  geom_point(shape= 8, size=jitter(12, factor=4), color="#ffff99", alpha=jitter(.05, factor=2))+
  geom_point(shape=16, size= jitter(4, factor=2), color="#ffff99", alpha=jitter(.05, factor=2))+
  geom_point(shape=46, size= 0, color="#ffff00")+opt

Gummy Worms

Just keep swimming (Dory in Finding Nemo)

Inspired by this post, I decided to create gummy worms like this:

Or these:

When I was young I used to eat them.

Do you want to try? This is the code:

t=seq(1, 6, by=.04)
f = function(a, b, c, d, e, f, t) exp(-a*t)*sin(t*b+c)+exp(-d*t)*sin(t*e+f)
v2=runif(6, 2, 3)
spheres3d(x=f(v1[1], v2[1], v3[1], v1[4], v2[4], v3[4], t),
          y=f(v1[2], v2[2], v3[2], v1[5], v2[5], v3[5], t),
          z=f(v1[3], v2[3], v3[3], v1[6], v2[6], v3[6], t),
          radius=.3, color=sample(brewer.pal(8, "Dark2"),1))

Visualizing the Gender of US Senators With R and Highmaps

I wake up every morning in a house that was built by slaves (Michelle Obama)

Some days ago I was invited by the people of Highcharts to write a post in their blog. What I have done is a simple but revealing map of women senators of the United States of America. Briefly, this is what I’ve done to generate it:

  • read from the US senate website a XML file with senators info
  • clean and obtain gender of senators from their first names
  • summarize results by state
  • join data with a US geojson dataset to create the highmap

You can find details and R code here.

It is easy creating a highcharts using highcharter, an amazing library as genderizeR, the one I use to obtain gender names. I like them a lot.

Visualizing Stirling’s Approximation With Highcharts

I said, “Wait a minute, Chester, you know I’m a peaceful man”, He said, “That’s okay, boy, won’t you feed him when you can” (The Weight, The Band)

It is quite easy to calculate the probability of obtaining the same number of heads and tails when tossing a coin N times, and N is even. There are 2^{N} possible outcomes and only C_{N/2}^{N} are favorable so the exact probability is the quotient of these numbers (# of favorable divided by # of possible).

There is another way to approximate this number incredibly well: to use the Stirling’s formula, which is 1/\sqrt{\pi\cdot N/2}

This plot represents both calculations for N from 2 to 200:


Although for small values of N, Stirling’s approximation tends to overestimate probability …

Stirling 2

… is extremely precise as N becomes bigger:

Stirling 3

James Stirling published this amazing formula in 1730. It simplifies the calculus to the extreme and also gives a quick way to obtain the answer to a very interesting question: How many tosses are needed to be sure that the probability of obtaining the same number of heads and tails is under any given threshold? Just solve the formula for N and you will obtain the answer. And, also, the formula is another example of the presence of pi in the most unexpected places, as happens here.

Just another thing: the more I use highcharter package the more I like it.

This is the code:

data.frame(N=seq(from=2, by=2, length.out = 100)) %>%
  mutate(Exact=choose(N,N/2)/2**N, Stirling=1/sqrt(pi*N/2))->data
hc <- highchart() %>% 
  hc_title(text = "Stirling's Approximation") %>% 
  hc_subtitle(text = "How likely is getting 50% heads and 50% tails tossing a coin N times?") %>% 
  hc_xAxis(title = list(text = "N: Number of tosses"), categories = data$N) %>% 
  hc_yAxis(title = list(text = "Probability"), labels = list(format = "{value}%", useHTML = TRUE)) %>% 
  hc_add_series(name = "Stirling", data = data$Stirling*100,  marker = list(enabled = FALSE), color="blue") %>% 
  hc_add_series(name = "Exact", data = data$Exact*100,  marker = list(enabled = FALSE), color="lightblue") %>% 
  hc_tooltip(formatter = JS("function(){return ('<b>Number of tosses: </b>'+this.x+'<br><b>Probability: </b>'+Highcharts.numberFormat(this.y, 2)+'%')}")) %>%
  hc_exporting(enabled = TRUE) %>%
  hc_chart(zoomType = "xy")

Amazing Things That Happen When You Toss a Coin 12 Times

If there is a God, he’s a great mathematician (Paul Dirac)

Imagine you toss a coin 12 times and you count how many heads and tails you are obtaining after each throwing (the coin is equilibrated so the probability of head or tail is the same). At some point, it can happen that number of heads and number of tails are the same. For example, if you obtain the sequence T-H-T-T-H-T-H-H-T-T-H-H, after the second throwing, number of heads is equal to number of tails (and both equal to one). It happens again after the 8th throwing and after last one. In this example, the last throwing where equallity occurs is the number 12. Obviously, equallity can only be observed in even throwings.

If you repeat the experiment 10.000 times you will find something like this if you draw the relative frequency of the last throwing where cumulated number of heads is equal to the one of tails:

From my point of view there are three amazing things in this plot:

  1. It is symmetrical, so prob(n)=prob(12-n)
  2. The least likely throwing to obtain the last equality is the central one.
  3. As a corollary, the most likely is not obtaining any equality (number of heads never are the same than number of tails) or obtaining last equality in the last throwing: two extremely different scenarios with the same chances to be observed.

Behind the simplicity of tossing coins there is a beautiful universe of mathematical surprises.

results=data.frame(nmax=numeric(0), count=numeric(0), iter=numeric(0))
for (j in 1:iter)
data.frame(x=sample(c(-1,1), size=tosses, replace=TRUE)) %>%
add_rownames(var = "n") %>%
mutate(cumsum = cumsum(x)) %>% filter(cumsum==0) %>%
summarize(nmax=max(as.numeric(n))) %>% rbind(tmp)->tmp
tmp %>%
group_by(nmax) %>%
summarize(count=n()) %>%
mutate(nmax=ifelse(is.finite(nmax), nmax, 0), iter=iter) %>%
panel.background = element_rect(fill="darkolivegreen1"),
panel.border = element_rect(colour="black", fill=NA),
axis.line = element_line(size = 0.5, colour = "black"),
axis.ticks = element_line(colour="black"),
panel.grid.major = element_line(colour="white", linetype = 1),
panel.grid.minor = element_blank(),
axis.text.y = element_text(colour="black"),
axis.text.x = element_text(colour="black"),
text = element_text(size=20),
legend.key = element_blank(),
plot.title = element_text(size = 30)
ggplot(results, aes(x=nmax, y=count/iter)) +
geom_line(size=2, color="green4")+
geom_point(size=8, fill="green4", colour="darkolivegreen1",pch=21)+
scale_x_continuous(breaks = seq(0, tosses, by=2))+
scale_y_continuous(labels=percent, limits=c(0, .25))+
labs(title="What happens when you toss a coin 12 times?",
x="Last throwing where cumulated #tails = #heads",
y="Probability (estimated)")+opts

Women in Orchestras

I believe in the truth of fairy-tales more than I believe in the truth in the newspaper (Lotte Reiniger)

In my opinion, this graph is a visual demonstration that we live in a male chauvinist world.


In this experiment I download the members of ten top orchestras of the world with the amazing rvest package. After cleaning texts, I obtain the gender of names with genderizeR package as I did here. Since I only take into account names genderized with high probability, these numbers cannot be exact. Apart of this, the plot speaks by itself.

read_html("http://www.berliner-philharmoniker.de/en/orchestra/") %>%
html_nodes(".name") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("[\r,\n]"," ", .) %>%
gsub("\\s+", " ", .) %>%
paste(collapse=" ") %>%
findGivenNames() -> berliner
saveRDS(berliner, file="berliner.RDS")
read_html("https://www.concertgebouworkest.nl/en/musicians") %>%
html_nodes(".u-padding--b2") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("\\s+", " ", .) %>%
paste(collapse=" ") %>%
findGivenNames() -> rco
saveRDS(rco, file="rco.RDS")
read_html("http://www.philharmonia.spb.ru/en/about/orchestra/zkrasof/contents/") %>%
html_nodes(".td") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("[\r,\n]"," ", .) %>%
gsub("\\s+", " ", .) %>%
.[23] %>%
findGivenNames() -> spb
saveRDS(spb, file="spb.RDS")
read_html("http://ocne.mcu.es/conoce-a-la-ocne/orquesta-nacional-de-espana/componentes/") %>%
html_nodes(".col-main") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("[\r,\n]"," ", .) %>%
gsub("\\s+", " ", .) %>%
gsub("([[:lower:]])([[:upper:]][[:lower:]])", "\\1 \\2", .) %>%
findGivenNames() -> one
saveRDS(one, file="one.RDS")
read_html("http://www.gewandhausorchester.de/en/orchester/") %>%
html_nodes("#content") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("[\r,\n]"," ", .) %>%
gsub("\\s+", " ", .) %>%
findGivenNames() -> leipzig
saveRDS(leipzig, file="leipzig.RDS")
read_html("http://www.wienerphilharmoniker.at/orchestra/members") %>%
html_nodes(".ModSuiteMembersC") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("[\r,\n,\t,*]"," ", .) %>%
gsub("\\s+", " ", .) %>%
gsub("([[:lower:]])([[:upper:]][[:lower:]])", "\\1 \\2", .) %>%
paste(collapse=" ") %>%
.[-18] %>%
findGivenNames() -> wiener
saveRDS(wiener, file="wiener.RDS")
read_html("http://www.laphil.com/philpedia/orchestra-roster") %>%
html_nodes(".view-content") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("\\s+", " ", .) %>%
.[1] %>%
findGivenNames() -> laphil
saveRDS(laphil, file="laphil.RDS")
read_html("http://nyphil.org/about-us/meet/musicians-of-the-orchestra") %>%
html_nodes(".resp-tab-content-active") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("[\r,\n]"," ", .) %>%
gsub("\\s+", " ", .) %>%
findGivenNames() -> nyphil
saveRDS(nyphil, file="nyphil.RDS")
sapply(urls, function(x)
read_html(x) %>%
html_nodes(".clearfix") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("[\r,\n,\t,*]"," ", .) %>%
gsub("\\s+", " ", .)
}) %>% paste(., collapse=" ") %>%
findGivenNames() -> lso
saveRDS(lso, file="lso.RDS")
read_html("http://www.osm.ca/en/discover-osm/orchestra/musicians-osm") %>%
html_nodes("#content-column") %>%
html_text(trim=TRUE) %>%
iconv("UTF-8") %>%
gsub("[\r,\n]"," ", .) %>%
gsub("\\s+", " ", .) %>%
findGivenNames() -> osm
saveRDS(osm, file="osm.RDS")
rbind(c("berliner", "Berliner Philharmoniker"),
c("rco", "Royal Concertgebouw Amsterdam"),
c("spb", "St. Petersburg Philharmonic Orchestra"),
c("one", "Orquesta Nacional de España"),
c("leipzig", "Gewandhaus Orchester Leipzig"),
c("wiener", "Wiener Philarmoniker"),
c("laphil", "The Los Angeles Philarmonic"),
c("nyphil", "New York Philarmonic"),
c("lso", "London Symphony Orchestra"),
c("osm", "Orchestre Symphonique de Montreal")) %>% as.data.frame()-> Orchestras
colnames(Orchestras)=c("Id", "Orchestra")
list.files(getwd(),pattern = ".RDS") %>%
readRDS(x) %>% as.data.frame(stringsAsFactors = FALSE) %>% cbind(Id=gsub(".RDS", "", x))
) %>% do.call("rbind", .) -> all
all %>% mutate(probability=as.numeric(probability)) %>%
filter(probability > 0.9 & count > 15) %>%
filter(!name %in% c("viola", "tuba", "harp")) %>%
group_by(Id, gender) %>%
all %>% filter(gender=="female") %>% mutate(females=Total) %>% select(Id, females) -> females
all %>% group_by(Id) %>% summarise(Total=sum(Total)) -> total
inner_join(total, females, by = "Id") %>% mutate(po_females=females/Total) %>%
inner_join(Orchestras, by="Id")-> df
plot.background = element_rect(fill="gray85"),
panel.background = element_rect(fill="gray85"),
panel.grid.major.x=element_line(colour="white", size=2),
axis.title = element_blank(),
axis.line.y = element_line(size = 2, color="black"),
axis.text = element_text(colour="black", size=18),
plot.title = element_text(size = 35, face="bold", margin=margin(10,0,10,0), hjust=0))
ggplot(df, aes(reorder(Orchestra, po_females), po_females)) +
geom_bar(stat="identity", fill="darkviolet", width=.5)+
scale_y_continuous(labels = percent, expand = c(0, 0), limits=c(0,.52))+
geom_text(aes(label=sprintf("%1.0f%%", 100*po_females)), hjust=-0.05, size=6)+
ggtitle(expression(atop(bold("Women in Orchestras"), atop("% of women among members", "")))) +

Playing With Julia (Set)

Viento, me pongo en movimiento y hago crecer las olas del mar que tienes dentro (Tercer Movimiento: Lo de Dentro, Extremoduro)

I really enjoy drawing complex numbers: it is a huge source of entertainment for me. In this experiment I play with the Julia Set, another beautiful fractal like this one. This is what I have done:

  • Choosing the function f(z)=exp(z3)-0.621
  • Generating a grid of complex numbers with both real and imaginary parts in [-2, 2]
  • Iterating f(z) over the grid a number of times so zn+1 = f(zn)
  • Drawing the resulting grid as I did here
  • Gathering all plots into a GIF with ImageMagick as I did in my previous post: each frame corresponds to a different number of iterations

This is the result:


I love how easy is doing difficult things in R. You can play with the code changing f(z) as well as color palettes. Be ready to get surprised:

f = function(z,c) exp(z^3)+c
# Grid of complex
z0 <- outer(seq(-2, 2, length.out = 1200),1i*seq(-2, 2, length.out = 1200),'+') %>% c()
opt <-  theme(legend.position="none",
              panel.background = element_rect(fill="white"),
              plot.margin=grid::unit(c(1,1,0,0), "mm"),
for (i in 1:35)
  # i iterations of f(z)
  for (k in 1:i) z <- f(z, c=-0.621)
                z=as.vector(exp(-Mod(z)))) %>% na.omit() 
  p=ggplot(df, aes(x=x, y=y, color=z)) + 
    geom_tile() + 
    scale_colour_gradientn(colours=brewer.pal(8, "Paired")) + opt
  ggsave(plot=p, file=paste0("plot", stringr::str_pad(i, 4, pad = "0"),".png"), width = 1.2, height = 1.2)
# Place the exact path where ImageMagick is installed
system('"C:\\Program Files\\ImageMagick-6.9.3-Q16\\convert.exe" -delay 20 -loop 0 *.png julia.gif')
# cleaning up


You don’t have to be beautiful to turn me on (Kiss, Prince)

I discovered recently how easy is to create GIFs with R using ImageMagick and I feel like a kid with a new toy. To begin this new era of my life as R programmer I have done this:

First of all, read this article: it explains very well how to start doing GIFs from scratch. The one I have done is inspired in this previous post where I take a set of complex numbers to transform and color it using HSV technique. In this case I use this next transformation: f(z)= -Im(z)+(Re(z)+0.5*Im(z))*1i

Modifying the range of Real and Imaginary parts of complex numbers I obtain the zooming  effect. The code is very simple. Play with it changing the transformation or the animation options. Send me your creations, I would love to see them:

id=1 # label tO name plots
for (i in seq(from=320, to=20, length.out = 38)){
z=outer(seq(from = -i, to = i, length.out = 300),1i*seq(from = -i, to = i, length.out = 500),'+') %>% c()
for (k in 1:100) z <- -Im(z)+(Re(z)+0.5*Im(z))*1i
v=(1+cos(2*pi*log(1+Mod(z))))/2) %>% mutate(col=hsv(h,s,v))
ggplot(df, aes(x, y)) +
labs(x=NULL, y=NULL)+
panel.background = element_rect(fill="white"),
plot.margin=grid::unit(c(1,1,0,0), "mm"),
ggsave(file=paste0("plot",stringr::str_pad(id, 4, pad = "0"),".png"), width = 1, height = 1)
system('"C:\\Program Files\\ImageMagick-6.9.3-Q16\\convert.exe" -delay 10 -loop 0 -duplicate 1,-2-1 *.png zooming.gif')
# cleaning up