The World We Live In #2: To Study Or To Work

I was getting ready for school and about to wear my uniform when I remembered that our principal had told us not to wear uniforms. So I decided to wear my favorite pink dress (Malala Yousafzai)

After reading the diary of a Pakistani schoolgirl and Malala’s history, there is no doubt of being in front of a brave girl. A girl that will fight against monsters who deprive children of their childhood. A girl who knows that one book, one pen, one child and one teacher can change this unfair world. A girl who knew she had won the Nobel Prize of Peace in her chemistry lesson and finished the school time before making her first statement. A girl for whom the prize is just the beginning: a girl that gives us hope. Long live Malala:
TWWLI2
To know where to obtain data for this plot, check out this post. This is the code:

require("sqldf")
require("plyr")
require("stringdist")
childlabour=read.csv("UNdata_Export_20141013_ChildLabour.csv", nrows=335, header=T, row.names=NULL)
education=read.csv("UNdata_Export_20141013_Education.csv", nrows=2994, header=T, row.names=NULL)
population =read.csv("UNdata_Export_20140930_Population.csv",  nrows=12846, header=T, row.names=NULL)
population=rename(population, replace = c("Country.or.Area" = "Country"))
education=rename(education, replace = c("Reference.Area" = "Country"))
education=rename(education, replace = c("Time.Period" = "Year"))
childlabour=rename(childlabour, replace = c("Country.or.Area" = "Country"))
population=sqldf("SELECT a.Country, a.Year, a.Value as Pop
FROM population a INNER JOIN (SELECT Country, MAX(Year) AS Year FROM population GROUP BY 1) b
ON (a.Country=b.Country AND a.Year=b.Year)
WHERE (a.Country NOT LIKE '%INCOME%')
AND (a.Country NOT LIKE '%WORLD%')
AND (a.Country NOT LIKE '%developing%')
AND (a.Country NOT LIKE '%OECD%')
AND (a.Country NOT LIKE '%countries%')
AND (a.Country NOT LIKE '%South Asia%')
AND (a.Country NOT LIKE '%Small states%')
AND (a.Country NOT LIKE '%Euro area%')
AND (a.Country NOT LIKE '%European Union%')
AND (a.Country NOT LIKE '%North America%')")
childlabour=sqldf("SELECT * FROM childlabour WHERE Subgroup='Total 5-14 yr'")
education=sqldf("SELECT a.* FROM education a INNER JOIN (SELECT Country, MAX(Year) AS Year FROM education GROUP BY 1) b
ON (a.Country=b.Country AND a.Year=b.Year)")
data=sqldf("SELECT a.Country, a.Pop, b.Value as ChildLabour, c.Observation_Value as Education
FROM
population a INNER JOIN childlabour b
ON (a.Country=b.Country) INNER JOIN education c
ON (a.Country=c.Country)")
require(ggplot2)
require(scales)
opts=theme(
panel.background = element_rect(fill="gray98"),
panel.border = element_rect(colour="black", fill=NA),
axis.line = element_line(size = 0.5, colour = "black"),
axis.ticks = element_line(colour="black"),
panel.grid.major = element_line(colour="gray75", linetype = 2),
panel.grid.minor = element_blank(),
axis.text.y = element_text(colour="gray25", size=15),
axis.text.x = element_text(colour="gray25", size=15),
text = element_text(size=20),
legend.key = element_blank(),
legend.position = "none",
legend.background = element_blank(),
plot.title = element_text(size = 45)
)
ggplot(data, aes(x=ChildLabour/100, y=Education/100, size=log(Pop), label=Country), guide=FALSE)+
geom_point(colour="white", fill="red", shape=21, alpha=.55)+
scale_size_continuous(range=c(2,40))+
scale_x_continuous(limits=c(0,.5), labels = percent)+
scale_y_continuous(limits=c(0,.12), labels = percent)+
labs(title="The World We Live In #2: To Study Or To Work",
x="% of Child Workers between 5-14 years old",
y="Public Expenditure on Education as % of GNI")+
geom_text(data=subset(data, ChildLabour/100>.3 | Education/100>.07| Education/10<.022), size=5.5, colour="gray25", hjust=0, vjust=0)+
geom_text(aes(.2, .0), colour="gray25", hjust=0, label="Countries of the world (Source: United Nations Statistics Division) Size of bubble depending on population", size=5)+
opts

Beautiful Curves: The Harmonograph

Each of us has their own mappa mundi (Gala, my indispensable friend)

The harmonograph is a mechanism which, by means of several pendulums, draws trajectories that can be analyzed not only from a mathematical point of view but also from an artistic one. In its double pendulum version, one pendulum moves a pencil and the other one moves a platform with a piece of paper on it. You can see an example here. The harmonograph is easy to use: you only have to put pendulums into motion and wait for them to stop. The result are amazing undulating drawings like this one:

grafico0x2First harmonographs were built in 1857 by Scottish mathematician Hugh Blackburn, based on the previous work of French mathematician Jean Antoine Lissajous. There is not an unique way to describe mathematically the motion of the pencil. I have implemented the one I found in this sensational blog, where motion in both x and y axis depending on time is defined by:

x(t)=e-d1tsin(tf1+p1)+e-d2tsin(tf2+p2)
y(t)=e-d3tsin(tf3+p3)+e-d4tsin(tf4+p4)

I initialize parameters randomly so every time you run the script, you obtain a different output. Here is a mosaic with some of mine:
Collage3

This is the code to simulate the harmonograph (no extra package is required). If you create some nice work of art, I will be very happy to admire it (you can find my email here):

f1=jitter(sample(c(2,3),1));f2=jitter(sample(c(2,3),1));f3=jitter(sample(c(2,3),1));f4=jitter(sample(c(2,3),1))
d1=runif(1,0,1e-02);d2=runif(1,0,1e-02);d3=runif(1,0,1e-02);d4=runif(1,0,1e-02)
p1=runif(1,0,pi);p2=runif(1,0,pi);p3=runif(1,0,pi);p4=runif(1,0,pi)
xt = function(t) exp(-d1*t)*sin(t*f1+p1)+exp(-d2*t)*sin(t*f2+p2)
yt = function(t) exp(-d3*t)*sin(t*f3+p3)+exp(-d4*t)*sin(t*f4+p4)
t=seq(1, 100, by=.001)
dat=data.frame(t=t, x=xt(t), y=yt(t))
with(dat, plot(x,y, type="l", xlim =c(-2,2), ylim =c(-2,2), xlab = "", ylab = "", xaxt='n', yaxt='n'))

The World We Live In #1: Obesity And Cells

Lesson learned, and the wheels keep turning (The Killers – The world we live in)

I discovered this site with a huge amount of data waiting to be analyzed. The first thing I’ve done is this simple graph, where you can see relationship between cellular subscribers and obese people. Bubbles are countries and its size depends on the population:
TWWLI1
Some quick conclusions:

  • The more cellular subscribers, the more obese people
  • Pacific islands such as Kiribati, Palau and Tonga are plenty of happy people
  • Singapore people are thinner than they should be
  • How do Saudi Arabian and Panamanian manage two cellulars?

This is the world we live in.

cellular  =read.csv("UNdata_Export_20140930_cellular.csv",   nrows=193,   header=T, row.names=NULL)
obese     =read.csv("UNdata_Export_20140930_obese.csv",      nrows=567,   header=T, row.names=NULL)
population=read.csv("UNdata_Export_20140930_population.csv", nrows=12846, header=T, row.names=NULL)
require("sqldf")
require("plyr")
population=rename(population, replace = c("Country.or.Area" = "Country"))
population=sqldf("SELECT a.Country, a.Year, a.Value as Population
FROM population a INNER JOIN (SELECT Country, MAX(Year) AS Year FROM population GROUP BY 1) b
      ON (a.Country=b.Country AND a.Year=b.Year)")
cellular=rename(cellular, replace = c("Country.or.Area" = "Country"))
cellular=rename(cellular, replace = c("Value" = "Cellular"))
obese=rename(obese, replace = c("Country.or.Area" = "Country"))
obese=rename(obese, replace = c("Year.s." = "Year"))
obese=sqldf("SELECT a.Country, a.Year, SUBSTR(TRIM(Value), 1, CHARINDEX(' [', TRIM(Value))) as Obeses
FROM obese a INNER JOIN (SELECT Country, MAX(Year) AS Year FROM obese WHERE GENDER='Both sexes' GROUP BY 1) b
ON (a.Country=b.Country AND a.Year=b.Year AND a.GENDER='Both sexes')")
obese$Obeses=as.numeric(obese$Obeses)
data=sqldf("SELECT a.Country, a.Cellular, c.Obeses, b.Population FROM cellular a inner join population b on a.Country = b.Country
      inner join obese c on (a.Country = c.Country) WHERE a.Country NOT IN ('World', 'South Asia')")
require(ggplot2)
require(scales)
opts=theme(
  panel.background = element_rect(fill="gray98"),
  panel.border = element_rect(colour="black", fill=NA),
  axis.line = element_line(size = 0.5, colour = "black"),
  axis.ticks = element_line(colour="black"),
  panel.grid.major = element_line(colour="gray75", linetype = 2),
  panel.grid.minor = element_blank(),
  axis.text.y = element_text(colour="gray25", size=15),
  axis.text.x = element_text(colour="gray25", size=15),
  text = element_text(size=20),
  legend.key = element_blank(),
  legend.position = "none",
  legend.background = element_blank(),
  plot.title = element_text(size = 45)
    )
ggplot(data, aes(x=Cellular/100, y=Obeses/100, size=Population, label=Country), guide=FALSE)+
  geom_point(colour="white", fill="red", shape=21, alpha=.65)+
  scale_size_continuous(range=c(3,35))+
  scale_x_continuous(limits=c(0,2.1), labels = percent)+
  scale_y_continuous(limits=c(0,.6), labels = percent)+
  labs(title="The World We Live In #1: Obesity And Cells",
       x="Cellular Subscribers (per 100 population)",
       y="Adults aged >= 20 years who are obese (%)")+
  geom_text(data=subset(data, Cellular/100 > 1.9 | Obeses/100 > .4 | (Cellular/100 > 1.4 & Obeses/100 < .15)), size=5, colour="gray25", hjust=0, vjust=0)+
  geom_text(aes(.9, .0), colour="blue", hjust=0, label="World's Countries (Source: United Nations Statistics Division. Size of bubble depending on population", size=4)+
  opts

Complex Domain Coloring

Why don’t you stop doodling and start writing serious posts in your blog? (Cecilia, my beautiful wife)

Choose a function, apply it to a set of complex numbers, paint  the result using the HSV technique and be ready to be impressed because images can be absolutely amazing. You only need ggplot2 package and your imagination. This is what happens when function is f(x)=(1+i)log(sin((x3-1)/x)):
ComplexDomainColoring

To learn more about complex domain coloring, you can go here. If you want to try your own functions, you can find the code below. I will try to write a serious post next time but meanwhile, long live doodles!

require(ggplot2)
f = function(x) (1+1i)*log(sin((x^3-1)/x))
z=as.vector(outer(seq(-5, 5, by =.01),1i*seq(-5, 5, by =.01),'+'))
z=data.frame(x=Re(z),
y=Im(z),
h=(Arg(f(z))<0)*1+Arg(f(z))/(2*pi),
s=(1+sin(2*pi*log(1+Mod(f(z)))))/2,
v=(1+cos(2*pi*log(1+Mod(f(z)))))/2)
z=z[is.finite(apply(z,1,sum)),]
opt=theme(legend.position="none",
panel.background = element_blank(),
panel.grid = element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank(),
axis.text =element_blank())
ggplot(data=z, aes(x=x, y=y)) + geom_tile(fill=hsv(z$h,z$s,z$v))+ opt

PageRank For SQL Lovers

If you’re changing the world, you’re working on important things. You’re excited to get up in the morning (Larry Page, CEO and Co-Founder of Google)

This is my particular tribute to one of the most important, influential and life-changer R packages I have discovered in the last times: sqldf package.

Because of my job, transforming data through SQL queries is very natural for me. This, together with the power of R made this package indispensable for me since I knew of its existence.

Imagine you have a directed graph like this:PR1

Given a vertex V, these are the steps to calculate its PageRank, lets call it PR(V):

  • Initialize PR(V) to some value (I do it to 1 in my script)
  • Iterate this formula until converges: PR(V)=(1-d)+d*(PR(T1)/C(T1)+ ... +PR(Tn)/C(Tn)) where Ti are the vertex that point to V and C(Ti) is the number of edges going out of Ti

After doing this, result is:

PR2

Following you can find my code to do it with sqldf, which is quite simple from my point of view. I am pretty sure there must be some package which calculates PageRank but the main goal of this post is to show how easy is to calculate it with two simple queries, no more. The example is taken from here, where you can find a good explanation of how PageRank works:

require(sqldf)
require(igraph)
net=data.frame(origin=c("A","A","B","C","D"), end=c("C","B","C","A","C"))
par(family="serif", cex=1, ps=25, bg="white", col.lab="black", col.axis="black")
plot(graph.edgelist(as.matrix(net)), edge.arrow.size=1, vertex.color="gray90", edge.color="black")
#Initialization
netou=sqldf("SELECT origin, COUNT(*) outs FROM net GROUP BY 1")
netpr=sqldf("SELECT origin vertex, 1.0 pagerank FROM net UNION SELECT end, 1.0 FROM net")
for (i in 1:50)
{
netx1=sqldf("SELECT vertex, pagerank/outs factor FROM netou a INNER JOIN netpr b ON (a.origin = b.vertex)")
netpr=sqldf("SELECT a.vertex, 0.15+SUM(0.85*COALESCE(factor,0)) AS pagerank
FROM netpr a LEFT OUTER JOIN net b ON (a.vertex = b.end) LEFT OUTER JOIN netx1 c
ON (b.origin = c.vertex) GROUP BY 1")
}
g=graph.edgelist(as.matrix(net))
names=data.frame(vertex=V(g)$name)
V(g)$name=sqldf("SELECT a.vertex||' (PR='||ROUND(b.pagerank,2)||')' as name from names a inner join netpr b ON (a.vertex=b.vertex)")$name
plot(g, edge.arrow.size=1, vertex.color="gray90", edge.color="black")

Space Invaders

I burned through all of my extra lives in a matter of minutes, and my two least-favorite words appeared on the screen: GAME OVER (Ernest Cline, Ready Player One)

Inspired by the book I read this summer and by this previous post, I decided to draw these aliens:

Invader1 Invader3
Invader4Invader2

Do not miss to check this indispensable document to choose your favorite colors:

require("ggplot2")
require("reshape")
mars1=matrix(c(0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,1,0,0,0,0,1,0,0,0,
0,0,0,0,1,0,0,1,0,0,0,0,
0,0,0,1,1,1,1,1,1,0,0,0,
0,0,1,1,0,1,1,0,1,1,0,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,0,1,0,0,0,0,1,0,1,0,
0,1,0,0,1,0,0,1,0,0,1,0,
0,0,0,0,0,0,0,0,0,0,0,0), nrow=12, byrow = TRUE)
mars2=matrix(c(0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,1,1,0,0,0,0,0,
0,0,0,0,1,1,1,1,0,0,0,0,
0,0,0,1,1,1,1,1,1,0,0,0,
0,0,1,1,0,1,1,0,1,1,0,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,0,0,0,1,0,0,1,0,0,0,0,
0,0,0,1,0,1,1,0,1,0,0,0,
0,0,1,0,1,0,0,1,0,1,0,0,
0,1,0,1,0,0,0,0,1,0,1,0,
0,0,0,0,0,0,0,0,0,0,0,0), nrow=12, byrow = TRUE)
mars3=matrix(c(0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,1,0,0,1,0,0,0,0,
0,0,0,1,0,0,0,0,1,0,0,0,
0,0,0,1,1,1,1,1,1,0,0,0,
0,0,1,1,0,1,1,0,1,1,0,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,0,1,0,0,0,0,1,0,1,0,
0,1,0,0,1,1,1,1,0,0,1,0,
0,0,0,0,1,0,0,1,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0), nrow=12, byrow = TRUE)
mars4=matrix(c(0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,1,1,0,0,0,0,0,
0,0,0,0,1,1,1,1,0,0,0,0,
0,0,0,1,1,1,1,1,1,0,0,0,
0,0,1,1,0,1,1,0,1,1,0,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,0,0,1,0,0,1,0,0,1,0,
0,0,0,1,0,0,0,0,1,0,0,0,
0,0,0,0,1,0,0,1,0,0,0,0,
0,0,0,1,0,0,0,0,1,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0), nrow=12, byrow = TRUE)
opt=theme(legend.position="none",
panel.background = element_blank(),
panel.grid = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
axis.text = element_blank())
p1=ggplot(melt(mars1), aes(x=X2, y=X1))+geom_tile(aes(fill=jitter(value, amount=.1)), colour="gray65", lwd=.025)+
scale_fill_gradientn(colours = c("chartreuse", "navy"))+scale_y_reverse()+opt
p2=ggplot(melt(mars2), aes(x=X2, y=X1))+geom_tile(aes(fill=jitter(value, amount=.1)), colour="gray65", lwd=.025)+
scale_fill_gradientn(colours = c("olivedrab1", "magenta4"))+scale_y_reverse()+opt
p3=ggplot(melt(mars3), aes(x=X2, y=X1))+geom_tile(aes(fill=jitter(value, amount=.1)), colour="gray65", lwd=.025)+
scale_fill_gradientn(colours = c("violetred4", "yellow"))+scale_y_reverse()+opt
p4=ggplot(melt(mars4), aes(x=X2, y=X1))+geom_tile(aes(fill=jitter(value, amount=.1)), colour="gray65", lwd=.025)+
scale_fill_gradientn(colours = c("tomato4", "lawngreen"))+scale_y_reverse()+opt

Princess Jasmine’s Trick

I’m history! No, I’m mythology! Nah, I don’t care what I am; I’m free hee! (Genie, when he is released from the magical oil lamp by Aladdin)

Princess JasmineA long time ago, in a kingdom far away, lived a beautiful princess named Jasmine. There also lived a very rich and evil wizard named Jafar, who was in love with the princess. In order to married with Jasmine, Jafar bought  her father’s will with treasures, but the princess was harder to convince. One day Jafar told the princess: Request me whatever you want and if  I am able to bring it to you, you will become my wife. The princess, tired of the insistence of Jafar, answered: I only want a gold chain, but I want you to give it to me as follows: the first day I should have just one link of the chain. The second day I should have two links. The third day, three … and so on. When you give me all the links of the chain I will marry you. Jafar, intrigued, asked: But how many links should have the chain?  And Jasmine replied: I want you to give me the longest chain that allows you to pay me breaking only 30 links. Jafar began to laugh out loud as he walked away and said to the princess: Tomorrow I’ll bring you such chain!. But as he went to his palace, his happiness turned into anger: he realized that there was not enough gold in the world to build the chain that asked Jasmine.

This is my own version of one of my favorite anti-common-sense mathematical curiosities. To explain it, let me start with an example. Imagine a simple chain with 7 links. If you open the 3rd link, the you split the chain into 3 pieces: a single link (the one you opened), a piece of 2 links and another one of 4 links. You could pay to Jasmine during seven days combining these 3 pieces:

  • Day 1: Give her the single link
  • Day 2: Give her the 2-links piece and take the single link, leaving her with 2 links
  • Day 3: Give her the single link again, leaving her with 3 links
  • Day 4: Give her the 4-links piece and take all pieces she has, leaving her with 4 links
  • Day 5: Give her the single link again, leaving her with 5 links
  • Day 6: Give her the 2-links piece and take 2-links piece, leaving her with 6 links
  • Day 7: Give her the single link piece, leaving her with all links

Is easy to see that having a chain with 63 links, you could pay Jasmine breaking only 3 links (positions 5th, 14th and 31st). It easy to prove that the length of the biggest chain you can manage breaking only n links is (2n+1-1)*(n+1)+n

Next plot represents the minimum number of breaks to pay Jasmine daily for a given chain’s length. I call it the Jasmine’s Staircase:

Generated With R #rstats

Some curiosities around chains:

  • Jasmine asked Jafar a chain of 66.571.993.087 links
  • Supposing one link weights 4 grams, the chain of Jasmine would weight around 266 tons. It is supposed to be around 171 tons of gold in the world
  • If you spend 1 second to climb the first step of the staircase, you will spend 302 years to climb the step number 100

Jafar was right. Jasmine was clever:

library(sqldf)
library(ggplot2)
library(extrafont)
max.breaks=5
CalculateLength = function(n) {n+sum(sapply(0:n, function(x) 2^x*(n+1)))}
results=data.frame(breaks=1:max.breaks, length=sapply(1:max.breaks, CalculateLength))
links=data.frame(links=2:CalculateLength(max.breaks))
results=sqldf("SELECT links.links, min(results.breaks) as minbreaks FROM links, results WHERE links.links <= results.length GROUP BY 1")
opts=theme(
panel.background = element_rect(fill="mistyrose"),
panel.border = element_rect(colour="black", fill=NA),
axis.line = element_line(size = 0.5, colour = "black"),
axis.ticks = element_line(colour="black"),
panel.grid = element_line(colour="white", linetype = 2),
axis.text.y = element_text(colour="black"),
axis.text.x = element_text(colour="black"),
text = element_text(size=20, family="Humor Sans"),
plot.title = element_text(size = 40)
)
ggplot(results, aes(links,minbreaks))+
geom_area(fill="violet", alpha=.4)+
geom_step(color="violetred", lwd=1.5)+
labs(x="Chain's Length", y="Minimum Number of Breaks", title="Princess Jasmine's Staircase")+
scale_x_continuous(expand = c(0, 0), breaks = sapply(1:max.breaks, CalculateLength))+
opts

Looking For Life

 

Machines take me by surprise with great frequency (Alan Turing)

Imagine a 8×8 grid in which cells can be alive (black colored) or empty (light gray colored): Generated with R #RstatsAs with the One-dimensional Cellular Automata, the next state of a cell is a function of the states of the cell’s nearest neighbors (the only neighbors that we consider are the eight cells that form the immediate perimeter of a cell). To evolve this community of cells over time, rules are quite simple:

  • If a live cell has less than two neighbors, then it dies (loneliness)
  • If a live cell has more than three neighbors, then it dies (overcrowding)
  • If an dead cell has three live neighbors, then it comes to life (reproduction)
  • Otherwise, a cell stays as is (stasis)

These are the rules of the famous Conway’s Game Of Life, a cellular automaton invented in the late 1960s by John Conway trying to refine the description of a cellular automaton to the simplest one that could support universal computation. In 1970 Martin Gardner described Conway’s work in his Scientific American column. Gardner’s article inspired many people around the world to experiment with Conway’s, which eventually led to the final pieces of how the Game Of Life could support universal computation in what was surely a global collaborative effort.

In this experiment I will try to find interesting objects that can be found in the Game Of Life: static (remain the same over the time), periodic (change but repeating their initial configuration in some iterations) or moving objects (travel through the grid before disappearing). Why are interesting? Because these are the kind of objects required to perform computations.

The plan is simple: I will generate some random grids and will evolve them over time a significant number of times. After doing this, I will check those grids having still some alive cells inside. Will I find there what I am looking for?

I generated 81 grids in which live cells are randomly located using binomial random variables with probabilities equal to i/80 with i from 0 (all cells empty) to 80 (all cells alive). This is a quick way to try a set of populations with a wide range of live cells. I measure % of alive cells after each iteration. I will analyze those grids which still have live cells after all iterations. This is what happens after 150 iterations:

Generated with R #Rstats

I find some interesting objects. Since I keep them in my script, I can list them with ls(pattern= "Conway", all.names = TRUE). Two of them are specially interesting because are not static. Are those which produce non-constant lines in the previous plot.

First one is a periodic object which reproduces itself after every 3 iterations:

Generated with R #Rstats

Second one is a bit more complex. After 8 iterations appears rotated:

Generated with R #Rstats

What kind of calculations can be done with these objects? I don’t now yet but let’s give time to time. Do you want to look for Life?

library(ggplot2)
library(scales)
library(gridExtra)
SumNeighbors = function (m) #Summarizes number of alive neighbors for each cell
{
  rbind(m[-1,],0)+rbind(0,m[-nrow(m),])+cbind(m[,-1],0)+cbind(0,m[,-ncol(m)])+
    cbind(rbind(m[-1,],0)[,-1],0)+cbind(0, rbind(0,m[-nrow(m),])[,-nrow(m)])+
    cbind(0,rbind(m[-1,],0)[,-nrow(m)])+cbind(rbind(0,m[-nrow(m),])[,-1],0)
}
NextNeighborhood = function (m) #Calculates next state for each cell according to Conway's rules
{
  (1-(SumNeighbors(m)<2 | SumNeighbors(m)>3)*1)-(SumNeighbors(m)==2 & m==0)*1
}
splits=80 #Number of different populations to simulate. Each population s initialized randomly
          #according a binomial with probability i/splits with i from 0 to splits
iter=150
results=data.frame()
rm(list = ls(pattern = "conway")) #Remove previous solutions (don't worry!)
for (i in 0:splits)
{
  z=matrix(rbinom(size=1, n=8^2, prob=i/splits), nrow=8); z0=z
  results=rbind(results, c(i/splits, 0, sum(z)/(nrow(z)*ncol(z))))
  for(j in 1:iter)
  {z=NextNeighborhood(z); results=rbind(results, c(i/splits, j, sum(z)/(nrow(z)*ncol(z))))}
  #Save interesting solutions
  if (sum(z)/(nrow(z)*ncol(z))>0) assign(paste("Conway",format(i/splits, nsmall = 4), sep=""), z)
}
colnames(results)=c("start", "iter", "sparsity")
#Plot reults of simulation
opt1=theme(panel.background = element_rect(fill="gray85"),
          panel.grid.minor = element_blank(),
          panel.grid.major.x = element_blank(),
          panel.grid.major.y = element_line(color="white", size=.5, linetype=2),
          plot.title = element_text(size = 45, color="black"),
          axis.title = element_text(size = 24, color="black"),
          axis.text = element_text(size=20, color="black"),
          axis.ticks = element_blank(),
          axis.line = element_line(colour = "black", size=1))
ggplot(results, aes(iter, sparsity, group = start))+
  geom_path(size=.8, alpha = 0.5, colour="black")+
  scale_x_continuous("Iteration", expand = c(0, 0), limits=c(0, iter), breaks = seq(0,iter,10))+
  scale_y_continuous("Alive Cells", labels = percent, expand = c(0, 0), limits=c(0, 1), breaks = seq(0, 1,.1))+
  labs(title = "Conway's Game Of Life Simulation")+opt1
#List of all interesting solutions
ls(pattern= "Conway", all.names = TRUE)
#Example to plot the evolution of an interesting solution (in this case "Conway0.5500")
require(reshape)
opt=theme(legend.position="none",
            panel.background = element_blank(),
            panel.grid = element_blank(),
            axis.ticks=element_blank(),
            axis.title=element_blank(),
            axis.text =element_blank())
p1=ggplot(melt(Conway0.5500), aes(x=X1, y=X2))+geom_tile(aes(fill=value), colour="white", lwd=2)+
  scale_fill_gradientn(colours = c("gray85", "black"))+opt
p2=ggplot(melt(NextNeighborhood(Conway0.5500)), aes(x=X1, y=X2))+geom_tile(aes(fill=value), colour="white", lwd=2)+
  scale_fill_gradientn(colours = c("gray85", "black"))+opt
p3=ggplot(melt(NextNeighborhood(NextNeighborhood(Conway0.5500))), aes(x=X1, y=X2))+geom_tile(aes(fill=value), colour="white", lwd=2)+
  scale_fill_gradientn(colours = c("gray85", "black"))+opt
p4=ggplot(melt(NextNeighborhood(NextNeighborhood(NextNeighborhood(Conway0.5500)))), aes(x=X1, y=X2))+geom_tile(aes(fill=value), colour="white", lwd=2)+
  scale_fill_gradientn(colours = c("gray85", "black"))+opt
#Arrange four plots in a 2x2 grid
grid.arrange(p1, p2, p3, p4, ncol=2)

The Andrica’s Conjecture

Things should be as simple as possible, but not simpler (Albert Einstein)

Following with conjectures about primes, it is time for Andrica’s conjecture. The great mathematician Leonhard Euler (1707-1783) pointed: “Mathematicians have tried with no success to find some kind of order in the sequence of prime numbers and today we have reasons to believe that this is a mystery that human mind will never understand”.

In 1985, the Romanian mathematician Dorin Andrica published his conjecture, still unproved, which makes reference to gap between consecutive prime numbers. In concrete, his conjecture establishes that difference between square roots between two consecutive prime numbers is always less than 1. The highest difference encountered until now is 0.67087, located between p4=7 and p5=11.

Following you can find the plot of these differences for first 400 prime numbers:

andricaIt is very interesting how dots form hyperbolic patterns. Does not seem similar in some sense to the Ulam spiral? Primes: how challenging you are!

Two more comments:

  • It is better to find primes using matlab package than doing with schoolmath one. Reason is simple: for schoolmath package, 133 is prime!
  • Why did Andrica formulated his conjecture as √pn+1-√pn < 1 instead of √pn+1-√pn < 3/4? In terms of statistical error, the second formulation is more accurate. Maybe the charisma of number 1 is hard to avoid.

This is the code. I learned how to insert mathematical expressions inside a ggplot chart:

library(matlab)
library(ggplot2)
ubound=2800
primes=primes(ubound)
andrica=data.frame(X=seq(1:(length(primes)-1)), Y=diff(sqrt(primes)))
opt=theme(panel.background = element_rect(fill="gray92"),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(color="white", size=1.5),
plot.title = element_text(size = 45),
axis.title = element_text(size = 28, color="gray35"),
axis.text = element_text(size=16),
axis.ticks = element_blank(),
axis.line = element_line(colour = "white"))
ggplot(andrica, aes(X, Y, colour=Y))+geom_point(size=5, alpha=.75)+
scale_colour_continuous(guide = FALSE)+
scale_x_continuous("n", limits=c(0, length(primes)-1), breaks = seq(0,length(primes)-1,50))+
scale_y_continuous(expression(A[n]==sqrt(p[n+1])-sqrt(p[n])), limits=c(0, .75), breaks = seq(0,.75,.05))+
labs(title = "The Andrica's Conjecture")+
opt

Summer Summary

The universe is full of magical things patiently waiting for our wits to grow sharper (Eden Phillpots)

I launched this blog 7 months ago and published 30 posts during this time. These are some of my figures until now:

My favourite post? I don’t really know, but I am very proud of this one and this one. I have received more positive critics than negative ones and the future sounds good: I have lots of experiments in my head to try with R.

Thanks a lot.

map