# Size Doesn’t Matter

An invisible red thread connects those destined to meet, regardless of time, place or circumstances. The thread may stretch or tangle, but never break (Ancient Chinese Legend)

I use to play once a year with my friends to Secret Santa (in Spain we call it Amigo Invisible). As you can read in Wikipedia:

To decide who gives whom, every year is the same: one of us introduces small papers in a bag with the names of participants (one name per paper). Then, each of us picks one paper and sees the name privately. If no one picks their own name,  the distribution is valid. If not, we have to start over. Every year we have to repeat process several times until obtaining a valid distribution. Why? Because we are victims of The Matching Problem.

Following the spirit of this talk I have done 16 simulations of the matching problem (for 10, 20, 30 … to 160 items). For example, given n items, I generate 5.000 random vectors sampling without replacement the set of natural numbers from 1 to n. Comparing these random vectors with the ordered one (1,2, …, n) I obtain number of matchings (that is, number of times where ith element of the random vector is equal to i). This is the result of the experiment:

In spite of each of one represents a different number of matchings, all plots are extremely similar. All of them say that probability of not matching any two identical items is around 36% (look at the first bar of all of them). In concrete terms, this probability tends to `1/e` (=36,8%) as n increases but does it very quickly.

This result is shocking. It means that if some day the 7 billion people of the world agree to play Secret Santa all together (how nice it would be!), the probability that at least one person chooses his/her own name is around 2/3. Absolutely amazing.

This is the code (note: all lines except two are for plotting):

```library(ggplot2)
library(scales)
library(RColorBrewer)
library(gridExtra)
library(extrafont)
results=data.frame(size=numeric(0), x=numeric(0))
for (i in seq(10, by=10, length.out = 16)){results=rbind(results, data.frame(size=i, x=replicate(5000, {sum(seq(1:i)-sample(seq(1:i), size=i, replace=FALSE)==0)})))}
opts=theme(
panel.background = element_rect(fill="gray98"),
panel.border = element_rect(colour="black", fill=NA),
axis.line = element_line(size = 0.5, colour = "black"),
axis.ticks = element_line(colour="black"),
panel.grid.major.y = element_line(colour="gray80"),
panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank(),
axis.text.y = element_text(colour="gray25", size=15),
axis.text.x = element_text(colour="gray25", size=15),
text = element_text(family="Humor Sans", size=15, colour="gray25"),
legend.key = element_blank(),
legend.position = "none",
legend.background = element_blank(),
plot.title = element_text(size = 18))
sizes=unique(results\$size)
for (i in 1:length(sizes))
{
data=subset(results, size==sizes[i])
assign(paste("g", i, sep=""),
ggplot(data, aes(x=as.factor(x), weight=1/nrow(data)))+
geom_bar(binwidth=.5, fill=sample(brewer.pal(9,"Set1"), 1), alpha=.85, colour="gray50")+
scale_y_continuous(limits=c(0,.4), expand = c(0, 0), "Probability", labels = percent)+
scale_x_discrete(limit =as.factor(0:8), expand = c(0, 0), "Number of matches")+
labs(title = paste("Matching", as.character(sizes[i]), "items ...", sep=" "))+
opts)
}
grid.arrange(g1, g2, g3, g4, g5, g6, g7, g8, g9, g10, g11, g12, g13, g14, g15, g16, ncol=4)
```

## 4 thoughts on “Size Doesn’t Matter”

1. Fábio says:

Interesting post ! I would just correct, that in spanish secret santa is called “amigo secreto” (at least in Latin America).
Greetings from Chile!

1. Thanks! In Spain is called “amigo invisible” 🙂

2. carsjam says:

Felices fiestas to you both.
My family does this in Canada, and we just call it the Christmas exchange.
For some reason, 4 years out of 5, I draw my sister-in-law’s name which displeases me a great deal since she either hates what I give her or sends me a hint consisting of one item which turns out to be impossible to find (such as the infamous Presto Salad Shooter saga from Christmas 2013).
There are around 10 family members in the draw.
I imagine the probability of someone drawing the same person 2 years in a row would be similar to your results, around 2/3 of the time, but what of the conditional probability situation I describe above?
Best,
cj