Monthly Archives: September 2015

A Segmentation Of The World According To Migration Flows ft. Leaflet

Up in the sky you just feel fine, there is no running out of time and you never cross a line (Up In The Sky, 77 Bombay Street)

In this post I analyze two datasets from Enigma:

  • Migration flows: Every 10 years, since 1960, the World Bank estimates migrations worldwide (267.960 rows)
  • World population: Values and percentages of populations for each nation examined beginning in year 1960, by the World Bank’s Health, Nutrition and Population project (4.168.185 rows)

Since the second dataset is very large, I load it into R using fread function of data.table package, which is extremely fast. To filter datasets, I also use dplyr and pipes of magrittr package (my life changed since I discovered it).

To build a comparable indicator across countries, I divide migration flows (from and to each country) by the mean population in each decade. I do this because migration flows are aggregated for each decade since 1960. For example, during the first decade of 21st century, Argentina reveived 1.537.850 inmigrants, which represents a 3,99% of the mean population of the country in this decade. In the same period, inmigration to Burundi only represented a 0,67% of its mean population.

What happened in the whole world in that decade? There were around 166 million people who moved to other countries. It represents a 2.58% of the mean population of the world. I use this figure to divide countries into four groups:

  • Isolated: countries with both % of inmigrants and % of migrants under 2.58%
  • Emitter: countries with % of inmigrants under 2.58% and % of migrants over 2.58%
  • Receiver: countries with % of inmigrants over 2.58% and % of migrants under 2.58%
  • Transit: countries with both % of inmigrants and % of migrants over 2.58%

To create the map I use leaflet package as I did in my previous post. Shapefile of the world can be downloaded here. This is how the world looks like according to this segmentation:

Migration Flows

Some conclusions:

  • There are just sixteen receiver countries: United Arab Emirates, Argentina, Australia, Bhutan, Botswana, Costa Rica, Djibouti, Spain, Gabon, The Gambia, Libya, Qatar, Rwanda, Saudi Arabia, United States and Venezuela
  • China and India (the two most populous countries in the world) are isolated
  • Transit countries are concentrated in the north hemisphere and most of them are located in cold latitudes
  • There are six emitter countries with more than 30% of emigrants between 2000 and 2009: Guyana, Tonga, Tuvalu, Jamaica, Bosnia and Herzegovina and Albania

This is the code you need to reproduce the map:

library(data.table)
library(dplyr) 
library(leaflet)
library(rgdal)
library(RColorBrewer)
setwd("YOU WORKING DIRECTORY HERE")
populflows = read.csv(file="enigma-org.worldbank.migration-remittances.migrants.migration-flow-c57405e33412118c8757b1052e8a1490.csv", stringsAsFactors=FALSE)
population = fread("enigma-org.worldbank.hnp.data-eaa31d1a34fadb52da9d809cc3bec954.csv")
# Population
population %>% 
  filter(indicator_name=="Population, total") %>% 
  as.data.frame %>% 
  mutate(decade=(year-year%%10)) %>% 
  group_by(country_name, country_code, decade) %>% 
  summarise(avg_pop=mean(value)) -> population2
# Inmigrants by country
populflows %>% filter(!is.na(total_migrants)) %>% 
  group_by(migration_year, destination_country) %>% 
  summarise(inmigrants = sum(total_migrants))  %>% 
  merge(population2, by.x = c("destination_country", "migration_year"), by.y = c("country_name", "decade"))  %>% 
  mutate(p_inmigrants=inmigrants/avg_pop) -> inmigrants
# Migrants by country
populflows %>% filter(!is.na(total_migrants)) %>% 
  group_by(migration_year, country_of_origin) %>% 
  summarise(migrants = sum(total_migrants)) %>%  
  merge(population2, by.x = c("country_of_origin", "migration_year"), by.y = c("country_name", "decade"))  %>%
  mutate(p_migrants=migrants/avg_pop) -> migrants
# Join of data sets
migrants %>% 
  merge(inmigrants, by.x = c("country_code", "migration_year"), by.y = c("country_code", "migration_year")) %>%
  filter(migration_year==2000) %>% 
  select(country_of_origin, country_code, avg_pop.x, migrants, p_migrants, inmigrants, p_inmigrants) %>% 
  plyr::rename(., c("country_of_origin"="Country", 
                    "country_code"="Country.code", 
                    "avg_pop.x"="Population.mean",
                    "migrants"="Total.migrants",
                    "p_migrants"="p.of.migrants",
                    "inmigrants"="Total.inmigrants",
                    "p_inmigrants"="p.of.inmigrants")) -> populflows2000
# Threshold to create groups
populflows2000 %>% 
  summarise(x=sum(Total.migrants), y=sum(Total.inmigrants), z=sum(Population.mean)) %>% 
  mutate(m=y/z) %>% 
  select(m)  %>% 
  as.numeric -> avg
# Segmentation
populflows2000$Group="Receiver"
populflows2000[populflows2000$p.of.migrants>avg & populflows2000$p.of.inmigrants>avg, "Group"]="Transit"
populflows2000[populflows2000$p.of.migrants<avg & populflows2000$p.of.inmigrants<avg, "Group"]="Isolated" 
populflows2000[populflows2000$p.of.migrants>avg & populflows2000$p.of.inmigrants<avg, "Group"]="Emitter" 
#Loading shapefile from http://data.okfn.org/data/datasets/geo-boundaries-world-110m 
countries=readOGR("json/countries.geojson", "OGRGeoJSON") 
# Join shapefile and enigma information 
joined=merge(countries, populflows2000, by.x="wb_a3", by.y="Country.code", all=FALSE, sort = FALSE) 
joined$Group=as.factor(joined$Group) 
# To define one color by segment 
factpal=colorFactor(brewer.pal(4, "Dark2"), joined$Group) 
leaflet(joined) %>%
  addPolygons(stroke = TRUE, color="white", weight=1, smoothFactor = 0.2, fillOpacity = .8, fillColor = ~factpal(Group)) %>%
  addTiles() %>%
  addLegend(pal = factpal, values=c("Emitter", "Isolated", "Receiver", "Transit"))

A Simple Interactive Map Of US Prisons With Leaflet

The love of one’s country is a splendid thing. But why should love stop at the border? (Pablo Casals, Spanish cellist)

Some time ago, I discovered Enigma, an amazing open platform that unifies billions of records from thousands of government sources to make the world of public data universally accessible and useful. This is the first experiment I have done using data from Enigma. This is what I did:

  1. Create a free account, search and download data. Save the csv file in your working directory. File contains information about all prison facilities in the United States (private and state run) as recorded by the Department of Corrections in each state. Facility types, names, addresses (or lat/long coordinates) ownership names and detailed. In sum, there is information about 1.248 prison facilities.
  2. Since most of the prisons of the file do not contain geographical coordinates, I obtain latitude and longitude using geocode function from ggmap package. This step takes some time. I also remove closed facilities. Finally, I obtain a data set with complete information of 953 prison facilities.
  3. After cleaning and filling out data, generating the map is very easy using leaflet package for R. I create a column named popup_info pasting name and address to be shown in the popup. Instead using default OpenStreetMap basemap I use a CartoDB one.

In my opinion, resulting map is very appealing with a minimal effort. Since I cannot embed the map here, this is a screenshot of it:

jailsThis plot could be a good example of visual correlation, because it depends on this. Here you have the code. To see the map in your browser, press Show in new window option, a little arrow on the upper left side of the RStudio viewer window:

library(dplyr)
library(ggmap)
library(leaflet)
setwd("YOUR WORKING DIRECTORY HERE")
prisons = read.csv(file="enigma-enigma.prisons.all-facilities-bd6a927c4024c16d8ba9e21d52292b0f.csv", stringsAsFactors=FALSE)
prisons %>% 
  mutate(address=paste(facility_address1, city, state, zip, "EEUU", sep=", ")) %>%
  select(address) %>% 
  lapply(function(x){geocode(x, output="latlon")})  %>% 
  as.data.frame %>% 
  cbind(prisons) -> prisons
prisons %>%  
  mutate(popup_info=paste(sep = "<br/>", paste0("<b>", facility_name, "</b>"), facility_address1, city, state, zip)) %>% 
  mutate(lon=ifelse(is.na(longitude), address.lon, longitude),
         lat=ifelse(is.na(latitude),  address.lat, latitude)) %>%
  filter(!is.na(lon) & !grepl("CLOSED", facility_name)) -> prisons
leaflet(prisons) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addCircleMarkers(lng = ~lon, 
                   lat = ~lat, 
                   radius = 3, 
                   color = "red",
                   stroke=FALSE,
                   fillOpacity = 0.5,
                   popup = ~popup_info)