Updated Circular Plots for Directional Bilateral Migration Data

I have had a few emails recently regarding plots from my new working paper on global migration flows, which has received some media coverage here, here and here. The plots were created using Zuguang Gu’s excellent circlize package and are modified version of those discussed in an earlier blog post. In particular, I have made four changes:

  1. I have added arrow heads to better indicate the direction of flows, following the example in Scientific American.
  2. I have reorganized the sectors on the outside of the circle so that in each the outflows are plotted first (largest to smallest) followed by the inflows (again, in size order). I prefer this new layout (previously the inflows were plotted first) as it allows the time sequencing of migration events (a migrant has to leave before they can arrive) to match up with the natural tendency for most to read from left to right.
  3. I have cut out the white spaces that detached the chords from the outer sector. To my eye, this alteration helps indicate the direction of the flow and gives a cleaner look.
  4. I have kept the smallest flows in the plot, but plotted their chords last, so that the focus is maintained on the largest flows. Previously smaller flows were dropped according to an arbitrary cut off, which meant that the sector pieces on the outside of the circle no longer represented the total of the inflows and outflows.

Combined, these four modifications have helped me when presenting the results at recent conferences, reducing the time I need to spend explaining the plots and avoiding some of the confusion that occasionally occurred with the direction of the migration flows.

If you would like to replicate one of these plot, you can do so using estimates of the minimum migrant transition flows for the 2010-15 period and the demo R script in my migest package;

# install.packages("migest")
# install.packages("circlize")
library("migest")
demo(cfplot_reg2, package = "migest", ask = FALSE)

which will give the following output:

Estimated Global Migration Flows 2010-15

The code in the demo script uses the chordDiagram function, based on a recent update to the circlize package (0.3.7). Most likely you will need to either update or install the package (uncomment the install.packages lines in the code above).

If you want to view the R script in detail to see which arguments I used, then take a look at the demo file on GitHub here. I provide some comments (in the script, below the function) to explain each of the argument values.

Save and view a PDF version of the plot (which looks much better than what comes up in my non-square RStudio plot pane) using:

dev.copy2pdf(file = "cfplot_reg2.pdf", height=10, width=10)
file.show("cfplot_reg2.pdf")

 

Shiny App for the Wittgenstein Centre Population Projections

A few weeks ago a new version of the the Wittgenstein Centre Data Explorer was launched. The data explorer is intended to disseminate the results of a recent global population projection exercise which uniquely incorporates level of education (as well as age and sex) and the scientific input of more than 500 population experts around the world. Included are the projected populations used in the 5th assessment report of the Intergovernmental Panel on Climate Change (IPCC).

Untitled

Over the past year or so I have been working (on and off) with the data lab team to create a shiny app, on which the data explorer is based. All the code and data is available on my github page. Below are notes to summarise some of the lessons I learnt:

1. Large data

We had a pretty large amount of data to display (31 indicators based on up to 7 scenarios x 26 time periods x 223 geographical areas x 21 age groups x 2 genders x 7 education categories)… so somewhere over 8 million rows for some indicators. Further complexity was added by the fact that some indicators were by definition not available for some dimensions of the data, for example, population median age is not available by age group. The size and complexity meant that data manipulations were a big issue. Using read.csv to load the data didn’t really cut the mustard, taking over 2 minutes when running on the server. The fantastic saves package and ultra.fast=TRUE argument in the loads function came to the rescue, alongside some pre-formatting to avoid as much joining and reshaping of the data on the server as possible. This cut load times to a couple of seconds at most, and allowed the app to work with the indicator variables on the fly as demanded by the user selections. Once the data was in, the more than awesome dplyr functions finished the data manipulations jobs in style. I am sure there is some smarter way to get everything running a little bit quicker than it does now, but I am pretty happy with the present speed, given the initial waiting times.

2. googleVis and gvisMerge

It’s a demographic data explorer, which means population pyramids have to pop-up somewhere. We needed pyramids that illustrate population sizes by education level, on top of the standard age and sex breakdown. Static versions of the education pyramids in the explorer have previously been used by my colleagues to illustrate past and future populations. For the graphic explorer I created some interactive versions, for side-by-side comparisons over time and between countries, and which also have some tool tip features. These took a little while to develop. I played with ggvis but couldn’t get my bar charts to go horizontal. I also took a look at some other functions for interactive pyramids but I couldn’t figure out a way to overlay the educational dimension. I found a solution by creating gender specific stacked bar charts from gvisBarChart in the googleVis package and then gvisMerge to bring them together in one plot. As with the data tables, they take a second or so render, so I added a withProgress bar to try and keep the user entertained.

I could not figure out a way in R to convert the HTML code outputted by the gvisMerge function to a familiar file format for users to download. Instead I used a system call to the wkhtmltopdf program to return PDF or PNG files. By default, wkhtmltopdf was a bit hit and miss, especially with converting the more complex plots in the maps to PNG files. I found setting --enable-javascript --javascript-delay 2000 helped in many cases.

3. The shiny user community

I asked questions using the shiny tag on stackoverflow and the shiny google group a number of times. A big thank you to everyone who helped me out. Browsing through other questions and answers was also super helpful. I found this question on organising large shiny code particularly useful. Making small changes during the reviewing process became a lot easier once I broke the code up across multiple .R files with sensible names.

4. Navbar Pages

When I started building the shiny app I was using a single layout with a sidebar and tabbed pages to display data and graphics (using tabsetPanel()), adding extra tabs as we developed new features (data selection, an assumption data base, population pyramids, plots of population size, maps, FAQ’s, etc, etc.). As these grew, the switch to the new Navbar layout helped clean up the appearance and provide a better user experience, where you can move between data, graphics and background information using the bar at the top of page.

5. Shading and link buttons

I added some shading and buttons to help navigate through the data selection and between different tabs. For the shading I used cssmatic.com to generate the colour of a fluidRow background. The code generated there was copy and pasted into a tags$style element for my defined row myRow1, as such;

library(shiny)
runApp(list(
  ui = shinyUI(fluidPage(
    br(),
    fluidRow(
      class = "myRow1", 
      br(),
      selectInput('variable', 'Variable', names(iris))
    ),
    tags$style(".myRow1{background: rgba(212,228,239,1); 
                background: -moz-linear-gradient(left, rgba(212,228,239,1) 0%, rgba(44,146,208,1) 100%);
                background: -webkit-gradient(left top, right top, color-stop(0%, rgba(212,228,239,1)), color-stop(100%, rgba(44,146,208,1)));
                background: -webkit-linear-gradient(left, rgba(212,228,239,1) 0%, rgba(44,146,208,1) 100%);
                background: -o-linear-gradient(left, rgba(212,228,239,1) 0%, rgba(44,146,208,1) 100%);
                background: -ms-linear-gradient(left, rgba(212,228,239,1) 0%, rgba(44,146,208,1) 100%);
                background: linear-gradient(to right, rgba(212,228,239,1) 0%, rgba(44,146,208,1) 100%);
                filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#d4e4ef', endColorstr='#2c92d0', GradientType=1 );
                border-radius: 10px 10px 10px 10px;
                -moz-border-radius: 10px 10px 10px 10px;
                -webkit-border-radius: 10px 10px 10px 10px;}")
  )),
  server = function(input, output) {
  }
))

I added some buttons to help novice users switch between tabs once they had selected or viewed their data. It was a little tougher to implement than the shading, and in the end I need a little help. I used bootsnipp.com to add some icons and define the style of the navigation buttons (using the tags$style element again).

That is about it for the moment. I might add a few more notes to this post as they occur to me… I would encourage anyone who is tempted to learn shiny to take the plunge. I did not know JavaScript or any other web languages before I started, and I still don’t… which is the great beauty of the shiny package. I started with the RStudio tutorials, which are fantastic. The R code did not get a whole lot more complex than what I learnt there, even though the shiny app is quite large in comparisons to most others I have seen.

Any comments or suggestions for improving website are welcome.

2014 World Cup Squads

I have been having a go in R at visualising player movements for the World Cup. I wanted to use similar plots to those used to visualise international migration flows in the recent Science paper that I co-authored. In the end I came up with two plots. The first, and more complex one, is based on a non-square matrix of leagues system of players clubs by their national team.

gjabelwc2014t3
You can zoom in and out if you click on the image.

Colours are based on the shirt of each team in the 2014 World Cup. Lines represent the connections between the country in which players play their club football (at the lines base) and their national teams (at the arrow head). Line thickness represent number of players. It’s a little cluttered, but shows nicely how many players in the English, Italian, Spanish and French leagues are involved in the world cup. It also highlights well some countries where almost all the players are at clubs abroad, for example most of the players in the African squads.

Whilst the first plot gave a lot of detail, I wanted to visualise the broader interactions, so I aggregated over leagues systems and national squads by regional confederations. This gives a square matrix:

> m
          squad
league     AFC CONCACAF CONMEBOL CAF UEFA
  AFC       49        2        1   3    1
  CONCACAF   0       13        0   0    0
  CONMEBOL   2        0       54  11    0
  CAF        0        0        0  36    0
  UEFA      41       99       37  86  296

The plot of which looks like:
gjabelwc2014r

This type of aggregation works really well to show how few European national players play elsewhere (only Zvjezdan Misimovic in all the European World Cup squads). It also provides a way to compare the share of non-European players plying their trade in the European leagues to those in more local leagues within their confederation.

I scraped the data from the provisional squads on Wikipedia, and then created the images with the circlize package. All the code to reproduce the plots + scraping the Wikipedia squad pages are on the my github.

Demo file for the fanplot package

I have added a demo file to the latest version of the fanplot package. It has lots of examples of different plotting styles to represent uncertainty in time series data. In the updated package I have added functionality to plot fan charts based on irregular time series objects from the zoo package, plus the use of alternative colour palettes from the RColorBrewer and colorspace packages. All plots are based on the th.mcmc object, the estimated posterior distributions of the volatility in daily returns from the Pound/Dollar exchange rate from 02/10/1981 to 28/6/1985. To run the demo file from your R console (ensure fanplot, zoo, tsbugs, RColorBrewer and colorspace packages are all installed beforehand);

# if you want plots in separate graphic devices 
# do not run this first line...
par(mfrow = c(10,2))
# run demo
demo(sv_fan, package = "fanplot", ask = FALSE)

The demo script should output this set of plots:
svplot
If you wish, click on the image above and take a closer look in your browser. In R, you can save the PDF version of all the plots on one graphics device (which looks much better than what comes up in my R graphics device):

dev.copy2pdf(file = "svplots.pdf", height = 50, width = 10)

You can also view the demo file for a closer look at the arguments used in each plot:

file.show(system.file("demo/sv_fan.R", package = "fanplot"))

Circular Migration Flow Plots in R

A article of mine was published in Science today. It introduces estimates for bilateral global migration flows between all countries. The underlying methodology is based on the conditional maximisation routine in my Demographic Research paper. However, I tweaked the demographic accounting which ensures the net migration in the estimated migration flow tables matches very closely to the net migration figures from the United Nations.

My co-author, Nikola Sander, developed some circular plots for the paper based on circos in perl. A couple of months back, after the paper was already in the submission process, I figured out how to replicate these plots in R using the circlize package. Zuguang Gu, the circlize package developer was very helpful, responding quickly (and with examples) to my emails.

To demonstrate, I have put two demo files in my migest R package. For the estimates of flows by regions, users can hopefully replicate the plots (so long as the circlize and plyr packages are installed) using:

library("migest")
demo(cfplot_reg, package = "migest", ask = FALSE)

It should result in the following plot:
cfplot_reg
The basic idea of the plot is to show simultaneously the relative size of estimated flows between regions. The origins and destinations of migrants are represented by the circle’s segments, where nearby regions are positioned close to each other. The size of the estimated flow is indicated by the width of the link at its bases and can be read using the tick marks (in millions) on the outside of the circle’s segments. The direction of the flow is encoded both by the origin colour and by the gap between link and circle segment at the destination.

You can save the PDF version of the plot (which looks much better than what comes up in my R graphics device) using:

dev.copy2pdf(file = "cfplot_reg.pdf", height=10, width=10)

If you want to view the R script:

file.show(system.file("demo/cfplot_reg.R", package = "migest"))

In Section 5 of our Vienna Institute of Demography Working Paper I provide a more detailed breakdown for the R code in the demo files.

A similar demo with slight alterations to the labelling is also available for a plot of the largest country to country flows:

demo(cfplot_nat, package = "migest", ask = FALSE)

cfplot_nat

If you are interested in the estimates, you can fully explore in the interactive website (made using d3.js) at http://global-migration.info/. There is also a link on the website to download all the data. Ramon Bauer has a nice blog post explaining the d3 version.

Publication Details:

Abel, G.J. and Sander, N. (2014). Quantifying Global International Migration Flows. Science. 343 (6178), 1520–1522.

Widely available data on the number of people living outside of their country of birth do not adequately capture contemporary intensities and patterns of global migration flows. We present data on bilateral flows between 196 countries from 1990 through 2010 that provide a comprehensive view of international migration flows. Our data suggest a stable intensity of global 5-year migration flows at ~0.6% of world population since 1995. In addition, the results aid the interpretation of trends and patterns of migration flows to and from individual countries by placing them in a regional or global context. We estimate the largest movements to occur between South and West Asia, from Latin to North America, and within Africa.