I have had a few emails recently regarding plots from my new working paper on global migration flows, which has received some media coverage here, here and here. The plots were created using Zuguang Gu’s excellent circlize package and are a modified version of those discussed in an earlier blog post. In particular, I have made four changes:
- I have added arrow heads to better indicate the direction of flows, following the example in Scientific American.
- I have reorganized the sectors on the outside of the circle so that in each the outflows are plotted first (largest to smallest) followed by the inflows (again, in size order). I prefer this new layout (previously the inflows were plotted first) as it allows the time sequencing of migration events (a migrant has to leave before they can arrive) to match up with the natural tendency for most to read from left to right.
- I have cut out the white spaces that detached the chords from the outer sector. To my eye, this alteration helps indicate the direction of the flow and gives a cleaner look.
- I have kept the smallest flows in the plot, but plotted their chords last, so that the focus is maintained on the largest flows. Previously smaller flows were dropped according to an arbitrary cut off, which meant that the sector pieces on the outside of the circle no longer represented the total of the inflows and outflows.
Combined, these four modifications have helped me when presenting the results at recent conferences, reducing the time I need to spend explaining the plots and avoiding some of the confusion that occasionally occurred with the direction of the migration flows.
If you would like to replicate one of these plot, you can do so using estimates of the minimum migrant transition flows for the 2010-15 period and the demo R script in my migest package;
demo(cfplot_reg2, package = "migest", ask = FALSE)
which will give the following output:
The code in the demo script uses the
chordDiagram function, based on a recent update to the circlize package (0.3.7). Most likely you will need to either update or install the package (uncomment the
install.packages lines in the code above).
If you want to view the R script in detail to see which arguments I used, then take a look at the demo file on GitHub here. I provide some comments (in the script, below the function) to explain each of the argument values.
Save and view a PDF version of the plot (which looks much better than what comes up in my non-square RStudio plot pane) using:
dev.copy2pdf(file ="cfplot_reg2.pdf", height=10, width=10)
I have been having a go in R at visualising player movements for the World Cup. I wanted to use similar plots to those used to visualise international migration flows in the recent Science paper that I co-authored. In the end I came up with two plots. The first, and more complex one, is based on a non-square matrix of leagues system of players clubs by their national team.
You can zoom in and out if you click on the image.
Colours are based on the shirt of each team in the 2014 World Cup. Lines represent the connections between the country in which players play their club football (at the lines base) and their national teams (at the arrow head). Line thickness represent number of players. It’s a little cluttered, but shows nicely how many players in the English, Italian, Spanish and French leagues are involved in the world cup. It also highlights well some countries where almost all the players are at clubs abroad, for example most of the players in the African squads.
Whilst the first plot gave a lot of detail, I wanted to visualise the broader interactions, so I aggregated over leagues systems and national squads by regional confederations. This gives a square matrix:
league AFC CONCACAF CONMEBOL CAF UEFA
AFC 49 2 1 3 1
CONCACAF 0 13 0 0 0
CONMEBOL 2 0 54 11 0
CAF 0 0 0 36 0
UEFA 41 99 37 86 296
The plot of which looks like:
This type of aggregation works really well to show how few European national players play elsewhere (only Zvjezdan Misimovic in all the European World Cup squads). It also provides a way to compare the share of non-European players plying their trade in the European leagues to those in more local leagues within their confederation.
I scraped the data from the provisional squads on Wikipedia, and then created the images with the circlize package. All the code to reproduce the plots + scraping the Wikipedia squad pages are on the my github.
A paper based on my Ph.D. has been published in the Journal of the Royal Statistical Society: Series A (Statistics in Society). It is essentially a boiled down version of my Ph.D. thesis without some of the earlier chapters. The idea was to come up with some comparable estimates of bilateral migration flows, which currently do not exist. I used some modern optimisation methods to harmonise existing migration flow data, and then the EM algorithm to derive some model based imputations where there is no existing flow data. Below are the results I got for the EU15, 2002-2006 (use the tabs at the bottom to view different years).
If you want to download the data, go to the Google spreadsheet here.
Abel, G. J (2010) Estimation of international migration flow tables in Europe. Journal of the Royal Statistical Society: Series A (Statistics in Society), Volume 173 Issue 4, Pages 797–825.
A methodology is developed to estimate comparable international migration flows between a set of countries. International migration flow data may be missing, reported by the sending country, reported by the receiving country or reported by both the sending and the receiving countries. For the last situation, reported counts rarely match owing to differences in definitions and data collection systems. We report counts harmonized by using correction factors estimated from a constrained optimization procedure. Factors are applied to scale data that are known to be of a reliable standard, creating an incomplete migration flow table of harmonized values. Cells for which no reliable reported flows exist are then estimated from a negative binomial regression model fitted by using an expectation–maximization (EM) type of algorithm. Covariate information for this model is drawn from international migration theory. Finally, measures of precision for all missing cell estimates are derived by using the supplemented EM algorithm. Recent data on international migration between countries in Europe are used to illustrate the methodology. The results represent a complete table of comparable flows which can be used by regional policy makers and social scientists to understand population behaviour and change better.
International migration flow data is a messy topic. No single pair of countries defines migration in the same way. Even if the did they most likely measure if differently. This causes some big headaches to anyone who wants to create any inference about migration levels, directions, policy implications or the cause and consequences of people’s movements at a cross national level. During my Ph.D. I worked on methods for estimating comparable international migration flows across multiple European countries.
I identified two fundamental data problems: inconsistency (countries with conflicting reports on the number of people moving between them) of and incompleteness (countries not providing any data). I applied both mathematical and statistical methods to create comparable set of international migration flow estimates. For more details see my Ph.D. dissertation (which is online, see the link below). It contains most of the R/S-Plus code to conduct the estimation in the Appendix. Note, there is also a published paper based on my Ph.D. (abstract and links here). I created a TeX template for the University of Southampton School of Social Sciences here.
Abel, G. J. (2009). International Migration Flow Table Estimation. University of Southampton, Division of Social Statistics, Doctoral Thesis.