Updated Circular Plots for Directional Bilateral Migration Data

I have had a few emails recently regarding plots from my new working paper on global migration flows, which has received some media coverage here, here and here. The plots were created using Zuguang Gu’s excellent circlize package and are modified version of those discussed in an earlier blog post. In particular, I have made four changes:

  1. I have added arrow heads to better indicate the direction of flows, following the example in Scientific American.
  2. I have reorganized the sectors on the outside of the circle so that in each the outflows are plotted first (largest to smallest) followed by the inflows (again, in size order). I prefer this new layout (previously the inflows were plotted first) as it allows the time sequencing of migration events (a migrant has to leave before they can arrive) to match up with the natural tendency for most to read from left to right.
  3. I have cut out the white spaces that detached the chords from the outer sector. To my eye, this alteration helps indicate the direction of the flow and gives a cleaner look.
  4. I have kept the smallest flows in the plot, but plotted their chords last, so that the focus is maintained on the largest flows. Previously smaller flows were dropped according to an arbitrary cut off, which meant that the sector pieces on the outside of the circle no longer represented the total of the inflows and outflows.

Combined, these four modifications have helped me when presenting the results at recent conferences, reducing the time I need to spend explaining the plots and avoiding some of the confusion that occasionally occurred with the direction of the migration flows.

If you would like to replicate one of these plot, you can do so using estimates of the minimum migrant transition flows for the 2010-15 period and the demo R script in my migest package;

# install.packages("migest")
# install.packages("circlize")
demo(cfplot_reg2, package = "migest", ask = FALSE)

which will give the following output:

Estimated Global Migration Flows 2010-15

The code in the demo script uses the chordDiagram function, based on a recent update to the circlize package (0.3.7). Most likely you will need to either update or install the package (uncomment the install.packages lines in the code above).

If you want to view the R script in detail to see which arguments I used, then take a look at the demo file on GitHub here. I provide some comments (in the script, below the function) to explain each of the argument values.

Save and view a PDF version of the plot (which looks much better than what comes up in my non-square RStudio plot pane) using:

dev.copy2pdf(file ="cfplot_reg2.pdf", height=10, width=10)

Global Bilateral International Migration Flows

A few months ago, Demographic Research published my paper on estimating global migration flow tables. In the paper I developed a method to estimate international migrant flows, for which there is limited comparable data, to matches changes in migrant stock data, which are more widely available. The result was bilateral tables of estimated international migrant transitions between 191 countries for four decades, which I believe are a first of kind. The estimates in an excel spreadsheet are available as a additional file on the journal website. The abstract and citation details are at the bottom of this post.

My migest R package contains the ffs function for the flows-from-stock method used in the paper. To demonstrate, consider two hypothetical migrant stock tables I use in the paper, where rows represent place of birth and columns represent place of residence. The first stock table represents the distributions of migrant stocks at the start of the period. The second represents the distributions at the end of the period.

> # create P1 and P2 stock tables
> dn <- LETTERS[1:4]
> P1 <- matrix(c(1000, 100, 10, 0,
+                55, 555, 50, 5, 
+                80, 40, 800, 40, 
+                20, 25, 20, 200), 
+              nrow=4, ncol=4, byrow = TRUE,
+              dimnames = list(pob = dn, por = dn))
> P2 <- matrix(c(950, 100, 60, 0, 
+                80, 505, 75, 5, 
+                90, 30, 800, 40,
+                40, 45, 0, 180),
+              nrow=4, ncol=4, byrow = TRUE,
+              dimnames = list(pob = dn, por = dn))
> # display with row and col totals
> addmargins(P1)
pob      A   B   C   D  Sum
  A   1000 100  10   0 1110
  B     55 555  50   5  665
  C     80  40 800  40  960
  D     20  25  20 200  265
  Sum 1155 720 880 245 3000
> addmargins(P2)
pob      A   B   C   D  Sum
  A    950 100  60   0 1110
  B     80 505  75   5  665
  C     90  30 800  40  960
  D     40  45   0 180  265
  Sum 1160 680 935 225 3000

When estimating flows from stock data, a good demographer should worry about births and deaths over the period as these can have substantial impacts on changes in populations over time. In the simplest example using the above hypothetical example above, I set births and deaths to zero (implied by the equal row totals, the sum of populations by their place of birth) in each stock table. In any case I need to create some vectors to pass this information to the ffs function.

> # no births and deaths
> b <- rep(0, 4)
> d <- rep(0, 4)

We can then pass the stock tables, births and deaths to the ffs function to estimate flows by birth place, contained the mu element of the returned list.

> # run flow from stock estimation
> library("migest")
> y <- ffs(P1=P1, P2=P2, d=d, b=b)
1 46 
2 0 
> # display with row, col and table totals
> addmargins(y$mu)
, , pob = A

orig    A   B  C D  Sum
  A   950   0 50 0 1000
  B     0 100  0 0  100
  C     0   0 10 0   10
  D     0   0  0 0    0
  Sum 950 100 60 0 1110

, , pob = B

orig   A   B  C D Sum
  A   55   0  0 0  55
  B   25 505 25 0 555
  C    0   0 50 0  50
  D    0   0  0 5   5
  Sum 80 505 75 5 665

, , pob = C

orig   A  B   C  D Sum
  A   80  0   0  0  80
  B   10 30   0  0  40
  C    0  0 800  0 800
  D    0  0   0 40  40
  Sum 90 30 800 40 960

, , pob = D

orig   A  B C   D Sum
  A   20  0 0   0  20
  B    0 25 0   0  25
  C   10 10 0   0  20
  D   10 10 0 180 200
  Sum 40 45 0 180 265

, , pob = Sum

orig     A   B   C   D  Sum
  A   1105   0  50   0 1155
  B     35 660  25   0  720
  C     10  10 860   0  880
  D     10  10   0 225  245
  Sum 1160 680 935 225 3000

The fm function returns the flow matrix aggregated over the place of birth dimension in the mu array.

> # display aggregate flows
> f <- fm(y$mu)
> addmargins(f)
orig   A  B  C D Sum
  A    0  0 50 0  50
  B   35  0 25 0  60
  C   10 10  0 0  20
  D   10 10  0 0  20
  Sum 55 20 75 0 150

….and there you have it, an estimated flow matrix that matches the changes in the stock tables whilst controlling for births and deaths. In the paper I run the code on real migrant stock data provided by the World Bank, to estimate global migrant flow tables.

The ffs function has some different methods to control for deaths in the estimation procedure. The estimation is based on a three way iterative proportional fitting scheme to estimate parameters in a log-linear model, not to dissimilar to that used in a paper based on my Southampton M.Sc. dissertation.

Publication Details:

Abel, G. J. (2013). Estimating global migration flow tables using place of birth data. Demographic Research, 28, 505–546. doi:10.4054/DemRes.2013.28.18

International migration flow data often lack adequate measurements of volume, direction and completeness. These pitfalls limit empirical comparative studies of migration and cross national population projections to use net migration measures or inadequate data. This paper aims to address these issues at a global level, presenting estimates of bilateral flow tables between 191 countries. A methodology to estimate flow tables of migration transitions for the globe is illustrated in two parts. First, a methodology to derive flows from sequential stock tables is developed. Second, the methodology is applied to recently released World Bank migration stock tables between 1960 and 2000 (Özden et al. 2011) to estimate a set of four decadal global migration flow tables. The results of the applied methodology are discussed with reference to comparable estimates of global net migration flows of the United Nations and models for international migration flows. The proposed methodology adds to the limited existing literature on linking migration flows to stocks. The estimated flow tables represent a first-of-a-kind set of comparable global origin destination flow data.

The tsbugs package for R

I have/will update this post as I expanded the tsbugs package.

My tsbugs package has gone up on CRAN. I decided not to write a vignette for the submission, as it would have involved doing some estimation in BUGS via R2WinBUGS or R2OpenBUGS and running into some problems when submitted the package. Instead I thought I would post a quick guide below. Note, the package code is on my github if you would like to contribute more models to the package…

The functions in the tsbugs package are aimed to automate the writing of time series models to run in WinBUGS or OpenBUGS. I created these functions a while back when I was doing some work on model averaging for time series models. I found it a lot easier to build R functions to write the BUGS models than the more error-inducing process of copy and pasting BUGS scripts, and then making slight alterations to create new models. It also allowed me to add arguments to specify different lag lengths, prior distributions, variance assumptions and data lengths. Below are examples for three types of time series models; autorgressive models with constant variance, stochastic volatility and random variance shift models.

Autoregressive Models

The ar.bugs command builds a BUGS script for autoregressive (AR) models ready to use in R2OpenBUGS. For example, consider the LakeHuron data.

LH <- LakeHuron 
plot(LH, main="Level (ft)") 
plot(diff(LH), main="Differenced Level")

We can construct a AR(1) model for this data (after differencing the data to obtain a stationary mean) as such:

ar1 <- ar.bugs(y=diff(LH), ar.order=1)

The ar.bugs function allows for alternative specifications for prior distributions, forecasts and the inclusion of mean term:

ar2 <- ar.bugs(y=diff(LH), ar.order=2, ar.prior="dunif(-1,1)", 
               var.prior="dgamma(0.001,0.001)", k = 10, 
               mean.centre = TRUE)

The tsbugs objects can be used with R2OpenBUGS to easily run models from R. This is made even easier using the inits and nodes functions (also in the tsbugs package). For example:

writeLines(ar2$bug, "ar2.txt") 
ar2.bug <- bugs(data = ar2$data,
                inits = list(inits(ar2)),
                param = c(nodes(ar2, "prior")$name, "y.new"),
                model = "ar2.txt", 
                n.iter = 11000, n.burnin = 1000, n.chains = 1)

Note, 1) the model is written to a .txt file (as required by R2OpenBUGS), 2) the data used is part of the tsbugs object. The ar.bugs command cleans the data and adds missing values at the end of the series for foretasted values, 3) the initial values offered by the inits function are very crude, and with more complicated data or models, users might be better off specifying there own list of initial values. The parameter traces and posterior distributions can be plotted using the coda package:

param.mcmc <- as.mcmc(ar2.bug$sims.matrix[,nodes(ar2, "prior")$name])

The fanplot package can be used to plot the entire series of posterior predictive distributions. We may also plot (after deriving using the diffinv function) the posterior predictive distributions of the lake level:

# derive future level
ynew.mcmc <- ar2.bug$sims.list$y.new
lhnew.mcmc <- apply(ynew.mcmc, 1, diffinv, xi = tail(LH,1))
lhnew.mcmc <- t(lhnew.mcmc[-1,])

# plot differenced
plot(diff(LH) ,xlim=k0+c(-50,10), main="Differenced Level")

# add fan
k0 <- end(LH)[1]
fan(ynew.mcmc, start=k0+1, rcex=0.5)

# plot undifferenced
plot(LH, xlim=k0+c(-50,10), main="Level")
fan(lhnew.mcmc, start=k0+1, rcex=0.5)


Stochastic Volatility Models

The sv.bugs command builds a BUGS script for stochastic volatility SV models ready to use in R2OpenBUGS. For example, consider the svpdx data.

# plot
plot(svpdx$pdx, type = "l", 
     main = "Return of Pound-Dollar exchange rate data 
             from 2nd October 1981 to 28th June 1985", 
     cex.main = 0.8)

We can construct a AR(0)-SV model for this data, and also obtain posterior simulations using the sv.bugs command:

y <- svpdx$pdx
sv0 <- sv.bugs(y, sim=TRUE)

This model closely matches those presented in
Meyer and Yu (2002). There are further options in the tsbugs package to incorporate different priors that do not involve transformations such as those for psi1 above. Using R2OpenBUGS we can fit the model,

# decent initial value for variance in first period
init <- inits(sv0, warn=FALSE)
init$psi0 <- log(var(y))
# write bug
writeLines(sv0$bug, "sv0.txt")
# might take a while to compile
sv0.bug <- bugs(data = sv0$data,
                inits = list(init),
                param = c(nodes(sv0, "prior")$name,"y.sim","h"),
                model = "sv0.txt", 
                n.iter = 11000, n.burnin = 1000, n.chains = 1)

The volatility and estimates can be easily extracted,

h.mcmc <- sv0.bug$sims.list$h

Which allows us to directly view the estimated volatility process or the time-dependent standard deviation using the fanplot package,

# plot
plot(NULL, xlim = c(1, 945)+c(0,40), ylim = c(-4,2), 
     main="Estimated Volatility from SV Model") 
# fan
fan(h.mcmc, type = "interval")

We can also plot the posterior simulations from the model:

# derive percentiles
y.mcmc <- sv0.bug$sims.list$y.sim
# plot
plot(NULL, type = "l", xlim = c(1, 945)+c(0,20), 
     ylim = range(y), 
     main = "Posterior Model Simulations and Data")


Random Variance Shift Models

The rv.bugs command builds a BUGS script for random variance (RV) shift models, similar to that of McCulloch and Tsay (1993) ready to use in R2OpenBUGS. Consider the ew data.

r <- ts(ew[2:167]/ew[1:166]-1, start=1841)
y <- diff(r) 
plot(y, main="Difference in England and Wales 
              Population Growth Rate")

We can create a BUGS script to fit a RV model to this data, including posterior simulations, using the rv.bugs command:

rv0<-rv.bugs(y, sim=TRUE)

and then run the script in R2OpenBUGS (this can take a couple of hours):

# decent inital value for variance in first period
init <- inits(rv0, warn=FALSE)
# write bug
# might take a while to compile
rv0.bug <- bugs(data = rv0$data,
                inits = list(init),
                param = c(nodes(rv0, "prior")$name,"y.sim",
                model = "rv0.txt", 
                n.iter = 11000, n.burnin = 1000, n.chains = 1)

We can plot the posterior simulations from the model using the fanplot package:

# derive percentiles
y0 <- tsp(y)[1]
y.mcmc <- rv0.bug$sims.list$y.sim

# plot
plot(NULL, xlim=tsp(y)[1:2]+c(-5,5), ylim = range(y), 
     main="Posterior Simulations")
fan(y.mcmc, start = y0, rlab=c(10,50,90), llab=TRUE)

Alongside the posterior distributions of the standard deviations,

# derive sigma
h.mcmc <- rv0.bug$sims.list$h
sigma.mcmc <- sqrt(exp(h.mcmc))
# plots
plot(NULL, xlim =tsp(y)[1:2]+c(-5,5), ylim = c(0,0.008), 
     main="Standard Deviation")
fan(sigma.mcmc, start = y0, rlab=c(5,50,95), llab = c(5,50,95))

The posterior distributions of the probability of a variance shift and multiplier effect of the shift in variance (delta[t] and beta[t] in the BUGS model) can also be plotted. Note, when there is no variance shift, the posterior of the beta[t] is similar to the prior distribution.

#extract data
delta.mcmc <- rv0.bug$sims.list$delta
beta.mcmc <- rv0.bug$sims.list$beta

# plots
plot(NULL, xlim = tsp(y)[1:2]+c(-5,5), ylim = c(0,1),
     main="Probability of Variance Change Point")
fan(delta.mcmc, start=y0, ln = NULL, rlab = NULL)

plot(NULL, xlim = tsp(y)[1:2]+c(-5,5), ylim = c(-2,2), 
     main="Variance Multiplier")
fan(beta.mcmc, start=y0)


Does specification matter? Experiments with simple multiregional probabilistic population projections.

A paper that I am a co-author on, looking at uncertainty in population forecasting generated by different measures of migration, came out this week in Environment and Planning A. Basically, try and avoid using net migration measures. Not only do they tend to give some dodgy projections, we also found out that they give you more uncertainty. Using in and out measures of migration in a projection model give a big reduction in uncertainty over a net measure. They also are a fairly good approximation to the uncertainty from a full multiregional projection model. Plots in the paper were done by my good-self using the fanplot package.

Publication Details:

Raymer J., Abel, G.J. and Rogers, A. (2012). Does Speci cation Matter? Experiments with Simple Multiregional Probabilistic Population Projections. Environment and Planning A 44 (11), 2664–2686.

Population projection models that introduce uncertainty are a growing subset of projection models in general. In this paper we focus on the importance of decisions made with regard to the model specifications adopted. We compare the forecasts and prediction intervals associated with four simple regional population projection models: an overall growth rate model, a component model with net migration, a component model with in-migration and out-migration rates, and a multiregional model with destination-specific out-migration rates. Vector autoregressive models are used to forecast future rates of growth, birth, death, net migration, in-migration and out-migration, and destination-specific out-migration for the North, Midlands, and South regions in England. They are also used to forecast different international migration measures. The base data represent a time series of annual data provided by the Office for National Statistics from 1976 to 2008. The results illustrate how both the forecasted subpopulation totals and the corresponding prediction intervals differ for the multiregional model in comparison to other simpler models, as well as for different assumptions about international migration. The paper ends with a discussion of our results and possible directions for future research.

Estimation of international migration flow tables in Europe

A paper based on my Ph.D. has been published in the Journal of the Royal Statistical Society: Series A (Statistics in Society). It is essentially a boiled down version of my Ph.D. thesis without some of the earlier chapters. The idea was to come up with some comparable estimates of bilateral migration flows, which currently do not exist. I used some modern optimisation methods to harmonise existing migration flow data, and then the EM algorithm to derive some model based imputations where there is no existing flow data. Below are the results I got for the EU15, 2002-2006 (use the tabs at the bottom to view different years).

If you want to download the data, go to the Google spreadsheet here.

Publication Details:

Abel, G. J (2010) Estimation of international migration flow tables in Europe. Journal of the Royal Statistical Society: Series A (Statistics in Society), Volume 173 Issue 4, Pages 797–825.

A methodology is developed to estimate comparable international migration flows between a set of countries. International migration flow data may be missing, reported by the sending country, reported by the receiving country or reported by both the sending and the receiving countries. For the last situation, reported counts rarely match owing to differences in definitions and data collection systems. We report counts harmonized by using correction factors estimated from a constrained optimization procedure. Factors are applied to scale data that are known to be of a reliable standard, creating an incomplete migration flow table of harmonized values. Cells for which no reliable reported flows exist are then estimated from a negative binomial regression model fitted by using an expectation–maximization (EM) type of algorithm. Covariate information for this model is drawn from international migration theory. Finally, measures of precision for all missing cell estimates are derived by using the supplemented EM algorithm. Recent data on international migration between countries in Europe are used to illustrate the methodology. The results represent a complete table of comparable flows which can be used by regional policy makers and social scientists to understand population behaviour and change better.