Does specification matter? Experiments with simple multiregional probabilistic population projections.

A paper that I am a co-author on, looking at uncertainty in population forecasting generated by different measures of migration, came out this week in Environment and Planning A. Basically, try and avoid using net migration measures. Not only do they tend to give some dodgy projections, we also found out that they give you more uncertainty. Using in and out measures of migration in a projection model give a big reduction in uncertainty over a net measure. They also are a fairly good approximation to the uncertainty from a full multiregional projection model. Plots in the paper were done by my good-self using the fanplot package.

Publication Details:

Raymer J., Abel, G.J. and Rogers, A. (2012). Does Speci cation Matter? Experiments with Simple Multiregional Probabilistic Population Projections. Environment and Planning A 44 (11), 2664–2686.

Population projection models that introduce uncertainty are a growing subset of projection models in general. In this paper we focus on the importance of decisions made with regard to the model specifications adopted. We compare the forecasts and prediction intervals associated with four simple regional population projection models: an overall growth rate model, a component model with net migration, a component model with in-migration and out-migration rates, and a multiregional model with destination-specific out-migration rates. Vector autoregressive models are used to forecast future rates of growth, birth, death, net migration, in-migration and out-migration, and destination-specific out-migration for the North, Midlands, and South regions in England. They are also used to forecast different international migration measures. The base data represent a time series of annual data provided by the Office for National Statistics from 1976 to 2008. The results illustrate how both the forecasted subpopulation totals and the corresponding prediction intervals differ for the multiregional model in comparison to other simpler models, as well as for different assumptions about international migration. The paper ends with a discussion of our results and possible directions for future research.

Estimation of international migration flow tables in Europe

A paper based on my Ph.D. has been published in the Journal of the Royal Statistical Society: Series A (Statistics in Society). It is essentially a boiled down version of my Ph.D. thesis without some of the earlier chapters. The idea was to come up with some comparable estimates of bilateral migration flows, which currently do not exist. I used some modern optimisation methods to harmonise existing migration flow data, and then the EM algorithm to derive some model based imputations where there is no existing flow data. Below are the results I got for the EU15, 2002-2006 (use the tabs at the bottom to view different years).


If you want to download the data, go to the Google spreadsheet here.

Publication Details:

Abel, G. J (2010) Estimation of international migration flow tables in Europe. Journal of the Royal Statistical Society: Series A (Statistics in Society), Volume 173 Issue 4, Pages 797–825.

A methodology is developed to estimate comparable international migration flows between a set of countries. International migration flow data may be missing, reported by the sending country, reported by the receiving country or reported by both the sending and the receiving countries. For the last situation, reported counts rarely match owing to differences in definitions and data collection systems. We report counts harmonized by using correction factors estimated from a constrained optimization procedure. Factors are applied to scale data that are known to be of a reliable standard, creating an incomplete migration flow table of harmonized values. Cells for which no reliable reported flows exist are then estimated from a negative binomial regression model fitted by using an expectation–maximization (EM) type of algorithm. Covariate information for this model is drawn from international migration theory. Finally, measures of precision for all missing cell estimates are derived by using the supplemented EM algorithm. Recent data on international migration between countries in Europe are used to illustrate the methodology. The results represent a complete table of comparable flows which can be used by regional policy makers and social scientists to understand population behaviour and change better.

International Migration Flow Table Estimation

International migration flow data is a messy topic. No single pair of countries defines migration in the same way. Even if the did they most likely measure if differently. This causes some big headaches to anyone who wants to create any inference about migration levels, directions, policy implications or the cause and consequences of people’s movements at a cross national level. During my Ph.D. I worked on methods for estimating comparable international migration flows across multiple European countries.

I identified two fundamental data problems: inconsistency (countries with conflicting reports on the number of people moving between them) of and incompleteness (countries not providing any data). I applied both mathematical and statistical methods to create comparable set of international migration flow estimates. For more details see my Ph.D. dissertation (which is online, see the link below). It contains most of the R/S-Plus code to conduct the estimation in the Appendix. Note, there is also a published paper based on my Ph.D. (abstract and links here). I created a TeX template for the University of Southampton School of Social Sciences here.

Publication Details:

Abel, G. J. (2009). International Migration Flow Table Estimation. University of Southampton, Division of Social Statistics, Doctoral Thesis.

POPFEST Conference Presentation Slides

I gave a presentation on the method developed during my MS.c. at POPFEST 2007. This also included some new results for estimating flows in 7 non-census years (only two were estimated in the paper).

The longer series of estimates allows a far better comparison of migration flows over time, as well as allowing for some better visualisations of the results. The slides (my first set ever created in Beamer) are available on the conference website here.

Combining census and registration data to estimate detailed elderly migration flows in England and Wales

During my MS.c. I worked on methods for combining internal migration data in England and Wales. Migration data is often represented in square tables of origin-destination flows. These are of particular interest to analysing migration patterns when they are disaggregated by age, sex and some other variable such as illness, ethnicity or economic status. In England and Wales the data within these detailed flow table are typically missing in non-census years. However, row and column (origin and destination) totals are regularly provided from the NHS patient registers (see the first two columns of the hypothetical data situation below). I worked on a method to estimate the detailed missing flow data to sum to the provided totals in non-census years (see the third column of the hypothetical data situation below). This method is particularly useful for estimating migration flow tables disaggregated by detailed characteristics of migrants (such as illness, ethnicity or economic status) that are only provided by the ONS for census years.

Hypothetical Example of Data Set Situation (where migrant origins are labelled on the vertical axis and destinations on the horizontal axis).

Auxiliary Data (e.g. 2001 Census) Primary Data (e.g. 2004 NHSCR Data) Detailed Estimates for 2004 Based on Methodology
Without Limiting Long Term Illness Without Limiting Long Term Illness
N M S N M S
N 80 20 50 150 N 88 56 40 183
M 50 100 50 200 Illness details unavailable M 29 145 21 195
S 10 30 110 150 N M S S 7 52 54 113
140 150 210 500 N 260 124 252 115 491
With Limiting Long Term Illness M 320 With Limiting Long Term Illness
N M S S 170 N M S
N 30 10 20 60 200 370 180 750 N 33 28 16 77
M 40 50 70 160 M 23 73 29 125
S 30 10 40 80 S 20 17 20 57
100 70 130 300 76 118 65 259

The estimated values maintain some properties (various cross product ratios) of the Census data whilst updating marginal totals to more current data. For more details see my MS.c. dissertation (which I have put online here). I also presented the method and some further results at POPFSET 2007, see here for more details. This contains the R/S-Plus code to conduct the estimation in the Appendix. There is also a published paper based on my MS.c. that uses a slightly modified R code.

Publication Details:

Raymer J., Abel G.J. and Smith P.W.F. (2007). Combining census and registration data to estimate detailed elderly migration flows in England and Wales. Journal of the Royal Statistical Society Series A (Statistics in Society) 170 (4) 891–908.

A log-linear model is developed to estimate detailed elderly migration flows by combining data from the 2001 UK census and National Health Services patient register. After showing that the census and National Health Service migration flows can be reasonably combined, elderly migration flows between groupings of local authority districts by age, sex and health status for the 2000–2001 and 2003–2004 periods are estimated and then analysed to show how the patterns have changed. By combining registration data with census data, we can provide recent estimates of detailed elderly migration flows, which can be used for improvements in social planning or policy.