Forecasting Environmental Immigration to the UK

A couple of months ago, a paper I worked on with co-authors from the Centre of Population Change was published in Population and Environment. It summarised work we did as part of the UK Government Office for Science Foresight project on Migration and Global Environmental Change. Our aim was to build expert based forecasts of environmental immigrants to the UK. We conducted a Delphi survey of nearly 30 migration experts from academia, the civil service and non-governmental organisations to obtain estimates on the future levels of immigration to the UK in 2030 and 2060 with uncertainty. We also asked them what proportion of current and future immigration are/will be environmental migrants. The results were incorporated into a set of model averaged Bayesian time series models through prior distributions on the mean and variance terms.

The plots in the journal article got somewhat butchered during the publication process. Below is the non-butchered version for the future immigration to the UK alongside the past immigration data from the Office of National Statistics.
imm2
At first, I was a bit taken aback with this plot. A few experts thought there were going to be some very high levels of future immigration which cause the rather striking large upper tail. However, at a second glance, the central percentiles show a gentle decrease where these is only (approximately) a 30% chance of an increase in future migration from the 2010 level throughout the forecast period.

The expert based forecast for total immigration was combined with the responses to questions on the proportion of environmental migrants, to obtain an estimate on both the current level of environmental migration (which is not currently measured) and future levels:
env4

As is the way with these things, we came across some problems in our project. The first, was with the definition of an environmental migrant, which is not completely nailed on in the migration literature. As a result the part of the uncertainty in the expert based forecasts are reflective of not only the future level but also of the measure itself. The second was with the elicitation of uncertainty. We used a Likert type scale, which caused some difficulties even during the later round of the Delphi survey. If I was to do over, then this I reckon problem could be much better addressed by getting experts to visualise their forecast fans in an interactive website, perhaps creating a shiny app with the fanplot package. Such an approach would result in smoother fans than those in the plots above, which were based on interpolations from expert answers at only two points of time in the future (2030 and 2060).

Publication Details:

Abel, G.J., Bijak, J., Findlay, A.M., McCollum, D. and Wiśniowski, A. (2013). Forecasting environmental migration to the United Kingdom: An exploration using Bayesian models. Population and Environment. 35 (2), 183–203

Over the next 50 years, the potential impact of environmental change on human livelihoods could be considerable, with one possible consequence being increased levels of human mobility. This paper explores how uncertainty about the level of immigration to the United Kingdom as a consequence of environmental factors elsewhere may be forecast using a methodology involving Bayesian models. The conceptual understanding of forecasting is advanced in three ways. First, the analysis is believed to be the first time that the Bayesian modelling approach has been attempted in relation to environmental mobility. Second, the paper considers the expediency of this approach by comparing the responses to a Delphi survey with conventional expectations about environmental mobility in the research literature. Finally, the values and assumptions of the expert evidence provided in the Delphi survey are interrogated to illustrate the limited set of conditions under which forecasts of environmental mobility, as set out in this paper, are likely to hold.

Does specification matter? Experiments with simple multiregional probabilistic population projections.

A paper that I am a co-author on, looking at uncertainty in population forecasting generated by different measures of migration, came out this week in Environment and Planning A. Basically, try and avoid using net migration measures. Not only do they tend to give some dodgy projections, we also found out that they give you more uncertainty. Using in and out measures of migration in a projection model give a big reduction in uncertainty over a net measure. They also are a fairly good approximation to the uncertainty from a full multiregional projection model. Plots in the paper were done by my good-self using the fanplot package.

Publication Details:

Raymer J., Abel, G.J. and Rogers, A. (2012). Does Speci cation Matter? Experiments with Simple Multiregional Probabilistic Population Projections. Environment and Planning A 44 (11), 2664–2686.

Population projection models that introduce uncertainty are a growing subset of projection models in general. In this paper we focus on the importance of decisions made with regard to the model specifications adopted. We compare the forecasts and prediction intervals associated with four simple regional population projection models: an overall growth rate model, a component model with net migration, a component model with in-migration and out-migration rates, and a multiregional model with destination-specific out-migration rates. Vector autoregressive models are used to forecast future rates of growth, birth, death, net migration, in-migration and out-migration, and destination-specific out-migration for the North, Midlands, and South regions in England. They are also used to forecast different international migration measures. The base data represent a time series of annual data provided by the Office for National Statistics from 1976 to 2008. The results illustrate how both the forecasted subpopulation totals and the corresponding prediction intervals differ for the multiregional model in comparison to other simpler models, as well as for different assumptions about international migration. The paper ends with a discussion of our results and possible directions for future research.

The fanplot package for R

I have/will update this post as I expanded the fanplot package.

My fanplot package has gone up on CRAN. Below is a online version of the package vignette…

Visualising Time Series Model Results

The fanplot package can also be used to display uncertainty in estimates from time series models. To illustrate, the packages th.mcmc data frame object contains posterior density distributions of the estimated volatility of daily returns y_t from the Pound/Dollar exchange rate from 02/10/1981 to 28/6/1985. These distributions are from a MCMC simulation from a stochastic volatility model given in Meyer and Yu (2002) where it assumed;

y_t | \theta_t = \exp\left(\frac{1}{2}\theta_t\right)u_t \qquad u_t \sim N(0, 1) \qquad t=1,\ldots,n.

The latent volatilities \theta_t, which are unknown states in a state-space model terminology, are assumed to follow a Markovian transition over time given by the state equations:

\theta_t | \theta_{t-1}, \mu, \phi, \tau^2 = \mu + \phi \log \sigma^2_{t-1} + v_t \qquad v_t \sim N(0, \tau^2) \qquad t=1,\ldots,n

with \theta_0 \sim N(\mu, \tau^2).

The th.mcmc object consists of (1000) rows corresponding to MCMC simulations and (945) columns corresponding to each t. A fan chart of the evolution of the distribution of \theta_t can be visualised using the fanplot package via,

library("fanplot")
# empty plot
plot(NULL, main="Percentiles", xlim = c(1, 965), ylim = c(-2.5, 1.5))

# add fan
fan(data = th.mcmc)

fanplot-sv1
The fan function calculates the values of 100 equally spaced percentiles of each future distribution when the default data.type = "simulations" is set. This allows 50 fans to be plotted from the heat.colours colour palette, providing a fine level of shading. In addition, lines and labels are provided along each decile.

Prediction Intervals

When argument type = "interval" is set, the probs argument corresponds to prediction intervals. Consequently, the fan chart comprises of 3 different shades, running from the darkest shade for the 50th prediction interval to the lightest for the 95th prediction interval.

# empty plot
plot(NULL, main="Prediction Intervals",
     xlim = c(-20, 965), ylim = c(-2.5, 1.5))

# add fan
fan(data = th.mcmc, type = "interval", llab=TRUE, rcex=0.6)

fanplot-sv2
Contour lines are overlayed for the upper and lower bounds of each prediction intervals, as set using the ln command. A further line is plotted along the median of \theta_t, which is controlled by the med.ln argument (set to TRUE by default when type="interval"). The default labels on the right hand side correspond to the upper and lower bounds of each plotted line. The left labels are added by setting llab = TRUE. Note, some extra room is created for the labels by setting the xlim = c(-20, 965) argument of plotting area to a wider range than the original data (945 observations). The text size of the right labels are controlled using the rcex argument. The left labels, by default, take the same text size as rcex although they can be separately controlled using the lcex argument.

Alternative Colours

Alternative colour schemes to the default heat.colors, can be obtained by supplying a colorRampPalette to the fan.col argument. For example, a new palette running from blue to white, via grey can be passed using;

# empty plot
plot(NULL, main="Alternative Colour Scheme",
xlim = c(-20, 965), ylim = c(-2.5, 1.5))

# add fan
fan(data = th.mcmc, rlab=seq(20,80,15), llab=c(10,50,90),
    fan.col=colorRampPalette(c("royalblue", "grey", "white")))

fanplot-sv3
Alternative labels are specified using the rlab and llab arguments.

Spaghetti Plots

Spaghetti plots can be used to represent uncertainty shown by a range of possible future trajectories or past estimates. For example using the th.mcmc object, 20 random sets of \theta_t can be plotted when setting the argument style = "spaghetti";

# empty plot
plot(NULL, main="Spaghetti Plot", xlim = c(-20, 965), ylim = range(th.mcmc))

# transparent fan with visible lines
fan(th.mcmc, ln=c(5, 50, 95), llab=TRUE, alpha=0, ln.col="orange")

# spaghetti lines
fan(th.mcmc, style="spaghetti", n.spag=20)

fanplot-sv4
The spaghetti lines are superimposed on a fan chart in order to illustrate some underlying probabilities. The initial fan chart is completely transparent from setting the transparency argument alpha to 0. In order for the percentile lines to be visible a non-transparent colour is used for the ln.col argument.

Forecast Fans

The fanplot package can also be used to illustrate probabilistic forecasts. For example, using the auto.arima function in the forecast package a model for the time series for net migration to the United Kingdom (contained in the ips data frame of the fanplot package) can be fitted.

#create time series
net <- ts(ips$net, start=1975)
#fit model
library("forecast")
m <- auto.arima(net)
m
Series: net
ARIMA(1,1,2) with drift         

Coefficients:
          ar1      ma1      ma2   drift
      -0.2301  -0.0851  -0.6734  6.7625
s.e.   0.3715   0.3620   0.1924  1.4154

sigma^2 estimated as 1231:  log likelihood=-179.3
AIC=368.6   AICc=370.54   BIC=376.66

We may then simulate 1000 values from the selected model using the simulate.Arima function, and plot the results.

mm <- matrix(NA, nrow=1000, ncol=5)
for(i in 1:1000)
  mm[i,] &amp;lt;- simulate(m, nsim=5)

# empty plot
plot(net, main="UK Net Migration", xlim=c(1975,2020), ylim=c(-100,300))

# add fan
fan(mm, start=2013)

fanplot-ips1
Users might want to connect the fan with the past data. This can be achieved by providing the last value to the anchor argument.

# empty plot
plot(net, main="UK Net Migration",
     xlim=c(1975,2020), ylim=c(-100,300))

# add fan
fan(mm, start=2013, anchor=net[time(net)==2012],
    type="interval", probs=seq(5, 95, 5), ln=c(50, 80))

fanplot-ips2
More shades for the fan are added to the plot (over the default 3 used for a interval fans) by supplying a sequence to the probs argument. Alternative contour lines (from the default median, 50th, 80th and 95th percentiles for interval fans) are added using the ln argument.

A comparison of official population projections with Bayesian time series forecasts for England and Wales

A paper based on my some work I did with colleagues in the ESRC Centre for Population Change was published in the the Population Trends. We fitted a range of time series models (including some volatility models) to population change data for England and Wales, calculated the posterior model probabilities and then projected from the model averaged posterior predictive distributions. We found our volatility models were heavily supported. Our median matches very closely the Office of National Statistics mid scenario. It’s a tad surprising that projections based on forecasts of a single annual growth rate per year give a similar forecast to the ONS cohort component projection which are based on hundreds of future age-sex specific fertility, mortality and net migration rates. The ONS do not provide any form probabilistic uncertainty, instead the give a expert based high and low scenario, which roughly calibrated to our 50% prediction interval in 2033. I ran all the models in BUGS and did the fan chart plots in R.

Publication Details:

Abel, G.J., Bijak, J. and Raymer J. (2010). A comparison of official population projections with Bayesian time series forecasts for England and Wales. Population Trends, 141, 95–114.

We compare official population projections with Bayesian time series forecasts for England and Wales. The Bayesian approach allows the integration of uncertainty in the data, models and model parameters in a coherent and consistent manner. Bayesian methodology for time-series forecasting is introduced, including autoregressive (AR) and stochastic volatility (SV) models. These models are then fitted to a historical time series of data from 1841 to 2007 and used to predict future population totals to 2033. These results are compared to the most recent projections produced by the Office for National Statistics. Sensitivity analyses are then performed to test the effect of changes in the prior uncertainty for a single parameter. Finally, in-sample forecasts are compared with actual population and previous official projections. The article ends with some conclusions and recommendations for future work.