## Global Migration Flow Table Estimates

A few months Demographic Research published my paper on estimating global migration flow tables. In the paper I developed a method to estimate international migrant flows, for which there is limited comparable data, to matches changes in migrant stock data, which are more widely available. The result was bilateral tables of estimated international migrant transitions between 191 countries for four decades, which I believe are a first of kind. The estimates in an excel spreadsheet are available as a additional file on the journals website.

My migest R package contains the ffs function for the flows-from-stock method used in the paper

The estimates in long format are also in a Google spreadsheet here (previewed below):

Publication Details:

Abel, G. J. (2013). Estimating global migration flow tables using place of birth data. Demographic Research, 28 (March), 505–546. doi:10.4054/DemRes.2013.28.18

International migration flow data often lack adequate measurements of volume, direction and completeness. These pitfalls limit empirical comparative studies of migration and cross national population projections to use net migration measures or inadequate data. This paper aims to address these issues at a global level, presenting estimates of bilateral flow tables between 191 countries. A methodology to estimate flow tables of migration transitions for the globe is illustrated in two parts. First, a methodology to derive flows from sequential stock tables is developed. Second, the methodology is applied to recently released World Bank migration stock tables between 1960 and 2000 (Özden et al. 2011) to estimate a set of four decadal global migration flow tables. The results of the applied methodology are discussed with reference to comparable estimates of global net migration flows of the United Nations and models for international migration flows. The proposed methodology adds to the limited existing literature on linking migration flows to stocks. The estimated flow tables represent a first-of-a-kind set of comparable global origin destination flow data.

## Bank of England Fan Charts in R

I managed to catch David Spiegelhalter’s Tails You Win on BBC iplayer last week. I missed it the first time round, only for my parents on my last visit home to tell me about a Statistician jumping out of a plane on TV. It was a great watch. Towards the end I spotted some fan charts used by the Bank of England to illustrate uncertainty in their forecasts, similar to this one:

Bank of England February 2013 CPI Fan Chart.
Obtained from http://www.bankofengland.co.uk/publications/Pages/inflationreport/irfanch.aspx

They discussed how even in the tails of their GDP predictive distribution they missed the financial crisis by a long shot. This got me googling, trying and find you how they made the plots, something that (also) completely passed me by when I put together my fanplot package for R. As far as I could tell they did them in Excel, although (appropriately) I am not completely certain. There are also MATLAB files that can create fan charts. Anyhow, I thought I would have a go at replicating a Bank of England fan chart in R….

### Split Normal (Two-Piece) Normal Distribution.

The Bank of England produce fan charts of forecasts for CPI and GDP in their quarterly Inflation Reports. They also provide data for each report, in the form of mean, uncertainty and a skewness indicator. These three parameters are from a split-normal distribution (the BOE call it a Two-Piece Normal Distribution):

where

so if you know $\sigma$ and $\gamma$ you can derive $\sigma_1$ and $\sigma_2$. As no split normal distribution existed in R, I added routines for a density, distribution and quantile function, plus a random generator, to a new version (2.1) of the fanplot package. I used the formula in Julio (2007) to code each of the three functions, and checked the results against those from the fan chart MATLAB code.

### Fan Chart Plots for CPI.

Once I had the qsplitnorm function working properly, producing the fan chart plot in R was pretty straight-forward. First I downloaded the past CPI data (percentage change over 12 months) from ONS:

> cpi <- read.csv("C:/Users/…/cpi.csv")
> cpi
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1  1997 2.1 1.9 1.7 1.6 1.6 1.7 2.0 2.0 1.8 1.9 1.9 1.7
2  1998 1.5 1.6 1.7 1.8 2.0 1.7 1.4 1.3 1.4 1.4 1.4 1.6
3  1999 1.6 1.4 1.7 1.5 1.3 1.3 1.3 1.2 1.2 1.1 1.2 1.1
4  2000 0.8 0.9 0.6 0.6 0.5 0.8 0.9 0.6 1.0 1.0 1.1 0.8
5  2001 0.9 0.8 0.9 1.2 1.7 1.7 1.4 1.8 1.3 1.2 0.8 1.1
6  2002 1.6 1.5 1.5 1.4 0.8 0.6 1.1 1.0 1.0 1.4 1.5 1.7
7  2003 1.3 1.6 1.5 1.4 1.3 1.1 1.3 1.4 1.4 1.4 1.3 1.3
8  2004 1.4 1.3 1.1 1.1 1.5 1.6 1.4 1.3 1.1 1.2 1.5 1.7
9  2005 1.6 1.7 1.9 1.9 1.9 2.0 2.3 2.4 2.5 2.3 2.1 1.9
10 2006 1.9 2.0 1.8 2.0 2.2 2.5 2.4 2.5 2.4 2.4 2.7 3.0
11 2007 2.7 2.8 3.1 2.8 2.5 2.4 1.9 1.8 1.8 2.1 2.1 2.1
12 2008 2.2 2.5 2.5 3.0 3.3 3.8 4.4 4.7 5.2 4.5 4.1 3.1
13 2009 3.0 3.2 2.9 2.3 2.2 1.8 1.8 1.6 1.1 1.5 1.9 2.9
14 2010 3.5 3.0 3.4 3.7 3.4 3.2 3.1 3.1 3.1 3.2 3.3 3.7
15 2011 4.0 4.4 4.0 4.5 4.5 4.2 4.4 4.5 5.2 5.0 4.8 4.2
16 2012 3.6 3.4 3.5 3.0 2.8 2.4 2.6 2.5 2.2 2.7 2.7 2.7
>
> # extract quarterly data in Feb, Apr, Aug and Nov
> mn <- month.abb[seq(2, 11, 3)]
> cpi <- cpi[, mn]
> # covert to ts
> cpi <- ts(c(t(cpi)), start = 1997, frequency = 4)
> cpi
Qtr1 Qtr2 Qtr3 Qtr4
1997  1.9  1.6  2.0  1.9
1998  1.6  2.0  1.3  1.4
1999  1.4  1.3  1.2  1.2
2000  0.9  0.5  0.6  1.1
2001  0.8  1.7  1.8  0.8
2002  1.5  0.8  1.0  1.5
2003  1.6  1.3  1.4  1.3
2004  1.3  1.5  1.3  1.5
2005  1.7  1.9  2.4  2.1
2006  2.0  2.2  2.5  2.7
2007  2.8  2.5  1.8  2.1
2008  2.5  3.3  4.7  4.1
2009  3.2  2.2  1.6  1.9
2010  3.0  3.4  3.1  3.3
2011  4.4  4.5  4.5  4.8
2012  3.4  2.8  2.5  2.7


Then I read in the parameters for the forecast distribution from the Bank of England:

> cpiboe <- read.csv("C:/Users/…/cpiboe.csv")
> cpiboe
year quarter mode median mean uncertainty skewness
1  2013      Q1 2.73   2.73 2.73        0.61        0
2  2013      Q2 2.92   2.92 2.92        0.88        0
3  2013      Q3 3.22   3.22 3.22        1.11        0
4  2013      Q4 3.13   3.13 3.13        1.27        0
5  2014      Q1 2.95   2.95 2.95        1.34        0
6  2014      Q2 2.82   2.82 2.82        1.37        0
7  2014      Q3 2.53   2.53 2.53        1.39        0
8  2014      Q4 2.41   2.41 2.41        1.42        0
9  2015      Q1 2.32   2.32 2.32        1.48        0
10 2015      Q2 2.23   2.23 2.23        1.49        0
11 2015      Q3 2.13   2.13 2.13        1.50        0
12 2015      Q4 2.01   2.01 2.01        1.52        0
13 2016      Q1 1.96   1.96 1.96        1.52        0


Using the qsplitnorm function I then derived the CPI values for a sequence of probabilities between 0 and 1:

> library("fanplot")
> x <- 1:99/100
> k <- nrow(cpiboe)
> xx <- matrix(NA, nrow = 99, ncol = k)
> for (i in 1:k) {
xx[, i] <- qsplitnorm(x, mean = cpiboe$mean[i], sd = (cpiboe$uncertainty[i])^2,
skew = cpiboe$skewness[i]) } > head(xx) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [1,] 1.864366 1.118476 0.3537068 -0.62216649 -1.227190243 -1.54632232 -1.9647367 -2.2808479 -2.775632 -2.934725 -3.104283 -3.364794 -3.414794 [2,] 1.965800 1.329577 0.6895760 -0.18249162 -0.737711544 -1.03468133 -1.4380483 -1.7311793 -2.178532 -2.329528 -2.490935 -2.734981 -2.784981 [3,] 2.030157 1.463513 0.9026742 0.09646799 -0.427153003 -0.71006152 -1.1038813 -1.3824322 -1.799690 -1.945550 -2.101786 -2.335386 -2.385386 [4,] 2.078570 1.564269 1.0629797 0.30631844 -0.193531910 -0.46586269 -0.8525006 -1.1200834 -1.514703 -1.656698 -1.809044 -2.034785 -2.084785 [5,] 2.117950 1.646225 1.1933758 0.47701559 -0.003499173 -0.26722577 -0.6480217 -0.9066829 -1.282887 -1.421740 -1.570921 -1.790270 -1.840270 [6,] 2.151469 1.715983 1.3043635 0.62230567 0.158248534 -0.09815456 -0.4739781 -0.7250455 -1.085576 -1.221753 -1.368241 -1.582149 -1.632149  These values are ready to be passed to the pn function to calculate the percentiles (in a pn.ts type object) for plotting. > # save time of last boe data point > boe0 <- tsp(cpi)[2] > # create pn.ts type object ready for fan function > xx.pn <- pn(xx, start = boe0, frequency = 4, anchor = tail(cpi, 1)) > # view head of object for 5th, 10th, 50th, 90th and 95th percentiles > head(xx.pn, p = c(5, 10, 50)) 5% 10% 50% 90% 95% [1,] 2.7000000 2.7000000 2.70 2.700000 2.700000 [2,] 2.1481169 2.2695140 2.73 3.190486 3.311883 [3,] 1.7090075 1.9616546 2.92 3.878345 4.130992 [4,] 1.2932647 1.6952358 3.22 4.744764 5.146735 [5,] 0.6077767 1.1339833 3.13 5.126017 5.652223 [6,] 0.1420738 0.7278861 2.95 5.172114 5.757926 > # plot the data > plot(cpi, type = "l", xlim = c(1997, 2016), ylim = c(-2, 7), las = 1) > # add fan > fan(xx.pn)  Note, setting the anchor to the last data point joins the fan to the past data line. This plot is based on the default arguments (colours, percentiles, lines and labels) in the pn and fan functions. The Bank of England fan charts are somewhat different. They use a courser set of colours and do not show the tails of the distribution. These can be derived by specifying which percentiles to estimate from the pn function (by default, as above, percentiles 1 to 100 are calculated for each time period): > # percentiles in steps of 5, without tails. > xx.pn <- pn(xx, p = seq(15, 85, 5), start = boe0, frequency = 4, anchor = tail(cpi, 1)) > head(xx.pn) 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% [1,] 2.700000 2.700000 2.700000 2.700000 2.700000 2.700000 2.700000 2.70 2.700000 2.700000 2.700000 2.700000 2.700000 2.700000 2.700000 [2,] 2.355276 2.424691 2.484817 2.539120 2.589621 2.637650 2.684180 2.73 2.775820 2.822350 2.870379 2.920880 2.975183 3.035309 3.104724 [3,] 2.140140 2.284604 2.409734 2.522748 2.627848 2.727804 2.824641 2.92 3.015359 3.112196 3.212152 3.317252 3.430266 3.555396 3.699860 [4,] 1.979213 2.209060 2.408148 2.587957 2.755176 2.914209 3.068281 3.22 3.371719 3.525791 3.684824 3.852043 4.031852 4.230940 4.460787 [5,] 1.505728 1.806614 2.067232 2.302614 2.521514 2.729700 2.931390 3.13 3.328610 3.530300 3.738486 3.957386 4.192768 4.453386 4.754272 [6,] 1.141740 1.476708 1.766848 2.028892 2.272588 2.504356 2.728892 2.95 3.171108 3.395644 3.627412 3.871108 4.133152 4.423292 4.758260  The xx.pn object can then be plotted, having set up a matching colour scheme. I also add shading for the forecast area, alternative axis and lines for the target rate and two year ahead period to match the Bank of England plot: > # create colour palette > pal <- colorRampPalette(c("tomato", "white")) > fancol <- pal(ncol(xx.pn)/2) > # plot data > plot(cpi, type = "l", lwd = 3, col = "tomato", xlim = c(2008, 2016), ylim = c(-2, 7), las = 1, xaxt = "n", yaxt = "n") > # add shading for forecast area > lim <- par("usr") > rect(boe0, lim[3] - 1, 2017, lim[4] + 1, border = "gray90", col = "gray90") > # add fan > fan(xx.pn, fan.col = fancol, txt = NA, ln.col = NA) > # add axis > axis(2, at = -2:7, las = 2, tcl = 0.5, labels = FALSE) > axis(4, at = -2:7, las = 2, tcl = 0.5) > axis(1, at = 2008:2016, tcl = 0.5) > axis(1, at = seq(2008, 2016, 0.25), labels = FALSE, tcl = 0.2) > # add target line > abline(h = 2) > # add two year line > abline(v = boe0 + 2, lty = 2)  Posted in fanplot, R, Research | | 6 Comments ## The tsbugs package for R My tsbugs package has gone up on CRAN. The functions in the tsbugs package are aimed to automate the writing of time series models to run in WinBUGS (Lunn et al., 2000) or OpenBUGS (Lunn et al., 2009). I created these functions a while back when I was doing some work on model averaging for time series models. I found it a lot easier to build R functions to write the BUGS models than the more error-inducing process of copy and pasting BUGS scripts, and then making slight alterations to create new models. It also allowed me to add arguments to specify different lag lengths, prior distributions, variance assumptions and data lengths. I decided not to write a vignette for CRAN, as it would have involved doing some estimation in BUGS via R2WinBUGS or R2OpenBUGS and running into some problems when submitted the package. Instead I thought I would post a quick guide here. ### Autoregressive Models The ar.bugs command builds a BUGS script for autoregressive (AR) models ready to use in R2OpenBUGS (Sturtz et al., 2005). For example, consider the LakeHuron data. > LH <- LakeHuron > par(mfrow=c(2,1)) > plot(LH, ylab="Level in feet") > plot(diff(LH), ylab="Differenced Level")  We can construct a AR(1) model for this data (after differencing the data to obtain a stationary mean) as such: > library("tsbugs") > ar1 <- ar.bugs(y=diff(LH), ar.order=1) > print(ar1) model{ #likelihood for(t in 2:97){ y[t] ~ dnorm(y.mean[t], isigma2) } for(t in 2:97){ y.mean[t] <- phi1*y[t-1] } #priors phi1 ~ dnorm(0,1) isigma2 ~ dgamma(0.000001,0.000001) sigma <- pow(isigma2,-0.5) }  The ar.bugs function allows for alternative specifications for prior distributions, forecasts and the inclusion of mean term: > ar2 <- ar.bugs(y=diff(LH), ar.order=2, ar.prior="dunif(-1,1)", var.prior="dgamma(0.001,0.001)", k = 10, mean.centre = TRUE) > print(ar2) model{ #likelihood for(t in 3:107){ y[t] ~ dnorm(y.mean[t], isigma2) } for(t in 3:107){ y.mean[t] <- phi0 + phi1*(y[t-1]-phi0) + phi2*(y[t-2]-phi0) } #priors phi0 ~ dunif(-1,1) phi1 ~ dunif(-1,1) phi2 ~ dunif(-1,1) sigma2 ~ dgamma(0.001,0.001) isigma2 <- pow(sigma2,-1) #forecast for(t in 98:107){ y.new[t] <- y[t] } }  The tsbugs objects can be used with R2OpenBUGS to easily run models from R. This is made even easier using the inits and nodes functions (also in the tsbugs package). For example: > writeLines(ar2$bug, "ar2.txt")
> library("R2OpenBUGS")
> ar2.bug <- bugs(data = ar2$data, inits = list(inits(ar2)), param = c(nodes(ar2, "prior")$name, "y.new"),
model = "ar2.txt",
n.iter = 11000, n.burnin = 1000, n.chains = 1)


Note, 1) the model is written to a .txt file (as required by R2OpenBUGS), 2) the data used is part of the tsbugs object. The ar.bugs command cleans the data and adds missing values at the end of the series for foretasted values, 3) the initial values offered by the inits function are very crude, and with more complicated data or models, users might be better off specifying there own list of initial values.
The parameter traces and posterior distributions can be plotted using the coda package:

> library("coda")
> param.mcmc <- as.mcmc(ar2.bug$sims.matrix[,nodes(ar2, "prior")$name])
> plot(param.mcmc[,1:4])


The fanplot package can be used to plot the entire series of posterior predictive distributions. We may also plot (after deriving using the diffinv function) the posterior predictive distributions of the lake level:

> # derive future level
> ynew.mcmc <- ar2.bug$sims.list$y.new
> lhnew.mcmc <- apply(ynew.mcmc, 1, diffinv, xi = tail(LH,1))
> lhnew.mcmc <- t(lhnew.mcmc[-1,])
> # derive percentiles
> library("fanplot")
> k0 <- end(LH)[1]
> ynew.pn <- pn(ynew.mcmc, start=k0+1)
> lhnew.pn <- pn(lhnew.mcmc, start=k0+1)
> # plot
> par(mfrow=c(2,1))
> plot(diff(LH), type="n", xlim=k0+c(-50,10), ylab="Differenced Level")
> fan(ynew.pn)
> lines(diff(LH))
> plot(LH, type="n", xlim=k0+c(-50,10), ylab="Level")
> fan(lhnew.pn, cex=0.8)
> lines(LH)


### Stochastic Volatility (SV) Models

The sv.bugs command builds a BUGS script for SV models ready to use in R2OpenBUGS. For example, consider the svpdx data.

> # plot
> plot(svpdx$pdx, type = "l", xaxt = "n", xlab = "Time", ylab = "Return") > # x-axis > svpdx$rdate <- format(svpdx$date, format = "%b %Y") > mth <- unique(svpdx$rdate)
> qtr <- mth[seq(1,length(mth),3)]
> axis(1, at = match(qtr, svpdx$rdate), labels = qtr, cex.axis = 0.55) > axis(1, at = match(mth, svpdx$rdate), labels = FALSE, tcl = -0.2)


We can construct a AR(0)-SV model for this data, and also obtain posterior simulations using the sv.bugs command:

> y <- svpdx$pdx > sv0 <- sv.bugs(y, sim=TRUE) > print(sv0) model{ #likelihood for(t in 1:945){ y[t] ~ dnorm(y.mean[t], isigma2[t]) isigma2[t] <- exp(-h[t]) h[t] ~ dnorm(h.mean[t], itau2) } for(t in 1:945){ y.mean[t] <- 0 } for(t in 1:1){ h.mean[t] <- psi0 } for(t in 2:945){ h.mean[t] <- psi0 + psi1*(h[t-1]-psi0) } #priors psi0 ~ dnorm(0,0.001) psi1.star ~ dunif(0,1) psi1 <- 2*psi1.star-1 itau2 ~ dgamma(0.01,0.01) tau <- pow(itau2,-0.5) #simulation for(t in 1:945){ y.mean.c[t] <- cut(y.mean[t]) isigma2.c[t] <- cut(isigma2[t]) y.sim[t] ~ dnorm(y.mean.c[t],isigma2.c[t]) } }  This model closely matches those presented in Meyer and Yu (2002). There are further options in the tsbugs package to incorporate different priors that do not involve transformations such as those for psi1 above. Using R2OpenBUGS we can fit the model, > # decent initial value for variance in first period > init <- inits(sv0, warn=FALSE) > init$psi0 <- log(var(y))
> # write bug
> writeLines(sv0$bug, "sv0.txt") > # might take a while to compile > sv0.bug <- bugs(data = sv0$data,
inits = list(init),
param = c(nodes(sv0, "prior")$name,"y.sim","h"), model = "sv0.txt", n.iter = 11000, n.burnin = 1000, n.chains = 1)  The volatility and time-dependent standard deviations estimates can be easily derived, > h.mcmc <- sv0.bug$sims.list$h > sigma.mcmc <- sqrt(exp(h.mcmc))  Which allows us to directly view the estimated volatility process or the time-dependent standard deviation using the fanplot package , > # derive percentile objects > h.pn <- pn(h.mcmc) > sigma.pn <- pn(sigma.mcmc) > # plot > par(mfrow=c(2,1), mar=c(2,4,2,1)) > plot(NULL, type = "l", xlim = c(1, 945), xlab = "Time", xaxt = "n", ylim = range(h.pn[,5:95]), ylab="Volatility") > # add axis > axis(1, at = match(qtr, svpdx$rdate), labels = qtr, cex.axis = 0.55)
> axis(1, at = match(mth, svpdx$rdate), labels = FALSE, tcl = -0.2) > # fan > fan(h.pn, ln=c(1,10,50,90,99)) > fan.txt(h.pn, pn.l = c(1,10,50,90,99)) > plot(NULL, type = "l", xlim = c(1, 945), xlab = "Time", xaxt = "n", ylim = range(sigma.pn[,1:95]), ylab="Standard Deviation") > axis(1, at = match(qtr, svpdx$rdate), labels = qtr, cex.axis = 0.55)
> axis(1, at = match(mth, svpdx$rdate), labels = FALSE, tcl = -0.2) > fan(sigma.pn, ln=c(1,10,50,90,99)) > fan.txt(sigma.pn, pn.l = c(1,10,50,90,99))  We can also plot the posterior simulations from the model: > # derive percentiles > y.mcmc <- sv0.bug$sims.list$y.sim > y.pn <- pn(y.mcmc) > # plot > plot(NULL, type = "l", xlim = c(1, 945), xlab = "Time", xaxt = "n", ylim = range(y.pn[,5:95]), ylab="Return") > axis(1, at = match(qtr, svpdx$rdate), labels = qtr, cex.axis = 0.55)
> axis(1, at = match(mth, svpdx$rdate), labels = FALSE, tcl = -0.2) > fan(y.pn) > lines(y)  ### Random Variance (RV) Shift Models The rv.bugs command builds a BUGS script for random variance shift models, similar to that of McCulloch and Tsay (1993 ready to use in R2OpenBUGS. Consider the ew data. > r <- ts(ew[2:167]/ew[1:166]-1, start=1841) > y <- diff(r) > plot(y, ylab="Difference in Population Growth Rate")  We can create a BUGS script to fit a RV model to this data, including posterior simulations, using the rv.bugs command: > rv0 <- rv.bugs(y, sim=TRUE) > print(rv0) model{ #likelihood for(t in 1:165){ y[t] ~ dnorm(y.mean[t], isigma2[t]) isigma2[t] <- exp(-h[t]) h[t] <- 2*lsig[t] } for(t in 1:165){ y.mean[t] <- 0 } lsig[1] <- -0.5*log(isig02) for(t in 2:165){ lsig[t] <- lsig[t-1]+(delta[t]*beta[t]) delta[t] ~ dbern(epsilon) beta[t] ~ dnorm(0,ilambda2) } #priors isig02 ~ dgamma(0.000001,0.000001) sig0 <- pow(isig02,-0.5) epsilon ~ dbeta(1, 100) ilambda2 ~ dgamma(0.01,0.01) lambda <- pow(ilambda2,-0.5) #simulation for(t in 1:165){ y.mean.c[t] <- cut(y.mean[t]) isigma2.c[t] <- cut(isigma2[t]) y.sim[t] ~ dnorm(y.mean.c[t],isigma2.c[t]) } }  and then run the script in R2OpenBUGS: > # decent inital value for variance in first period > init <- inits(rv0, warn=FALSE) > init$isig02<-sd(y)^-2
> # write bug
> writeLines(rv0$bug,"rv0.txt") > # might take a while to compile > rv0.bug <- bugs(data = rv0$data,
inits = list(init),
param = c(nodes(rv0, "prior")$name, "y.sim", "h", "delta", "beta"), model = "rv0.txt", n.iter = 11000, n.burnin = 1000, n.chains = 1)  We can plot the posterior simulations from the model using the fanplot package: > # derive percentiles > y0 <- tsp(y)[1] > y.mcmc <- rv0.bug$sims.list$y.sim > y.pn <- pn(y.mcmc, start = y0) > # plot > plot(NULL, type = "n", xlim=tsp(y.pn)[1:2], ylim = range(y.pn[,5:95]), ylab="Difference in Population Growth Rate", xlab="Time") > fan(y.pn, txt=NA) > fan.txt(y.pn, pn.r = seq(10,90,40)) > fan.txt(y.pn, pn.l = seq(10,90,10)) > lines(y)  Alongside the posterior distributions of the volatility and standard deviations, > # derive percentiles > h.mcmc <- rv0.bug$sims.list$h > h.pn <- pn(h.mcmc, start = y0) > sigma.pn <- pn(sims = sqrt(exp(h.mcmc)), start = y0) > # plots > par(mfrow=c(2,1)) > plot(NULL, type = "n&quot;, xlim =tsp(h.pn)[1:2], ylim = range(h.pn[,5:95]), ylab=&quot;Volatility&quot;, xlab=&quot;Time&quot;) > fan(h.pn, ln=c(1,10,50,90,99)) > fan.txt(h.pn, pn.l = c(1,10,50,90,99)) > plot(NULL, type = "n", xlim =tsp(sigma.pn)[1:2], ylim = range(sigma.pn[,1:95]), ylab="Standard Deviation", xlab="Time") > fan(sigma.pn, ln=c(1,10,50,90,99)) > fan.txt(sigma.pn, pn.l = c(1,50,99))  The posterior distributions of the multiplier effect of the shift in variance (beta[t] in the BUGS model) can also be plotted. Note, when there is no variance shift, the posterior of the beta[t] is similar to the prior distribution. > # derive percentiles > beta.mcmc <- rv0.bug$sims.list$beta > beta.pn <- pn(beta.mcmc, start = y0) > # plots > plot(NULL, type = "n", xlim =tsp(beta.pn)[1:2], ylim = range(beta.pn), ylab="Variance Multiplier", xlab="Time") > fan(beta.pn) > fan.txt(beta.pn, pn.l = seq(10,90,10))  ### Further Work I hope to develop the package in the future with a few more functions to produce some MA and GARCH models. I am also thinking about ways to incorporate non-normal time series data. ### References Lunn, D. J., A. Thomas, N. Best, and D. Spiegelhalter (2000). WinBUGS – A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing 10 (4), 325–337. Lunn, D. et al. (2009). The BUGS project: Evolution, critique and future directions. Statistics in Medicine 28.25, 3049-3067 Meyer, R. and J. Yu (2002). BUGS for a Bayesian analysis of stochastic volatility models. Econometrics Journal, 3, 2, pp. 198–215. McCulloch, R. E. and R. S. Tsay (1993). Bayesian Inference and Prediction for Mean and Variance Shifts in Autoregressive Time Series. Journal of the American Statistical Association 88 (423), 968-978. Sturtz, S., U. Ligges, and A. Gelman (2005). R2WinBUGS: a package for running WinBUGS from R. Journal of Statistical Software 12 (3). Posted in BUGS, R, Research, tsbugs | | 7 Comments ## Does specification matter? Experiments with simple multiregional probabilistic population projections. A paper that I am a co-author on, looking at uncertainty in population forecasting generated by different measures of migration, came out this week in Environment and Planning A. Basically, try and avoid using net migration measures. Not only do they tend to give some dodgy forecasts, they also give you more uncertainty. Using in and out measures of migration in a projection model give a big improvement over a net measure. They also are a fairly good approximation to a full multiregional projection model. Plots in the paper were done by my good-self using the fanplot package. James Raymer, Guy J. Abel and Andrei Rogers Environment and Planning A 44(11) 2664 – 2686 Abstract Population projection models that introduce uncertainty are a growing subset of projection models in general. In this paper we focus on the importance of decisions made with regard to the model specifications adopted. We compare the forecasts and prediction intervals associated with four simple regional population projection models: an overall growth rate model, a component model with net migration, a component model with in-migration and out-migration rates, and a multiregional model with destination-specific out-migration rates. Vector autoregressive models are used to forecast future rates of growth, birth, death, net migration, in-migration and out-migration, and destination-specific out-migration for the North, Midlands, and South regions in England. They are also used to forecast different international migration measures. The base data represent a time series of annual data provided by the Office for National Statistics from 1976 to 2008. The results illustrate how both the forecasted subpopulation totals and the corresponding prediction intervals differ for the multiregional model in comparison to other simpler models, as well as for different assumptions about international migration. The paper ends with a discussion of our results and possible directions for future research. ## The fanplot package for R My fanplot package has gone up on CRAN. Here is a online version of the vignette. ### Introduction The fanplot package contains a collection of R (R Development Core Team, 2012) functions to effectively display plots of sequential distributions such as probabilistic forecasts. The plotting of distributions are based around two functions. The first, pn. calculates the percentiles for a set of sequential distributions over a specified time period. The second, fan. plots the calculated percentiles of the sequential distributions. The resulting plot is a set of coloured polygon, with shadings corresponding to the percentile values. This document illustrates these two core functions using MCMC simulation results from fitted stochastic volatility models. These MCMC simulations can be recreated from data and BUGS model also contained in the fanplot package, via the R2OpenBUGS package of (Sturtz et al., 2005). These are first shown for dataframe type object where there is no time series type attributes, followed by plots based on a ts object. ### Volatility Plots To illustrate the basics of the fanplot package consider the svpdx data contained in the tsbugs (Abel, 2013) package. This contains information on the log return of the Pound-Dollar exchange rate from 2nd October 1981 to 28th June 1985. For a view of the first part of the data, use the head function, after loading the fanplot package. > library("tsbugs") > head(svpdx) date pdx 1 1981-10-02 -0.3555316 2 1981-10-05 1.4254090 3 1981-10-06 -0.4439399 4 1981-10-07 1.0256500 5 1981-10-08 1.6775790 6 1981-10-09 0.3690041  This data is a dataframe object, and is difficult to express it as a standard ts object due to the irregular nature of the data. It is best plot initially without an x-axis, which can then be added later using the axis function: > #plot > plot(svpdx$pdx, type = "l", xaxt = "n", xlab = "Time", ylab = "Return")
> #x-axis
> svpdx$rdate <- format(svpdx$date, format = "%b %Y")
> mth <- unique(svpdx$rdate) > qtr <- mth[seq(1,length(mth),3)] > axis(1, at = match(qtr, svpdx$rdate), labels = qtr, cex.axis = 0.55)
> axis(1, at = match(mth, svpdx$rdate), labels = FALSE, tcl = -0.2)  If the xaxt argument was not used in the plot function and the labels not added later, the x-axis would be based on the row number of each observation in the svpdx data, i.e. an index sequence from 1 to 945. To produce the x-axis with date information in the above code, a new column is added to svpdx for the month-year combination of each observation. Objects mth and qtr are then created to mark each month and quarter in the data series respectively. Major axis ticks are then plotted on the unseen 1 to 945 index for the beginning of every quarter with a corresponding label, whilst minor axis tick are also plotted for beginning of every month. Meyer and Yu (2002) used the above Pound-Dollar exchange rate data to fit various stochastic volatility models in WinBUGS (Lunn et al., 2000). One such stochastic model they fitted to the data is contained in my1.txt in the model directory of the fanplot package. The model of Meyer and Yu (2002) can be refitted in BUGS via R, using the R2OpenBUGS package (Sturtz et al., 2005). > library("R2OpenBUGS") > # write model file: > my1.bug <- dget(system.file("model", "my1.txt", package = "fanplot")) > write.model(my1.bug, "my1.txt") > # take a look: > file.show("my1.txt") > # run openbugs > my1<-bugs(data=list(n=length(svpdx$pdx),y=svpdx$pdx), inits=list(list(phistar=0.975,mu=0,itau2=50)), param=c("mu","phi","tau","theta"), model="my1.txt", n.iter = 11000, n.burnin = 1000, n.chains = 1)  Here, the same initial parameter values as Meyer and Yu (2002) are set. One chain of the MCMC simulation is run for 11000 iterations, with the first 1000 used for burn in. The resulting bugs object contains the MCMC simulation results for the parameters in the stochastic volatility model, including the time dependent volatility parameters ($\theta_t$). This set of sequential posterior distributions are of interest when studying the variation in the data over time. The fanplot package can effectively display the entire posterior distribution of $\theta_t$. The separate MCMC simulation of the volatility be obtained from my1 using $\theta_t$, which is overwritten when creating a new th.mcmc object}, > th.mcmc <- my1$sims.list$theta  A plot of the entire posterior distribution of $\theta_t$ first requires a calculation of the percentiles over all $t$ using the pn function, > library("fanplot") > th.pn <- pn(sims = th.mcmc)  This produces a pn type object, where rows represent the time index and columns the percentiles calculated. > head(th.pn[,c(1:3, 97:99)]) 1% 2% 3% 97% 98% 99% [1,] -1.05400 -0.999006 -0.9811 -0.2069 -0.1615000 -0.091476 [2,] -1.00907 -0.947200 -0.9127 -0.2230 -0.1829000 -0.116800 [3,] -1.01600 -0.945000 -0.9030 -0.1869 -0.1776000 -0.151200 [4,] -1.02800 -0.932900 -0.8911 -0.1714 -0.1466000 -0.082590 [5,] -1.00600 -0.917500 -0.8740 -0.1367 -0.0973464 -0.047270 [6,] -1.05101 -0.935700 -0.8998 -0.1297 -0.0543700 0.029130  Every percentile between 1st and 99th are calculated by default, however, more or less percentiles can be calculated via the p argument in the pn function. The number of percentiles calculated has a direct impact on the plotting of the sequential distributions, as we shall see later. Additional arguments to control the indexing of the rows, which is of use when the time series are from regular intervals, will also be discussed later. In order to plot the th.pn percentile object the plot area must be first set. This can be done using type = “n” argument in plot. Both the xlim and ylim arguments need to be set appropriately. In the code below, the xlim argument is set between 1 and 945, the length of th.pn. The ylim argument is set to the range of th.pn to enable all percentiles calculated to be included in the plot area. Once the plot area is set up, the fan function can be used to add the th.pn object. Each percentile in the plot is represented by a different shade of the default colour scale in the fan.col argument of fan. In addition, contour lines are drawn on every decile, with labels for these decides add to the right hand side. > #empty plot > plot(NULL, type = "n", xlim = c(1, 945), ylim = range(th.pn), ylab = "Theta") > #add fan > fan(th.pn)  The fan.txt function can be add more labels to a set of sequential distributions plotted using fan. When used in addition to the fan function, labels for closely spaced deciles can be controlled to allow a more spacious display of labels in comparison to those shown in the above plot. It can also be used to add text labels to percentiles that are not of a unit of 10, such as the 1st or 99th. The code below demonstrates these features. First a empty plot area is created with the x-axis changed to dates using the same method as in the plot of the original data. Second, the percentiles of the sequential distribution is plotted for $\theta_t$ with no text labels (setting txt = NA) and contour lines for the 1st and 99th percentiles alongside some selected deciles. The ln argument in the fan function is set to include contour lines at percentiles where future text labels are to be added. > #empty plot with x-axis added later > plot(NULL, type = "l", xlim = c(1, 945), xlab = "Time", xaxt = "n", ylim = range(th.pn), ylab = "Theta") > axis(1, at = match(qtr, svpdx$rdate), labels = qtr, cex.axis = 0.55)
> axis(1, at = match(mth, svpdx$rdate), labels = FALSE, tcl = -0.2) > #add fan > fan(th.pn, txt = NA, ln = c(1,10,30,50,70,90,99)) > #add text labels for percentiles > fan.txt(th.pn, pn.r = c(1,10,50,90,99)) > fan.txt(th.pn, pn.l = c(1,30,70,99))  The colour of the percentiles in the sequential distributions can be easily altered from the default heat.colors scheme. A new set of graded colours can be passed to the fan function using the fan.col argument. The number of colours should be half the number of percentiles (columns) in the pn object. New graded colour schemes can be constructed in a number of ways. For example, using the colorRampPalette a new shading from blue to white, via grey can be created. > pal <- colorRampPalette(c("royalblue", "grey", "white"))  Using this palette, 50 colours (approximately half the number of percentiles calculated in pn) can be defined: > fancol <- pal(50)  For a change, this new colour scheme is used to plot the posterior distribution of the standard deviation over time, $\sigma_t$, which is derived as such: > sigma.pn <- pn(sims = sqrt(exp(th.mcmc)))  The new colour scheme can then be passed to the fan function for sigma.pn. with contour lines on selected percentiles: > #empty plot with x-axis added later > plot(NULL, type = "l", xlim = c(1, 945), xlab = "Time", xaxt = "n", ylim = range(sigma.pn), ylab = "Standard Deviation") > axis(1, at = match(qtr, svpdx$rdate), labels = qtr, cex.axis = 0.55)
> axis(1, at = match(mth, svpdx$rdate), labels = FALSE, tcl = -0.2) > #add fan > fan(sigma.pn, fan.col = fancol, ln = c(1, 10, 50, 90, 99))  ### Model Fits To illustrate plots in the fanplot package that use time series objects (ts) based on regularly spaced data, a stochastic volatility model is fitted to the change of population growth rate of England and Wales, similar to that in Abel et al. (2010). Population data from 1841 to 2007 from Human Mortality Database (2012) are included in the tsbugs package (ew). The growth rate, that the stochastic volatility model is based on can be derived following Rogers (1995) as such > r <- ts(ew[2:167]/ew[1:166]-1, start=1841)  Mean stationarity can be obtained by differencing the series > y <- diff(r)  Using this differenced growth rate time series, we can build a BUGS stochastic volatility model using the tsbugs package. > pop.bug <- sv.bugs(y, k=25, sim=TRUE, sv.mean.prior2 = "dgamma(0.000001,0.000001)", sv.ar.prior2 = "dunif(-0.999, 0.999)")  The sv.bugs function specifies for 25 future values to be forecast and simulations of the model to be taken. Specifications for alternative prior distributions for the parameters in the volatility process are also stated. This BUGS model can be run using R2OpenBUGS, > library("R2OpenBUGS") > # write model file: > writeLines(pop.bug$bug, "pop.txt")
> # take a look:
> file.show("pop.txt")
> # run openbugs
> pop <- bugs(data = pop.bug$data, inits = list(list(psi0.star=exp(12), psi1=0.5, itau2=0.5)), param = c("psi0", "psi1", "tau", "y.new", "y.sim"), model = "pop.txt", n.iter = 11000, n.burnin = 1000, n.chains = 1)  As was the case with the exchange rate data, one chain of the MCMC simulation is run for 11000 iterations, with the first 1000 used for burn in. The resulting bugs object contains the MCMC simulation results for the parameters in the stochastic volatility model, including the time dependent model fits ($E(y_t)$), forecasts of the future population growth rate ($\hat{r}_{t+h|t}$) and population ($\hat{p}_{t+h|t}$). The fanplot package can efficiently display these sequential distributions in a number of ways. Consider the MCMC simulation of the model fit, which can be extracted from the pop using, > y.mcmc <- pop$sims.list$y.sim  A plot of the entire posterior distribution of the model fits requires the calculation of percentiles over all $t$ using the pn function. > y0 <- tsp(y)[1] > y.pn <- pn(sims = y.mcmc, start = y0)  Here, the corresponding start date is also given so that the pn object takes the relevant time series properties when plotted. This saves on the previous effort for the non-regularly spaced exchange rate data, in setting up date labels on the x-axis. The time series properties are stored in the tsp attributes of the new y.pn object. > str(y.pn) pn [1:165, 1:99] -0.00357 -0.00367 -0.00405 -0.00467 -0.00588 ... - attr(*, "dimnames")=List of 2 ..$ : NULL
..$: chr [1:99] "1%" "2%" "3%" "4%" ... - attr(*, "tsp")= num [1:3] 1843 2007 1  These attributes can be directly used to set up the x-axis limits in the plot area, alongside y-axis limits based on the difference in the growth rate, which was the basis for the stochastic variance model. The fan function can then be used to add the sequential posterior distributions with contour lines at the 1st, 10th, 90th and 99th percentiles using the ln argument. Note, the fan function will automatically only draw and label contour lines given in the ln argument. If the user wishes to subdue this and add text labels later, they can do so by adding the txt = NA as was demonstrated earlier. The original data can also be plotted on top of the posterior distributions using the lines function: > #empty plot > plot(NULL, type = "l", xlim = range(time(y.pn)), xlab = "Time", ylim = range(y), ylab = "Expected Model Fit") > #add fan > fan(y.pn, ln = c(1, 10, 90, 99)) > #add data > lines(diff(r), lwd = 2)  A coarser set of colours can be plotted by creating a pn object with fewer percentiles. This is done by defining the p argument of the pn function with the only the percentiles for which colour changes are desired. > y.pn2 <- pn(sims = y.mcmc, p = c(1, 20, 40, 60, 80, 99), start = y0)  Note, that elements of p will ultimately be adjusted to be symmetric around 50. For example if a user set p = c(1, 40, 80) a pn object identical the one above would be returned. This allows the user, if desired, to only define percentiles either above or below 50. The new y.pn2 object can be plotted using the default arguments for contour lines and text labels. Lines are never drawn for percentiles not calculated in the pn object. As a result, in the plotting of y.pn2 there are no contour lines on the 10th, 30th, 50th, 70th and 90th deciles. > #empty plot > plot(NULL, type = "l", xlim = range(time(y.pn)), xlab = "Time", ylim = range(diff(r)), ylab = "Expected Model Fit") > #add fan > fan(y.pn2) > #add data > lines(diff(r), lwd = 2)  ### Forecast Fans To illustrate the plotting of forecast fans we use the MCMC predictive distributions in the pop object. These are used to derive posterior predictive distributions of the population growth rate and population total using the diffinv function, > ynew.mcmc <- pop$sims.list\$y.new
> rnew.mcmc <- apply(ynew.mcmc, 1, diffinv, xi = tail(r,1))
> rnew.mcmc <- t(rnew.mcmc[-1,])
>
> pnew.mcmc <- apply(1+rnew.mcmc, 1, cumprod) * tail(ew,1)
> pnew.mcmc <- t(pnew.mcmc)


Percentiles for rnew.mcmc can be derived as

> r0 <- tsp(r)[2]
> rnew.pn <- pn(sims = rnew.mcmc, start = r0 + 1)


Note, that as rnew.mcmc is a simulation of forecasts, the start argument set to the year after the last observation of the population growth rate. Forecast fans are plotted after estimating the percentile objects, much in the same way as they were for the volatility and model fit in the previous sections. However, some extra care must be taken in setting up the xlim to allow space for the fan to appear in the right hand side of plotting area. For the population growth rate this is demonstrated alongside a small section of the underlying simulation data (the first 30 MCMC samples of predictive distribution) that are represent a small part of the underlying data used to calculate the percentiles.

> par(mfrow = c(1 ,2))
> #sample of underlying simulation data
> plot(r, ylim = range(r), xlim = c(1940, 2040), lwd = 2, ylab = "Population Growth Rate")
> for (i in 1:30) lines(ts(rnew.mcmc[i, ], r0 + 1), col = "grey")
> #plot r
> plot(r, ylim = range(r), xlim = c(1940, 2040), lwd = 2, ylab = "Population Growth Rate")
> fan(rnew.pn)


The anchor argument in pn can be utilised to bridge the gap between the predictive distribution fan and the final data point. The associated starting point for creating an time series type object for plotting must also be adjusted to account for the anchoring. For the pnew.mcmc the two alternative sets of percentile calculations, with or without an anchoring can be derived as such:

> p0 <- tsp(ew)[2]
> pnew.pn <- pn(sims = pnew.mcmc/1e+06, start = p0 + 1)
> pnew.pn2 <- pn(sims = pnew.mcmc/1e+06, p = c(1, 10, 40, 50), anchor = tail(ew,1)/1e+06, start = p0)


For the pnew.pn2 only a few percentiles are calculated, which will provide a coarser set of shade in the plotting of the predictive distribution. In addition, the anchor is set to the last observed population count, hence the start point is now on the last observation p0. not at p0 + 1. In both calculations simulations are divided by one million to provide easier interpretation.

Both pn objects can be plotted side by side using the fan function. Contour line colours can be altered directly using the ln.col argument for the display of pnew.pn2 on the right hand side below, in comparison to the default plot on the left hand side.

> par(mfrow = c(1 ,2))
> #plot ew
> plot(ew/1e+06, ylim = c(40, 80), xlim = c(1940, 2040), lwd = 2, ylab = "Population (m)")
> fan(pnew.pn)
> #plot ew
> plot(ew/1e+06, ylim = c(40, 80), xlim = c(1940, 2040), lwd = 2, ylab = "Population (m)")
> fan(pnew.pn2, ln.col = "black")


### References

Abel, G. J. (2013). tsbugs: Create time series BUGS models. Retrieved 26 February 2013, from http://cran.r-project.org/web/packages/tsbugs.
Abel, G. J., J. Bijak, and J. Raymer (2010). A comparison of official population projections with Bayesian time series forecasts for England and Wales. Population Trends, 95–114.
Human Mortality Database (2012). Available at http://www.mortality.org. University of California, Berkeley (USA) and Max Planck Institute for Demographic Research (Germany).
Lunn, D. J., A. Thomas, N. Best, and D. Spiegelhalter (2000, October). WinBUGS – A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing 10 (4), 325–337.
Meyer, R. and J. Yu (2000). BUGS for a Bayesian analysis of stochastic volatility models. Econometrics Journal 3 (2).
R Development Core Team (2012). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.
Rogers, A. (1995). Multiregional Demography: Principles, Methods and Extensions (1 ed.). John Wiley & Sons.
Sturtz, S., U. Ligges, and A. Gelman (2005). R2WinBUGS: a package for running WinBUGS from R. Journal of Statistical Software 12 (3).

Posted in BUGS, fanplot, Population Forecasting, R, Research | | 4 Comments