Circular Migration Flow Plots in R

A article of mine was published in Science today. It introduces estimates for bilateral global migration flows between all countries. The underlying methodology is based on the conditional maximisation routine in my Demographic Research paper. However, I tweaked the demographic accounting which ensures the net migration in the estimated migration flow tables matches very closely to the net migration figures from the United Nations.

My co-author, Nikola Sander, developed some circular plots for the paper based on circos in perl. A couple of months back, after the paper was already in the submission process, I figured out how to replicate these plots in R using the circlize package. Zuguang Gu, the circlize package developer was very helpful, responding quickly (and with examples) to my emails.

To demonstrate, I have put two demo files in my migest R package. For the estimates of flows by regions, users can hopefully replicate the plots (so long as the circlize and plyr packages are installed) using:

library("migest")
demo(cfplot_reg, package = "migest", ask = FALSE)

It should result in the following plot:
cfplot_reg
The basic idea of the plot is to show simultaneously the relative size of estimated flows between regions. The origins and destinations of migrants are represented by the circle’s segments, where nearby regions are positioned close to each other. The size of the estimated flow is indicated by the width of the link at its bases and can be read using the tick marks (in millions) on the outside of the circle’s segments. The direction of the flow is encoded both by the origin colour and by the gap between link and circle segment at the destination.

You can save the PDF version of the plot (which looks much better than what comes up in my R graphics device) using:

dev.copy2pdf(file = "cfplot_reg.pdf", height=10, width=10)

In Section 5 of our Vienna Institute of Demography Working Paper I describe step by step the R code in the demo files. A similar demo with slight alterations to the labelling is also available for a plot of the largest country to country flows:

demo(cfplot_nat, package = "migest", ask = FALSE)

cfplot_nat

If you are interested in the estimates, you can fully explore in the interactive website (made using d3.js) at http://global-migration.info/. There is also a link on the website to download all the data. Ramon Bauer has a nice blog post explaining the d3 version.

Publication Details:

Abel, G.J. and Sander, N. (2014). Quantifying Global International Migration Flows. Science. 343 (6178), 1520–1522.

Widely available data on the number of people living outside of their country of birth do not adequately capture contemporary intensities and patterns of global migration flows. We present data on bilateral flows between 196 countries from 1990 through 2010 that provide a comprehensive view of international migration flows. Our data suggest a stable intensity of global 5-year migration flows at ~0.6% of world population since 1995. In addition, the results aid the interpretation of trends and patterns of migration flows to and from individual countries by placing them in a regional or global context. We estimate the largest movements to occur between South and West Asia, from Latin to North America, and within Africa.

Posted in International Migration Estimation, migest, R, Research | 8 Comments

Forecasting Environmental Immigration to the UK

A couple of months ago, a paper I worked on with co-authors from the Centre of Population Change was published in Population and Environment. It summarised work we did as part of the UK Government Office for Science Foresight project on Migration and Global Environmental Change. Our aim was to build expert based forecasts of environmental immigrants to the UK. We conducted a Delphi survey of nearly 30 migration experts from academia, the civil service and non-governmental organisations to obtain estimates on the future levels of immigration to the UK in 2030 and 2060 with uncertainty. We also asked them what proportion of current and future immigration are/will be environmental migrants. The results were incorporated into a set of model averaged Bayesian time series models through prior distributions on the mean and variance terms.

The plots in the journal article got somewhat butchered during the publication process. Below is the non-butchered version for the future immigration to the UK alongside the past immigration data from the Office of National Statistics.
imm2
At first, I was a bit taken aback with this plot. A few experts thought there were going to be some very high levels of future immigration which cause the rather striking large upper tail. However, at a second glance, the central percentiles show a gentle decrease where these is only (approximately) a 30% chance of an increase in future migration from the 2010 level throughout the forecast period.

The expert based forecast for total immigration was combined with the responses to questions on the proportion of environmental migrants, to obtain an estimate on both the current level of environmental migration (which is not currently measured) and future levels:
env4

As is the way with these things, we came across some problems in our project. The first, was with the definition of an environmental migrant, which is not completely nailed on in the migration literature. As a result the part of the uncertainty in the expert based forecasts are reflective of not only the future level but also of the measure itself. The second was with the elicitation of uncertainty. We used a Likert type scale, which caused some difficulties even during the later round of the Delphi survey. If I was to do over, then this I reckon problem could be much better addressed by getting experts to visualise their forecast fans in an interactive website, perhaps creating a shiny app with the fanplot package. Such an approach would result in smoother fans than those in the plots above, which were based on interpolations from expert answers at only two points of time in the future (2030 and 2060).

Publication Details:

Abel, G.J., Bijak, J., Findlay, A.M., McCollum, D. and Wiśniowski, A. (2013). Forecasting environmental migration to the United Kingdom: An exploration using Bayesian models. Population and Environment. 35 (2), 183–203

Over the next 50 years, the potential impact of environmental change on human livelihoods could be considerable, with one possible consequence being increased levels of human mobility. This paper explores how uncertainty about the level of immigration to the United Kingdom as a consequence of environmental factors elsewhere may be forecast using a methodology involving Bayesian models. The conceptual understanding of forecasting is advanced in three ways. First, the analysis is believed to be the first time that the Bayesian modelling approach has been attempted in relation to environmental mobility. Second, the paper considers the expediency of this approach by comparing the responses to a Delphi survey with conventional expectations about environmental mobility in the research literature. Finally, the values and assumptions of the expert evidence provided in the Delphi survey are interrogated to illustrate the limited set of conditions under which forecasts of environmental mobility, as set out in this paper, are likely to hold.

Posted in BUGS, International Migration Estimation, Population Forecasting, Research | Tagged , , , , , , , , , , , , | Leave a comment

Global Bilateral International Migration Flows

A few months ago, Demographic Research published my paper on estimating global migration flow tables. In the paper I developed a method to estimate international migrant flows, for which there is limited comparable data, to matches changes in migrant stock data, which are more widely available. The result was bilateral tables of estimated international migrant transitions between 191 countries for four decades, which I believe are a first of kind. The estimates in an excel spreadsheet are available as a additional file on the journal website. The abstract and citation details are at the bottom of this post.

My migest R package contains the ffs function for the flows-from-stock method used in the paper. To demonstrate, consider two hypothetical migrant stock tables I use in the paper, where rows represent place of birth and columns represent place of residence. The first stock table represents the distributions of migrant stocks at the start of the period. The second represents the distributions at the end of the period.

> # create P1 and P2 stock tables
> dn <- LETTERS[1:4]
> P1 <- matrix(c(1000, 100, 10, 0,
+                55, 555, 50, 5, 
+                80, 40, 800, 40, 
+                20, 25, 20, 200), 
+              nrow=4, ncol=4, byrow = TRUE,
+              dimnames = list(pob = dn, por = dn))
> P2 <- matrix(c(950, 100, 60, 0, 
+                80, 505, 75, 5, 
+                90, 30, 800, 40,
+                40, 45, 0, 180),
+              nrow=4, ncol=4, byrow = TRUE,
+              dimnames = list(pob = dn, por = dn))
> # display with row and col totals
> addmargins(P1)
     por
pob      A   B   C   D  Sum
  A   1000 100  10   0 1110
  B     55 555  50   5  665
  C     80  40 800  40  960
  D     20  25  20 200  265
  Sum 1155 720 880 245 3000
> addmargins(P2)
     por
pob      A   B   C   D  Sum
  A    950 100  60   0 1110
  B     80 505  75   5  665
  C     90  30 800  40  960
  D     40  45   0 180  265
  Sum 1160 680 935 225 3000

When estimating flows from stock data, a good demographer should worry about births and deaths over the period as these can have substantial impacts on changes in populations over time. In the simplest example using the above hypothetical example above, I set births and deaths to zero (implied by the equal row totals, the sum of populations by their place of birth) in each stock table. In any case I need to create some vectors to pass this information to the ffs function.

> # no births and deaths
> b <- rep(0, 4)
> d <- rep(0, 4)

We can then pass the stock tables, births and deaths to the ffs function to estimate flows by birth place, contained the mu element of the returned list.

> # run flow from stock estimation
> library("migest")
> y <- ffs(P1=P1, P2=P2, d=d, b=b)
1 46 
2 0 
> # display with row, col and table totals
> addmargins(y$mu)
, , pob = A

     dest
orig    A   B  C D  Sum
  A   950   0 50 0 1000
  B     0 100  0 0  100
  C     0   0 10 0   10
  D     0   0  0 0    0
  Sum 950 100 60 0 1110

, , pob = B

     dest
orig   A   B  C D Sum
  A   55   0  0 0  55
  B   25 505 25 0 555
  C    0   0 50 0  50
  D    0   0  0 5   5
  Sum 80 505 75 5 665

, , pob = C

     dest
orig   A  B   C  D Sum
  A   80  0   0  0  80
  B   10 30   0  0  40
  C    0  0 800  0 800
  D    0  0   0 40  40
  Sum 90 30 800 40 960

, , pob = D

     dest
orig   A  B C   D Sum
  A   20  0 0   0  20
  B    0 25 0   0  25
  C   10 10 0   0  20
  D   10 10 0 180 200
  Sum 40 45 0 180 265

, , pob = Sum

     dest
orig     A   B   C   D  Sum
  A   1105   0  50   0 1155
  B     35 660  25   0  720
  C     10  10 860   0  880
  D     10  10   0 225  245
  Sum 1160 680 935 225 3000

The fm function returns the flow matrix aggregated over the place of birth dimension in the mu array.

> # display aggregate flows
> f <- fm(y$mu)
> addmargins(f)
     dest
orig   A  B  C D Sum
  A    0  0 50 0  50
  B   35  0 25 0  60
  C   10 10  0 0  20
  D   10 10  0 0  20
  Sum 55 20 75 0 150

….and there you have it, an estimated flow matrix that matches the changes in the stock tables whilst controlling for births and deaths. In the paper I run the code on real migrant stock data provided by the World Bank, to estimate global migrant flow tables.

The ffs function has some different methods to control for deaths in the estimation procedure. The estimation is based on a three way iterative proportional fitting scheme to estimate parameters in a log-linear model, not to dissimilar to that used in a paper based on my Southampton M.Sc. dissertation.

Publication Details:

Abel, G. J. (2013). Estimating global migration flow tables using place of birth data. Demographic Research, 28, 505–546. doi:10.4054/DemRes.2013.28.18

International migration flow data often lack adequate measurements of volume, direction and completeness. These pitfalls limit empirical comparative studies of migration and cross national population projections to use net migration measures or inadequate data. This paper aims to address these issues at a global level, presenting estimates of bilateral flow tables between 191 countries. A methodology to estimate flow tables of migration transitions for the globe is illustrated in two parts. First, a methodology to derive flows from sequential stock tables is developed. Second, the methodology is applied to recently released World Bank migration stock tables between 1960 and 2000 (Özden et al. 2011) to estimate a set of four decadal global migration flow tables. The results of the applied methodology are discussed with reference to comparable estimates of global net migration flows of the United Nations and models for international migration flows. The proposed methodology adds to the limited existing literature on linking migration flows to stocks. The estimated flow tables represent a first-of-a-kind set of comparable global origin destination flow data.

Posted in International Migration Estimation, migest, R, Research | Tagged , , , , , , , , | Leave a comment

Bank of England Fan Charts in R

I have/will update this post as I expanded the fanplot package.

I managed to catch David Spiegelhalter’s Tails You Win on BBC iplayer last week. I missed it the first time round, only for my parents on my last visit home to tell me about a Statistician jumping out of a plane on TV. It was a great watch. Towards the end I spotted some fan charts used by the Bank of England to illustrate uncertainty in their forecasts, similar to this one:

Bank of England February 2013 CPI Fan Chart.
Obtained from Chart 5.3 in February 2013 Inflation Report

They discussed how even in the tails of their GDP predictive distribution they missed the financial crisis by a long shot. This got me googling, trying and find you how they made the plots, something that (also) completely passed me by when I put together my fanplot package for R. As far as I could tell they did them in Excel, although (appropriately) I am not completely certain. There are also MATLAB files that can create fan charts. Anyhow, I thought I would have a go at replicating a Bank of England fan chart in R….

Split Normal (Two-Piece) Normal Distribution.

The Bank of England produce fan charts of forecasts for CPI and GDP in their quarterly Inflation Reports. They also provide data, in the form of mean, uncertainty and a skewness parameters of a split-normal distribution that underlie their fan charts (The Bank of England predominately refer to the equivalent, re-parametrised, two-piece normal distribution). The probability density of the split-normal distribution is given by Julio (2007) as

f(x; \mu, \sigma_1, \sigma_2) = \left\{   \begin{array}{ll}    \frac{\sqrt 2}{\sqrt\pi (\sigma_1+\sigma_2)} e^{-\frac{1}{2\sigma_1^2}(x-\mu)^2} & \mbox{for } -\infty < x \leq \mu \\   \frac{\sqrt 2}{\sqrt\pi (\sigma_1+\sigma_2)} e^{-\frac{1}{2\sigma_2^2}(x-\mu)^2} & \mbox{for } \mu < x < \infty \\  \end{array},  \right.

where \mu represents the mean parameter, and the two standard deviations \sigma_1 and \sigma_2 can be derived given the overall uncertainty parameter, \sigma and skewness parameters, \gamma, as;

\sigma^2=\sigma^2_1(1+\gamma)=\sigma^2_2(1-\gamma).

As no split normal distribution existed in R, I added routines for a density, distribution and quantile function, plus a random generator, to a new version (2.1) of the fanplot package. I used the formula in Julio (2007) to code each of the three functions, and checked the results against those from the fan chart MATLAB code.

Fan Chart Plots for CPI.

Once I had the qsplitnorm function working properly, producing the fan chart plot in R was pretty straight-forward. I added two data objects to the fanplot package to help readers reproduce my plots below. The first, cpi, is a time series object with past values of CPI index. The second, boe, is a data frame with historical details on the split normal parameters for CPI inflation between Q1 2004 to Q4 2013 forecasts by the Bank of England.

> library("fanplot")
> head(boe)
  time0    time mean uncertainty skew
1  2004 2004.00 1.34      0.2249    0
2  2004 2004.25 1.60      0.3149    0
3  2004 2004.50 1.60      0.3824    0
4  2004 2004.75 1.71      0.4274    0
5  2004 2005.00 1.77      0.4499    0
6  2004 2005.25 1.68      0.4761    0

The first column time0 refers to the base year of forecast, the second, time indexes future projections, whilst the remaining three columns provide values for the corresponding projected mean (\mu), uncertainty (\sigma) and skew (\gamma) parameters:

Users can replicate past Bank of England fan charts for a particular period after creating a matrix object that contains values on the split-normal quantile function for a set of user defined probabilities. For example, in the code below, a subset of the Bank of England future parameters of CPI published in Q1 2013 are first selected. Then a vector of probabilities related to the percentiles, that we ultimately would like to plot different shaded fans for, are created. Finally, in a for loop the qsplitnorm function, calculates the values for which the time-specific (i) split-normal distribution will be less than or equal to the probabilities of p.

# select relevant data
y0 <- 2013
boe0 <- subset(boe, time0==y0)
k <- nrow(boe0)

# guess work to set percentiles the BOE are plotting
p <- seq(0.05, 0.95, 0.05)
p <- c(0.01, p, 0.99)
 
# quantiles of split-normal distribution for each probability
# (row) at each future time point (column)
cpival <- matrix(NA, nrow = length(p), ncol = k)
for (i in 1:k) 
   cpival[, i] <- qsplitnorm(p, mean = boe0$mean[i], 
                                sd = boe0$uncertainty[i],
                                skew = boe0$skew[i])

The new object cpival contains the values evaluated from the qsplitnorm function in 6 rows and 13 columns, where rows represent the probabilities used in the calculation p and columns represent successive time periods.

The object cpival can then used to add a fan chart to the active R graphic device. In the code below, the area of the plot is set up when plotting the past CPI data, contained in the time series object cpi. The xlim arguments are set to ensure space on the right hand side of the plotting area for the fan. Following the Bank of England style for plotting fan charts, the background for future values is set to a gray colour, y-axis are plotted on the right hand side, a horizontal line are added for the CPI target and a vertical line for the two-year ahead point.

# past data
plot(cpi, type = "l", col = "tomato", lwd = 2, 
     xlim = c(y0 - 5, y0 + 3), ylim = c(-2, 7), 
     xaxt = "n", yaxt = "n", ylab="")
 
# background
rect(y0 - 0.25, par("usr")[3] - 1, y0 + 3, par("usr")[4] + 1, 
     border = "gray90", col = "gray90")
 
# add fan
fan(data = cpival, data.type = "values", probs = p, 
    start = y0, frequency = 4, 
    anchor = cpi[time(cpi) == y0 - 0.25], 
    fan.col = colorRampPalette(c("tomato", "gray90")),  
    ln = NULL, rlab = NULL)
 
# boe aesthetics
axis(2, at = -2:7, las = 2, tcl = 0.5, labels = FALSE)
axis(4, at = -2:7, las = 2, tcl = 0.5)
axis(1, at = 2008:2016, tcl = 0.5)
axis(1, at = seq(2008, 2016, 0.25), labels = FALSE, tcl = 0.2)
abline(h = 2)  #boe cpi target
abline(v = y0 + 1.75, lty = 2)  #2 year line

boe1
The fan chart itself is outputted from the fan function, where arguments are set to ensure a close resemblance of the R plot to that produced by the Bank of England. The first three arguments in the fan function called in the above code, provide the cpival data to plotted, indicate that the data are a set of calculated values (as opposed to simulations) and provide the probabilities that correspond to each row of cpival object. The next two arguments define the start time and frequency of the data. These operate in a similar fashion to those used when defining time series in R with the ts function. The anchor argument is set to the value of CPI before the start of the fan chart. This allows a join between the value of the Q1 2013 observation and the fan chart. The fan.col argument is set to a colour palette for shades between tomato and gray90. The final two arguments are set to NULL to suppress the plotting of contour lines at the boundary of each shaded fan and their labels, as per the Bank of England style.

Default Fan Chart Plot.

By default, the fan function treats objects passed to the data argument as simulations from sequential distributions, rather than user-created values corresponding probabilities provided in the probs argument (as above). An alternative plot below, based on simulated data and default style settings in the fan function produces a fan chart with a greater array of coloured fans with labels and contour lines alongside selected percentiles of the future distribution. To illustrate we can simulate 10,000 values from the future split-normal distribution parameters from Q1 2013 in the boe0 data frame using the rsplitnorm function

#simulate future values
cpisim <- matrix(NA, nrow = 10000, ncol = k)
for (i in 1:k) 
   cpisim[, i] <- rsplitnorm(n=10000, mean = boe0$mean[i],
                                      sd = boe0$uncertainty[i], 
                                      skew = boe0$skew[i])

The fan chart based on the simulations in cpisim can then be added to the plot;

# truncate cpi series
cpi0 <- ts(cpi[time(cpi)<2013], start=start(cpi), 
           frequency=frequency(cpi) )

# past data
plot(cpi0, type = "l", lwd = 2, 
     xlim = c(y0 - 5, y0 + 3.25), ylim = c(-2, 7))

# add fan
fan(data = cpisim, start = y0, frequency = 4) 

boe2
The fan function calculates the values of 100 equally spaced percentiles of each future distribution when the default data.type = "simulations is set. This allows 50 fans to be plotted from the heat.colours colour palate, providing a finer level of shading in the representation of future distributions. In addition, lines and labels are provided along each decile. The fan chart does not connect to the last observation as anchor = NULL by default.

Posted in fanplot, R, Research | Tagged , , , , , , , , , , , , , , , | 5 Comments

The tsbugs package for R

I have/will update this post as I expanded the tsbugs package.

My tsbugs package has gone up on CRAN. I decided not to write a vignette for the submission, as it would have involved doing some estimation in BUGS via R2WinBUGS or R2OpenBUGS and running into some problems when submitted the package. Instead I thought I would post a quick guide here…

The functions in the tsbugs package are aimed to automate the writing of time series models to run in WinBUGS or OpenBUGS. I created these functions a while back when I was doing some work on model averaging for time series models. I found it a lot easier to build R functions to write the BUGS models than the more error-inducing process of copy and pasting BUGS scripts, and then making slight alterations to create new models. It also allowed me to add arguments to specify different lag lengths, prior distributions, variance assumptions and data lengths. Below are examples for three types of time series models; autorgressive models with constant variance, stochastic volatility and random variance shift models.

Autoregressive Models

The ar.bugs command builds a BUGS script for autoregressive (AR) models ready to use in R2OpenBUGS. For example, consider the LakeHuron data.

LH <- LakeHuron 
par(mfrow=c(2,1)) 
plot(LH, main="Level (ft)") 
plot(diff(LH), main="Differenced Level")

tsbugs1-cvdata
We can construct a AR(1) model for this data (after differencing the data to obtain a stationary mean) as such:

library("tsbugs")
ar1 <- ar.bugs(y=diff(LH), ar.order=1)
print(ar1)

The ar.bugs function allows for alternative specifications for prior distributions, forecasts and the inclusion of mean term:

ar2 <- ar.bugs(y=diff(LH), ar.order=2, ar.prior="dunif(-1,1)", 
               var.prior="dgamma(0.001,0.001)", k = 10, 
               mean.centre = TRUE)
print(ar2)

The tsbugs objects can be used with R2OpenBUGS to easily run models from R. This is made even easier using the inits and nodes functions (also in the tsbugs package). For example:

writeLines(ar2$bug, "ar2.txt") 
library("R2OpenBUGS")
ar2.bug <- bugs(data = ar2$data,
                inits = list(inits(ar2)),
                param = c(nodes(ar2, "prior")$name, "y.new"),
                model = "ar2.txt", 
                n.iter = 11000, n.burnin = 1000, n.chains = 1)

Note, 1) the model is written to a .txt file (as required by R2OpenBUGS), 2) the data used is part of the tsbugs object. The ar.bugs command cleans the data and adds missing values at the end of the series for foretasted values, 3) the initial values offered by the inits function are very crude, and with more complicated data or models, users might be better off specifying there own list of initial values. The parameter traces and posterior distributions can be plotted using the coda package:

library("coda")
param.mcmc <- as.mcmc(ar2.bug$sims.matrix[,nodes(ar2, "prior")$name])
plot(param.mcmc[,1:4])

tsbugs1-cvparam
The fanplot package can be used to plot the entire series of posterior predictive distributions. We may also plot (after deriving using the diffinv function) the posterior predictive distributions of the lake level:

# derive future level
ynew.mcmc <- ar2.bug$sims.list$y.new
lhnew.mcmc <- apply(ynew.mcmc, 1, diffinv, xi = tail(LH,1))
lhnew.mcmc <- t(lhnew.mcmc[-1,])

# plot differenced
par(mfrow=c(2,1)) 
plot(diff(LH) ,xlim=k0+c(-50,10), main="Differenced Level")

# add fan
library("fanplot")
k0 <- end(LH)[1]
fan(ynew.mcmc, start=k0+1, rcex=0.5)

# plot undifferenced
plot(LH, xlim=k0+c(-50,10), main="Level")
fan(lhnew.mcmc, start=k0+1, rcex=0.5)

tsbugs1-cvforc

Stochastic Volatility Models

The sv.bugs command builds a BUGS script for stochastic volatility SV models ready to use in R2OpenBUGS. For example, consider the svpdx data.

# plot
plot(svpdx$pdx, type = "l", 
     main = "Return of Pound-Dollar exchange rate data 
             from 2nd October 1981 to 28th June 1985", 
     cex.main = 0.8)

tsbugs1-svdata
We can construct a AR(0)-SV model for this data, and also obtain posterior simulations using the sv.bugs command:

y <- svpdx$pdx
sv0 <- sv.bugs(y, sim=TRUE)
print(sv0)

This model closely matches those presented in
Meyer and Yu (2002). There are further options in the tsbugs package to incorporate different priors that do not involve transformations such as those for psi1 above. Using R2OpenBUGS we can fit the model,

# decent initial value for variance in first period
init <- inits(sv0, warn=FALSE)
init$psi0 <- log(var(y))
# write bug
writeLines(sv0$bug, "sv0.txt")
# might take a while to compile
sv0.bug <- bugs(data = sv0$data,
                inits = list(init),
                param = c(nodes(sv0, "prior")$name,"y.sim","h"),
                model = "sv0.txt", 
                n.iter = 11000, n.burnin = 1000, n.chains = 1)

The volatility and estimates can be easily extracted,

h.mcmc <- sv0.bug$sims.list$h

Which allows us to directly view the estimated volatility process or the time-dependent standard deviation using the fanplot package,

# plot
plot(NULL, xlim = c(1, 945)+c(0,40), ylim = c(-4,2), 
     main="Estimated Volatility from SV Model") 
# fan
fan(h.mcmc, type = "interval")

tsbugs1-svh
We can also plot the posterior simulations from the model:

# derive percentiles
y.mcmc <- sv0.bug$sims.list$y.sim
# plot
plot(NULL, type = "l", xlim = c(1, 945)+c(0,20), 
     ylim = range(y), 
     main = "Posterior Model Simulations and Data")
fan(y.mcmc)
lines(y)

tsbugs1-svysim

Random Variance Shift Models

The rv.bugs command builds a BUGS script for random variance (RV) shift models, similar to that of McCulloch and Tsay (1993) ready to use in R2OpenBUGS. Consider the ew data.

r <- ts(ew[2:167]/ew[1:166]-1, start=1841)
y <- diff(r) 
plot(y, main="Difference in England and Wales 
              Population Growth Rate")

tsbugs1-rvdata
We can create a BUGS script to fit a RV model to this data, including posterior simulations, using the rv.bugs command:

rv0<-rv.bugs(y, sim=TRUE)
print(rv0)

and then run the script in R2OpenBUGS (this can take a couple of hours):

# decent inital value for variance in first period
init <- inits(rv0, warn=FALSE)
init$isig02<-sd(y)^-2
# write bug
writeLines(rv0$bug,"rv0.txt")
# might take a while to compile
rv0.bug <- bugs(data = rv0$data,
                inits = list(init),
                param = c(nodes(rv0, "prior")$name,"y.sim",
                          "h","delta","beta"),
                model = "rv0.txt", 
                n.iter = 11000, n.burnin = 1000, n.chains = 1)

We can plot the posterior simulations from the model using the fanplot package:

# derive percentiles
y0 <- tsp(y)[1]
y.mcmc <- rv0.bug$sims.list$y.sim

# plot
plot(NULL, xlim=tsp(y)[1:2]+c(-5,5), ylim = range(y), 
     main="Posterior Simulations")
fan(y.mcmc, start = y0, rlab=c(10,50,90), llab=TRUE)
lines(y)

tsbugs1-rvysim
Alongside the posterior distributions of the standard deviations,

# derive sigma
h.mcmc <- rv0.bug$sims.list$h
sigma.mcmc <- sqrt(exp(h.mcmc))
# plots
plot(NULL, xlim =tsp(y)[1:2]+c(-5,5), ylim = c(0,0.008), 
     main="Standard Deviation")
fan(sigma.mcmc, start = y0, rlab=c(5,50,95), llab = c(5,50,95))

tsbugs1-rvhsigma
The posterior distributions of the probability of a variance shift and multiplier effect of the shift in variance (delta[t] and beta[t] in the BUGS model) can also be plotted. Note, when there is no variance shift, the posterior of the beta[t] is similar to the prior distribution.

#extract data
delta.mcmc <- rv0.bug$sims.list$delta
beta.mcmc <- rv0.bug$sims.list$beta

# plots
par(mfrow=c(2,1))
plot(NULL, xlim = tsp(y)[1:2]+c(-5,5), ylim = c(0,1),
     main="Probability of Variance Change Point")
fan(delta.mcmc, start=y0, ln = NULL, rlab = NULL)

plot(NULL, xlim = tsp(y)[1:2]+c(-5,5), ylim = c(-2,2), 
     main="Variance Multiplier")
fan(beta.mcmc, start=y0)

tsbugs1-rvbeta

Posted in BUGS, R, Research, tsbugs | Tagged , , , , , , , , , , , , , | 7 Comments