Combining census and registration data to estimate detailed elderly migration flows in England and Wales

During my MS.c. I worked on methods for combining internal migration data in England and Wales. Migration data is often represented in square tables of origin-destination flows. These are of particular interest to analysing migration patterns when they are disaggregated by age, sex and some other variable such as illness, ethnicity or economic status. In England and Wales the data within these detailed flow table are typically missing in non-census years. However, row and column (origin and destination) totals are regularly provided from the NHS patient registers (see the first two columns of the hypothetical data situation below). I worked on a method to estimate the detailed missing flow data to sum to the provided totals in non-census years (see the third column of the hypothetical data situation below). This method is particularly useful for estimating migration flow tables disaggregated by detailed characteristics of migrants (such as illness, ethnicity or economic status) that are only provided by the ONS for census years.

Hypothetical Example of Data Set Situation (where migrant origins are labelled on the vertical axis and destinations on the horizontal axis).

Auxiliary Data (e.g. 2001 Census) Primary Data (e.g. 2004 NHSCR Data) Detailed Estimates for 2004 Based on Methodology
Without Limiting Long Term Illness Without Limiting Long Term Illness
N 80 20 50 150 N 88 56 40 183
M 50 100 50 200 Illness details unavailable M 29 145 21 195
S 10 30 110 150 N M S S 7 52 54 113
140 150 210 500 N 260 124 252 115 491
With Limiting Long Term Illness M 320 With Limiting Long Term Illness
N M S S 170 N M S
N 30 10 20 60 200 370 180 750 N 33 28 16 77
M 40 50 70 160 M 23 73 29 125
S 30 10 40 80 S 20 17 20 57
100 70 130 300 76 118 65 259

The estimated values maintain some properties (various cross product ratios) of the Census data whilst updating marginal totals to more current data. For more details see my MS.c. dissertation (which I have put online here). I also presented the method and some further results at POPFSET 2007, see here for more details. This contains the R/S-Plus code to conduct the estimation in the Appendix. There is also a published paper based on my MS.c. that uses a slightly modified R code.

Publication Details:

Raymer J., Abel G.J. and Smith P.W.F. (2007). Combining census and registration data to estimate detailed elderly migration flows in England and Wales. Journal of the Royal Statistical Society Series A (Statistics in Society) 170 (4) 891–908.

A log-linear model is developed to estimate detailed elderly migration flows by combining data from the 2001 UK census and National Health Services patient register. After showing that the census and National Health Service migration flows can be reasonably combined, elderly migration flows between groupings of local authority districts by age, sex and health status for the 2000–2001 and 2003–2004 periods are estimated and then analysed to show how the patterns have changed. By combining registration data with census data, we can provide recent estimates of detailed elderly migration flows, which can be used for improvements in social planning or policy.

Chatfield’s Plots in S-Plus

I have recently finished reading the sixth edition of The Analysis of Time Series: An Introduction by Chatfield in our Statistics reading group. Whilst enjoying most of the book I got a little confused when looking at Appendix D: Some MINITAB and S-PLUS Commands. In the S-Plus section the author gives the code below to replicate his Figure 1.2.

RecfS<-cts(recife, start=dates("01/01/1953"),units="months",frequency="12")
ts.plot(RecfS,xaxt="n",ylab="Temperature (deg C)", xlab="Year", type="l")
axis(side=1, at = c(fst,mid1,mid2,mid3,mid4,mid5,mid6,mid7,mid8,lst),
labels=c("Jan 53", "Jan 54", "Jan 55", "Jan 56", "Jan 57", "Jan 58",
"Jan 59", "Jan 60", "Jan 61", "Jan 62"),

I was not too sure what was going on with the code, which gave a tonne of error messages, some of which might well have been typo’s by the publisher (_ instead of <-)? In addition, the author stresses the effort required to construct nice plots in S-Plus. This got me thinking that 1) his code was excessive (not just because it does not work) and 2) he could have saved a lot of his effort by using R. The R code below proves my point

> png("...:/1.2.png", width=650, height=460, units="px")
> recife <- scan("", sep="", skip=3)
> n <- length(recife)
> times <- seq(as.Date("1953/1/1"), by="month", length.out=n)
> plot(recife, xaxt="n", type= "l",ylab="Temperature (deg C)", xlab="Year")
> axis(1, at = seq(1,n,12), labels = format(times, "%b %Y")[seq(1,n,12)])

Much Tidier! Here is the plot..
If you only wanted labels every other January, as in p.2 of the book (but not in the S-Plus code), then you can use..

> axis(1, at = seq(1,n,24), 
       labels = format(times, "%b %y")[seq(1,n,24)])