Combining census and registration data to estimate detailed elderly migration flows in England and Wales

During my MS.c. I worked on methods for combining internal migration data in England and Wales. Migration data is often represented in square tables of origin-destination flows. These are of particular interest to analysing migration patterns when they are disaggregated by age, sex and some other variable such as illness, ethnicity or economic status. In England and Wales the data within these detailed flow table are typically missing in non-census years. However, row and column (origin and destination) totals are regularly provided from the NHS patient registers (see the first two columns of the hypothetical data situation below). I worked on a method to estimate the detailed missing flow data to sum to the provided totals in non-census years (see the third column of the hypothetical data situation below). This method is particularly useful for estimating migration flow tables disaggregated by detailed characteristics of migrants (such as illness, ethnicity or economic status) that are only provided by the ONS for census years.

Hypothetical Example of Data Set Situation (where migrant origins are labelled on the vertical axis and destinations on the horizontal axis).

Auxiliary Data (e.g. 2001 Census) Primary Data (e.g. 2004 NHSCR Data) Detailed Estimates for 2004 Based on Methodology
Without Limiting Long Term Illness Without Limiting Long Term Illness
N M S N M S
N 80 20 50 150 N 88 56 40 183
M 50 100 50 200 Illness details unavailable M 29 145 21 195
S 10 30 110 150 N M S S 7 52 54 113
140 150 210 500 N 260 124 252 115 491
With Limiting Long Term Illness M 320 With Limiting Long Term Illness
N M S S 170 N M S
N 30 10 20 60 200 370 180 750 N 33 28 16 77
M 40 50 70 160 M 23 73 29 125
S 30 10 40 80 S 20 17 20 57
100 70 130 300 76 118 65 259

The estimated values maintain some properties (various cross product ratios) of the Census data whilst updating marginal totals to more current data. For more details see my MS.c. dissertation (which I have put online here). I also presented the method and some further results at POPFSET 2007, see here for more details. This contains the R/S-Plus code to conduct the estimation in the Appendix. There is also a published paper based on my MS.c. that uses a slightly modified R code.

Publication Details:

Raymer J., Abel G.J. and Smith P.W.F. (2007). Combining census and registration data to estimate detailed elderly migration flows in England and Wales. Journal of the Royal Statistical Society Series A (Statistics in Society) 170 (4) 891–908.

A log-linear model is developed to estimate detailed elderly migration flows by combining data from the 2001 UK census and National Health Services patient register. After showing that the census and National Health Service migration flows can be reasonably combined, elderly migration flows between groupings of local authority districts by age, sex and health status for the 2000–2001 and 2003–2004 periods are estimated and then analysed to show how the patterns have changed. By combining registration data with census data, we can provide recent estimates of detailed elderly migration flows, which can be used for improvements in social planning or policy.

Advertisements

Chatfield’s Plots in S-Plus

I have recently finished reading the sixth edition of The Analysis of Time Series: An Introduction by Chatfield in our Statistics reading group. Whilst enjoying most of the book I got a little confused when looking at Appendix D: Some MINITAB and S-PLUS Commands. In the S-Plus section the author gives the code below to replicate his Figure 1.2.

postscript(file="1.2.ps")
recife<-scan("Recife")
RecfS<-cts(recife, start=dates("01/01/1953"),units="months",frequency="12")
fst_as.numeric(date("01/01/1953"),format="dd/mm/yyyy")
mid1_as.numeric(dates("01/01/1954"),format="dd/mm/yyyy")
mid2_as.numeric(dates("01/01/1955"),format="dd/mm/yyyy")
mid3_as.numeric(dates("01/01/1956"),format="dd/mm/yyyy")
mid4_as.numeric(dates("01/01/1957"),format="dd/mm/yyyy")
mid5_as.numeric(dates("01/01/1958"),format="dd/mm/yyyy")
mid6_as.numeric(dates("01/01/1959"),format="dd/mm/yyyy")
mid7_as.numeric(dates("01/01/1960"),format="dd/mm/yyyy")
mid8_as.numeric(dates("01/01/1961"),format="dd/mm/yyyy")
lst_as.numeric(dates("01/01/1962"),format="dd/mm/yyyy")
ts.plot(RecfS,xaxt="n",ylab="Temperature (deg C)", xlab="Year", type="l")
axis(side=1, at = c(fst,mid1,mid2,mid3,mid4,mid5,mid6,mid7,mid8,lst),
labels=c("Jan 53", "Jan 54", "Jan 55", "Jan 56", "Jan 57", "Jan 58",
"Jan 59", "Jan 60", "Jan 61", "Jan 62"),
ticks=T)
dev.off()

I was not too sure what was going on with the code, which gave a tonne of error messages, some of which might well have been typo’s by the publisher (_ instead of <-)? In addition, the author stresses the effort required to construct nice plots in S-Plus. This got me thinking that 1) his code was excessive (not just because it does not work) and 2) he could have saved a lot of his effort by using R. The R code below proves my point

> png("...:/1.2.png", width=650, height=460, units="px")
> recife <- scan("http://people.bath.ac.uk/mascc/Recife.TS", sep="", skip=3)
> n <- length(recife)
> times <- seq(as.Date("1953/1/1"), by="month", length.out=n)
> plot(recife, xaxt="n", type= "l",ylab="Temperature (deg C)", xlab="Year")
> axis(1, at = seq(1,n,12), labels = format(times, "%b %Y")[seq(1,n,12)])
> dev.off()

Much Tidier! Here is the plot..
1.2
If you only wanted labels every other January, as in p.2 of the book (but not in the S-Plus code), then you can use..

> axis(1, at = seq(1,n,24), 
       labels = format(times, "%b %y")[seq(1,n,24)])

LaTeX Template for Ph.D. Thesis, School of Social Sciences, University of Southampton

This is a LaTeX Template based on the code used for my Thesis.tex document submitted in the School of Social Sciences, University of Southampton (April 2009). I had to develop this myself as I (and others in my Ph.D. cohort) had no contact with previous LaTeX users from the school.

This is the raw .tex file. I have cut out most of the text to leave the bones of the LaTeX commands. (Click on Kingsley Royal LaTeX icon). The entire thesis is available online, link in this post.

I tried to keep within the formatting guidelines laid out in the Completion of Research Degree Candidature book of the University of Southampton and did not receive any problems in my viva.

Feel free to use this as a template. Let me know if you add any developments and I will update the template accordingly.

Football Predictions: Season 2012-13

Wow. Completely forgot about this, found the old spreadsheet shifting through my Google Drive this morning. Looks like Dave won by the narrowest of margins, fending of James, Rich and Amos were inseparable… and Bernie had a shocker, cricky mate! Seems like the centre of predictive power still lies in the Murry Building. Most defiantly going do a big World Cup comp. Spread the word. Love. Guy.

Football Predictions: Season 2010-11

Season over. Results are in the table below. Congrats to Amos, the clear winner, who predicted the correct order of the Premiership 1-4. Amazing. Commiserations to Bernie.

At the top of the table are the weights given to each prediction category.

Lowest score wins, where:
*Scores for league places is calculated by multiplying the weight by the difference in the final and predicted finishing places.
*Scores for cup competitions is calculated by multiplying the weight by the number of wins away the chosen team was from winning the cup.
*Scores for Golden Boots are calculated by multiplying the weight by the number of goals the chosen player is away from the top scorers tally.

Football Predictions: 2010 World Cup

The world cup is over, and I have updated prediction table (see final version below). Well done to Tom who correctly predicted the final. Hard luck to Dave C. who would have won if Tom was 1) patriotic (and actually supported England) and 2) did not make his golden boot prediction a week into the tournament. However, given that these predictions are based on a FIFA tournament, its logical that no action is taken against those who do not play by the rules correctly. Congratulations Tom.

Scoring system:

Lowest points total wins. Points calculated from each category as such:

Winner: 10 x No of wins away from predicted team becoming champions.
Runner: Up 7 x No of wins/losses away from predicted team becoming runner up.
Semi Final: 3 x No of wins/losses away from predicted teams loosing in Semi Finals.
Quarter Final: 1 x No of wins/losses away from predicted teams loosing in Quarter Finals.
Own Team: 7 x No of wins/losses away from own teams predicated finish.
Golden Boot: 5 x No of goals your chosen player is away from the top scorers tally.
Fair Play: -5  If you correctly predict the fair play team.

Tie breaks decided by own team predictions, then alphabetical order.