Post deleted. See the migest pkgdown site for easier ways to create chord diagrams for directional origin-destination data.
Tag: population
Shiny App for the Wittgenstein Centre Population Projections
A few weeks ago a new version of the the Wittgenstein Centre Data Explorer was launched. The data explorer is intended to disseminate the results of a recent global population projection exercise which uniquely incorporates level of education (as well as age and sex) and the scientific input of more than 500 population experts around the world. Included are the projected populations used in the 5th assessment report of the Intergovernmental Panel on Climate Change (IPCC).
Over the past year or so I have been working (on and off) with the data lab team to create a shiny app, on which the data explorer is based. All the code and data is available on my github page. Below are notes to summarise some of the lessons I learnt:
1. Large data
We had a pretty large amount of data to display (31 indicators based on up to 7 scenarios x 26 time periods x 223 geographical areas x 21 age groups x 2 genders x 7 education categories)… so somewhere over 8 million rows for some indicators. Further complexity was added by the fact that some indicators were by definition not available for some dimensions of the data, for example, population median age is not available by age group. The size and complexity meant that data manipulations were a big issue. Using read.csv
to load the data didn’t really cut the mustard, taking over 2 minutes when running on the server. The fantastic saves package and ultra.fast=TRUE
argument in the loads
function came to the rescue, alongside some pre-formatting to avoid as much joining and reshaping of the data on the server as possible. This cut load times to a couple of seconds at most, and allowed the app to work with the indicator variables on the fly as demanded by the user selections. Once the data was in, the more than awesome dplyr functions finished the data manipulations jobs in style. I am sure there is some smarter way to get everything running a little bit quicker than it does now, but I am pretty happy with the present speed, given the initial waiting times.
2. googleVis and gvisMerge
It’s a demographic data explorer, which means population pyramids have to pop-up somewhere. We needed pyramids that illustrate population sizes by education level, on top of the standard age and sex breakdown. Static versions of the education pyramids in the explorer have previously been used by my colleagues to illustrate past and future populations. For the graphic explorer I created some interactive versions, for side-by-side comparisons over time and between countries, and which also have some tool tip features. These took a little while to develop. I played with ggvis but couldn’t get my bar charts to go horizontal. I also took a look at some other functions for interactive pyramids but I couldn’t figure out a way to overlay the educational dimension. I found a solution by creating gender specific stacked bar charts from gvisBarChart
in the googleVis package and then gvisMerge
to bring them together in one plot. As with the data tables, they take a second or so render, so I added a withProgress
bar to try and keep the user entertained.
I could not figure out a way in R to convert the HTML code outputted by the gvisMerge
function to a familiar file format for users to download. Instead I used a system
call to the wkhtmltopdf program to return PDF or PNG files. By default, wkhtmltopdf was a bit hit and miss, especially with converting the more complex plots in the maps to PNG files. I found setting --enable-javascript --javascript-delay 2000
helped in many cases.
3. The shiny user community
I asked questions using the shiny tag on stackoverflow and the shiny google group a number of times. A big thank you to everyone who helped me out. Browsing through other questions and answers was also super helpful. I found this question on organising large shiny code particularly useful. Making small changes during the reviewing process became a lot easier once I broke the code up across multiple .R files with sensible names.
4. Navbar Pages
When I started building the shiny app I was using a single layout with a sidebar and tabbed pages to display data and graphics (using tabsetPanel()
), adding extra tabs as we developed new features (data selection, an assumption data base, population pyramids, plots of population size, maps, FAQ’s, etc, etc.). As these grew, the switch to the new Navbar layout helped clean up the appearance and provide a better user experience, where you can move between data, graphics and background information using the bar at the top of page.
5. Shading and link buttons
I added some shading and buttons to help navigate through the data selection and between different tabs. For the shading I used cssmatic.com to generate the colour of a fluidRow
background. The code generated there was copy and pasted into a tags$style
element for my defined row myRow1
, as such;
library(shiny) runApp(list( ui = shinyUI(fluidPage( br(), fluidRow( class = "myRow1", br(), selectInput('variable', 'Variable', names(iris)) ), tags$style(".myRow1{background: rgba(212,228,239,1); background: -moz-linear-gradient(left, rgba(212,228,239,1) 0%, rgba(44,146,208,1) 100%); background: -webkit-gradient(left top, right top, color-stop(0%, rgba(212,228,239,1)), color-stop(100%, rgba(44,146,208,1))); background: -webkit-linear-gradient(left, rgba(212,228,239,1) 0%, rgba(44,146,208,1) 100%); background: -o-linear-gradient(left, rgba(212,228,239,1) 0%, rgba(44,146,208,1) 100%); background: -ms-linear-gradient(left, rgba(212,228,239,1) 0%, rgba(44,146,208,1) 100%); background: linear-gradient(to right, rgba(212,228,239,1) 0%, rgba(44,146,208,1) 100%); filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#d4e4ef', endColorstr='#2c92d0', GradientType=1 ); border-radius: 10px 10px 10px 10px; -moz-border-radius: 10px 10px 10px 10px; -webkit-border-radius: 10px 10px 10px 10px;}") )), server = function(input, output) { } ))
I added some buttons to help novice users switch between tabs once they had selected or viewed their data. It was a little tougher to implement than the shading, and in the end I need a little help. I used bootsnipp.com to add some icons and define the style of the navigation buttons (using the tags$style
element again).
That is about it for the moment. I might add a few more notes to this post as they occur to me… I would encourage anyone who is tempted to learn shiny to take the plunge. I did not know JavaScript or any other web languages before I started, and I still don’t… which is the great beauty of the shiny package. I started with the RStudio tutorials, which are fantastic. The R code did not get a whole lot more complex than what I learnt there, even though the shiny app is quite large in comparisons to most others I have seen.
Any comments or suggestions for improving website are welcome.
Estimating global migration flow tables using place of birth data.
A few months ago, Demographic Research published my paper on estimating global migration flow tables. In the paper I developed a method to estimate international migrant flows, for which there is limited comparable data, to matches changes in migrant stock data, which are more widely available. The result was bilateral tables of estimated international migrant transitions between 191 countries for four decades, which I believe are a first of kind. The estimates in an excel spreadsheet are available as a additional file on the journal website. The abstract and citation details are below.
The paper uses the ffs_demo() function in the migest package.
Publication Details:
Abel, G. J. (2013). Estimating global migration flow tables using place of birth data. Demographic Research, 28, 505–546. doi:10.4054/DemRes.2013.28.18
International migration flow data often lack adequate measurements of volume, direction and completeness. These pitfalls limit empirical comparative studies of migration and cross national population projections to use net migration measures or inadequate data. This paper aims to address these issues at a global level, presenting estimates of bilateral flow tables between 191 countries. A methodology to estimate flow tables of migration transitions for the globe is illustrated in two parts. First, a methodology to derive flows from sequential stock tables is developed. Second, the methodology is applied to recently released World Bank migration stock tables between 1960 and 2000 (Özden et al. 2011) to estimate a set of four decadal global migration flow tables. The results of the applied methodology are discussed with reference to comparable estimates of global net migration flows of the United Nations and models for international migration flows. The proposed methodology adds to the limited existing literature on linking migration flows to stocks. The estimated flow tables represent a first-of-a-kind set of comparable global origin destination flow data.
Does specification matter? Experiments with simple multiregional probabilistic population projections.
A paper that I am a co-author on, looking at uncertainty in population forecasting generated by different measures of migration, came out this week in Environment and Planning A. Basically, try and avoid using net migration measures. Not only do they tend to give some dodgy projections, we also found out that they give you more uncertainty. Using in and out measures of migration in a projection model give a big reduction in uncertainty over a net measure. They also are a fairly good approximation to the uncertainty from a full multiregional projection model. Plots in the paper were done by my good-self using the fanplot package.
Publication Details:
Raymer J., Abel, G.J. and Rogers, A. (2012). Does Specication Matter? Experiments with Simple Multiregional Probabilistic Population Projections. Environment and Planning A 44 (11), 2664–2686.
Population projection models that introduce uncertainty are a growing subset of projection models in general. In this paper we focus on the importance of decisions made with regard to the model specifications adopted. We compare the forecasts and prediction intervals associated with four simple regional population projection models: an overall growth rate model, a component model with net migration, a component model with in-migration and out-migration rates, and a multiregional model with destination-specific out-migration rates. Vector autoregressive models are used to forecast future rates of growth, birth, death, net migration, in-migration and out-migration, and destination-specific out-migration for the North, Midlands, and South regions in England. They are also used to forecast different international migration measures. The base data represent a time series of annual data provided by the Office for National Statistics from 1976 to 2008. The results illustrate how both the forecasted subpopulation totals and the corresponding prediction intervals differ for the multiregional model in comparison to other simpler models, as well as for different assumptions about international migration. The paper ends with a discussion of our results and possible directions for future research.
A comparison of official population projections with Bayesian time series forecasts for England and Wales
A paper based on my some work I did with colleagues in the ESRC Centre for Population Change was published in the the Population Trends. We fitted a range of time series models (including some volatility models) to population change data for England and Wales, calculated the posterior model probabilities and then projected from the model averaged posterior predictive distributions. We found our volatility models were heavily supported. Our median matches very closely the Office of National Statistics mid scenario. It’s a tad surprising that projections based on forecasts of a single annual growth rate per year give a similar forecast to the ONS cohort component projection which are based on hundreds of future age-sex specific fertility, mortality and net migration rates. The ONS do not provide any form probabilistic uncertainty, instead the give a expert based high and low scenario, which roughly calibrated to our 50% prediction interval in 2033. I ran all the models in BUGS and did the fan chart plots in R.
Publication Details:
Abel, G.J., Bijak, J. and Raymer J. (2010). A comparison of official population projections with Bayesian time series forecasts for England and Wales. Population Trends, 141, 95–114.
We compare official population projections with Bayesian time series forecasts for England and Wales. The Bayesian approach allows the integration of uncertainty in the data, models and model parameters in a coherent and consistent manner. Bayesian methodology for time-series forecasting is introduced, including autoregressive (AR) and stochastic volatility (SV) models. These models are then fitted to a historical time series of data from 1841 to 2007 and used to predict future population totals to 2033. These results are compared to the most recent projections produced by the Office for National Statistics. Sensitivity analyses are then performed to test the effect of changes in the prior uncertainty for a single parameter. Finally, in-sample forecasts are compared with actual population and previous official projections. The article ends with some conclusions and recommendations for future work.