## Bank of England Fan Charts in R

I have/will update this post as I expanded the fanplot package.

I managed to catch David Spiegelhalter’s Tails You Win on BBC iplayer last week. I missed it the first time round, only for my parents on my last visit home to tell me about a Statistician jumping out of a plane on TV. It was a great watch. Towards the end I spotted some fan charts used by the Bank of England to illustrate uncertainty in their forecasts, similar to this one:

They discussed how even in the tails of their GDP predictive distribution they missed the financial crisis by a long shot. This got me googling, trying and find you how they made the plots, something that (also) completely passed me by when I put together my fanplot package for R. As far as I could tell they did them in Excel, although (appropriately) I am not completely certain. There are also MATLAB files that can create fan charts. Anyhow, I thought I would have a go at replicating a Bank of England fan chart in R….

Split Normal (Two-Piece) Normal Distribution.

The Bank of England produce fan charts of forecasts for CPI and GDP in their quarterly Inflation Reports. They also provide data, in the form of mode, uncertainty and a skewness parameters of a split-normal distribution that underlie their fan charts (The Bank of England predominately refer to the equivalent, re-parametrised, two-piece normal distribution). The probability density of the split-normal distribution is given by Julio (2007) as

$f(x; \mu, \sigma_1, \sigma_2) = \left\{ \begin{array}{ll} \frac{\sqrt 2}{\sqrt\pi (\sigma_1+\sigma_2)} e^{-\frac{1}{2\sigma_1^2}(x-\mu)^2} & \mbox{for } -\infty < x \leq \mu \\ \frac{\sqrt 2}{\sqrt\pi (\sigma_1+\sigma_2)} e^{-\frac{1}{2\sigma_2^2}(x-\mu)^2} & \mbox{for } \mu < x < \infty \\ \end{array}, \right.$

where $\mu$ represents the mode parameter, and the two standard deviations $\sigma_1$ and $\sigma_2$ can be derived given the overall uncertainty parameter, $\sigma$ and skewness parameters, $\gamma$, as;

$\sigma^2=\sigma^2_1(1+\gamma)=\sigma^2_2(1-\gamma).$

As no split normal distribution existed in R, I added routines for a density, distribution and quantile function, plus a random generator, to a new version (2.1) of the fanplot package. I used the formula in Julio (2007) to code each of the three functions, and checked the results against those from the fan chart MATLAB code.

Fan Chart Plots for CPI.

Once I had the qsplitnorm function working properly, producing the fan chart plot in R was pretty straight-forward. I added two data objects to the fanplot package to help readers reproduce my plots below. The first, cpi, is a time series object with past values of CPI index. The second, boe, is a data frame with historical details on the split normal parameters for CPI inflation between Q1 2004 to Q4 2013 forecasts by the Bank of England.

> library("fanplot")
time0    time mode uncertainty skew
1  2004 2004.00 1.34      0.2249    0
2  2004 2004.25 1.60      0.3149    0
3  2004 2004.50 1.60      0.3824    0
4  2004 2004.75 1.71      0.4274    0
5  2004 2005.00 1.77      0.4499    0
6  2004 2005.25 1.68      0.4761    0

The first column time0 refers to the base year of forecast, the second, time indexes future projections, whilst the remaining three columns provide values for the corresponding projected mode ($\mu$), uncertainty ($\sigma$) and skew ($\gamma$) parameters:

Users can replicate past Bank of England fan charts for a particular period after creating a matrix object that contains values on the split-normal quantile function for a set of user defined probabilities. For example, in the code below, a subset of the Bank of England future parameters of CPI published in Q1 2013 are first selected. Then a vector of probabilities related to the percentiles, that we ultimately would like to plot different shaded fans for, are created. Finally, in a for loop the qsplitnorm function, calculates the values for which the time-specific (i) split-normal distribution will be less than or equal to the probabilities of p.

# select relevant data
y0 <- 2013
boe0 <- subset(boe, time0==y0)
k <- nrow(boe0)

# guess work to set percentiles the BOE are plotting
p <- seq(0.05, 0.95, 0.05)
p <- c(0.01, p, 0.99)

# quantiles of split-normal distribution for each probability
# (row) at each future time point (column)
cpival <- matrix(NA, nrow = length(p), ncol = k)
for (i in 1:k)
cpival[, i] <- qsplitnorm(p, mode = boe0$mode[i], sd = boe0$uncertainty[i],
skew = boe0$skew[i]) The new object cpival contains the values evaluated from the qsplitnorm function in 6 rows and 13 columns, where rows represent the probabilities used in the calculation p and columns represent successive time periods. The object cpival can then used to add a fan chart to the active R graphic device. In the code below, the area of the plot is set up when plotting the past CPI data, contained in the time series object cpi. The xlim arguments are set to ensure space on the right hand side of the plotting area for the fan. Following the Bank of England style for plotting fan charts, the background for future values is set to a gray colour, y-axis are plotted on the right hand side, a horizontal line are added for the CPI target and a vertical line for the two-year ahead point. # past data plot(cpi, type = "l", col = "tomato", lwd = 2, xlim = c(y0 - 5, y0 + 3), ylim = c(-2, 7), xaxt = "n", yaxt = "n", ylab="") # background rect(y0 - 0.25, par("usr")[3] - 1, y0 + 3, par("usr")[4] + 1, border = "gray90", col = "gray90") # add fan fan(data = cpival, data.type = "values", probs = p, start = y0, frequency = 4, anchor = cpi[time(cpi) == y0 - 0.25], fan.col = colorRampPalette(c("tomato", "gray90")), ln = NULL, rlab = NULL) # boe aesthetics axis(2, at = -2:7, las = 2, tcl = 0.5, labels = FALSE) axis(4, at = -2:7, las = 2, tcl = 0.5) axis(1, at = 2008:2016, tcl = 0.5) axis(1, at = seq(2008, 2016, 0.25), labels = FALSE, tcl = 0.2) abline(h = 2) #boe cpi target abline(v = y0 + 1.75, lty = 2) #2 year line The fan chart itself is outputted from the fan function, where arguments are set to ensure a close resemblance of the R plot to that produced by the Bank of England. The first three arguments in the fan function called in the above code, provide the cpival data to plotted, indicate that the data are a set of calculated values (as opposed to simulations) and provide the probabilities that correspond to each row of cpival object. The next two arguments define the start time and frequency of the data. These operate in a similar fashion to those used when defining time series in R with the ts function. The anchor argument is set to the value of CPI before the start of the fan chart. This allows a join between the value of the Q1 2013 observation and the fan chart. The fan.col argument is set to a colour palette for shades between tomato and gray90. The final two arguments are set to NULL to suppress the plotting of contour lines at the boundary of each shaded fan and their labels, as per the Bank of England style. Default Fan Chart Plot. By default, the fan function treats objects passed to the data argument as simulations from sequential distributions, rather than user-created values corresponding probabilities provided in the probs argument (as above). An alternative plot below, based on simulated data and default style settings in the fan function produces a fan chart with a greater array of coloured fans with labels and contour lines alongside selected percentiles of the future distribution. To illustrate we can simulate 10,000 values from the future split-normal distribution parameters from Q1 2013 in the boe0 data frame using the rsplitnorm function #simulate future values cpisim <- matrix(NA, nrow = 10000, ncol = k) for (i in 1:k) cpisim[, i] <- rsplitnorm(n=10000, mode = boe0$mode[i],
sd = boe0$uncertainty[i], skew = boe0$skew[i])

The fan chart based on the simulations in cpisim can then be added to the plot;

# truncate cpi series
cpi0 <- ts(cpi[time(cpi)<2013], start=start(cpi),
frequency=frequency(cpi) )

# past data
plot(cpi0, type = "l", lwd = 2,
xlim = c(y0 - 5, y0 + 3.25), ylim = c(-2, 7))

fan(data = cpisim, start = y0, frequency = 4)

The fan function calculates the values of 100 equally spaced percentiles of each future distribution when the default data.type = "simulations" is set. This allows 50 fans to be plotted from the heat.colours colour palate, providing a finer level of shading in the representation of future distributions. In addition, lines and labels are provided along each decile. The fan chart does not connect to the last observation as anchor = NULL by default.

## Does specification matter? Experiments with simple multiregional probabilistic population projections.

A paper that I am a co-author on, looking at uncertainty in population forecasting generated by different measures of migration, came out this week in Environment and Planning A. Basically, try and avoid using net migration measures. Not only do they tend to give some dodgy projections, we also found out that they give you more uncertainty. Using in and out measures of migration in a projection model give a big reduction in uncertainty over a net measure. They also are a fairly good approximation to the uncertainty from a full multiregional projection model. Plots in the paper were done by my good-self using the fanplot package.

Publication Details:

Raymer J., Abel, G.J. and Rogers, A. (2012). Does Speci cation Matter? Experiments with Simple Multiregional Probabilistic Population Projections. Environment and Planning A 44 (11), 2664–2686.

Population projection models that introduce uncertainty are a growing subset of projection models in general. In this paper we focus on the importance of decisions made with regard to the model specifications adopted. We compare the forecasts and prediction intervals associated with four simple regional population projection models: an overall growth rate model, a component model with net migration, a component model with in-migration and out-migration rates, and a multiregional model with destination-specific out-migration rates. Vector autoregressive models are used to forecast future rates of growth, birth, death, net migration, in-migration and out-migration, and destination-specific out-migration for the North, Midlands, and South regions in England. They are also used to forecast different international migration measures. The base data represent a time series of annual data provided by the Office for National Statistics from 1976 to 2008. The results illustrate how both the forecasted subpopulation totals and the corresponding prediction intervals differ for the multiregional model in comparison to other simpler models, as well as for different assumptions about international migration. The paper ends with a discussion of our results and possible directions for future research.