I have/will update this post as I expanded the fanplot package.

I managed to catch David Spiegelhalter’s Tails You Win on BBC iplayer last week. I missed it the first time round, only for my parents on my last visit home to tell me about a Statistician jumping out of a plane on TV. It was a great watch. Towards the end I spotted some fan charts used by the Bank of England to illustrate uncertainty in their forecasts, similar to this one:

Split Normal (Two-Piece) Normal Distribution.

The Bank of England produce fan charts of forecasts for CPI and GDP in their quarterly Inflation Reports. They also provide data, in the form of mean, uncertainty and a skewness parameters of a split-normal distribution that underlie their fan charts (The Bank of England predominately refer to the equivalent, re-parametrised, two-piece normal distribution). The probability density of the split-normal distribution is given by Julio (2007) as

where represents the mean parameter, and the two standard deviations and can be derived given the overall uncertainty parameter, and skewness parameters, , as;

As no split normal distribution existed in R, I added routines for a density, distribution and quantile function, plus a random generator, to a new version (2.1) of the fanplot package. I used the formula in Julio (2007) to code each of the three functions, and checked the results against those from the fan chart MATLAB code.

Fan Chart Plots for CPI.

Once I had the `qsplitnorm`

function working properly, producing the fan chart plot in R was pretty straight-forward. I added two data objects to the fanplot package to help readers reproduce my plots below. The first, ` cpi`

, is a time series object with past values of CPI index. The second, `boe`

, is a data frame with historical details on the split normal parameters for CPI inflation between Q1 2004 to Q4 2013 forecasts by the Bank of England.

> library("fanplot") > head(boe) time0 time mean uncertainty skew 1 2004 2004.00 1.34 0.2249 0 2 2004 2004.25 1.60 0.3149 0 3 2004 2004.50 1.60 0.3824 0 4 2004 2004.75 1.71 0.4274 0 5 2004 2005.00 1.77 0.4499 0 6 2004 2005.25 1.68 0.4761 0

The first column `time0`

refers to the base year of forecast, the second, `time`

indexes future projections, whilst the remaining three columns provide values for the corresponding projected mean (), uncertainty () and skew () parameters:

Users can replicate past Bank of England fan charts for a particular period after creating a matrix object that contains values on the split-normal quantile function for a set of user defined probabilities. For example, in the code below, a subset of the Bank of England future parameters of CPI published in Q1 2013 are first selected. Then a vector of probabilities related to the percentiles, that we ultimately would like to plot different shaded fans for, are created. Finally, in a `for`

loop the `qsplitnorm`

function, calculates the values for which the time-specific (`i`

) split-normal distribution will be less than or equal to the probabilities of `p`

.

# select relevant data y0 <- 2013 boe0 <- subset(boe, time0==y0) k <- nrow(boe0) # guess work to set percentiles the BOE are plotting p <- seq(0.05, 0.95, 0.05) p <- c(0.01, p, 0.99) # quantiles of split-normal distribution for each probability # (row) at each future time point (column) cpival <- matrix(NA, nrow = length(p), ncol = k) for (i in 1:k) cpival[, i] <- qsplitnorm(p, mean = boe0$mean[i], sd = boe0$uncertainty[i], skew = boe0$skew[i])

The new object `cpival`

contains the values evaluated from the `qsplitnorm `

function in 6 rows and 13 columns, where rows represent the probabilities used in the calculation `p`

and columns represent successive time periods.

The object `cpival`

can then used to add a fan chart to the active R graphic device. In the code below, the area of the plot is set up when plotting the past CPI data, contained in the time series object `cpi`

. The `xlim`

arguments are set to ensure space on the right hand side of the plotting area for the fan. Following the Bank of England style for plotting fan charts, the background for future values is set to a gray colour, y-axis are plotted on the right hand side, a horizontal line are added for the CPI target and a vertical line for the two-year ahead point.

# past data plot(cpi, type = "l", col = "tomato", lwd = 2, xlim = c(y0 - 5, y0 + 3), ylim = c(-2, 7), xaxt = "n", yaxt = "n", ylab="") # background rect(y0 - 0.25, par("usr")[3] - 1, y0 + 3, par("usr")[4] + 1, border = "gray90", col = "gray90") # add fan fan(data = cpival, data.type = "values", probs = p, start = y0, frequency = 4, anchor = cpi[time(cpi) == y0 - 0.25], fan.col = colorRampPalette(c("tomato", "gray90")), ln = NULL, rlab = NULL) # boe aesthetics axis(2, at = -2:7, las = 2, tcl = 0.5, labels = FALSE) axis(4, at = -2:7, las = 2, tcl = 0.5) axis(1, at = 2008:2016, tcl = 0.5) axis(1, at = seq(2008, 2016, 0.25), labels = FALSE, tcl = 0.2) abline(h = 2) #boe cpi target abline(v = y0 + 1.75, lty = 2) #2 year line

The fan chart itself is outputted from the `fan`

function, where arguments are set to ensure a close resemblance of the R plot to that produced by the Bank of England. The first three arguments in the `fan`

function called in the above code, provide the `cpival`

data to plotted, indicate that the data are a set of calculated values (as opposed to simulations) and provide the probabilities that correspond to each row of `cpival`

object. The next two arguments define the start time and frequency of the data. These operate in a similar fashion to those used when defining time series in R with the `ts`

function. The `anchor`

argument is set to the value of CPI before the start of the fan chart. This allows a join between the value of the Q1 2013 observation and the fan chart. The `fan.col`

argument is set to a colour palette for shades between `tomato`

and `gray90`

. The final two arguments are set to `NULL`

to suppress the plotting of contour lines at the boundary of each shaded fan and their labels, as per the Bank of England style.

Default Fan Chart Plot.

By default, the `fan`

function treats objects passed to the `data`

argument as simulations from sequential distributions, rather than user-created values corresponding probabilities provided in the `probs`

argument (as above). An alternative plot below, based on simulated data and default style settings in the `fan`

function produces a fan chart with a greater array of coloured fans with labels and contour lines alongside selected percentiles of the future distribution. To illustrate we can simulate 10,000 values from the future split-normal distribution parameters from Q1 2013 in the `boe0`

data frame using the `rsplitnorm `

function

#simulate future values cpisim <- matrix(NA, nrow = 10000, ncol = k) for (i in 1:k) cpisim[, i] <- rsplitnorm(n=10000, mean = boe0$mean[i], sd = boe0$uncertainty[i], skew = boe0$skew[i])

The fan chart based on the simulations in `cpisim`

can then be added to the plot;

# truncate cpi series cpi0 <- ts(cpi[time(cpi)<2013], start=start(cpi), frequency=frequency(cpi) ) # past data plot(cpi0, type = "l", lwd = 2, xlim = c(y0 - 5, y0 + 3.25), ylim = c(-2, 7)) # add fan fan(data = cpisim, start = y0, frequency = 4)

The `fan`

function calculates the values of 100 equally spaced percentiles of each future distribution when the default `data.type = "simulations"`

is set. This allows 50 fans to be plotted from the `heat.colours`

colour palate, providing a finer level of shading in the representation of future distributions. In addition, lines and labels are provided along each decile. The fan chart does not connect to the last observation as `anchor = NULL`

by default.

Great stuff, thanks a lot for sharing!

I was able to reproduce the graphs without too many problems. Comments:

1. the plus sign in front of xx[,t], presumably added automatically by the R or html editor, should be removed, as far as I could tell.

2. to match exactly the incomplete data for 2013 should be dropped, e.g. add this line: cpi <- na.omit(cpi).

3. Here's the data I used to replicate:

cpi

Qtr1 Qtr2 Qtr3 Qtr4

1997 1.9 1.6 2.0 1.9

1998 1.6 2.0 1.3 1.4

1999 1.4 1.3 1.2 1.2

2000 0.9 0.5 0.6 1.1

2001 0.8 1.7 1.8 0.8

2002 1.5 0.8 1.0 1.5

2003 1.6 1.3 1.4 1.3

2004 1.3 1.5 1.3 1.5

2005 1.7 1.9 2.4 2.1

2006 2.0 2.2 2.5 2.7

2007 2.8 2.5 1.8 2.1

2008 2.5 3.3 4.7 4.1

2009 3.2 2.2 1.6 1.9

2010 3.0 3.4 3.1 3.3

2011 4.4 4.5 4.5 4.8

2012 3.4 2.8 2.5 2.7

cpiboe

Year Quarter Mode Median Mean Uncertainty Skewness

1 2013 Q1 2.73 2.73 2.73 0.61 0

2 2013 Q2 2.92 2.92 2.92 0.88 0

3 2013 Q3 3.22 3.22 3.22 1.11 0

4 2013 Q4 3.13 3.13 3.13 1.27 0

5 2014 Q1 2.95 2.95 2.95 1.34 0

6 2014 Q2 2.82 2.82 2.82 1.37 0

7 2014 Q3 2.53 2.53 2.53 1.39 0

8 2014 Q4 2.41 2.41 2.41 1.42 0

9 2015 Q1 2.32 2.32 2.32 1.48 0

10 2015 Q2 2.23 2.23 2.23 1.49 0

11 2015 Q3 2.13 2.13 2.13 1.50 0

12 2015 Q4 2.01 2.01 2.01 1.52 0

13 2016 Q1 1.96 1.96 1.96 1.52 0

Thanks Patrick. You are correct (on all your points). I have updated the post to show all the data, hopefully making it easier to reproduce.

Thanks Guy, fantastic tool you have made available here. There are so many applications!

It would be an interesting exercise to take a look at old B of E fan charts and overlay the actual data on top of their prediction fans, and see whether the frequency is consistent with the split normal assumption used.

Thanks for sharing this post. I have been trying to replicate the graph, but as soon as i run the follwing command statement along with the qsplitnorm i get the an error:

> for (i in 1:k) {

+ xx[, i] 1 & skew < 1) stop("skew must be between -1 and 1") :

argument is of length zero

. I am not sure what am i missing. Would you be able to let me know if there is something very obvious that i am missing.

Hi Atmajit. Not to sure what it could be from the code you give. Perhaps it could be in the format of the Bank of England data or the column names in the cpiboe data.frame?

I just managed to replicate the example by copying the Bank of England data in the post below the lines…

> cpiboe <- read.csv("C:/Users/…/cpiboe.csv")

> cpiboe

into excel. Then I used the “Text to Columns” button in excel to get the data into separate columns (see point 3 here… http://stackoverflow.com/questions/14096692/getting-r-studio-lmer-output-to-word-excel/16033373#16033373). Then I saved the file as cpiboe.csv and adjusted the path name to read the csv file into R. The code should then run fine?

Not sure why time0 has nine replications?

Its just a variable in the boe object to indicate which quarter the foretasted mean, uncertainty and skewness are from. Its not used in the fan function at all, just to obtain the subset of the boe forecast data that (in the plot above) refer to the desired launch period (2013 Q1 above).

Just a little clarification. Two piece distributions (including the split-normal) were known long before Julio (2007) and John (1982). The two-piece normal seems to have been introduced by Fechner (1897). A nice compilation of the re-inventions of this sort of distributions is presented in:

http://projecteuclid.org/euclid.ss/1399645739

Also $\mu$ is not the mean, but a location parameter in the split-normal (two-piece normal).