Wednesday 11 December 2019

Statistical rethinking: grid approximation for estimating a posterior

My colleague James Cotton recommended that a good way to expand my stats knowledge is to read the book Statistcal Rethinking by Richard McElreath, which so far I'm enjoying a lot!

A nice aspect of the book is that he uses many examples in R. 

Grid approximation for estimating a posterior distribution
In chapter 2, he shows how to estimate a posterior distribution using a 'grid approximation' approach. I've wrapped up his lines of code into a little function that lets you try out different number of grid points.

In this case, we are interested in calculating the posterior distribution for p, the probability of success for a binomial variable.

We can specifiy different prior distributions for p.

Let's assume that we have carried out 9 Bernoulli trials and observed 6 successes so far. 

Let's try a uniform prior:

plot_posterior_using_uniform_prior <- function(num_points_in_grid)
{
    p_grid <- seq(from = 0, to=1, length.out=num_points_in_grid)
    prior <- rep(1,num_points_in_grid)
    likelihood <- dbinom(6, size=9, prob=p_grid)
    ustd.posterior <- likelihood * prior
    posterior <- ustd.posterior/sum(ustd.posterior)
    plot(p_grid, posterior, type="b", xlab="probability of success", ylab="posterior probability for p")
}

Do a plot with a grid with 30 values for p:
plot_posterior_using_uniform_prior(num_points_in_grid=30) 


Let's try 10 values for p:
plot_posterior_using_uniform_prior(num_points_in_grid=10)
This doesn't give us such a good estimate of the posterior distribution for p:




What if we had a different prior, that was a step function, assigning 0 probability to all values of p less than 0.5:


plot_posterior_using_step_prior <- function(num_points_in_grid)
{
     p_grid <- seq(from = 0, to=1, length.out=num_points_in_grid) 
     prior <- ifelse ( p_grid < 0.5, 0, 1)
     likelihood <- dbinom(6, size=9, prob=p_grid) 
     ustd.posterior <- likelihood * prior 
     posterior <- ustd.posterior/sum(ustd.posterior) 
     plot(p_grid, posterior, type="b", xlab="probability of success", ylab="posterior probability for p")
}

plot_posterior_using_step_prior(num_points_in_grid=30)
 
 
Quadratic approximation for estimating a posterior distribution 
Chapter 2 of Statistical Rethinking also explains that the grid approximation approach doesn't scale well and is slow, and that a faster, more scaleable approach is quadratic approximation. McElreath shows how it can be used to estimate the posterior distribution for the same parameter p, as above: (see his book for how to install the 'rethinking' R package):

library(rethinking)
globe.qa <- quap(alist(W ~ dbinom(W+L,p), p ~ dunif(0,1)), data=list(W=6,L=3)
+ )
precis(globe.qa)
  mean   sd 5.5% 94.5%
p 0.67 0.16 0.42  0.92
 
This tells us that, using a uniform prior and assuming the posterior for p is Gaussian (the key assumption of the quadratic approximation approach), its maximum (peak) is at 0.67 and its standard deviation is 0.16.
 
This is similar to the peak of the posterior distribution for p that we estimated using the grid approximation approach above (when using a large number of grid points such as 30).