## Wednesday, 10 April 2013

### Making maximum likelihood estimates of parameters using R

Making a maximum likelihood estimate of a binomial probability
To make a maximum likelihood estimate of a binomial probability you can use the mle2() function in the 'bbmle' R package.

For example, if you roll a particular (six-sided) die 10 times, and observe seven '5's, then you can estimate the probability of getting a '5' when you roll that die. Here our data is simply that we observe seven '5's:
> mydata <- c(7)
We just have one observation in our data set here.

Next we need to construct a negative log-likelihood function, as the mle2() R function (which we will use to calculate the maximum likelihood estimate, see below) requires a negative log-likelihood function as input, rather than a likelihood function.

Our negative log likelihood function will be minus the log of the probability of observing seven '5's out of 10 throws, according to a binomial distribution B(10, p) with a particular value of the binomial probability, p='prob':
> myfunc <- function(size,prob) {  -sum(dbinom(mydata,size,prob,log=TRUE))  }

Then we can call the 'mle2' function in the 'bbmle' package to get the maximum likelihood estimate of the probability of observing a '5' with our die, given that we observed seven '5's on 10 throws of the die:
> library('bbmle')
> mle2(myfunc, start=list(prob=0.5), data=list(size=10))
Coefficients:
prob
0.7000032
Log-likelihood: -1.32

This tells us that the maximum likelihood estimate of the binomial probability is 0.7. That is, this is our estimate of the probability of observing a '5' with our die. This makes sense, as we did observe '5's on 7 out of 10 throws.

The log likelihood is given as -1.32. This is the log of the probability of observing seven '5's on 10 throws of a die that has p=0.7:
> log(dbinom(7, size=10, prob=0.7))
 -1.321151

We can also make a plot of the likelihood function L(p) plotted against the possible values of the binomial probability p (Note: the 'curve' function requires the function in terms of x, so we use x here in R instead of p):
> curve(dbinom(7, 10, x))

We see that the peak is at about 0.7.

Note I have plotted the binomial probability mass function (p.m.f.) at each possible value of p, this is the likeihood function. Note that in 'myfunc' above, we used the negative log likeilhood function (because mle2() requires a negative log likelihood function rather than a likelihood function), which is -log of the binomial probability mass function:
> curve(-log(dbinom(7,10,x)))

The negative log likelihood function has its minimum where the likelihood function has its maximum, that is, at p = 0.7.

Making a maximum likelihood estimate of a geometric parameter
A geometric distribution is indexed by one parameter, p.  Say we observe three values, 3, 4, 8, that come from a geometric distribution with unknown value of the parameter p. These are the numbers of tries up to and including the first success.

The dgeom function in R assumes the data is in the form of the number of failures before the first success. Thus, the values given to dgeom should be 2, 3, and 7.

We can make a maximum likelihood estimate of the value of p by typing:
> mydata <- c(2, 3, 7)
> myfunc <- function(prob) {  -sum(dgeom(mydata,prob,log=TRUE))  }
> library('bbmle')
> mle2(myfunc, start=list(prob=0.5))
Coefficients:
prob
0.2000023
Log-likelihood: -7.51

This tells us that the maximum likelihood estimate of p is about 0.2.

We can also make a plot of the likelihood function L(p) plotted against the possible values of the geometric parameter p:
> xvalues <- seq(0.01,0.99,by=0.01)
> myfunc2 <- function(prob) {  prod(dgeom(mydata,prob))  }
> yvalues <- sapply(xvalues, myfunc2)
> plot(xvalues,yvalues,type="l")

The peak of the plot is at 0.2. This makes sense as the average of 3, 4, and 8 is 5, and 1/5 is 0.2.

Note I have plotted the products of the geometric probability mass function (p.m.f.) for the data at each possible value of p, which is the likelihood function. In 'myfunc' above, we used the negative log likelihood function, which is just the negative of the sum of the logs of the geometric probability mass functions.

Making a maximum likelihood estimate of a Poisson parameter
A Poisson distribution is indexed by one parameter, lambda. Say we have observed counts of some event with the following frequencies:
Count         Frequency
0                 58
1                 25
2                 13
3                 2
4                 2
5                 1
6                 1
7                 0
8                 1
>=9             0
Suppose the counts are observations from a Poisson distribution with mean lambda. We can make a maximum likelihood estimate of the value of lambda by typing:
> mydata <- c(rep(0,58),rep(1,25),rep(2,13),rep(3,2),rep(4,2),rep(5,1),rep(6,1),rep(8,1))
> myfunc <- function(lambda) {  -sum(dpois(mydata,lambda,log=TRUE))  }
> library('bbmle')
> mle2(myfunc, start=list(lambda=0.5))
lambda
0.8155337
Log-likelihood: -142.05

That is, the maximum likelihood estimate of lambda is 0.8155337, that is, about 0.816. This makes sense, as the sample mean is about 0.816:
> mean(mydata)
 0.815534

Making a maximum likelihood estimate of an exponential parameter
An exponential distribution has one parameter, theta. Say we observe the waiting times between successive occurrences of some event (eg. a town flooding) to be 5 years, 8 years, and 7 years. Suppose the time between occurrences of the event are observations from an exponential distribution with parameter theta. We can make a maximum likelihood estimate of theta by typing:

> mydata <- c(5,8,7)
> myfunc <- function(x) {  -sum(dexp(mydata,x,log=TRUE))  }
> library('bbmle')
> mle2(myfunc, start=list(x=0.5))
Coefficients:
x
0.15
Log-likelihood: -8.69

That is,  a maximum likelihood estimate of the value of theta is 0.15.

Making a maximum likelihood estimate of a Rayleigh parameter
The probability density function (p.d.f.) of a Rayleigh distribution is:

f(x; theta) =  (x/theta^2) * exp(-x^2/(2*theta^2)), where theta > 0

That is, the distribution has one parameter, theta. Say we observe a random sample of six observations of X: 22.2, 2.8, 4.0, 13.9, 11.7 and 8.3. We can make a maximum likelihood estimate of theta as follows:

> mypdf <- function(x, theta)
{
ifelse(theta > 0,  (x/(theta^2)) * exp(-(x*x)/(2*theta*theta)), NA)
}
> mydata <- c(22.2, 2.8, 4.0, 13.9, 11.7, 8.3)
> myfunc <- function(theta)
{
mylen <- length(mydata)
myprod <- mypdf(mydata, theta)
for (i in 2:mylen) { myprod <- myprod * mypdf(mydata[i], theta) }
mylogL <- -log(myprod)
return(mylogL)
}

Here the function mypdf() is for calculating the p.d.f., and the function myfunc() is for calculating the negative of the log likelihood. Now calculate the maximum likelihood estimate of theta:

> library('bbmle')
> mle2(myfunc, start=list(theta=8.5))
8.735013
Log-likelihood: -19.28

This tells us that the maximum likelihood estimate of theta is about 8.735.

Making a maximum likelihood estimate of a parameter in a model
Say we observe four different types of objects, 187 of object 1, 35 of object 2, 37 of object 3, and 31 of object 4.

And say we have some theory (model) that says that the four types of objects should occur with probabilities (9/16) + p, (3/16) - p, (3/16) - p, and (1/16) + p, respectively, where the value of parameter p is unknown.

Therefore, the likelihood function is:
L(p) = (((9/16) + p)^187) * (((3/16) - p)^35) * (((3/16) - p)^37) * (((1/16) + p)^31)
= (((9/16) + p)^187) * (((3/16) - p)^72) * (((1/16) + p)^31)

We can plot this function in R by typing (note that we use x instead of p in R, as the curve() function expects a function in terms of x):
> curve(  (((9/16)+x)^187) * (((3/16)-x)^72) * (((1/16)+x)^31)  )

We see that the function has a vertical asymptote at p=1. However, suppose we have some prior information that the value of p that we are interested in is just between 0 and 0.1. We can plot the function for this range:
> curve(  (((9/16)+x)^187) * (((3/16)-x)^72) * (((1/16)+x)^31)  , from = 0, to = 0.1)

We see that there is a local maximum at about 0.058 or 0.059.

To estimate the maximum likelihood value of p, we need to define the negative log likelihood function for just the range 0 to 0.1:
> myfunc <- function(p)
{
ifelse((0 <= p & p <= 0.1),-log((((9/16)+p)^187) * (((3/16)-p)^72) * (((1/16)+p)^31)),NA)
}
> mle2(myfunc, start=list(p=0.05))
Coefficients:
p
0.05838241
Log-likelihood: -302.01

This tells us that the maximum likelihood estimate of p is 0.05838241, that is, about 0.0584.

Note that if we try to estimate the maximum likelihood estimate of p by using the function:
-log((((9/16)+p)^187) * (((3/16)-p)^72) * (((1/16)+p)^31))
without restricting the range of p to 0-0.1, the mle2() function will give us an error message, because of the vertical asymptote at p=1.

Furthermore, if we give mle2() a starting value that is too far from 0.05838241, it will give an error message, eg.:
> mle2(myfunc, start=list(p=0))
Error in optim(par = 0, fn = function (p)  :
non-finite finite-difference value