## Monday, 27 May 2013

### Probability distributions in R

Probability mass functions (p.m.f. s)
To calculate P(X = x) for a discrete distribution, you can use the probability mass function (p.m.f.).
In R, these usually have function names beginning with 'd'.

For example, to calculate the probability of obtaining 7 'Yes' responses and 3 'No' responses from a sample of 10 people questioned, if the probability of a 'Yes' is 1/3, we use the p.m.f. for a Binomial distribution:
> dbinom(7, size=10, prob=1/3)
 0.01625768

Probability density functions (p.d.f. s)
To calculate P(X <= x) for a continuous distribution, you can use the probability density function (p.d.f.). As for p.m.f.s, in R these usually have function names beginning with 'd'.

For example, for a Normal distribution with mean 100 and standard deviation 15 (variance 225), to calculate the probability of observing a score of 110 or higher, we type:
> 1 - pnorm(110, mean=100, sd=15)
0.2524925
We get the same answer by typing:
> pnorm(110, mean=100, sd=15, lower.tail=FALSE)
0.2524925

Cumulative distribution functions (c.d.f. s)
To calculate P(X <= x), we can use cumulative distribution functions.
In R, these usually have function names beginning with 'p'.

For example, for an exponential distribution with rate parameter 0.25, to calculate P(X <= 2), we type:
> pexp(2, rate = 0.25)
 0.3934693

We can use the cumulative distribution function to find the probability that an interval will lie in a given range, for example, to calculate P(5 <= X <= 10), we type (again using an exponential distribution):
> pexp(10, rate = 0.25) - pexp(5, rate = 0.25)
 0.2044198

Similarly, if the probability of getting one answer right by chance in a multiple choice exam is 1/5, the probability of getting ten or more answers right (out of twenty questions) by chance is (using a Binomial distribution):
> 1 - pbinom(9, size=20, prob=1/5)
0.002594827
We also get the same answer if we type:
> pbinom(9, size=20, prob=1/5, lower.tail=FALSE)
0.002594827

Another example is using a Geometric distribution to find the probability that you need to roll a die at at most 3 times to obtain a six:
> pgeom(2, prob=1/6)
0.4212963
[Note: the pgeom() function in R takes as its argument the number of failures before the first success.]