Wednesday, 13 March 2013

Two-sample tests in R

Here are some nice two-sample hypothesis tests that we can do in R:

Testing the difference between two proportions:
To test the difference between two proportions (Bernoulli probabilities) you can use the following function, which does a Z-test:
> testDifferenceBetweenProportions <- function(x1, n1, x2, n2)
   {
         require("TeachingDemos")
         d <- (x1/n1) - (x2/n2)
         estp <- (x1+x2)/(n1+n2)
         estvar <- estp*(1-estp)*( (1/n1) + (1/n2))
         TeachingDemos::z.test(d, sd=sqrt(estvar))
   }
For example, to test the difference between 125/198 and 173/323:
> testDifferenceBetweenProportions(125,198,173,323)
z = 2.1431, n = 1.000, Std. Dev. = 0.045, Std. Dev. of the sample mean = 0.045, p-value = 0.0321
This tells us that the p-value is 0.0321. Therefore, there is moderate evidence against the null hypothesis that the underlying proportions are equal.

Testing the difference between the means of two samples (two-sample t-test):
If we have samples from two different populations, and can safely assume that the variation in each population is adequately modelled by a normal distribution and that the population variances are equal, then we can use a two-sample t-test to test the hypothesis that the means of the two populations are equal.

We can do a two-sample t-test in R by typing:
> t.test(x1, x2, var.equal=TRUE)
where x1 and x2 are your two variables, and 'var.equal=TRUE' tells R that we are assuming that the variances of the two populations are equal, and so the pooled variance is used to estimate the variance.

Note that you can probably be fairly safe in assuming that the population variances are equal if the ratio between the sample variances is less than 3. Using the t-test that assumes equal variances gives you a more powerful test than the default t-test in R (ie. t.test() without 'var.equal=TRUE'), which doesn't assume equal variances.

Note: if you only have summary statistics (mean, standard deviation, sample size) for each of your two samples, you can use the t.test2() function contributed to stackexchange.

No comments: