Wednesday 24 April 2013

Chi-squared goodness-of-fit tests using R

By default, the chisq.test() function in R assumes that the degrees of freedom of your chi-squared goodness-of-fit test should be equal to (the number of categories in the data) - 1.

For example, say we have the following observed and expected values in 12 different categories:

> obs <- c(57,203,383,525,532,408,273,139,49,27,10,6)
> exp <- c(54.10,209.75,406.61,525.47,509.31,394.92,255.19,141.34,68.50,29.51,11.44,5.86)

We can carry out a chi-squared goodness-of-fit test using R:
> chisq.test(obs,p=exp,rescale.p=TRUE)
X-squared = 10.419, df = 11, p-value = 0.4931

This assumes the degrees of freedom is 11, and gives a p-value of 0.4931.

However, is our degrees of freedom 11? In some cases, we may have used a model to calculate the expected values (eg. a Poisson model, or some other probability model), and we may have estimated one or more parameters of the model from the data. We need to take the number of parameters estimated from the data into account when calculating the degrees of freedom. For a chi-squared goodness-of-fit test, the degrees of freedom should be equal to (the number of categories in the data) - (the number of parameters estimated from the data) - 1.

So, for example, if we estimated one parameter from the data in order to calculate the expected values in our example above, the degrees of freedom would be 12 - 1 - 1 = 10. But the chisq.test function assumes our degrees of freedom is 11! In this case you can use the following function to carry out your chi-squared goodness-of-fit test:

> chiSquaredTestOfGoodnessOfFit <- function(exp, obs, estimatedParameters)
  {
      chiSquared1 <- chisq.test(obs, p=exp, rescale.p=TRUE)
      chiSquared2 <- chiSquared1$statistic
      chiSquared <- chiSquared2[[1]]
      degreesOfFreedom <- length(obs) - estimatedParameters - 1
      pvalue <- 1 - pchisq(chiSquared, df = degreesOfFreedom)
      print(paste("Chi-squared=",chiSquared,", p-value=",pvalue,", df=",degreesOfFreedom))
   }

For example, for our example data, this gives:

> chiSquaredTestOfGoodnessOfFit(exp, obs, 1)
[1] "Chi-squared= 10.4189993163364 , p-value= 0.404533217496095 , df= 10"

This tells us that the degrees of freedom is 10 (as desired), and the p-value is 0.405.

No comments: