Friday 31 July 2015

Calculating a confidence interval for the variance of a Normal distribution using R

Here are two ways to calculate a confidence interval for the variance of a Normal distribution using R. As an example, I've created some fake data, by simulating 100 data points from a standard Normal distribution:
> mydata <- rnorm(100)
As it's a standard Normal distribution, the population variance is 1.
> var(mydata)
[1] 1.091236
The sample variance is s^2 = 1.091236.

1. Using the R 'lavaan' package:
I found out about this from stackexchange.com. This uses the lavaan R package.
> install.packages("lavaan")
> library(lavaan)
> mydataframe <- as.data.frame(mydata)
> names(mydataframe) <- "x"
> mymodel <- 'x ~~ x'
> fit <- sem(mymodel, data=mydataframe, likelihood = "wishart")
> parameterEstimates(fit)
  lhs op rhs   est    se     z pvalue ci.lower ci.upper
1   x ~~   x 1.091 0.155 7.036      0    0.787    1.395

This gives us the confidence interval (0.787, 1.395) for the population variance. I assume this is a 95% confidence interval.

Note added some days later: I asked on the lavaan googlegroup, and they said that lavaan is not using the chi-squared distribution to calculate a confidence interval, it is using a Z-statistic. They said that as a result it does not calculate a very reliable confidence interval, and that this is discussed further here .

2. Using the chi-squared distribution
We can use the chi-squared distribution to provide a confidence interval for the variance of a Normal distribution. I'm not sure if this is what the 'lavaan' package does already, I don't think so however, as my results don't match those of the lavaan package (see below). As we have 100 data points (n=100), we use a chi-squared distribution with n-1=99 degrees of freedom. As we want a 95% confidence interval (alpha=0.05), we want the 0.025 (=0.05/2) and 0.975 (=1 - (0.05/2)) quantiles of the chi-squared distribution with 99 degrees of freedom:
> qchisq(0.025, df=99)
[1] 73.36108
> qchisq(.975, df=99) 
[1] 128.422
Then our confidence interval is given by ( (n-1)s^2 / 128.422, (n-1)s^2 / 73.36108), where n is our sample size of 100, and s^2 is our sample variance 1.091236:
> (99*1.091236)/128.422
[1] 0.8412294
> (99*1.091236)/73.36108
[1] 1.472611
That is, using the chi-squared distribution I get a 95% confidence interval of (0.841, 1.473) for the population variance.

Let's get a 99% confidence interval:
>  (99*1.091236)/qchisq(.995, df=99)
[1] 0.7772852
>  (99*1.091236)/qchisq(.005, df=99)
[1] 1.6243
That is, a 99% confidence interval is (0.777, 1.624).

No comments: