**An approximate large-sample confidence interval for the population mean**

If we have a large random sample of size

*n*from a population, we can calculate an

__approximate__confidence interval for the population mean using the following R function:

> largeSampleConfidenceInterval <- function(sampleMean, sampleSd, n, confidenceLevel)

{

alpha <- 1 - (confidenceLevel/100)

error <- qnorm(1-(alpha/2))*(sampleSd)/sqrt(n)

upperLimit <- sampleMean + error

lowerLimit <- sampleMean - error

print(paste(confidenceLevel,"% confidence interval: (",lowerLimit,",",upperLimit,")"))

}

For example, say we want a 95% confidence interval for the mean, when our sample mean is 2.45, our sample standard deviation is 2.03, and our sample size is 621:

> largeSampleConfidenceInterval(2.45, 2.03, 621, 95)

[1] "95 % confidence interval: ( 2.29033918977288 , 2.60966081022712 )"

This tells that an approximate 95% confidence interval for the mean is (2.29, 2.61).

Note that this approach of calculating an approximate confidence intervals is only valid when the sample size is large.

**An approximate large-sample confidence interval for a proportion**

If we have observed

*x*successes in

*n*independent data points (Bernoulli trials), for example, if we observe 20% of 110 schoolchildren to be red-haired, we can calculate an

__approximate__confidence interval for the true proportion of successes using this R function:

> largeSampleConfidenceIntervalForProportion <- function(observedP, trials, confidenceLevel)

{

alpha <- 1 - (confidenceLevel/100)

error <- qnorm(1-(alpha/2))*sqrt(observedP*(1-observedP)/trials)

upperLimit <- observedP + error

lowerLimit <- observedP - error

print(paste(confidenceLevel,"% confidence interval: (",lowerLimit,",",upperLimit,")"))

}

For example, if we observe 20% of 110 schoolchildren to be red-haired children, we can calculate a 90% confidence interval for the true proportion of children who are red-haired by typing:

> largeSampleConfidenceIntervalForProportion(0.20, 110, 90)

[1] "90 % confidence interval: ( 0.137267744076674 , 0.262732255923326 )"

This gives an approximate 90% confidence interval of (0.14, 0.26).

Again, this approach is only valid for large samples.

Note that you can also use the R prop.test() function to calculate a confidence interval for a proportion, but it will give a slightly different confidence interval, as it uses a slightly different method:

> prop.test(x=0.20*110,n=110,conf.level=0.90)

90 percent confidence interval:

0.1408916 0.2745358

In fact, there are many ways to calculate confidence intervals for proportions, some of these are implemented in the R package PropCIs.

**An approximate large-sample confidence interval for the difference between two proportions**

If we observed a proportion

*p1*in a sample of size

*n1*, and a proportion

*p2*in a sample of size

*n2*, we can calculate an

__approximate__confidence interval for the difference between the two proportions using this R function:

> largeSampleConfidenceIntervalForDifferenceInProportions <- function(p1, n1, p2, n2, confidenceLevel)

{

alpha <- 1 - (confidenceLevel/100)

error <- qnorm(1-(alpha/2))*sqrt((p1*(1-p1)/n1) + (p2*(1-p2)/n2))

upperLimit <- (p1 - p2) + error

lowerLimit <- (p1 - p2) - error

print(paste(confidenceLevel,"% confidence interval: (",lowerLimit,",",upperLimit,")"))

}

For example, if we observe that 20% of a sample of 110 schoolchildren in Dublin are red-haired, and that 25% of a sample of 104 schoolchildren in Cork are red-haired, we can calculate a 95% confidence interval for the difference in proportions using:

> largeSampleConfidenceIntervalForDifferenceInProportions(0.20, 110, 0.25, 104, 95)

[1] "95 % confidence interval: ( -0.161862788606968 , 0.0618627886069677 )"

This gives us an approximate 95% confidence interval of (-0.16, 0.06).

Again, this approach is only valid for large samples.

We can actually do the same thing in R using prop.test():

> prop.test(x=c(0.20*110,0.25*104),n=c(110,104))

95 percent confidence interval:

-0.17121594 0.07121594

It doesn't give the same answer as our largeSampleConfidenceIntervalForDifferenceInProportions() function, because it is applying Yates' continuity correction. It will give the same answer if we turn off Yates' continuity correction (it is usually advisable to use Yates' correction, but just to show you):

> prop.test(x=c(0.20*110,0.25*104),n=c(110,104),correct=FALSE)

95 percent confidence interval:

-0.16186279 0.06186279

**An exact confidence interval for a Normal mean**

If we

__know that our data comes from a Normally distributed variable__, we can calculate an

__exact__confidence interval for the population mean using this R function:

> exactConfidenceIntervalForNormalMean <- function(sampleMean, sampleSd, n, confidenceLevel)

{

alpha <- 1 - (confidenceLevel/100)

error <- qt(1-(alpha/2),df=n-1)*(sampleSd)/sqrt(n)

upperLimit <- sampleMean + error

lowerLimit <- sampleMean - error

print(paste(confidenceLevel,"% confidence interval: (",lowerLimit,",",upperLimit,")"))

}

For example, if we observed a mean of 294.81 and a standard deviation of 1.77 based on a sample of 10 data points, we can calculate an exact 90% confidence interval for the population mean by typing:

> exactConfidenceIntervalForNormalMean(294.81, 1.77, 10, 90)

[1] "90 % confidence interval: ( 293.783964262636 , 295.836035737364 )"

This gives us an exact 90% confidence interval of (293.78, 295.84).

**An exact confidence interval for the difference between two Normal means**

If we have collected two samples for a variable that we know to be

__Normally distributed__, and we know that the samples have the

__same population variance__, then we can calculate an

__exact__confidence interval for the difference between the two population means by using this R function:

> exactConfidenceIntervalForDifferenceInNormalMeans <- function(sampleMean1, sampleSd1, n1, sampleMean2, sampleSd2, n2, confidenceLevel)

{

alpha <- 1 - (confidenceLevel/100)

pooledVariance <- ((n1-1)*(sampleSd1^2) + (n2-1)*(sampleSd2^2))/(n1 + n2 - 2)

pooledSd <- sqrt(pooledVariance)

qt <- qt(1-(alpha/2),df=n1+n2-2)

error <- qt(1-(alpha/2),df=n1+n2-2)*pooledSd*sqrt((1/n1)+(1/n2))

upperLimit <- (sampleMean1 - sampleMean2) + error

lowerLimit <- (sampleMean1 - sampleMean2) - error

print(paste(confidenceLevel,"% confidence interval:(",lowerLimit,",",upperLimit,")"))

}

For example, say our first sample had a sample mean of 1.692 and sample standard deviation of 0.518 and sample size of 27. And say our second sample had a sample mean of 2.307 and standard deviation of 0.665 and sample size of 23. Then we can calculate an exact 95% confidence interval for the difference between the means by typing:

> exactConfidenceIntervalForDifferenceInNormalMeans(1.692, 0.518, 27, 2.307, 0.665, 23, 95)

[1] "95 % confidence interval: ( -0.951573462687863 , -0.278426537312137 )"

That is, we get an exact 95% confidence interval of (-0.952, -0.278).

**An exact confidence interval for a proportion**

If the number of successes or failures is small, then we need to use exact methods to calculate a confidence interval for a proportion. An exact confidence interval for a proportion can be calculated based on the binomial distribution, using the exactci() function in the PropCIs R package. For example, if the observed proportion is 0/1000, we can calculate an exact 95% confidence interval using:

> library("PropCIs")

> exactci(0, 1000, 0.95)

95 percent confidence interval:

0.000000000 0.003682084

This tells us that a 95% confidence interval for the proportion is (0.0000, 0.0037), rounded to four decimal places.

**A confidence interval for the variance of a Normal distribution**

See my blog post on this topic.

**An exact confidence interval for a Poisson mean (lambda parameter)**

For example, if you observed 187 events over 365 days, the observed mean is 187/365 = approx. 0.512. You can calculate an exact 90% confidence interval for this Poisson mean by typing:

> library("exactci")

> poisson.exact(187, 365, conf.level=0.9)

Exact two-sided Poisson test (central method)

90 percent confidence interval:

0.4523004 0.5783757

That is, our confidence interval is (0.452, 0.578).

**An approximate large sample confidence interval for a Poisson mean (lambda parameter)**

> largeSampleConfidenceIntervalForPoissonMean <- function(observedMean, sampleSize, confidenceLevel)

{

alpha <- 1 - (confidenceLevel/100)

error <- qnorm(1-(alpha/2))*sqrt(observedMean/sampleSize)

upperLimit <- observedMean + error

lowerLimit <- observedMean - error

print(paste(confidenceLevel,"% confidence interval: (",lowerLimit,",",upperLimit,")"))

}

For example, if you observed 187 events over 365 days, the observed mean is 187/365 = approx. 0.512. You can calculaet an approximate 90% confidence interval by pasting the function above into R and typing:

> largeSampleConfidenceIntervalForPoissonMean(187/365, 365, 90)

[1] "90 % confidence interval: ( 0.450704013552185 , 0.57395352069439 )"

That is, our approximate 90% confidence interval is (0.451, 0.574).

## No comments:

Post a Comment