**Making a Normal probability plot**

To investigate whether a Normal distribution is a appropriate for modelling your data, you can make a Normal probability plot. To make a Normal probability plot of your data, you can use the function NormalProbPlot():

> NormalProbPlot <- function(x)

{

oo <- order(x)

length <- length(x)

quantiles <- seq(1/(1+length),1-(1/(1+length)),1/(1+length))

normvals <- qnorm(quantiles)

plot(x[oo], normvals, xlab="Data", ylab="yi", pch=20)

}

For example, we can use it as follows:

> x <- c(4, 0, -12, -18, 4, 12, -6, -16)

> NormalProbPlot(x)

For the model to be a good fit for the data, the points on a Normal probability plot should lie close to a line (it should in theory pass through the origin, but in practice can pass nearby). In this case, the data don't really lie on a straight line, so it is not very convincing that a Normal distribution is appropriate for modelling these data. However, the sample size is small, and the evidence against a Normal distribution isn't very strong either.

Let's try using a random sample of 5000 drawn from a Normal distribution:

> x <- rnorm(5000)

> NormalProbPlot(x)

We see a nice straight line.

Now let's try using a random sample of 5000 drawn from a continuous uniform distribution:

> x <- runif(5000)

> NormalProbPlot(x)

We see that the plot differs from a straight line. The pattern is characteristic of a distribution with tails that are too 'light' compared to a Normal distribution.

Let's try sampling 5000 points from an exponential distribution:

> x <- rexp(5000)

> NormalProbPlot(x)

This is clearly not a straight line. In fact, this curve is typical of what you see when you make a Normal probability plot for a very right-skewed data sample, like one originating from an exponential distribution.

Note that another way of making a Normal probability plot in R is to use the qqnorm() and qqline() functions:

> qqnorm(x)

> qqline(x)

Note that this plot shows the quantiles of the sample data on the y-axis and the quantiles of a theoretical Normal distribution on the x-axis, which is the opposite of the plot above, although it is the exact same data.

In fact, people often make their plot this way; you can also do it using this function:

> NormalProbPlot2 <- function(x)

{

oo <- order(x)

length <- length(x)

quantiles <- seq(1/(1+length),1-(1/(1+length)),1/(1+length))

normvals <- qnorm(quantiles)

sortedx <- x[oo]

plot(normvals, sortedx, xlab="yi", ylab="Data", pch=20)

}

> NormalProbPlot2(x)

**A half-Normal plot**

Another type of Normal plot is a 'half-Normal plot', which consists of the negative half of the Normal probability plot superimposed on the positive half:

> HalfNormalProbPlot <- function(x)

{

x <- c(abs(x),-abs(x))

oo <- order(x)

length <- length(x)

quantiles <- seq(1/(1+length),1-(1/(1+length)),1/(1+length))

normvals <- qnorm(quantiles)

sortedx <- x[oo]

plot(normvals[normvals>0], sortedx[normvals>0], xlab="yi", ylab="Data", pch=20)

}

> HalfNormalProbPlot(x)

**Making an exponential probability plot**

Similarly, to investigate whether an exponential distribution is appropriate for modelling your data, you can make an exponential probability plot. This can be done using the ExpProbPlot function:

> ExpProbPlot <- function(x)

{

oo <- order(x)

length <- length(x)

quantiles <- seq(1/(1+length),1-(1/(1+length)),1/(1+length))

expvals <- qexp(quantiles)

plot(x[oo], expvals, xlab="Data", ylab="yi", pch=20)

}

For example, we can use it as follows:

> x <- c(841, 158, 146, 45, 34, 122, 151, 281, 435, 737, 585, 888, 264, 1902,

696, 295, 563, 722, 77, 711, 47, 403, 195, 760, 320, 461, 41, 1337, 336, 1355,

455, 37, 668, 41, 557, 100, 305, 377, 568, 140, 781, 204, 437, 31, 385, 130, 10,

210, 600, 84, 833, 329, 247, 1618, 639, 938, 736, 39, 366, 93, 83, 221)

> ExpProbPlot(x)

For the model to be a good fit for the data, the points on an exponential probability plot should lie close to a line through the origin. In this case the data do lie approximately along a straight line through the origin, so it seems that an exponential distribution is a plausible model for the data.

## 1 comment:

Post a Comment