Thursday, 20 December 2012

Using the R ggplot2 library compare two variables

I was recently discussing with a colleague about how to use the R ggplot2 library to make plots to compare two variables (both of which refer to the same set of individuals), if one of the variables has error-bars, and the other variable does not. For example, the first variable could be height for a set of individuals, and the second variable be weight for those same individuals, and we may only have error bars for the weights.

For example, say you have a file 'data.txt' that contains height and weight values for three individuals (g1, g2, g3):
Individual variable value
g1 HEIGHT 22.5
g1 WEIGHT 25.0
g2 HEIGHT 50.5
g2 WEIGHT 55.5
g3 HEIGHT 1.5
g3 WEIGHT 15

We may also have error bars for the weights, that is, we may have estimated the standard errors for the weights to be 1, 8 and 15 for individuals g1, g2, and g3 respectively.

One way to do make a plot of this data is to plot bar charts for height and weight side-by-side using ggplot2, showing error bars for weight:
> MyData <- read.table("data.txt",header=TRUE)
> library("ggplot2")
> ggplot(data=MyData, aes(x=Individual, y=value, fill=variable)) + geom_bar(stat="identity", position=position_dodge())
(Note: I learnt how to do this from a nice ggplot2 tutorial.)


We can add error bars to show the standard errors by typing:
> weight <- MyData$value[MyData$variable=="WEIGHT"]
> se <- c(1, 8, 15)
> upper <- (weight + se)
> lower <- (weight - se)
> upper2 <- numeric(2*length(upper))
> lower2 <- numeric(2*length(lower))
> for (i in 1:length(upper2))
   {
         if (i %% 2 == 0) { upper2[i] <- upper[i/2] }
         else                     { upper2[i] <- "NA"        }
   }
> for (i in 1:length(lower2))
   {
         if (i %% 2 == 0) { lower2[i] <- lower[i/2] }
         else                     { lower2[i] <- "NA"         }

   }
> ggplot(data=MyData, aes(x=Individual, y=value, fill=variable)) + geom_bar(stat="identity", position=position_dodge()) + geom_errorbar(aes(ymax=as.numeric(upper2),ymin=as.numeric(lower2)), position=position_dodge(0.9),width=0.25)
(I learnt how to do this from this nice webpage.)


Another way to do this is to make a back-to-back bar chart using ggplot2:
> MyData <- read.table("data.txt",header=TRUE)
> library("ggplot2")
> ggplot(MyData,
   aes(Individual)) + geom_bar(subset = .(variable == "WEIGHT"), aes(y = value, fill = variable),     
   stat = "identity") + geom_bar(subset = .(variable == "HEIGHT"), aes(y = -value, fill = variable),   
   stat = "identity") + xlab("") + scale_y_continuous("HEIGHT - WEIGHT")
> se <- c(1, 8, 15)
> weight <- MyData$value[MyData$variable=="WEIGHT"]
> upper <- (weight + se)
> lower <- (weight - se)
> upper2 <- rep(upper, each=2)
> lower2 <- rep(lower, each=2)
> limits <- aes(ymax=upper2,ymin=lower2)
> last_plot() + geom_errorbar(limits, position="identity", width=0.25)
(Note: I did this using R version 2.11.1 and gglplot2 0.8.8.)

A third type of plot that you might want to make is a scatterplot, with error-bars for weight:
> se <- c(1, 8, 15)
> weight <- MyData$value[MyData$variable=="WEIGHT"]
> height <- MyData$value[MyData$variable=="HEIGHT"]
> upper <- (weight + se)
> lower <- (weight - se)
> ggplot(MyData, aes(x=height, y=weight)) + geom_errorbar(aes(ymax=upper,ymin=lower), width=0.25, position="identity") + geom_line(position="identity") + geom_point(position="identity")
(Again, I got some nice tips from this webpage.)

Thanks to my colleague Anna Protasio for introducing me to ggplot2!

No comments: