Monday, 26 January 2015

Using ggplot2 to plot boxplots in R

I love ggplot2! Here is a nice boxplot I made today, showing labels for the outliers:


> library(ggplot2)
> var1 <- c(1.06,1.06,1.19,1.28,1.11,1.16,1.04,1.21,1.27,1.41,1.09,1.10,1.04,1.41,1.07,1.16,1.09,1.11)
> var2 <- c(1.14,1.14,1.11,1.13,1.12,1.17,1.16,1.13,1.08,1.21,1.57,1.09)
> var3 <- c(1.13,1.05,1.03,1.04,1.10,1.04,1.14,1.15,1.00,1.08,1.07,1.07,1.08,1.03,1.09,1.07,1.33,1.07,1.08,1.09,1.03,1.05)
> var4 <- c(1.04,1.08,1.12,1.07,1.07,1.09,1.04)
> var5 <- c(1.03,1.04,1.02,1.04,1.04,1.04,1.04,1.04,1.05,1.05,1.06,1.05,1.08,1.10,1.07,1.00,1.18,1.05,1.03,1.11,1.53,1.05,1.08,1.08,1.04,1.06,1.05,1.05,1.04,1.03,1.07,1.41,1.04)
> myvalues <- c(var1,var2,var3,var4,var5)
> mynames <- c( rep('var1',length(var1)), rep('var2',length(var2)), rep('var3', length(var3)), rep('var4', length(var4)), rep('var5', length(var5)) )  

We only want to label outliers:
> mylabels <- c('\n','\n','\n','\n','\n','\n','\n','\n','\n','A','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n',
'\n','\n','\n','\n','\n','B','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','C','\n','\n',
'\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n',
'D','\n','\n','\n','E','\n','\n','\n','\n','\n','\n','\n','\n','\n','\n','F','\n') 

Make the plot:
> mydata <- data.frame(myvalues, mynames)
> myplot <- ggplot(data = mydata, aes(factor(mynames), myvalues))
> myplot + geom_boxplot(outlier.size = 2, fill="red") + ylab("My values") + xlab("My variable") + geom_text(label=mylabels,size=3,hjust=1.5,vjust=1.3)
# outlier.size=2 makes a bigger dot for the outliers, label hjust and vjust adjust the label position

To change the order of boxes along the x-axis:
> myxorder <- factor(mydata$mynames, levels=c("var5","var3","var1","var2","var4"))
> myplot <- ggplot(data = mydata, aes(myxorder, myvalues))
> myplot + geom_boxplot(outlier.size = 2, fill="red") + ylab("My values") + xlab("My variable") + geom_text(label=mylabels,size=3,hjust=1.5,vjust=1.3)


 

Monday, 19 January 2015

Using the R ggplot2 package to make a multiple line plot

Here's how I made a multiple line plot using the lovely ggplot2 package:
[note to self: need to do 'ssh -Y' to the farm, to be able to see plots]

> library(ggplot2) # load library
# enter my data, and make a data frame
> var1 <- c(4.5,2.3,2.4,2.1,2.2)
> var2 <- c(33,22,13,23,14)
> var3 <- c(234,234,23,23,1)
> myvalues <- c(var1,var2,var3)
> myx <- rep(c(50,40,30,20,10),3) # the x axis labels
> myvarname <- rep(c("my var1", "my var2", "my var3"),each=5)
> mydata <- data.frame(myx, myvalues, myvarname)

# plot the data:

> myplot <- ggplot(data = mydata, aes(x=myx, y=myvalues)) + geom_line(aes(colour=myvarname),size=2) # size=2 makes a thicker line
> myplot + ylab("Average number") + xlab("Length threshold (kb)")




















Adding a vertical line:
Here's how to add a dashed verticle line at x=40 to the plot:
> myplot <- ggplot(data = mydata, aes(x=myx, y=myvalues)) + geom_line(aes(colour=myvarname),size=2) + geom_vline(xintercept=40,linetype=2) # size=2 makes a thicker line
> myplot + ylab("Average number") + xlab("Length threshold (kb)")

Friday, 16 January 2015

Over-riding installed versions of a python module

I needed to use a local version of a python module AvrilFastaUtils.py that I had edited, rather than the one that is installed system-wide on our compute cluster. To do this, I had to force the python script to use the local version rather than the one installed on the computer cluster.

Here's how to do it within a python script that uses the AvrilFastaUtils.py module:

# prepend the path to the local version of AvrilFastaUtils.py to the PYTHONPATH
sys.path = ["/nfs/users/nfs_a/alc/Documents/git/helminth_scripts_python/lib"] + sys.path
# double-check that you've typed it correctly:
assert os.path.isdir(sys.path[0]) 
# now import the local version of the module:

import AvrilFastaUtils 

Thanks to Noel O'Boyle for helping with this!