Friday, 3 November 2017

A stacked barplot in R

I wanted to make a stacked barplot in R. My input data looked like this:

plate barcode reads
1 1 3232
1 2 32232
1 4 23232
2 1 23322
2 2 2323
2 3 4343
2 4 23432

I wanted to make a barplot showing the 96 barcodes adjacent to each other, and for each barcode, a stack showing the number of reads for plate 1, 2, 3.

Getting the data into R (painful!)
The problem was getting my data into R. The input data did not have values for every plate-barcode combination, but I wanted to assume a value of 0 for combinations that were not in the input file. In the end I had to write some code to squeeze the data into R:

# input data has columns plate, barcode, number of input reads
MyData <- read.table("reads_in_inputs",header=TRUE)

plate <- MyData$plate
barcode <- MyData$barcode
reads <- MyData$reads

# put the input data in a matrix, for use in the barplot() command.
# The matrix will have three rows (plate 1,2,3) and 96 columns (barcode 1..96):
mymatrix <- matrix(, nrow=3, ncol=96)

for (platenum in 1:3)

   for (barcodenum in 1:96)
      # find the index (if any) in vector 'reads' for plate 'platenum' and barcode 'barcodenum'.
      value <- intersect(which(plate==platenum),which(barcode==barcodenum))
      if (length(value > 0))
         # get the number of reads for plate 'platenum' and barcode 'barcodenum' from vector reads:
         mymatrix[platenum,barcodenum] <- reads[value] / 1e+3 # in thousands of reads
         mymatrix[platenum,barcodenum] <- 0

Plotting the data in R (ok!)
Plotting the data was not so hard. I used the example from to make a stacked barplot:
colnames <- seq(1,96)
rownames <- seq(1,3)
colnames(mymatrix) <- colnames
rownames(mymatrix) <- rownames

barplot(mymatrix, col=colors()[c(23,89,12)], border="white", space=0.04, font.axis=2, xlab="barcode", ylab="thousands of input reads", legend=rownames(mymatrix)) 

A little bit of the plot:

Some other little tricks I learnt:
To put some space around the plot I can type before the 'barplot' command:
par( mar=c(8, 4.7, 2.3, 0)) # last value is space on RHS, second last value is space at top, 2nd value is space on LHS, 1st value is space below  

In the barplot command itself:
border="white": use white for the border of the bars
space=0.04 : leaves space before each bar. cex.names=0.5 
makes the x-axis labels smallerlas=3 makes the labels perpendicular to the axis

No comments: