Tuesday, 15 January 2019

Using RColorBrewer to choose colours for R plots

I wanted to make a plot in R where the colour of a line depends on the amount of data (amount of sequence reads) supporting it. To do this, I decided to try the RColorBrewer library. This is an R version of ColorBrewer, which can be used to choose beautiful colour schemes for plots.

First I loaded in the library into R (it had already been installed):
> library(RColorBrewer)

Then I wanted to find 4 colours between red and yellow: (following an example from here)
> pal(4)
[1] "#FF0000" "#FF5500" "#FFAA00" "#FFFF00"
I decided to use a darker yellow instead of "#FFFF00": #ccc943

Now assign a colour to each of my lines (the start point of which I had stored in a vector 'insertion_start') according to the frequency of data (amount of sequence reads) supporting it (stored in vector 'reads'):
> mycolors <- character(length(insertion_start))
> for (i in 1:length(mycolors))
     myreads <- num_reads[i]
     if (myreads <= 10) { mycolor <- "#CCC943"}
     else if (myreads <= 100) { mycolor <- "#FFAA00"}
     else if (myreads <= 500) { mycolor <- "#FF5500"}
     else                     { mycolor <- "#FF0000"}
     mycolors[i] <- mycolor

Now try plotting with the lines supported by the most reads at the bottom:
> oo <- order(num_reads,decreasing=TRUE)
> plot(insertion_start[oo],yval,ylab="Allele index",xlab="Spanning deletions",col="white")
> segments(insertion_start[oo],yval,insertion_end[oo],yval,col=mycolors[oo],lwd=2)

Et voila, a beautiful plot!
Thank you RColorBrewer : )

Thursday, 20 December 2018

CRAM to BAM conversion

CRAM is a compressed version of the BAM/SAM formats.
To convert a CRAM file to BAM, you can use the SCRAMBLE conversion tool.
For example:
% /software/sciops/pkg/staden_io_bin/1.14.9/bin/scramble -I cram -O bam 27541_1#9.cram > 27541_1#9.bam
where /software/sciops/pkg/staden_io_bin/1.14.9/bin/scramble is where SCRAMBLE is installed,
27541_1#9.cram is the input CRAM file,
27541_1#9.bam is the output BAM file.

Monday, 26 November 2018

Dashed line in pysvg

I recently wrote a blog post on using pysvg to make a picture of chromsomes.

Now I wanted to figure out how to make a dashed line using pysvg.

The library seems to be missing a dedicated function for this, but I was able to do it with:

# define the dashed line style
    linestyle_dict = {'stroke-width':2, 'stroke':'black','stroke-dasharray': "5, 5"}
    linestyle = pysvg.builders.StyleBuilder(linestyle_dict)
    myline = shape_builder.createLine(0,0,300,300)

If you want to change the lengths of the dashes you can change for example 'stroke-dasharray': "5, 5" to 'stroke-dasharray': "10, 10".

BWA aligner

The BWA aligner for short read alignment is by Heng Li and Richard Durbin.

BWA mem
On the BWA website, it explains the the 'bwa mem' algorithm is the best one for '70bp or longer Illumina, 454, Ion Torrent and Sanger reads, assembly contigs and BAC sequences.'

I've previously used it something like this:
bwa mem genome.fasta reads_fastq.gz mates_fastq.gz | samtools view -b -S -F 4 - > output.bam
   run 'bwa mem' to align reads and their mates to a genome, and pipe the output through samtools to just take the reads that align (using -F4) and discarding reads that don't align (to save disk space), and put the output in output.bam

Other options
Other options available in BWA include:
bwa sampe: generated paired ended alignments (This takes an option -a <max_insert_size>)

Friday, 23 November 2018

Adjusting text style in pysvg

I wrote a previous blogpost on making a plot using pysvg.

Now I want to do something a bit more tricky, to adjust the font styles.
I found a tutorial online, but the commands in it don't seem to work with the current pysvg that's installed on the Sanger farm.
However, I found another file on github that gave me some hints...

Adjusting font style and alignment with pysvg
You need something like this in the Python:

# define the text style
    style = pysvg.builders.StyleBuilder()
    style.style_dict["alignment-baseline"] = "middle" # there doesn't seem to be a setAlignmentBaseline function

Then you can add text using something like this:
# add some text to our svg picture for this chromosome:
        t=pysvg.text.Text(mychr, x = mychrlen*xscale + 30, y = chrcnt*yscale + 30)

Wednesday, 21 November 2018

Jupyter notebooks using Python

I wanted to figure out how to run Python in a Jupyter notebook, which is like the newer version of iPython notebooks. I found quite a nice tutorial on getting started on Python Jupyter notebooks here.

The first step is to install Jupyter.
(Note to self: I found it is installed on the pcs5 at Sanger, so I can start it up by 'ssh -Y pcs5', and then..)

The second step is to start up a Jupyter notebook session:
(Note to self: I did this in /nfs/users/nfs_a/alc/Documents/git/Python)
% jupyter notebook
(Note to self: jupyter is installed with python2 software on the Sanger farm, so seems to assume Python2.)
This opens up a web browser showing the files in the directory where I started jupyter.
You can then start a new notebook by going to the 'New' menu on the right, and choosing 'python2'.

You can give your notebook a title by typing in the 'Untitled' box at the top, for example, you could rename it 'plotting_chrs', and then it would be saved as file plotting_chrs.ipynb
Then you can start typing in the prompt line.
To get a line to be executed, you can press Shift+Return.

Here I made a jupiter notebook to make a SVG plot of chromosomes (see my previous blogpost), and display the SVG inline:

(Note to self: the whole of the svg image from pysvg doesn't seem to be displayed inline, I'm not sure why this is - I tried another svg and it worked fine.)

To shut down the notebook, you can click on 'Close and Halt' in the File menu. To start again you can type
% jupyter notebook plotting_chrs.ipynb 
to open this notebook again.

Plotting chromosomes using Python pysvg

To draw an svg image of chromosomes, here is a little example I made using pysvg (below).

I found some nice examples for using pysvg here.
Here is another old example (seems to use an old version of pysvg, that doesn't work now) by Noel O'Blog.

It can be run using python2 (not python3):
% python /nfs/users/nfs_a/alc/Documents/git/Python/plot_emu_chrs.py emu_chrs.svg
% convert emu_chrs.svg emu_chrs.png

The output looks like (as png):

You can edit it manually in the SVG editor Inkscape (Note to self: do 'ssh -Y pcs5' and then 'inkscape emu_chrs.svg').

The code is:

import sys
import os
import pysvg.structure
import pysvg.builders
import pysvg.text

# Note: this is to make a svg plot with the E. multilocularis chromosomes,
# plotted one under the other, scaled according to their lengths, and
# with a different colour for each chromosome, and a label for each
# chromosome.
# Note 2: this script must be run with python2 (not python3).


def define_input_data():
    # define the input data, as a list of E. multilocularis chromosomes, and
    # a dictionary with lengths of E. multilocularis chromosomes, and a dictionary
    # with the colours to plot E. multilocularis chromosomes:

    chrlist = ["chr1", "chr2", "chr3"]
    chrlengths = { "chr1": 2000000, "chr2": 5000000, "chr3": 4500000}
    chrcols = { "chr1": "rgb(255,0,153)", "chr2": "blue", "chr3": "purple"}
    # Note: there is a list of RGB values for colours here: http://access-excel.tips/vba-excel-rgb-property-and-get-rgb-color/

    return(chrlist, chrlengths, chrcols)


# make the plot of the input data:

def plot_input_data(chrlist, chrlengths, chrcols, output_svg_file):

    # initialise our svg picture  
    svg_document = pysvg.structure.Svg()
    shape_builder = pysvg.builders.ShapeBuilder()

    # plot each chromosome, one under the other:
    chrcnt = 0
    yscale = 100
    xscale = 0.00004
    for mychr in chrlist:
        # find its length:

        assert(mychr in chrlist)
        mychrlen = chrlengths[mychr]
        # find the colour to plot it:
        assert(mychr in chrcols)
        mychrcol = chrcols[mychr]
        # add a rectangle to our svg picture for this chromosome:
        # createRect takes x, y, width, height
        myrectwidth = "%dpx" % (mychrlen*xscale + 20)
        svg_document.addElement(shape_builder.createRect(0, chrcnt*yscale, myrectwidth, "50px", strokewidth = 1, stroke = "black", fill = mychrcol))
        # add some text to our svg picture for this chromosome:
        svg_document.addElement(pysvg.text.Text(mychr, x = mychrlen*xscale + 30, y = chrcnt*yscale + 30))
        # add one to the chromosome count:
        chrcnt += 1

    # save the svg picture to an output file



def main():
    # check the command-line arguments:
    if len(sys.argv) != 2:
        print("Usage: %s output_svg_file" % sys.argv[0])
    output_svg_file = sys.argv[1]

    # define the input data: (note: you could read this in from an input file)
    (chrlist, chrlengths, chrcols) = define_input_data()

    # make the plot of the input data:
    plot_input_data(chrlist, chrlengths, chrcols, output_svg_file)


if __name__=="__main__":