Monday, 26 November 2018

Dashed line in pysvg

I recently wrote a blog post on using pysvg to make a picture of chromsomes.

Now I wanted to figure out how to make a dashed line using pysvg.

The library seems to be missing a dedicated function for this, but I was able to do it with:

# define the dashed line style
    linestyle_dict = {'stroke-width':2, 'stroke':'black','stroke-dasharray': "5, 5"}
    linestyle =
    myline = shape_builder.createLine(0,0,300,300)

If you want to change the lengths of the dashes you can change for example 'stroke-dasharray': "5, 5" to 'stroke-dasharray': "10, 10".

BWA aligner

The BWA aligner for short read alignment is by Heng Li and Richard Durbin.

BWA mem
On the BWA website, it explains the the 'bwa mem' algorithm is the best one for '70bp or longer Illumina, 454, Ion Torrent and Sanger reads, assembly contigs and BAC sequences.'

I've previously used it something like this:
bwa mem genome.fasta reads_fastq.gz mates_fastq.gz | samtools view -b -S -F 4 - > output.bam
   run 'bwa mem' to align reads and their mates to a genome, and pipe the output through samtools to just take the reads that align (using -F4) and discarding reads that don't align (to save disk space), and put the output in output.bam

Other options
Other options available in BWA include:
bwa sampe: generated paired ended alignments (This takes an option -a <max_insert_size>)

Friday, 23 November 2018

Adjusting text style in pysvg

I wrote a previous blogpost on making a plot using pysvg.

Now I want to do something a bit more tricky, to adjust the font styles.
I found a tutorial online, but the commands in it don't seem to work with the current pysvg that's installed on the Sanger farm.
However, I found another file on github that gave me some hints...

Adjusting font style and alignment with pysvg
You need something like this in the Python:

# define the text style
    style =
    style.style_dict["alignment-baseline"] = "middle" # there doesn't seem to be a setAlignmentBaseline function

Then you can add text using something like this:
# add some text to our svg picture for this chromosome:
        t=pysvg.text.Text(mychr, x = mychrlen*xscale + 30, y = chrcnt*yscale + 30)

Wednesday, 21 November 2018

Jupyter notebooks using Python

I wanted to figure out how to run Python in a Jupyter notebook, which is like the newer version of iPython notebooks. I found quite a nice tutorial on getting started on Python Jupyter notebooks here.

The first step is to install Jupyter.
(Note to self: I found it is installed on the pcs5 at Sanger, so I can start it up by 'ssh -Y pcs5', and then..)

The second step is to start up a Jupyter notebook session:
(Note to self: I did this in /nfs/users/nfs_a/alc/Documents/git/Python)
% jupyter notebook
(Note to self: jupyter is installed with python2 software on the Sanger farm, so seems to assume Python2.)
This opens up a web browser showing the files in the directory where I started jupyter.
You can then start a new notebook by going to the 'New' menu on the right, and choosing 'python2'.

You can give your notebook a title by typing in the 'Untitled' box at the top, for example, you could rename it 'plotting_chrs', and then it would be saved as file plotting_chrs.ipynb
Then you can start typing in the prompt line.
To get a line to be executed, you can press Shift+Return.

Here I made a jupiter notebook to make a SVG plot of chromosomes (see my previous blogpost), and display the SVG inline:

(Note to self: the whole of the svg image from pysvg doesn't seem to be displayed inline, I'm not sure why this is - I tried another svg and it worked fine.)

To shut down the notebook, you can click on 'Close and Halt' in the File menu. To start again you can type
% jupyter notebook plotting_chrs.ipynb 
to open this notebook again.

Later note (May 2019):
- I wanted to install jupyter notebook on a laptop running Windows. To do this, I first installed the latest Python (v 3.7.3) using a Window installer for Python from the Python website. Then I used 'cmd' in Windows to open a command prompt and within the command prompt I typed:
> py -m pip install --upgrade pip
> py -m pip install jupyter
This installed Jupyter notebook [Note: to install another module, such as 'pandas', you can type 'py -m pip install pandas].
To then run a certain Jupyter notebook such as my.notebook, you can type:
> jupyter notebook C:\\mypath\Docs\my.notebook
This should open the notebook my.notebook in a web browser.
You can get a certain command to run in the Jupyter notebook by pressing shift+return.

Plotting chromosomes using Python pysvg

To draw an svg image of chromosomes, here is a little example I made using pysvg (below).

I found some nice examples for using pysvg here.
Here is another old example (seems to use an old version of pysvg, that doesn't work now) by Noel O'Blog.

It can be run using python2 (not python3):
% python /nfs/users/nfs_a/alc/Documents/git/Python/ emu_chrs.svg
% convert emu_chrs.svg emu_chrs.png

The output looks like (as png):

You can edit it manually in the SVG editor Inkscape (Note to self: do 'ssh -Y pcs5' and then 'inkscape emu_chrs.svg').

The code is:

import sys
import os
import pysvg.structure
import pysvg.text

# Note: this is to make a svg plot with the E. multilocularis chromosomes,
# plotted one under the other, scaled according to their lengths, and
# with a different colour for each chromosome, and a label for each
# chromosome.
# Note 2: this script must be run with python2 (not python3).


def define_input_data():
    # define the input data, as a list of E. multilocularis chromosomes, and
    # a dictionary with lengths of E. multilocularis chromosomes, and a dictionary
    # with the colours to plot E. multilocularis chromosomes:

    chrlist = ["chr1", "chr2", "chr3"]
    chrlengths = { "chr1": 2000000, "chr2": 5000000, "chr3": 4500000}
    chrcols = { "chr1": "rgb(255,0,153)", "chr2": "blue", "chr3": "purple"}
    # Note: there is a list of RGB values for colours here:

    return(chrlist, chrlengths, chrcols)


# make the plot of the input data:

def plot_input_data(chrlist, chrlengths, chrcols, output_svg_file):

    # initialise our svg picture  
    svg_document = pysvg.structure.Svg()
    shape_builder =

    # plot each chromosome, one under the other:
    chrcnt = 0
    yscale = 100
    xscale = 0.00004
    for mychr in chrlist:
        # find its length:

        assert(mychr in chrlist)
        mychrlen = chrlengths[mychr]
        # find the colour to plot it:
        assert(mychr in chrcols)
        mychrcol = chrcols[mychr]
        # add a rectangle to our svg picture for this chromosome:
        # createRect takes x, y, width, height
        myrectwidth = "%dpx" % (mychrlen*xscale + 20)
        svg_document.addElement(shape_builder.createRect(0, chrcnt*yscale, myrectwidth, "50px", strokewidth = 1, stroke = "black", fill = mychrcol))
        # add some text to our svg picture for this chromosome:
        svg_document.addElement(pysvg.text.Text(mychr, x = mychrlen*xscale + 30, y = chrcnt*yscale + 30))
        # add one to the chromosome count:
        chrcnt += 1

    # save the svg picture to an output file



def main():
    # check the command-line arguments:
    if len(sys.argv) != 2:
        print("Usage: %s output_svg_file" % sys.argv[0])
    output_svg_file = sys.argv[1]

    # define the input data: (note: you could read this in from an input file)
    (chrlist, chrlengths, chrcols) = define_input_data()

    # make the plot of the input data:
    plot_input_data(chrlist, chrlengths, chrcols, output_svg_file)


if __name__=="__main__":


Wednesday, 14 November 2018

I-TASSER for protein structure and function prediction

I've recently heard about the I-TASSER program for protein structure and function prediction.

Notes to self:
- there is a script on the Sanger farm for running it
-  I-TASSER and its scripts currently live here:
- Users need the library location:
- Sample command: -pkgdir /software/pathogen/external/apps/usr/local/itasser-5.0 -libdir /lustre/scratch118/infgen/pathogen/pathpipe/itasser -java_home /usr -seqname example -datadir example  -outdir itasser_out
- If users run it with the -light option that will limit the simulation time to 5-6 hours which means they can get a smallish protein (<100aa) through in approx 8 hours.

Thanks to 
Victoria Offord
Arporn Wangwiwatsin

Tuesday, 13 November 2018

HHpred for finding remote homologues

I recently came across HHpred (described in Soding et al 2005), a program for looking for remote homologues by using HMMs. A nice use case of this is in the paper by Valentim et al 2013, where they used HHpred to predict that the Schistosoma mansoni protein that is involved in response to the drug oxamniquine has structural similarity to sulfotransferases.