Monday, 26 November 2018
Dashed line in pysvg
Now I wanted to figure out how to make a dashed line using pysvg.
The library seems to be missing a dedicated function for this, but I was able to do it with:
# define the dashed line style
linestyle_dict = {'stroke-width':2, 'stroke':'black','stroke-dasharray': "5, 5"}
linestyle = pysvg.builders.StyleBuilder(linestyle_dict)
myline = shape_builder.createLine(0,0,300,300)
myline.set_style(linestyle.getStyle())
svg_document.addElement(myline)
If you want to change the lengths of the dashes you can change for example 'stroke-dasharray': "5, 5" to 'stroke-dasharray': "10, 10".
BWA aligner
BWA mem
On the BWA website, it explains the the 'bwa mem' algorithm is the best one for '70bp or longer Illumina, 454, Ion Torrent and Sanger reads, assembly contigs and BAC sequences.'
I've previously used it something like this:
bwa mem genome.fasta reads_fastq.gz mates_fastq.gz | samtools view -b -S -F 4 - > output.bam
run 'bwa mem' to align reads and their mates to a genome, and pipe the output through samtools to just take the reads that align (using -F4) and discarding reads that don't align (to save disk space), and put the output in output.bam
Other options
Other options available in BWA include:
bwa sampe: generated paired ended alignments (This takes an option -a <max_insert_size>)
Friday, 23 November 2018
Adjusting text style in pysvg
Now I want to do something a bit more tricky, to adjust the font styles.
I found a tutorial online, but the commands in it don't seem to work with the current pysvg that's installed on the Sanger farm.
However, I found another file on github that gave me some hints...
Adjusting font style and alignment with pysvg
You need something like this in the Python:
# define the text style
style = pysvg.builders.StyleBuilder()
style.setFontFamily(fontfamily="Arial")
style.setFontSize('1em')
style.setFilling("blue")
style.setTextAnchor("start")
style.style_dict["alignment-baseline"] = "middle" # there doesn't seem to be a setAlignmentBaseline function
Then you can add text using something like this:
# add some text to our svg picture for this chromosome:
t=pysvg.text.Text(mychr, x = mychrlen*xscale + 30, y = chrcnt*yscale + 30)
t.set_style(style.getStyle())
svg_document.addElement(t)
Wednesday, 21 November 2018
Jupyter notebooks using Python
The first step is to install Jupyter.
(Note to self: I found it is installed on the pcs5 at Sanger, so I can start it up by 'ssh -Y pcs5', and then..)
The second step is to start up a Jupyter notebook session:
(Note to self: I did this in /nfs/users/nfs_a/alc/Documents/git/Python)
% jupyter notebook
(Note to self: jupyter is installed with python2 software on the Sanger farm, so seems to assume Python2.)
This opens up a web browser showing the files in the directory where I started jupyter.
You can then start a new notebook by going to the 'New' menu on the right, and choosing 'python2'.
You can give your notebook a title by typing in the 'Untitled' box at the top, for example, you could rename it 'plotting_chrs', and then it would be saved as file plotting_chrs.ipynb
Then you can start typing in the prompt line.
To get a line to be executed, you can press Shift+Return.
Here I made a jupiter notebook to make a SVG plot of chromosomes (see my previous blogpost), and display the SVG inline:
(Note to self: the whole of the svg image from pysvg doesn't seem to be displayed inline, I'm not sure why this is - I tried another svg and it worked fine.)
To shut down the notebook, you can click on 'Close and Halt' in the File menu. To start again you can type
% jupyter notebook plotting_chrs.ipynb
to open this notebook again.
Later note (May 2019):
- I wanted to install jupyter notebook on a laptop running Windows. To do this, I first installed the latest Python (v 3.7.3) using a Window installer for Python from the Python website. Then I used 'cmd' in Windows to open a command prompt and within the command prompt I typed:
> py -m pip install --upgrade pip
> py -m pip install jupyter
This installed Jupyter notebook [Note: to install another module, such as 'pandas', you can type 'py -m pip install pandas].
To then run a certain Jupyter notebook such as my.notebook, you can type:
> jupyter notebook C:\\mypath\Docs\my.notebook
This should open the notebook my.notebook in a web browser.
You can get a certain command to run in the Jupyter notebook by pressing shift+return.
- I wanted to install jupyter notebook on a Mac laptop. I already had Anaconda installed, so typed:
Plotting chromosomes using Python pysvg
I found some nice examples for using pysvg here.
Here is another old example (seems to use an old version of pysvg, that doesn't work now) by Noel O'Blog.
It can be run using python2 (not python3):
% python /nfs/users/nfs_a/alc/Documents/git/Python/plot_emu_chrs.py emu_chrs.svg
% convert emu_chrs.svg emu_chrs.png
The output looks like (as png):
You can edit it manually in the SVG editor Inkscape (Note to self: do 'ssh -Y pcs5' and then 'inkscape emu_chrs.svg').
The code is:
import sys
import os
import pysvg.structure
import pysvg.builders
import pysvg.text
# Note: this is to make a svg plot with the E. multilocularis chromosomes,
# plotted one under the other, scaled according to their lengths, and
# with a different colour for each chromosome, and a label for each
# chromosome.
# Note 2: this script must be run with python2 (not python3).
#====================================================================#
def define_input_data():
# define the input data, as a list of E. multilocularis chromosomes, and
# a dictionary with lengths of E. multilocularis chromosomes, and a dictionary
# with the colours to plot E. multilocularis chromosomes:
chrlist = ["chr1", "chr2", "chr3"]
chrlengths = { "chr1": 2000000, "chr2": 5000000, "chr3": 4500000}
chrcols = { "chr1": "rgb(255,0,153)", "chr2": "blue", "chr3": "purple"}
# Note: there is a list of RGB values for colours here: http://access-excel.tips/vba-excel-rgb-property-and-get-rgb-color/
return(chrlist, chrlengths, chrcols)
#====================================================================#
# make the plot of the input data:
def plot_input_data(chrlist, chrlengths, chrcols, output_svg_file):
# initialise our svg picture
svg_document = pysvg.structure.Svg()
shape_builder = pysvg.builders.ShapeBuilder()
# plot each chromosome, one under the other:
chrcnt = 0
yscale = 100
xscale = 0.00004
for mychr in chrlist:
# find its length:
assert(mychr in chrlist)
mychrlen = chrlengths[mychr]
# find the colour to plot it:
assert(mychr in chrcols)
mychrcol = chrcols[mychr]
# add a rectangle to our svg picture for this chromosome:
# createRect takes x, y, width, height
myrectwidth = "%dpx" % (mychrlen*xscale + 20)
svg_document.addElement(shape_builder.createRect(0, chrcnt*yscale, myrectwidth, "50px", strokewidth = 1, stroke = "black", fill = mychrcol))
# add some text to our svg picture for this chromosome:
svg_document.addElement(pysvg.text.Text(mychr, x = mychrlen*xscale + 30, y = chrcnt*yscale + 30))
# add one to the chromosome count:
chrcnt += 1
# save the svg picture to an output file
svg_document.save(output_svg_file)
return()
#====================================================================#
def main():
# check the command-line arguments:
if len(sys.argv) != 2:
print("Usage: %s output_svg_file" % sys.argv[0])
sys.exit(1)
output_svg_file = sys.argv[1]
# define the input data: (note: you could read this in from an input file)
(chrlist, chrlengths, chrcols) = define_input_data()
# make the plot of the input data:
plot_input_data(chrlist, chrlengths, chrcols, output_svg_file)
#====================================================================#
if __name__=="__main__":
main()
#====================================================================#
Wednesday, 14 November 2018
I-TASSER for protein structure and function prediction
Notes to self:
- there is a script runI-TASSER.pl on the Sanger farm for running it
- I-TASSER and its scripts currently live here:
/software/pathogen/external/apps/usr/local/itasser-5.0
- Users need the library location:
/lustre/scratch118/infgen/pathogen/pathpipe/itasser
- Sample command:
runI-TASSER.pl -pkgdir /software/pathogen/external/apps/usr/local/itasser-5.0 -libdir /lustre/scratch118/infgen/pathogen/pathpipe/itasser -java_home /usr -seqname example -datadir example -outdir itasser_out
- If users run it with the -light option that will limit the simulation time to 5-6 hours which means they can get a smallish protein (<100aa) through in approx 8 hours.
Thanks to
Victoria Offord
Arporn Wangwiwatsin