Tuesday 17 December 2013

Ensembl API workshop

I attended a great Ensembl API workshop last week in the University of Cambridge, and learnt loads of things about the Ensembl API.

The course was divided up into different sections, on the different parts of the Ensembl API (core api, variation, comparative genomics, functional genomics, etc.). The instructors set us lots of nice exercises, and I've included my answers to the exercises below.

Ensembl Compara (Comparative Genomics)
This part of the course was taught by Matthieu Muffato and Stephen Fitzgerald, whose course notes are here:
Matthieu Muffato: course notes
Stephen Fitzgerald: course notes

1) Print the sequence of the [Compara] Member corresponding to SwissProt protein O93279: exercise1a_compara.pl
2) Find and print the sequence of all the peptide Members corresponding to the human protein-coding gene(s) FRAS1: exercise2a_compara.pl
3) Get the multiple alignment corresponding to the family (a 'family' can contain both UniProt and Ensembl members) with the stable id ENSFM00250000006121: exercise3_compara.pl
4) Get the families that the human gene ENSG00000139618 belongs to, and print out their members (note: a 'family' can contain both UniProt and Ensembl members): exercise4_compara.pl
5) Print the protein tree with the stable id ENSGT00390000003602 (note: a 'tree' can only contain Ensembl members, not UniProt members): exercise5_compara.pl
6) Print all the members of the tree containing the human ncRNA gene ENSG00000238344: exercise6_compara.pl
7) Get all the homologues for the human gene ENSG00000229314: exercise7_compara.pl
8) Count the number of one-to-one orthologues between human and mouse: exercise8_compara.pl

Making a plot of a tree:
The script in exercise 5 above extracts the tree in several formats, the last of which is called 'display_label_composite' NHX format by Compara:
If you put this into a file (eg. tree.nj), then you can make a picture of the tree using Li Heng's NJTREE software, which you can download from sourceforge, by typing, for example:
% ~alc/Documents/bin/treebest/treebest export -f 8 tree.nj > tree.eps
Here -f8 sets the font size to be 8 in the image. Here's the picture:

It doesn't show the duplication and speciation nodes in different colours, but that's ok.
[Note to self: it's possible to make a PNG that has the duplication and speciation nodes in different colours by using Li Heng's Perl script. You need to copy the tree.nj file to /nfs/users/nfs_a/alc/Documents/bin/njtree_plot, then type:
% perl nhxplot.pl tree.nj > tree.png
This gives:

You can see the duplication nodes in red and speciation nodes in blue. Very nice! ]

Ensembl Compara Perl API documentation
The documentation for the Ensembl Compara Perl API is at http://www.ensembl.org/info/docs/api/index.html
To see the documentation for an old version of the API, eg. for Ensembl 75, replace the 'www' in the address by 'e75' for example: http://e75.ensembl.org/info/docs/api/index.html

Tuesday 10 December 2013

Creating a password-protected zip file using 7-zip

If you want to share data with collaborators, you might want to put a password-protected zip file on the web for them to download.

If you have data in a Linux directory mydir, you can create a tar file of that directory using:
% tar cvf mydir.tar mydir

If you have 7-zip installed, you can then create a password-protected zip version of mydir.tar by typing:
% 7za a -tzip -pMYPASSWORD -mem=AES256 mydir_secure.zip mydir.tar
where the password is set to 'MYPASSWORD'

If you collaborator has 7-zip installed on a Linux machine, the zip file can be unzipped using:
% 7za e secure.zip
This will ask for the password to be entered.

I haven't tested it, but I'm guessing that your collaborator could also unzip the file using 7-zip on a Windows machine.

Thanks to this handy webpage for information: http://xmodulo.com/2013/09/how-to-create-encrypted-zip-file-on-linux.html