Thursday 28 February 2013

Using artemis to view annotations

Starting Artemis
I am using Artemis on a server called pcs4 at sanger. To run Artemis,  I need to log into pcs4:
% ssh -Y pcs4

 Then start Artemis:
% art
This brings up the Artemis menu:

Opening a fasta file in Artemis
Then to open a file in Artemis, you can go to File menu, and choose 'Open', and choose the name of fasta file or embl file for your sequence. This could be a fasta or embl file for an assembly that you want to look at.

   Alternatively, you can start Artemis by telling it the name of your fasta or embl file on the command-line, eg.
% art PTRK.v1.fa
where 'PTRK.v1.fa' is a fasta file for the assembly that I want to look at.

   This brings up a window with one scaffold from my assembly displayed, and a list of all the other scaffolds below:

Opening an embl file in Artemis
If you have gene predictions stored in an embl file, you can bring it up by typing:
% art PTRK.contig.00392.62747.embl
where PTRK.contig.00392.62747.embl is the name of the embl file.

This brings up a nice artemis window:
% art PTRK.contig.00392.62747.embl

Loading annotations from a gff file in Artemis
You can also load up an embl file and gff files at once eg.
% art PTRK.contig.00392.62747.embl + PTRK.contig.00392.62747.features.gff
where  PTRK.contig.00392.62747.features.gff has some additional features that you want to display.
Note that the gff file has to just be for the same scaffold/chromosome as the embl file (it can't contain multiple scaffolds).
     If you load a gff file into Artemis, it might not show the intron between the separate exons in a gene, so it can be hard to see which exons belong to which gene. As a result, it's a better idea to convert  your gff files to embl files (eg. using my script, and load them into Artemis.

Viewing mapped RNA-seq reads in Artemis
I have a bam file of mapped RNA-seq reads, mapped to my genome using TopHat. I can load the bam file into Artemis by going to the Artemis 'File' menu and selecting 'Read BAM/VCF' and selecting the bam file. Note that the bam file needs to be sorted and indexed using Samtools (ie. you need to have created a .bai file for the bam file, as described here). The bam file can contain more scaffolds than are present in the embl file that you opened (ie. contain the scaffold in the embl file, plus additional scaffolds).

    The default view of the mapped reads is 'stack' view:


Note that reads shown in blue are unique reads, whereas reads in green are 'duplicated' reads that have been mapped to exactly the same position on the reference sequence. To save space, if there are duplicated reads, only one is shown by Artemis.

If you want to filter the reads that are shown from the bam file, you can right-click, and choose 'Filter reads', and you can choose to only take reads with above a certain mapping quality.

You can change this to a sliding-window plot of coverage, by right-clicking on the display of the mapped reads, and selecting 'Views' -> 'Coverage':

1 comment:

mun said...

Hi, nice blog!

Was trying out your script to convert gff to embl format but got an error:

ERROR: test_read_gene_positions: failed test1

How can I fix this?