This is only of interest to Sanger users, as it's only available on the Sanger farm. The path-dev group (Jacquilline Keane's team) have made a script called annotate_bacteria for annotation of bacterial genomes. It is based on PROKKA and is tailored for bacteria, archaea and viruses. It works by taking an assembly as input and identifying ORFs with Prodigal, and predicting RNA genes using RNAmmer and Aragorn.
It then predicts the functions of ORFs, by running BLAST against a database of proteins from RefSeq and UniProt (by default these are bacterial, archaeal and viral proteins), and comparing to domain databases (PfamA, CDD), and also runs SignalP to predict signal peptides. It gives evidence codes on the description lines to give the sources of the functional annotations.
To run it you type: [on farm3]
% annotate_bacteria -a assembly.fa --dbdir /lustre/scratch108/pathogen/pathpipe/prokka --sample_name MyExample
where assembly.fa is your input assembly, /lustre/scratch108/pathogen/pathpipe/prokka is the directory with the sequence databases to run BLAST against, and MyExample is the label to give to the job.
The output appears in a subdirectory called 'annotation'. There is a file called MyExample.tbl that contains a summary of the annotation, eg.
1 1254 CDS
inference ab initio prediction:Prodigal:2.60
inference similar to AA sequence:UniProtKB:Q47899
inference protein motif:Pfam:PF01400.18
product Flavastacin precursor
product Astacin (Peptidase family M12A)
- This runs fine on farm3, but not on farm2.
- Prodigal does not seem to predict partial genes (lacking a start and/or stop codon).
- Contigs in your input assembly.fa that are <200 bp are discarded.
- The RNA gene prediction step takes a long time.
- If you want to run a particular version of interproscan, you can do this with the -e option, eg. -e /software/pathogen/external/apps/usr/local/iprscan-5.0.7/interproscan.sh