I've been predicting genes in bacterial assemblies using Prokka.
The Prokka software has been described in this paper by Seemann (2014).
Prokka predicts protein-coding genes, ribosomal RNA (rRNA) genes, transfer RNA (tRNA) genes, signal leader peptides, and non-coding RNA (ncRNA) genes. Prokka provides an annotation for each predicted gene by finding its best match in large databases such as UniProt and RefSeq and Pfam.
It's very easy to use:
% prokka --outdir myout input.fasta
where --outdir points to the directory where you want output to go (e.g. 'myout'),
input.fasta is the input assembly file.
The output directory outdir will have a .gff file with the output gene predictions from Prokka.
This will have lines looking like this:
##gff-version 3
##sequence-region NZ_LT906614.1 1 2961182
##sequence-region NZ_LT906615.1 1 1072319
NZ_LT906614.1 Prodigal:002006 CDS 372 806 . - 0 ID=BEDIDOIH_00001;Name=mioC;db_xref=COG:COG0716;gene=mioC;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:P03817;locus_tag=BEDIDOIH_00001;product=Protein MioC
NZ_LT906614.1 Prodigal:002006 CDS 816 2177 . - 0 ID=BEDIDOIH_00002;eC_number=3.6.-.-;Name=mnmE;db_xref=COG:COG0486;gene=mnmE;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:P25522;locus_tag=BEDIDOIH_00002;product=tRNA modification GTPase MnmE
NZ_LT906614.1 Prodigal:002006 CDS 2271 3896 . - 0 ID=BEDIDOIH_00003;Name=yidC;gene=yidC;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:Q1R4M9;locus_tag=BEDIDOIH_00003;product=Membrane protein insertase YidC
NZ_LT906614.1 Prodigal:002006 CDS 4123 4446 . - 0 ID=BEDIDOIH_00004;eC_number=3.1.26.5;Name=rnpA;db_xref=COG:COG0594;gene=rnpA;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:P0A7Y8;locus_tag=BEDIDOIH_00004;product=Ribonuclease P protein component
NZ_LT906614.1 Prodigal:002006 CDS 4492 4629 . - 0 ID=BEDIDOIH_00005;inference=ab initio prediction:Prodigal:002006;locus_tag=BEDIDOIH_00005;product=hypothetical protein
NZ_LT906614.1 Prodigal:002006 CDS 4871 5608 . - 0 ID=BEDIDOIH_00006;eC_number=3.6.3.-;Name=yxeO;db_xref=COG:COG1126;gene=yxeO;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:P54954;locus_tag=BEDIDOIH_00006;product=putative amino-acid import ATP-binding protein YxeO
NZ_LT906614.1 Prodigal:002006 CDS 5605 6276 . - 0 ID=BEDIDOIH_00007;Name=yxeN;gene=yxeN;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:P54953;locus_tag=BEDIDOIH_00007;product=putative amino-acid permease protein YxeN
...
The output directory also has a file called something like PROKKA_12192023.txt that summarises the results, saying something like this:
organism: Genus species strain
contigs: 2
bases: 4033501
CDS: 3547
rRNA: 25
tRNA: 98
tmRNA: 1
Yay!
No comments:
Post a Comment