Thursday, 30 May 2013

Using blat to align ESTs/cDNAs to a genome

BLAT by Jim Kent can be used to align ESTs, cDNAs to a genome. It is extremely fast.
It is an alternative to exonerate, which is slower but more accurate than BLAT.
Note that you can also use BLAT to align proteins (query) to proteins (in a database), or to align proteins (query) to a genome (DNA).

Aligning ESTs/cDNAs to a genome using BLAT
% blat assembly.fa ests.fa out.blat -out=blast8 -t=dna -q=dna
where assembly.fa is your assembly fasta file,
ests.fa is your fasta file of ESTs,
out.blat is the output file name,
-out=blast8 means the output format will be BLAST m8 format (by default the format is psl format),
-t=dna tells BLAT the database is DNA,
-q=dna tells BLAT the query is DNA.

Aligning proteins to a genome using BLAT
% blat assembly.fa proteins.fa out.blat -out=blast8 -t=dnax -q=prot
where assembly.fa is your assembly fasta file,
proteins.fa is your fasta file of proteins,
out.blat is the output file name,
-out=blast8 means the output format will be BLAST m8 format (by default the format is psl format),
-t=dnax tells BLAT the database is DNA,
-q=prot tells BLAT the query is proteins.

Aligning a short 44-bp sequence to Illumina reads using BLAT
I wanted to use BLAT to search for a short 44-bp sequence in some Illumina reads. I found that I needed to use -tileSize=8 in BLAT, as otherwise BLAT misses the 44-bp sequence in many reads (in which it is actually found), and also gets the coordinates slightly wrong. When I use -tileSize=8 it works much better and finds the cases I expect to find, and also gets the coordinates right.

Thanks
Thanks to John Liechty for advice on using BLAT to align proteins to a genome.

No comments: