avrilomics: Running BLAST to align genome sequences

Tuesday, 29 September 2020

Running BLAST to align genome sequences

I'm interested in finding conserved non-coding sequences between two related species of worms.

First I took the introns, UTRs, and intergenic regions from the first species, and tried comparing them to the genome of the second species using exonerate, but that was very slow. I then tried BLAT, which was a little faster. Then I tried BLASTN, which was nice and speedy!

It's been a while since I ran BLAST on the Sanger farm so I needed to remind myself how to run it, even though I have written previous posts on that ages ago (e.g. on farm_blast and on speeding up blast jobs).

This is what I did now:

Find the BLAST module on the farm, and load it: (only applicable to Sanger users)

% module avail -t | grep -i blast
blast/2.7.1=h96bfa4b_5

% module load blast/2.7.1=h96bfa4b_5

Make a blast database:

% makeblastdb -in genome2.fa -dbtype nucl

Run blast:

% blastn -db genome2.fa -query genome1_intronsandutrsandintergenic.fa -out myoutput.blast -outfmt 6

One thing I always always forget is what are the columns in the BLAST m8 format, so I have to look at this nice webpage.

Note that by default the blastn command runs Megablast, which looks for matches of high percent identity, and is a fast algorithm. I'm interested in high percent identity matches, so I used this.

Alternatives to BLAST:

An alternative to BLAST is nucmer, part of the mummer package, which I wrote a post on ages ago (see here). Note to self: nucmer is part of the mummer module on the Sanger farm.

I asked my colleagues what they are using nowadays for whole genome alignements, and they mentioned a couple of other software:

- my colleague Eerik Aunin mentioned the software SibeliaZ, which is tailored for aligning highly similar genomes, eg. strains of the same species,

- my colleague Faye Rodgers mentioned Cactus, which can be used to make alignments of 1000s of vertebrate genomes,

- my colleague Ana Protasio mentioned Satsuma

Regarding finding conserved noncoding regions, my colleague James Cotton mentioned PhastCons.

avrilomics

Tuesday, 29 September 2020

Running BLAST to align genome sequences

No comments:

About Me