Friday 2 December 2016

Mapping Illumina reads using the smalt aligner

Martin Hunt has written the map_splitter.py script for aligning Illumina reads to a genome assembly.

This can use the SMALT aligner (on the Sanger compute farm) as follows:

% map_splitter.py --split 500000 -k 11 -s 2 -o " -x -r 0 -y 0.7 -i 500" smalt assembly.fa seqs_1.fastq.gz seqs_2.fastq.gz

where --split 500000 tells it to split the input genome fasta file into smaller files of 500,000 sequences each;
-k 11 is the kmer for the SMALT index;
-s 2 is the step length for the SMALT index;
-o gies the mapping options for SMALT;
-x is the SMALT option for SMALT to do an exhaustive search for alignments (at cost of speed);
-r 0 tells SMALT to randomly assign multiply mapping reads to one place;
-y 0.7 tells SMALT to only take reads with alignment identity of >=70%;
-i 500 tells SMALT to allow a maximum insert size of 500 bp;
assembly.fa is the input assembly file;
seqs_1.fastq.gz and seqs_2.fastq.gz are the fastq files of reads and their mates, respectively.

No comments: