Friday, 10 January 2025

Using snippy to find SNPs in bacterial genomes

I have been learning how to use snippy by Torsten Seemann to identify SNPs in bacterial genomes.

Running snippy

To run snippy on the Sanger computer farm, I first had to type:

% module load snippy/4.6.0

Then I wanted to run snippy for an assembly "14.fasta", by comparing it to a reference genome "ref.fasta". I told snippy to infer SNPs by simulating fake 250-bp reads from the assembly "14.fasta", and comparing those to the reference genome:

% snippy --cpus 16 --outdir mysnps_test --ref ref.fa --ctgs 14.fasta

where the output files were put into directory mysnps_test, and the --cpus 16 means that 16 CPUs are used.

It took 8 minutes to run on that assembly.

 Output files from snippy

 The main output file from snippy is called snps.tab and looks something like this:

% head -10 mysnps_test/snps.tab
CHROM POS TYPE REF ALT EVIDENCE FTYPE STRAND NT_POS AA_POS EFFECT LOCUS_TAG GENE PRODUCT
AE003852 5414 snp G A A:20 G:0
AE003852 42082 snp A C C:20 A:0
AE003852 137105 del TAACAGAAACAGA T T:14 TAACAGAAACAGA:0
AE003852 144569 snp G A A:20 G:0
AE003852 167663 snp T C C:14 T:0
AE003852 167678 snp G A A:14 G:0
AE003852 167684 snp C T T:14 C:0
AE003852 167697 snp A G G:14 A:0
AE003852 182735 snp C T T:20 C:0
  

Acknowledgements

Thanks to my colleagues Lia Bote and Vignesh Shetty for help running snippy and understanding it.