avrilomics: January 2025

Friday, 10 January 2025

Using snippy to find SNPs in bacterial genomes

I have been learning how to use snippy by Torsten Seemann to identify SNPs in bacterial genomes.

Running snippy

To run snippy on the Sanger computer farm, I first had to type:

% module load snippy/4.6.0

Then I wanted to run snippy for an assembly "14.fasta", by comparing it to a reference genome "ref.fasta". I told snippy to infer SNPs by simulating fake 250-bp reads from the assembly "14.fasta", and comparing those to the reference genome:

% snippy --cpus 16 --outdir mysnps_test --ref ref.fa --ctgs 14.fasta

where the output files were put into directory mysnps_test, and the --cpus 16 means that 16 CPUs are used.

It took 8 minutes to run on that assembly.

Output files from snippy

The main output file from snippy is called snps.tab and looks something like this:

% head -10 mysnps_test/snps.tab
CHROM   POS     TYPE    REF     ALT     EVIDENCE        FTYPE   STRAND  NT_POS  AA_POS  EFFECT  LOCUS_TAG       GENE    PRODUCT
AE003852        5414    snp     G       A       A:20 G:0
AE003852        42082   snp     A       C       C:20 A:0
AE003852        137105  del     TAACAGAAACAGA   T       T:14 TAACAGAAACAGA:0
AE003852        144569  snp     G       A       A:20 G:0
AE003852        167663  snp     T       C       C:14 T:0
AE003852        167678  snp     G       A       A:14 G:0
AE003852        167684  snp     C       T       T:14 C:0
AE003852        167697  snp     A       G       G:14 A:0
AE003852        182735  snp     C       T       T:20 C:0

Acknowledgements

Thanks to my colleagues Lia Bote and Vignesh Shetty for help running snippy and understanding it.