Monday, 17 November 2025

Making a map of the locations where bacterial isolates were collected

 I wanted to make a map of the locations in the world were some bacterial isolates were collected. I found a nice website that gave me some useful location on plotting maps in R.

Here's what I found worked for me.

First I got a map of the world: 

> library("ggplot2")
> theme_set(theme_bw())
> library("sf")
> library("rnaturalearth")
> library("rnaturalearthdata")
> world <- ne_countries(scale = "medium", returnclass = "sf")

Then I made a text file in which I had the number of bacterial isolates collected in each country, and the abbreviation for countries using the ISO_A3 three-letter codes:
Isolates    Country    "Country Code"
1    Bahrain    BHR
1    Ecuador    ECU
1    El Salvador    SLV
1    Gabon    GAB
1    Guatemala    GTM
...
I read in this file:
MyCountryCountData <- read.table("countrycount_data.txt",header=TRUE, sep="\t", stringsAsFactors=FALSE)
Then I stored the data on the number of counts from each country, from this file:
numcountries <- length(world$iso_a3)
isolatecounts <- numeric()
for (i in 1:numcountries)
{
    mycountry <- world$iso_a3[i]
    print(paste("Calculating for country=",mycountry," row i=",i, "mylength=",length(isolatecounts)))
    myindex <- which(mycountry == MyCountryCountData$Country.Code)
    if (length(myindex) > 0)
    {
       myisolates <- MyCountryCountData$Isolates[myindex]
       isolatecounts <- append(isolatecounts, myisolates, after=length(isolatecounts))
    }
    else
    {
       isolatecounts <- append(isolatecounts, 0, after=length(isolatecounts))
    }
}
 
Then I made a plot showing a map of the world, with the countries coloured in according to their number of isolates:
 
Make a 300 dpi tiff file:
> tiff("countries_histogram.tiff", units="in", width=5, height=5, res=300)
> ggplot(data = world) + geom_sf(aes(fill = isolatecounts[1:242])) + scale_fill_viridis_c(option = "plasma", trans="sqrt")
> dev.off()
 
 The plot looks something like this:

 
 
 
 
 
 
 
 
 
 

Friday, 14 November 2025

Submitting V. cholerae and MLST sequence types to PubMLST

 I've been submitting novel Vibrio cholerae MLST alleles and sequence types to the PubMLST database.

This is the website for submitting novel Vibrio cholerae alleles, genomes or MLST sequence types to PubMLST. 

 Some little things I found out along the way:

- to submit a novel MLST sequence type based on a genome sequence, several things are necessary:

(i) first submit the novel allele(s) to PubMLST. This requires that you know some things, such as gene for the allele (e.g. pntA), sequencing technology used (e.g. Illumina), assembly type (e.g. de novo), assembly software (e.g. CLC genomics v. 7), sequencing coverage (e.g. >100x), read length (e.g. 200-299 bp). I usually just submit one allele at a time for a genome.

 Note that if a gene is missing from a particular genome, that is not counted as a novel allele or MLST sequence type. 

 PubMLST will then email you back with the new identifier for this allele, e.g.  allele 87 for pntA (a made-up example).

(ii) then you need to submit the genomes that have the novel allele/MLST to PubMLST. This requires you submit some information about the genome, in this format (below). You can submit one or more genomes at a time, e.g.

isolate    references    assembly_filename    sequence_method    country    year    species    serogroup    biosample_accession    run_accession    NCBI_assembly_accession
1223-93    35930328    1223-93.fasta.gz    Illumina    Indonesia    1993    Vibrio cholerae    O180    SAMD00180560    DRR213438    GCA_023164185.1
1003-93    35930328    1003-93.fasta.gz    Illumina    Indonesia    1993    Vibrio cholerae    O161    SAMD00180540    DRR213418    GCA_023163825.1

Note that if you don't have the NCBI accession, you can just leave that column empty. If you don't know the biosample accession or run accession you can put 'null' in those columns.

PubMLST will then email you back with the new PubMLST identifiers for these genomes, e.g. identifier 4188 and 4189 (a made-up example).

(iii)  then you can submit the novel MLST sequence type, giving an example of a genome in PubMLST in which the novel MLST sequence type is found. You need to submit some information in this format, e.g. for a genome with PubMLST identifier 4184:

id    adk    gyrB    mdh    metE    pntA    purM    pyrC
4184    13    40    14    46    3    9    39

Acknowledgements

Thank you to Sophie Octavia for helpful advice.