I'm interested in finding all the Vibrio cholerae data in the European Nucleotide Archive.
I found a nice documentation page on 'How to Programmatically Perform a Search across ENA based on Taxonomy'.
Note that below I have given the links to web pages that have the results for certain searches. Another way to perform the same searches is to use the superb Advanced search website for the ENA.
Here are some things I learnt:
How to search for all sets of Vibrio cholerae reads in the ENA:
https://www.ebi.ac.uk/ena/portal/api/search?result=read_run&query=tax_tree(666)%20OR%20tax_tree(650003)&format=tsv&fields=accession,description,collection_date,fastq_ftp
This gives all the sets of reads in the ENA for Vibrio cholerae (taxonomy id. 666) or Vibrio paracholerae (taxonomy id. 650003) or any subordinate taxa.
This gave me back for example:
run_accession sample_accession accession description collection_date fastq_ftp
DRR014565 SAMD00008671 SAMD00008671 Illumina Genome Analyzer IIx sequencing; sequencing of V. cholera CRC711 ftp.sra.ebi.ac.uk/vol1/fastq/DRR014/DRR014565/DRR014565.fastq.gz
DRR014566 SAMD00008673 SAMD00008673 Illumina Genome Analyzer IIx sequencing; sequencing of V. cholera CRC1106 ftp.sra.ebi.ac.uk/vol1/fastq/DRR014/DRR014566/DRR014566.fastq.gz
How to search for all Vibrio cholerae assemblies in the ENA:
https://www.ebi.ac.uk/ena/portal/api/search?result=assembly&query=tax_tree(666)%20OR%20tax_tree(650003)&format=tsv
This gives all the NCBI assemblies stored in the ENA for Vibrio cholerae (taxonomy id. 666) or Vibrio paracholerae (taxonomy id. 650003) or any subordinate taxa.
This gave me back for example:
accession assembly_name assembly_title run_ref sample_accession secondary_sample_accession study_accession strain
GCA_000006745 ASM674v1 ASM674v1 assembly for Vibrio cholerae O1 biovar El Tor str. N16961 SAMN02603969 PRJNA36 N16961
GCA_000016245 ASM1624v1 ASM1624v1 assembly for Vibrio cholerae O395 SAMN02604040 PRJNA15667 O395
GCA_000021605 ASM2160v1 ASM2160v1 assembly for Vibrio cholerae M66-2 SAMN02603897 PRJNA32851 M66-2
GCA_000021625 ASM2162v1 ASM2162v1 assembly for Vibrio cholerae O395 SAMN02603898 PRJNA32853 O395
analysis_accession description ERZ2821805 Genome assembly: SAMD00006230_shovill ERZ2885330 Genome assembly: SAMD00057587_shovill ERZ2885331 Genome assembly: SAMD00057588_shovill
SAMD00006230 Genome of Vibrio cholerae
SAMD00008668 Vibrio cholerae NCTC9420
SAMD00008669 Vibrio cholerae NCTC5395
SAMD00008670 Vibrio cholerae E9120
sample_accession secondary_sample_accession run_accession collection_date country serotype strain sample_title
SAMD00008671 DRS012884 DRR014565 Vibrio cholerae CRC711
SAMD00008673 DRS012885 DRR014566 Vibrio cholerae CRC1106
SAMD00008670 DRS012886 DRR014567 Vibrio cholerae E9120
SAMD00008672 DRS012887 DRR014568 Vibrio cholerae C5
SAMD00008669 DRS012888 DRR014569 Vibrio cholerae NCTC5395
SAMD00008668 DRS012889 DRR014570 Vibrio cholerae NCTC9420
SAMD00006230 DRS013907 DRR015799 Genome of Vibrio cholerae
SAMD00057587 DRS071898 DRR068856 2013-07-01 Viet Nam: Nam Dinh VNND_2013Jul_3SS Vibrio cholerae O1 str. environmental isolate VNND_2013Jul_3SS
SAMD00057588 DRS071899 DRR068857 2013-07-01 Viet Nam: Nam Dinh VNND_2013Jul_5SS Vibrio cholerae O1 str. environmental isolate VNND_2013Jul_5SS
SAMEA889371 ERS013257 ERR018111 2007-01-01 India Ogawa 4605 2956_6#1
SAMEA889365 ERS013258 ERR018112 2006-01-01 India Ogawa 4656 2956_6#2
SAMEA889366 ERS013259 ERR018113 2001-01-01 Bangladesh Ogawa 4675 2956_6#3
SAMEA889269 ERS013260 ERR018114 1999-01-01 Bangladesh Ogawa 4679 2956_6#4
SAMEA889268 ERS013261 ERR018115 2001-01-01 Bangladesh Ogawa 4663 2956_6#5
SAMEA889293 ERS013263 ERR018116 2001-01-01 Bangladesh Ogawa 4661 2956_6#6
SAMEA889314 ERS013262 ERR018117 1994-01-01 Bangladesh Ogawa 4660 2956_6#7
No comments:
Post a Comment