I recently wrote a blog post on using the WormBase ParaSite REST API to retrieve data from their database. For C. elegans data, it's best to use the main WormBase website, and there is also a WormBase REST API.
My problem today was to retrieve phenotype information (from RNAi, mutants) for C. elegans genes. Ages ago, I wrote a blog post doing this by querying the WormBase website through their standard webpage. But, let's modernise and use the REST API instead!
I found that it's possible to do a REST API query to get the phenotypes for a gene, by typing for example:
http://rest.wormbase.org/rest/field/gene/WBGene00000079/phenotype
for gene WBGene00000079.
Python scripts to retrieve phenotypes for a gene, or a list of genes
I also wrote a Python script to do the same query using Python, see here. When you run it you should see:
% python3 retrieve_phenotypeinfo_from_wormbase.py
id= WBPhenotype:0001952 label= germline nuclear positioning variant
id= WBPhenotype:0001969 label= germ cell compartment morphology variant
id= WBPhenotype:0000186 label= oogenesis variant
id= WBPhenotype:0000640 label= egg laying variant
id= WBPhenotype:0001940 label= rachis morphology variant
id= WBPhenotype:0001980 label= germ cell compartment expansion variant
id= WBPhenotype:0000638 label= molt defect
id= WBPhenotype:0000154 label= reduced brood size
id= WBPhenotype:0001973 label= germ cell compartment size variant
id= WBPhenotype:0000059 label= larval arrest
id= WBPhenotype:0000697 label= protruding vulva
Then I wrote another Python script that retrieves all the phenotypes for an input list of genes: see here.
For example, for an input list of C.elegans genes:
1 WBGene00000079
2 WBGene00001484
3 WBGene00001948
you can run it by typing:
% python3 retrieve_phenotypeinfo_from_wormbase_for_genelist.py mytmp mytmp.out
and the output is:
WBGene00000079 WBPhenotype:0001952 germline nuclear positioning variant
WBGene00000079 WBPhenotype:0001969 germ cell compartment morphology variant
WBGene00000079 WBPhenotype:0000186 oogenesis variant
WBGene00000079 WBPhenotype:0000640 egg laying variant
WBGene00000079 WBPhenotype:0001940 rachis morphology variant
WBGene00000079 WBPhenotype:0001980 germ cell compartment expansion variant
WBGene00000079 WBPhenotype:0000638 molt defect
WBGene00000079 WBPhenotype:0000154 reduced brood size
WBGene00000079 WBPhenotype:0001973 germ cell compartment size variant
WBGene00000079 WBPhenotype:0000059 larval arrest
WBGene00000079 WBPhenotype:0000697 protruding vulva
WBGene00001484 WBPhenotype:0000640 egg laying variant
WBGene00001948 WBPhenotype:0000062 lethal
WBGene00001948 WBPhenotype:0000054 larval lethal
WBGene00001948 WBPhenotype:0000867 embryonic arrest
WBGene00001948 WBPhenotype:0000643 locomotion variant
WBGene00001948 WBPhenotype:0000050 embryonic lethal
WBGene00001948 WBPhenotype:0000053 paralyzed arrested elongation two fold
WBGene00001948 WBPhenotype:0000535 organism morphology variant
WBGene00001948 WBPhenotype:0000406 lumpy
WBGene00001948 WBPhenotype:0000583 dumpy
WBGene00001948 WBPhenotype:0000688 sterile
WBGene00001948 WBPhenotype:0000669 sex muscle development variant
WBGene00001948 WBPhenotype:0000154 reduced brood size
WBGene00001948 WBPhenotype:0000861 body wall muscle development variant
WBGene00001948 WBPhenotype:0000861 body wall muscle development variant
WBGene00001948 WBPhenotype:0000861 body wall muscle development variant
WBGene00001948 WBPhenotype:0001913 excess coelomocytes
WBGene00001948 WBPhenotype:0000095 M lineage variant
WBGene00001948 WBPhenotype:0000031 slow growth
Python script to retrieve references for a gene
Here is a little script to retrieve the references (e.g. papers) for a gene:
import requests, sys
server = "http://rest.wormbase.org"
ext = "/rest/field/gene/WBGene00000079/references"
r = requests.get(server+ext, headers={ "Content-Type" : "application/json", "Accept" : ""})
if not r.ok:
r.raise_for_status()
sys.exit()
decoded = r.json()
# print(decoded)
# based on looking at the example http://rest.wormbase.org/rest/field/gene/WBGene00000079/references
refs = decoded["references"]
refs = refs["data"]
refs = refs["results"]
refcnt = 0
for ref in refs:
refcnt += 1
title = ref["title"]
title = title[0]
abstract = ref["abstract"]
abstract = abstract[0]
title_words = title.split()
abstract_words = abstract.split()
#print(refcnt,"Title=",title_words)
if 'development' in title_words:
print("title=",title)
#print(refcnt,"Abs=",abstract_words)
print("FINISHED\n")
Retrieving the gene name for a gene
You can retrieve the gene name for a gene by using the REST API e.g. http://rest.wormbase.org/rest/field/gene/WBGene00000079/name .
No comments:
Post a Comment