Friday 28 June 2019

Retrieving data from WormBase using the REST API

I recently wrote a blog post on using the WormBase ParaSite REST API to retrieve data from their database. For C. elegans data, it's best to use the main WormBase website, and there is also a WormBase REST API.

My problem today was to retrieve phenotype information (from RNAi, mutants) for C. elegans genes. Ages ago, I wrote a blog post doing this by querying the WormBase website through their standard webpage. But, let's modernise and use the REST API instead!

I found that it's possible to do a REST API query to get the phenotypes for a gene, by typing for example:
for gene WBGene00000079.

Python scripts to retrieve phenotypes for a gene, or a list of genes
I also wrote a Python script to do the same query using Python, see here. When you run it you should see:
% python3
id= WBPhenotype:0001952 label= germline nuclear positioning variant
id= WBPhenotype:0001969 label= germ cell compartment morphology variant
id= WBPhenotype:0000186 label= oogenesis variant
id= WBPhenotype:0000640 label= egg laying variant
id= WBPhenotype:0001940 label= rachis morphology variant
id= WBPhenotype:0001980 label= germ cell compartment expansion variant
id= WBPhenotype:0000638 label= molt defect
id= WBPhenotype:0000154 label= reduced brood size
id= WBPhenotype:0001973 label= germ cell compartment size variant
id= WBPhenotype:0000059 label= larval arrest
id= WBPhenotype:0000697 label= protruding vulva

Then I wrote another Python script that retrieves all the phenotypes for an input list of genes: see here.
For example, for an input list of C.elegans genes:
1   WBGene00000079
2   WBGene00001484
3   WBGene00001948
you can run it by typing:
% python3 mytmp mytmp.out
and the output is:
WBGene00000079  WBPhenotype:0001952     germline nuclear positioning variant
WBGene00000079  WBPhenotype:0001969     germ cell compartment morphology variant
WBGene00000079  WBPhenotype:0000186     oogenesis variant
WBGene00000079  WBPhenotype:0000640     egg laying variant
WBGene00000079  WBPhenotype:0001940     rachis morphology variant
WBGene00000079  WBPhenotype:0001980     germ cell compartment expansion variant
WBGene00000079  WBPhenotype:0000638     molt defect
WBGene00000079  WBPhenotype:0000154     reduced brood size
WBGene00000079  WBPhenotype:0001973     germ cell compartment size variant
WBGene00000079  WBPhenotype:0000059     larval arrest
WBGene00000079  WBPhenotype:0000697     protruding vulva
WBGene00001484  WBPhenotype:0000640     egg laying variant
WBGene00001948  WBPhenotype:0000062     lethal
WBGene00001948  WBPhenotype:0000054     larval lethal
WBGene00001948  WBPhenotype:0000867     embryonic arrest
WBGene00001948  WBPhenotype:0000643     locomotion variant
WBGene00001948  WBPhenotype:0000050     embryonic lethal
WBGene00001948  WBPhenotype:0000053     paralyzed arrested elongation two fold
WBGene00001948  WBPhenotype:0000535     organism morphology variant
WBGene00001948  WBPhenotype:0000406     lumpy
WBGene00001948  WBPhenotype:0000583     dumpy
WBGene00001948  WBPhenotype:0000688     sterile
WBGene00001948  WBPhenotype:0000669     sex muscle development variant
WBGene00001948  WBPhenotype:0000154     reduced brood size
WBGene00001948  WBPhenotype:0000861     body wall muscle development variant
WBGene00001948  WBPhenotype:0000861     body wall muscle development variant
WBGene00001948  WBPhenotype:0000861     body wall muscle development variant
WBGene00001948  WBPhenotype:0001913     excess coelomocytes
WBGene00001948  WBPhenotype:0000095     M lineage variant
WBGene00001948  WBPhenotype:0000031     slow growth

Python script to retrieve references for a gene

Here is a little script to retrieve the references (e.g. papers) for a gene:
import requests, sys

server = ""
ext = "/rest/field/gene/WBGene00000079/references"

r = requests.get(server+ext, headers={ "Content-Type" : "application/json", "Accept" : ""})

if not r.ok:

decoded = r.json()
# print(decoded)

# based on looking at the example
refs = decoded["references"]
refs = refs["data"]
refs = refs["results"]
refcnt = 0
for ref in refs:
    refcnt += 1
    title = ref["title"]
    title = title[0]
    abstract = ref["abstract"]
    abstract = abstract[0]
    title_words = title.split()
    abstract_words = abstract.split()
    if 'development' in title_words:


Retrieving the gene name for a gene

You can retrieve the gene name for a gene by using the REST API e.g. .

No comments: