I wanted to find all descendants of a GO term in the GO hierarchy. First I downloaded the hierarchy from the GO website:
% wget http://purl.obolibrary.org/obo/go/go-basic.obo
Then I wrote a little Python script to find all the descendants of a GO term:
from collections import defaultdict
import sys
import os
#====================================================================#
# define a function to record the children of each GO term in the GO hierarchy:
def read_go_children(input_go_obo_file):
"""record the children of each GO term in the GO hierarchy"""
# first read in the input GO file, and make a list of all the GO terms, and the
# terms below them in the GO hierarchy:
# eg.
# [Term]
# id: GO:0004835
children = defaultdict(list) # children of a particular GO term, in the hierarchy
take = 0
fileObj = open(input_go_obo_file, "r")
for line in fileObj:
line = line.rstrip()
temp = line.split()
if len(temp) == 1:
if temp[0] == '[Term]':
take = 1
elif len(temp) >= 2 and take == 1:
if temp[0] == 'id:':
go = temp[1]
elif temp[0] == 'is_a:': # eg. is_a: GO:0048308 ! organelle inheritance
parent = temp[1]
children[parent].append(go) # record that a child of 'parent' is term 'go'
elif len(temp) == 0:
take = 0
fileObj.close()
return children
#====================================================================#
# define a function to find all descendants of a GO term in the GO hierarchy:
def find_all_descendants(input_go_term, children):
"""find all the descendants of a GO term in the GO hierarchy
>> find_all_descendants('GO1', {'GO1': ['GO2', 'GO3'], 'GO2': ['GO4']})
['GO1', 'GO2', 'GO3', 'GO4']
"""
descendants = set()
queue = []
queue.append(input_go_term)
while queue:
node = queue.pop(0)
if node in children and node not in descendants: # don't visit a second time
node_children = children[node]
queue.extend(node_children)
descendants.add(node)
return descendants
#====================================================================#
def main():
# check the command-line arguments:
if len(sys.argv) != 3 or os.path.exists(sys.argv[1]) == False:
print("Usage: %s input_go_obo_file input_go_term" % sys.argv[0])
sys.exit(1)
input_go_obo_file = sys.argv[1] # the input gene ontology file eg. gene_ontology.WS238.obo
input_go_term = sys.argv[2] # the input GO term of interest
# read in the children of each GO term in the GO hierachy:
children = read_go_children(input_go_obo_file)
# find all the descendants of the term 'input_go_term':
descendants = find_all_descendants(input_go_term, children)
descendantlist = ",".join(descendants)
print("DESCENDANTS:",descendantlist)
#====================================================================#
if __name__=="__main__":
main()
#====================================================================#
To run it I type, for example, to find descendants of the term GO:0008291 (acetylcholine metabolic process):
% python3 find_descendants_of_GO_term.py go-basic.obo 'GO:0008291'
This gives me:
DESCENDANTS: GO:0001507,GO:0006581,GO:0008291,GO:0008292
The descendants are:
GO:0006581 acetylcholine catabolic process
GO:0001507 acetylcholine catabolic process in synaptic cleft (descendant of acetylcholine catabolic process)
GO:0008292 acetylcholine biosynthetic process
Hurray for Python!
Notes
- I see a discussion on biostars about doing something similar in R. I didn't check if that gives the same result.
- It turns out that you can also do this using WormMine (I don't know if this is for just C. elegans-related GO terms though)
No comments:
Post a Comment