I first came up with the idea to use Dijkstra's algorithm to do this, and wrote a Python script calc_dists_to_top_of_GO.py for this. This worked ok, but was quite slow (took about 15 minutes to run).
My husband Noel came up with the idea of using a breadth-first search for the same purpose. Here's a Python script that he helped me to write for this (thanks Noel!): calc_dists_to_top_of_GO_using_bfs.py.
This was much faster, and only took seconds to run!
Both scripts use the GO ontology file (which contains the GO hierarchy), in this case 'gene_ontology.WS238.obo' (which I downloaded from WormBase) as input.
Happily the two scripts give the same result : )
For example, for the GO term 'single strand break repair' (GO:0000012), they both find that it is 6 steps from a term at the top of the hierarchy (in this case, from 'biological process', GO:0008150):
Post a Comment