Tuesday, 28 January 2014

Find the number of steps from a GO term to the top of the hierarchy

I've been analysing the Gene Ontology (GO) hierarchy, and wanted to find a way to calculate the number of steps from a particular GO term to the top of the hierarchy.

I first came up with the idea to use Dijkstra's algorithm to do this, and wrote a Python script calc_dists_to_top_of_GO.py for this. This worked ok, but was quite slow (took about 15 minutes to run).

My husband Noel came up with the idea of using a breadth-first search for the same purpose. Here's a Python script that he helped me to write for this (thanks Noel!): calc_dists_to_top_of_GO_using_bfs.py.
This was much faster, and only took seconds to run!

Both scripts use the GO ontology file (which contains the GO hierarchy), in this case 'gene_ontology.WS238.obo' (which I downloaded from WormBase) as input.  

Happily the two scripts give the same result : )

For example, for the GO term 'single strand break repair' (GO:0000012), they both find that it is 6 steps from a term at the top of the hierarchy (in this case, from 'biological process', GO:0008150):

No comments: