Monday, 29 June 2015

Creating a GO-Slim, and mapping GO terms to it

Creating a GO-Slim 
I wanted to create my own GO-Slim, and found there is a nice tool for creating a GO-Slim at the EBI. You can start with an existing GO-Slim, eg. the 'generic GO-Slim' (has 149 terms), and add terms.

Mapping GO terms to your own GO-Slim
The next thing I wanted to do was to map GO terms to my GO-Slim. I was able to do this using Map2Slim, which is part of Owltools. To install it, I typed:
% wget http://build.berkeleybop.org/userContent/owltools/owltools
% wget http://build.berkeleybop.org/userContent/owltools/owltools-runner-all.jar
This is a Java program, it ran fine for me locally on my Mac laptop.

To run Map2Slim you need:
(i) a list of the GO terms in your GO-Slim
(ii) the GO annotations for your gene set of interest, with respect to the full ontology
(iii) the gene ontology hierarchy file (.obo file) for the full ontology

You need a list of the GO terms in your GO-Slim. If you are using the generic GO-Slim, you can download the generic GO-Slim in obo format from the geneontology.org website. Then to get a list of the terms, you can type:
% grep "id: GO:" goslim_generic.obo | grep -v alt | cut -d" " -f2 > goslim_terms.txt

The GO annotations for your gene set of interest need to be in GAF-2.0 format. In fact, I found that some of the columns aren't necessary for Map2Slim, so the file could look like this, where the columns marked 'optional' and 'unknown' seem to be ignored by Map2Slim:
!gaf-version: 2.0
WB    482159    482159    optional    GO:0005515    pubmed    unknown    optional    unknown    optional    optional    protein    unknown    482159    WB    optional    optional
WB    482159    482159    optional    GO:0008270    pubmed    unknown    optional    unknown    optional    optional    protein    unknown    482159    WB    optional    optional
WB    644503    644503    optional    GO:0008270    pubmed    unknown    optional    unknown    optional    optional    protein    unknown    644503    WB    optional    optional
WB    644503    644503    optional    GO:0005230    pubmed    unknown    optional    unknown    optional    optional    protein    unknown    644503    WB    optional    optional


You can download the latest gene ontology hierarchy (.obo) file from the geneontology.org website.

To run Map2Slim you type for example:
% ./owltools go-basic.obo --gaf my_gaf.txt --map2slim --idfile goslim_terms.txt --write-gaf my_slim.txt
The input GAF file was my_gaf.txt, the input obo file was go-basic.obo and the input list of GO-Slim terms was goslim_terms.txt.

The output file my_slim.txt is in GAF format, but has the GO-Slim terms for your genes. Hurray!