Tuesday, 18 March 2014

Adding filters in Apple Mail

I use the Apple 'Mail' program to read my email, and a lot of emails that aren't (from mailing lists, etc.) To set up filters in 'Mail', you can do the following:
1. Go to the 'Mail' menu -> choose 'Preferences', click on the 'Rules' tab.
2. Set up a new rule for filtering messages.

Tuesday, 4 March 2014

Querying the chado database

The chado database lies behind Genedb. To carry out queries, you can log into chado directly by typing (from within Sanger):
> ssh pcs5
> chado [then type your chado password]
Then within chado, you can type queries, and put the output in a file.
For example, to get a list of all the Schistosoma mansoni genes that have a note containing the word 'manual' (to find all manually curated genes), and save them in a file 'smansoni_curated', we can type:

\o smansoni_curated

select gene.uniquename as gene
     , prop.value as note
from feature gene
join featureprop prop on gene.feature_id = prop.feature_id
join cvterm prop_type on prop.type_id = prop_type.cvterm_id
join cv prop_type_cv on prop_type.cv_id = prop_type_cv.cv_id
join organism on gene.organism_id = organism.organism_id
where prop_type_cv.name = 'feature_property' and prop_type.name = 'comment'
  and organism.genus = 'Schistosoma' and organism.species = 'mansoni'
  and prop.value like '%manual%'


I got this example from the Sample_Chado_queries website.
There are also more sample chado queries on the Useful_chado_queries website.
A third useful webpage is the Extracting_data_from_a_Chado_database website.

Thanks to my colleagues Magdalena, Matt and Anna for help.

- to exit chado, I seem to have to type CTRL+D

Monday, 3 March 2014

Retrieving annotations from chado (genedb)

The database behind Genedb is called Chado.

Getting a gff file for Schistosoma mansoni from chado
To extract annotations for a species (eg. Schistosoma mansoni) from Chado, you can use a shell script like this on the Sanger farm (for Sanger users only):


export output="/lustre/scratch108/parasites/alc/50HGI_FuncAnnotn/Smansoni_chado_dump"
rm -rf $output;
mkdir -p $output;
bsub  -o  $output/bsub.o -e $output/bsub.e -q long  \
        -M2500 -R "select[mem>2500] rusage[mem=2500]" \
        writedb_entries.py -t -o Smansoni -i -d pgsrv1:5432/pathogens?genedb -x $output

I've replaced the password information with 'xxx', to keep the password secret!

Getting all flatworm transcripts from chado
This is from my colleague Eleanor Stanley (thanks Eleanor).
% chado_dump_transcripts_coding -o Smansoni > Sma.fa
% chado_dump_transcripts_coding -o Emultilocularis > Emu.fa
% cat Sma.fa Emu.fa > flatworm_transcripts.fa