Monday, 15 October 2018

Getting raw sequence data at Sanger

This is useful for Sanger people only: how to get some data off our Sanger irods system.

1) Request an irods account from the service desk, as explained on the irods wiki page.

2) Make sure your ..softwarerc has irods in, so you can use irods commands. 

3) (Steve told me): first move to a directory where you want to transfer the cram files, and then run the following:
(e.g. for run 27104, lane 8)

kinit
< input password >   # you need to input your irods password here
icd /seq/27104
ils | grep "27104_8.*cram"$ | grep -v "phix" | while read -r list; do iget /seq/27104/$list . ;  done &

# once you cram files have downloaded, convert crams to fastq
bsub.py 1 cram2fq ~sd21/bash_scripts/run_cram2fastq


Note the script ~sd21/bash_scripts/run_cram2fastq says:
for i in *cram; do samtools view -ub --threads 4 ${i} | samtools sort -n - | samtools fastq -1 ${i%.cram}_1.fastq.gz -2 ${i%.cram}_2.fastq.gz - ; done

Note: to get this script to run for me, I had to make a copy of it and change the path for samtools to be /software/pathogen/external/apps/usr/local/samtools-1.6/samtools, which was the one in Steve's .cshrc

More info about irods
A lot of initial questions are covered in the FAQs located at
https://gitlab.internal.sanger.ac.uk/kdj/npg_doc/blob/master/irods_in_10_minutes.adoc
http://mediawiki.internal.sanger.ac.uk/index.php/IRODS_for_Sequencing_Users

Acknowledgements
Thanks very much to Steve Doyle for help with this.

No comments: