Wednesday 29 January 2014

Merging optical map XML files using Python

My colleagues asked me to merge 341 optical map XML files (for 341 scaffolds) into one large XML file. I was able to write a Python script merge_optical_map_xml_files.py by using Python's nice ElementTree library for parsing XML. I thought it would be a tricky job but with Python it took me only about half an hour. Yay!

Postscript:
My colleagues Alan Tracey and Karen Brooks found that the MapSolver software cannot read in the merged XML file produced by my script. We figured out that the problem is with the header. The header produced by my script is:
<?xml version='1.0' encoding='iso-8859-1'?>
<aligned_maps_document><experiment>

...

In order to get MapSolver to read in the XML file, I had to manually edit the header of the XML file produced by my script so that it is:
<?xml version="1.0" encoding="iso-8859-1"?>
<!-- @version:0.1 -->

<!-- this is a comment -->
<aligned_maps_document version="0.5">

<experiment>

...
I've now updated my Python script to make it produce this type of header in the output XML file.

No comments: