Friday 31 May 2019

Retrieving data from PDBe and UniChem using their web interface

The PDBe is a database of macromolecular structures, such as protein structures (Note: don't get this mixed up with its American cousin, the RCSB PDB). In many cases it contains structures of small chemical compounds bound to protein structures.

I wanted to check if certain chemical compounds (for which I know the ChEMBL identifiers) are present in any PDBe entry.

To do this, I decided to learn to use the PDBe REST API, which allows you to query the PDBe database via their web interface. There is a nice webinar on using the PDBe REST API here.

I also learnt that for my problem it would be useful to learn about the UniChem REST API, which lets you perform queries on UniChem, a resource which lets you map between chemical compound identifiers from different databases.

For my examples, I have used the Python 'requests' module which can parse the JSON format data returned from the REST API calls, and convert it into native Python objects.

Simple queries of PDBe via the web
There are many nice examples of how to query the PDBe using their REST API (ie. via the web) here.

...

The 'sifts' part of the PDBe REST API lets you map between the PDBe and other databases such as UniProt. There are example API calls here here. For example, an API call to find the UniProt entry corresponding to the PDBe entry 1ivv would be https://www.ebi.ac.uk/pdbe/api/mappings/uniprot/1ivv This gives an output like this: