Friday, 31 May 2019

Retrieving data from PDBe and UniChem using their web interface

The PDBe is a database of macromolecular structures, such as protein structures (Note: don't get this mixed up with its American cousin, the RCSB PDB). In many cases it contains structures of small chemical compounds bound to protein structures.

I wanted to check if certain chemical compounds (for which I know the ChEMBL identifiers) are present in any PDBe entry.

To do this, I decided to learn to use the PDBe REST API, which allows you to query the PDBe database via their web interface. There is a nice webinar on using the PDBe REST API here.

I also learnt that for my problem it would be useful to learn about the UniChem REST API, which lets you perform queries on UniChem, a resource which lets you map between chemical compound identifiers from different databases.

For my examples, I have used the Python 'requests' module which can parse the JSON format data returned from the REST API calls, and convert it into native Python objects.

Simple queries of PDBe via the web
There are many nice examples of how to query the PDBe using their REST API (ie. via the web) here.

...

The 'sifts' part of the PDBe REST API lets you map between the PDBe and other databases such as UniProt. There are example API calls here here. For example, an API call to find the UniProt entry corresponding to the PDBe entry 1ivv would be https://www.ebi.ac.uk/pdbe/api/mappings/uniprot/1ivv This gives an output like this:



Simple queries of PDBe using Python
To do the exact same query using Python, ie. to find the PDBe ids. for PDB entries that the ligand 'ATP' appears in, we can use the Python script here, e.g. by typing:
%  python3 pdb_rest_example_get_pbids_with_ligand.py -e ATP
This should give you the list of PDB ids. as output:
This is the url string:
https://www.ebi.ac.uk/pdbe/api//pdb/compound/in_pdb/ATP?pretty=true
1a0i
1a49
1a5u
1a82
1aq2
1asz
1atn
1atp
1ayl
1b0u
1b38
1b39
1b76
...


Similarly, to find the UniProt id. for a particular PDB id., we can use the script here, e.g. by typing:
% python3 pdb_rest_example_get_uniprot_for_pdbid.py -e 1ivv
This should give output:
This is the url string:
https://www.ebi.ac.uk/pdbe/api//mappings/uniprot/1ivv?pretty=true
UniProt id= P46881

We can verify that we got the right answer by checking the page for 1ivv on the PDBe website.

Simple queries of PDBe using Python, in a Jupyter notebook
If you are familiar with Python, an even easier way to query PDBe via the web is to write the queries within Python.

There are some nice examples of querying PDBe using Python in a Jupyter notebook here, as well as here.

Simple queries of UniChem, via the web
You can perform simple queries of UniChem via the web using their REST API.

For example, to find out the PDB ligand identifier for a particular ChEMBL identifier, you can go to the link https://www.ebi.ac.uk/unichem/rest/src_compound_id/CHEMBL14249/1/3
where here CHEMBL14249 is our ChEMBL id. of interest, and the '/1/3' at the end of the link says we want to convert from ChEMBL id. to PDB ligand id.

The output should look like this:
src_compound_id: ATP

Simple queries of UniChem, using Python
It's also quite easy to perform simple queries of UniChem using Python.
You can perform a query to find out the PDB identifier for a particular ChEMBL identifier using the script here, which you can run like this:

% python3 unichem_rest_example_get_pdbligandids_for_chemblid.py -e CHEMBL14249
and you will see output:
This is the url string:
https://www.ebi.ac.uk/unichem/rest//src_compound_id/CHEMBL14249/1/3
Ligand id. in PDB: ATP


Finding the PDB entries containing a particular ChEMBL compound
My original problem was to check if certain chemical compounds (for which I know the ChEMBL identifiers) are present in any PDBe entry.

I was able to do this, by taking two steps:
(i) first finding the PDB ligand identfier (three letter code) for my ChEMBL compounds, using the UniChem REST API (see this python script), and then
(ii) then finding the PDB entries that the PDB ligands are found in, using the PDB REST API (see this Python script).
To then add in the UniProt ids that correspond to those PDB entries, I again used the PDB REST API, using this Python script.

Hurray! I have graduated to the modern world of REST APIs!

Acknowledgements
A big thanks to David Armstrong from PDBe for answering some of my beginner's questions on retrieving data from the PDBe. 


No comments: