Monday 25 September 2017

How to do a substructure search of ChEMBL

I wanted to do a search of ChEMBL to see if it contains any phase III or phase IV approved drugs that are carbamate benzamidazoles (like albendazole).

So I went to the ChEMBL homepage where there is a 'Marvin JS' chemical drawing tool by the company ChemAxon. Here was my picture:


The 'A' here means any atom. This 'A' doesn't seem to include H though, so I had to do another search using '-OH' instead of the '-OA'.
[Later note: actually it's not necessary to do -OA here, you can just do '-O' and the software shows a -OH but this substructure search will hit things with -O-A unless you specifically draw in a -O-H (ie. H instead of A)].

Then I clicked on the 'Substructure search - Fetch compounds' button under the 'Marvin JS' chemical drawing tool, on the ChEMBL webpage. This brought back 248 hits.

Then I clicked on the arrow at the top of the 'Max phase' column to sort by phase. This found a few old friends at the top, albendazole and mebendazole, plus a couple of others:



The fourth was a molecule I hadn't seen before:

Some things I noticed:

- For some reason the 'atom toolbar' seemed to disappear when I was using Safari. I tried using Firefox instead and it was there again. Not sure why this is..

- When I searched ChEMBL for compounds containing a -SO3 substructure, it got hits that on first glance didn't seemed to contain this substructure. However, when I looked at the ChEMBL pages for those hits, they said 'Alternative forms of this compound in ChEMBL', and those alternative forms did have -SO3 substructures. So I guess it was searching the 'Alternative forms' as well.

- The 'A' atom seems to mean any atom except H.

- I found that when I search for a benzene ring with -OH attached, it also finds hits that have benzene attached to -O-(something else) so that it doesn't seem to enforce that it's -OH.

- Something is wrong with the ChEMBL results page. When I searched for a benzene ring with two -OH groups attached, the default display showed 25 hits per page, and when I clicked through the pages, I saw some compounds repeated on different hit pages, but didn't find hexylresorcinol, which should be a hit. However, when I asked for 100 hits per page, it showed hexylresorcinol! I emailed the ChEMBL helpdesk about this but no reply yet...