Challenges of Connecting Chemistry to Pharmacology: Perspectives from Curating the IUPHAR/BPS Guide to PHARMACOLOGY

Connecting chemistry to pharmacology has been an objective of Guide to PHARMACOLOGY (GtoPdb) and its precursor the International Union of Basic and Clinical Pharmacology Database (IUPHAR-DB) since 2003. This has been achieved by populating our database with expert-curated relationships between documents, assays, quantitative results, chemical structures, their locations within the documents, and the protein targets in the assays (D-A-R-C-P). A wide range of challenges associated with this are described in this perspective, using illustrative examples from GtoPdb entries. Our selection process begins with judgments of pharmacological relevance and scientific quality. Even though we have a stringent focus for our small-data extraction, we note that assessing the quality of papers has become more difficult over the last 15 years. We discuss ambiguity issues with the resolution of authors’ descriptions of A-R-C-P entities to standardized identifiers. We also describe developments that have made this somewhat easier over the same period both in the publication ecosystem and recent enhancements of our internal processes. This perspective concludes with a look at challenges for the future, including the wider capture of mechanistic nuances and possible impacts of text mining on automated entity extraction.

[1]  Masters,et al.  Cell line misidentification: the beginning of the end , 2010 .

[2]  Alasdair J. G. Gray,et al.  The IUPHAR/BPS Guide to PHARMACOLOGY in 2018: updates and expansion to encompass the new guide to IMMUNOPHARMACOLOGY , 2017, Nucleic Acids Res..

[3]  American Type Culture Collection Standards Development Orga ASN-0002 Cell line misidentification: the beginning of the end , 2010, Nature Reviews Cancer.

[4]  Peter Murray-Rust,et al.  Minimum information about a bioactive entity (MIABE) , 2011, Nature Reviews Drug Discovery.

[5]  Antony J. Williams,et al.  Dispensing Processes Impact Apparent Biological Activity as Determined by Computational and Statistical Analyses , 2013, PloS one.

[6]  Christopher Southan,et al.  SynPharm: A Guide to PHARMACOLOGY Database Tool for Designing Drug Control into Engineered Proteins , 2018, ACS omega.

[7]  Qin Jiang,et al.  Discovery of the 3-Imino-1,2,4-thiadiazinane 1,1-Dioxide Derivative Verubecestat (MK-8931)-A β-Site Amyloid Precursor Protein Cleaving Enzyme 1 Inhibitor for the Treatment of Alzheimer's Disease. , 2016, Journal of medicinal chemistry.

[8]  Henning Hermjakob,et al.  Shared resources, shared costs—leveraging biocuration resources , 2015, Database J. Biol. Databases Curation.

[9]  Christopher Southan,et al.  Extracting and connecting chemical structures from text sources using chemicalize.org , 2013, Journal of Cheminformatics.

[10]  Tony Taldone,et al.  Inhibition of dipeptidyl peptidase-IV (DPP-IV) by atorvastatin. , 2008, Bioorganic & medicinal chemistry letters.

[11]  Sean Ekins,et al.  Challenges and recommendations for obtaining chemical structures of industry-provided repurposing candidates. , 2013, Drug discovery today.

[12]  Yanli Wang,et al.  PubChem BioAssay: A Decade’s Development toward Open High-Throughput Screening Data Sharing , 2017, SLAS discovery : advancing life sciences R & D.

[13]  Michael K Gilson,et al.  Digital chemistry in the Journal of Medicinal Chemistry. , 2014, Journal of medicinal chemistry.

[14]  Adam J Pawson,et al.  International Union of Basic and Clinical Pharmacology. LXXXVIII. G Protein-Coupled Receptor List: Recommendations for New Pairings with Cognate Ligands , 2013, Pharmacological Reviews.

[15]  Susan Tweedie,et al.  Genenames.org: the HGNC and VGNC resources in 2017 , 2016, Nucleic Acids Res..

[16]  Lawrence Mbuagbaw,et al.  Exploring the characteristics, global distribution and reasons for retraction of published articles involving human research participants: a literature survey , 2018, Journal of multidisciplinary healthcare.

[17]  M. Baker Reproducibility crisis: Blame it on the antibodies , 2015, Nature.

[18]  Antony J. Williams,et al.  Machines first, humans second: on the importance of algorithmic interpretation of open chemistry data , 2015, Journal of Cheminformatics.

[19]  A. Churg,et al.  AZD9668: Pharmacological Characterization of a Novel Oral Inhibitor of Neutrophil Elastase , 2011, Journal of Pharmacology and Experimental Therapeutics.

[20]  George Papadatos,et al.  The ChEMBL database in 2017 , 2016, Nucleic Acids Res..

[21]  Jörg B Schulz,et al.  The impact of fraudulent and irreproducible data to the translational research crisis – solutions and implementation , 2016, Journal of neurochemistry.

[22]  Antony J. Williams,et al.  Ambiguity of non-systematic chemical identifiers within and between small-molecule databases , 2015, Journal of Cheminformatics.

[23]  Wilhelm Huisinga,et al.  Mechanism-Based Inhibition: Deriving KI and kinact Directly from Time-Dependent IC50 Values , 2009, Journal of biomolecular screening.

[24]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[25]  Paul A Insel,et al.  Experimental design and analysis and their reporting: new guidance for publication in BJP , 2015, British journal of pharmacology.

[26]  Robert L. Perlman,et al.  Mouse models of human disease evolutionary perspective , 2016 .

[27]  A. Valencia,et al.  Information Retrieval and Text Mining Technologies for Chemistry. , 2017, Chemical reviews.

[28]  M. Noor,et al.  Data Sharing: How Much Doesn't Get Submitted to GenBank? , 2006, PLoS biology.

[29]  David S. Wishart,et al.  HMDB 4.0: the human metabolome database for 2018 , 2017, Nucleic Acids Res..

[30]  Asher Mullard,et al.  Reliability of 'new drug target' claims called into question , 2011, Nature Reviews Drug Discovery.

[31]  R. Perlman,et al.  Mouse models of human disease , 2016, Evolution, medicine, and public health.

[32]  Martin Jones,et al.  IUPHAR-DB: the IUPHAR database of G protein-coupled receptors and ion channels , 2008, Nucleic Acids Res..

[33]  George Papadatos,et al.  SureChEMBL: a large-scale, chemically annotated patent document database , 2015, Nucleic Acids Res..

[34]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[35]  Igor V. Filippov,et al.  Optical Structure Recognition Software To Recover Chemical Information: OSRA, An Open Source Solution , 2009, J. Chem. Inf. Model..

[36]  Biocuration: Distilling data into knowledge , 2018, PLoS biology.

[37]  Ubbo Visser,et al.  Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation , 2014, PeerJ.

[38]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[39]  Sorel Muresan,et al.  Analysis of in vitro bioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffolds , 2011, J. Cheminformatics.

[40]  Declan Butler,et al.  Scientists in the dark after French clinical trial proves fatal , 2016, Nature.

[41]  John P. Overington,et al.  Chemical databases: curation or integration by user-defined equivalence? , 2015, Drug discovery today. Technologies.

[42]  Joanna L. Sharman,et al.  The IUPHAR/BPS Guide to PHARMACOLOGY in 2016: towards curated quantitative interactions between 1300 protein targets and 6000 ligands , 2015, Nucleic Acids Res..

[43]  Christopher Southan,et al.  Caveat Usor: Assessing Differences between Major Chemistry Databases , 2018, ChemMedChem.

[44]  Ali Nahvi,et al.  Discovery of Selective RNA-Binding Small Molecules by Affinity-Selection Mass Spectrometry. , 2018, ACS chemical biology.

[45]  O. Witte,et al.  Sphingosylphosphorylcholine and Lysophosphatidylcholine Are Ligands for the G Protein-coupled Receptor GPR4* , 2001, The Journal of Biological Chemistry.

[46]  Michael K. Gilson,et al.  BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology , 2015, Nucleic Acids Res..

[47]  Jacob K. Asiedu,et al.  The Drug Repurposing Hub: a next-generation drug library and information resource , 2017, Nature Medicine.

[48]  Peter Murray-Rust,et al.  Chemical Name to Structure: OPSIN, an Open Source Solution , 2011, J. Chem. Inf. Model..

[49]  Ben Goldacre,et al.  Pharmaceutical companies’ policies on access to trial data, results, and methods: audit study , 2017, British Medical Journal.

[50]  Christopher Southan,et al.  Expanding opportunities for mining bioactive chemistry from patents , 2015, Drug discovery today. Technologies.

[51]  Arthur Christopoulos,et al.  A kinetic view of GPCR allostery and biased agonism. , 2017, Nature chemical biology.

[52]  Lucia Gardossi,et al.  Guidelines for reporting of biocatalytic reactions. , 2010, Trends in biotechnology.