Public domain databases for medicinal chemistry.

Medicinal chemists today find themselves in an increasingly information-rich environment. An abundance of compound activity and affinity data is being published, and medicinal chemistry data are increasingly connected with a broader world of data from the realms of bioinformatics and systems biology. In recent years, a number of publicly accessible, chemistry-oriented databases of interest to medicinal chemists have been established to facilitate access to medicinal chemistry data and their biological links, with the aim of accelerating the discovery of new medications. In order to maximize their usefulness, it is important that researchers in pertinent fields be fully aware of these resources and exploit their full potential. Decades of growth worldwide in the pharmaceutical industry and of academic drug discovery efforts, along with technological advances that speed compound synthesis and assays1 , and the advent and growth of the related fields of chemical biology and chemical genomics, have led to an ongoing flood of publications with valuable data regarding new compounds and their biological activities. On the order of 20,000 - 30,000 new compounds are now published per year in some of the main medicinal chemistry journals, and this rate has accelerated in recent years (as detailed below). However, publication in conventional journals traps data in a form where they are inaccessible to computer search and retrieval. For example, it is not possible to search standard scientific articles for compounds of interest or to reliably extract machine-readable representations of compounds from chemical drawings in articles. As a consequence, the conventional publishing paradigm can severely restrict the discoverability and usability of medicinal chemistry data. The parallel growth of information technology and the emergence of the World Wide Web in the 1990’s have created important new opportunities for dissemination of data. Biologists – especially structural and molecular biologists – seized these opportunities, establishing central data resources like the Protein Data Bank2 and GenBank3 and laying the foundations for the field of bioinformatics. The first public protein-ligand database aimed at serving the drug discovery community, BindingDB, came on line in late 2000. This resource has grown substantially and has since been joined by other important databases with related scopes and goals. According to Pathguide, a web resource for online databases, there are at least 43 protein-compound interaction databases4, 5 and many other useful, yet free, chemical databases are now available6. Such resources are of increasing value not only for basic uses like finding and downloading structure-activity relationship (SAR) data for a protein target of interest, but also for emergent applications that become possible as the medicinal chemistry dataset grows to provide a comprehensive picture of small molecules in the larger biological context. For example, if a cell-based screen reveals that a new compound inhibits apoptosis, then one might seek similar compounds that bind apoptosis-related proteins, and thus hypothesize that the new compound also binds one of these targets. Similarly, if one is prioritizing several lead compounds for further development, the observation that one lead is similar to a published compound known to bind a different target might lead one to reduce its priority, to minimize off-target effects. In another scenario, marking all the proteins in a defined signaling pathway according to which ones already are targeted by FDA-approved drugs might lead to suggestions for a multidrug therapy to maximally suppress signaling. Here, we aim first to help medicinal chemists take advantage of the growing array of freely accessible medicinal chemistry-oriented databases by discussing three central resources focused on small molecule binding and bioactivity, BindingDB, ChEMBL and PubChem, and noting as well several other small molecule databases that are also of great value. (Readers interested in additional perspectives will enjoy other recent reviews7-12). In particular, Section B seeks to help users over the initial barriers encountered when one starts to use these rather complex resources, by summarizing information their organization and methods of accessing key types of data, information that is not always easy to glean from their respective web-sites. Subsequent sections then offer broader discussions of the field, and some readers may wish to jump directly to Section C, which uses the available medicinal chemistry data to derive interesting overviews of the available medicinal chemistry data; or to Section D, which offers views towards the future of online compound databases and their applications, including the possibility of integrating related databases to minimize overlapping efforts, addressing the challenge of getting data into databases where they can be most useful, and the role of medicinal chemistry databases in systems biology and systems pharmacology.

[1]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[2]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[3]  M. Kanehisa A database for post-genome analysis. , 1997, Trends in genetics : TIG.

[4]  M. Kanehisa,et al.  Computation with the KEGG pathway database. , 1998, Bio Systems.

[5]  J. Janc,et al.  High-throughput screening of enzyme inhibitors: simultaneous determination of tight-binding inhibition constants and enzyme concentration. , 2000, Analytical biochemistry.

[6]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[7]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[8]  Ferenc Csizmadia JChem: Java Applets and Modules Supporting Chemical Database Handling from Web Browsers , 2000, J. Chem. Inf. Comput. Sci..

[9]  Darren V. S. Green,et al.  Prediction of Biological Activity for High-Throughput Screening Using Binary Kernel Discrimination , 2001, J. Chem. Inf. Comput. Sci..

[10]  B. Shoichet,et al.  A common mechanism underlying promiscuous inhibitors from virtual and high-throughput screening. , 2002, Journal of medicinal chemistry.

[11]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[12]  James M. Stevenson and,et al.  Pipeline Pilot 2.1 By Scitegic, 9665 Chesapeake Drive, Suite 401, San Diego, CA 92123-1365. www.scitegic.com. See Web Site for Pricing Information , 2003 .

[13]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[14]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[15]  R. Abagyan,et al.  Comprehensive identification of "druggable" protein ligand binding sites. , 2004, Genome informatics. International Conference on Genome Informatics.

[16]  Renxiao Wang,et al.  The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. , 2004, Journal of medicinal chemistry.

[17]  Robert N. Goldberg,et al.  Thermodynamics of enzyme-catalyzed reactions - a database for quantitative biochemistry , 2004, Bioinform..

[18]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004 .

[19]  E. Birney,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Research.

[20]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2005, Nucleic Acids Res..

[21]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[22]  Michael G. Lerner,et al.  Binding MOAD (Mother Of All Databases) , 2005, Proteins.

[23]  Philip E. Bourne,et al.  Will a Biological Database Be Different from a Biological Journal? , 2005, PLoS Comput. Biol..

[24]  Michael K. Gilson,et al.  Virtual Screening of Molecular Databases Using a Support Vector Machine , 2005, J. Chem. Inf. Model..

[25]  Renxiao Wang,et al.  The PDBbind database: methodologies and updates. , 2005, Journal of medicinal chemistry.

[26]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[27]  Lucia Bryant The Social Worker , 2006 .

[28]  T. Keller,et al.  A practical view of 'druggability'. , 2006, Current opinion in chemical biology.

[29]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[30]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[31]  Gary D. Bader,et al.  Pathguide: a Pathway Resource List , 2005, Nucleic Acids Res..

[32]  Tom Halgren,et al.  New Method for Fast and Accurate Binding‐site Identification and Analysis , 2007, Chemical biology & drug design.

[33]  Alexander R. Pico,et al.  WikiPathways: Pathway Editing for the People , 2008, PLoS biology.

[34]  Robert B. Russell,et al.  SuperTarget and Matador: resources for exploring drug-target relationships , 2007, Nucleic Acids Res..

[35]  J A Peters,et al.  Guide to Receptors and Channels (GRAC), 3rd edition , 2008, British journal of pharmacology.

[36]  Yanli Wang,et al.  PubChem: Integrated Platform of Small Molecules and Biological Activities , 2008 .

[37]  Antony J Williams,et al.  Internet-based tools for communication and collaboration in chemistry. , 2008, Drug discovery today.

[38]  Anthony J Williams,et al.  Public chemical compound databases. , 2008, Current opinion in drug discovery & development.

[39]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[40]  Ruben Abagyan,et al.  New Method for the Assessment of All Drug-Like Pockets Across a Structural Genome , 2008, J. Comput. Biol..

[41]  Richard D. Smith,et al.  Binding MOAD, a high-quality protein–ligand database , 2007, Nucleic Acids Res..

[42]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..

[43]  Martin Jones,et al.  IUPHAR-DB: the IUPHAR database of G protein-coupled receptors and ion channels , 2008, Nucleic Acids Res..

[44]  Zhihai Liu,et al.  Comparative Assessment of Scoring Functions on a Diverse Test Set , 2009, J. Chem. Inf. Model..

[45]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[46]  Ralf Herwig,et al.  ConsensusPathDB—a database for integrating human functional interaction networks , 2008, Nucleic Acids Res..

[47]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[48]  Thorsten Meinl,et al.  KNIME - the Konstanz information miner: version 2.0 and beyond , 2009, SKDD.

[49]  James R. Brown,et al.  Thousands of chemical starting points for antimalarial lead identification , 2010, Nature.

[50]  R. Wade,et al.  Computational approaches to identifying and characterizing protein binding sites for ligand design , 2009, Journal of molecular recognition : JMR.

[51]  David S. Wishart,et al.  SMPDB: The Small Molecule Pathway Database , 2009, Nucleic Acids Res..

[52]  Renata C. Geer,et al.  The NCBI BioSystems database , 2009, Nucleic Acids Res..

[53]  John P. Overington,et al.  Role of open chemical data in aiding drug discovery and design. , 2010, Future medicinal chemistry.

[54]  Anang A. Shelat,et al.  Chemical genetics of Plasmodium falciparum , 2010, Nature.

[55]  S. Bryant,et al.  PubChem as a public resource for drug discovery. , 2010, Drug discovery today.

[56]  J. Bajorath,et al.  BindingDB and ChEMBL: online compound databases for drug discovery , 2011, Expert opinion on drug discovery.

[57]  Stephen P. H. Alexander,et al.  Guide to Receptors and Channels (GRAC), 5th edition , 2011, British journal of pharmacology.

[58]  Sameer Velankar,et al.  PDBe: Protein Data Bank in Europe , 2011, Nucleic Acids Res..

[59]  Rafael Gozalbes,et al.  Small molecule databases and chemical descriptors useful in chemoinformatics: an overview. , 2011, Combinatorial chemistry & high throughput screening.

[60]  Joanna L. Sharman,et al.  IUPHAR-DB: new receptors and tools for easy searching and visualization of pharmacological data , 2010, Nucleic Acids Res..

[61]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[62]  D. Schrijvers,et al.  Pharmacological modulation of cell death in atherosclerosis: a promising approach towards plaque stabilization? , 2011, British journal of pharmacology.

[63]  Sean Ekins,et al.  A quality alert and call for improved curation of public chemistry databases. , 2011, Drug discovery today.

[64]  Yingyao Zhou,et al.  Imaging of Plasmodium Liver Stages to Drive Next-Generation Antimalarial Drug Discovery , 2011, Science.

[65]  Gary D Bader,et al.  PSICQUIC and PSISCORE: accessing and scoring molecular interactions , 2011, Nature Methods.

[66]  My farewell to the Journal of Medicinal Chemistry. , 2011, Journal of medicinal chemistry.

[67]  Narmada Thanki,et al.  CDD: a Conserved Domain Database for the functional annotation of proteins , 2010, Nucleic Acids Res..

[68]  Evan Bolton,et al.  PubChem's BioAssay Database , 2011, Nucleic Acids Res..

[69]  G. Georg,et al.  Transition in leadership: opportunities and challenges. , 2012, Journal of medicinal chemistry.

[70]  Yanli Wang,et al.  MMDB: 3D structures and macromolecular interactions , 2011, Nucleic Acids Res..

[71]  Shilpa Rani,et al.  IPAVS: Integrated Pathway Resources, Analysis and Visualization System , 2012, Nucleic Acids Res..

[72]  Sean Ekins,et al.  Towards a gold standard: regarding quality in public domain chemistry databases and approaches to improving the situation. , 2012, Drug discovery today.

[73]  Yang Song,et al.  Therapeutic target database update 2012: a resource for facilitating target-oriented drug discovery , 2011, Nucleic Acids Res..

[74]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[75]  Philip E. Bourne,et al.  SuperTarget goes quantitative: update on drug–target interactions , 2011, Nucleic Acids Res..

[76]  Akira R. Kinjo,et al.  Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format , 2011, Nucleic Acids Res..