CREDO: A Protein–Ligand Interaction Database for Drug Discovery

Harnessing data from the growing number of protein–ligand complexes in the Protein Data Bank is an important task in drug discovery. In order to benefit from the abundance of three‐dimensional structures, structural data must be integrated with sequence as well as chemical data and the protein–small molecule interactions characterized structurally at the inter‐atomic level. In this study, we present CREDO, a new publicly available database of protein–ligand interactions, which represents contacts as structural interaction fingerprints, implements novel features and is completely scriptable through its application programming interface. Features of CREDO include implementation of molecular shape descriptors with ultrafast shape recognition, fragmentation of ligands in the Protein Data Bank, sequence‐to‐structure mapping and the identification of approved drugs. Selected analyses of these key features are presented to highlight a range of potential applications of CREDO. The CREDO dataset has been released into the public domain together with the application programming interface under a Creative Commons license at http://www‐cryst.bioc.cam.ac.uk/credo. We believe that the free availability and numerous features of CREDO database will be useful not only for commercial but also for academia‐driven drug discovery programmes.

[1]  Paul N. Mortenson,et al.  Diverse, high-quality test set for the validation of protein-ligand docking performance. , 2007, Journal of medicinal chemistry.

[2]  Chris Morley,et al.  Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit , 2008, Chemistry Central journal.

[3]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..

[4]  Stefan Günther,et al.  SuperPred: drug classification and target prediction , 2008, Nucleic Acids Res..

[5]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[6]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..

[7]  Michael M. Hann,et al.  RECAP-Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorial Chemistry , 1998, J. Chem. Inf. Comput. Sci..

[8]  A. Ghose,et al.  A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases. , 1999, Journal of combinatorial chemistry.

[9]  Hedi Peterson,et al.  g:Profiler—a web-based toolset for functional profiling of gene lists from large-scale experiments , 2007, Nucleic Acids Res..

[10]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[11]  Samy O Meroueh,et al.  PDBcal: A Comprehensive Dataset for Receptor–Ligand Interactions with Three‐dimensional Structures and Binding Thermodynamics from Isothermal Titration Calorimetry , 2008, Chemical biology & drug design.

[12]  J. Ladbury Just add water! The effect of water on the specificity of protein-ligand binding sites and its potential application to drug design. , 1996, Chemistry & biology.

[13]  Jonathan D. Wren,et al.  URL decay in MEDLINE - a 4-year follow-up study , 2008, Bioinform..

[14]  Jaime Prilusky,et al.  Automated analysis of interatomic contacts in proteins , 1999, Bioinform..

[15]  Sungsam Gong,et al.  Discarding Functional Residues from the Substitution Table Improves Predictions of Active Sites within Three-Dimensional Structures , 2008, PLoS Comput. Biol..

[16]  W. Graham Richards,et al.  Ultrafast shape recognition to search compound databases for similar molecular shapes , 2007, J. Comput. Chem..

[17]  M. Congreve,et al.  A 'rule of three' for fragment-based lead discovery? , 2003, Drug discovery today.

[18]  Terry Halpin,et al.  Conceptual Schema and Relational Database Design , 1995 .

[19]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[20]  Gerhard Klebe,et al.  Relibase: design and development of a database for comprehensive analysis of protein-ligand interactions. , 2003, Journal of molecular biology.

[21]  Ruth Nussinov,et al.  MultiBind and MAPPIS: webservers for multiple alignment of protein 3D-binding sites and their interactions , 2008, Nucleic Acids Res..

[22]  Susumu Goto,et al.  LIGAND: database of chemical compounds and reactions in biological pathways , 2002, Nucleic Acids Res..

[23]  Andreas Prlic,et al.  Ensembl 2007 , 2006, Nucleic Acids Res..

[24]  Kerim Babaoglu,et al.  Deconstructing fragment-based inhibitor discovery , 2006, Nature chemical biology.

[25]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[26]  Terry Halpin Conceptual schema and relational database design (2nd ed.) , 1996 .

[27]  Richard D. Smith,et al.  Binding MOAD, a high-quality protein–ligand database , 2007, Nucleic Acids Res..

[28]  Kei Yura,et al.  Het-PDB Navi.: a database for protein-small molecule interactions. , 2004, Journal of biochemistry.

[29]  Daniel Sunday Fast Polygon Area and Newell Normal Computation , 2002, J. Graphics, GPU, & Game Tools.

[30]  Z. Deng,et al.  Structural interaction fingerprint (SIFt): a novel method for analyzing three-dimensional protein-ligand binding interactions. , 2004, Journal of medicinal chemistry.

[31]  Gerhard Klebe,et al.  AffinDB: a freely accessible database of affinities for protein–ligand complexes from the PDB , 2005, Nucleic Acids Res..

[32]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[33]  Eric Westhof,et al.  Halogen bonds in biological molecules. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Christian von Mering,et al.  STITCH: interaction networks of chemicals and proteins , 2007, Nucleic Acids Res..

[35]  Bernard Manderick,et al.  PDB file parser and structure class implemented in Python , 2003, Bioinform..

[36]  R. Powers,et al.  Comparison of protein active site structures for functional annotation of proteins and drug design , 2006, Proteins.

[37]  John B. O. Mitchell,et al.  Protein Ligand Database (PLD): additional understanding of the nature and specificity of protein-ligand complexes , 2003, Bioinform..

[38]  Gilles Marcou,et al.  Optimizing Fragment and Scaffold Docking by Use of Molecular Interaction Fingerprints , 2007, J. Chem. Inf. Model..

[39]  Doo-Ho Cho,et al.  PDB-Ligand: a ligand database based on PDB for the automated and customized classification of ligand-binding structures , 2005, Nucleic Acids Res..

[40]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[41]  Kenneth M. Merz,et al.  Can we separate active from inactive conformations? , 2002, J. Comput. Aided Mol. Des..

[42]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[43]  Michael J. Keiser,et al.  Relating protein pharmacology by ligand chemistry , 2007, Nature Biotechnology.

[44]  Adel Golovin,et al.  MSDmotif: exploring protein sites and motifs , 2008, BMC Bioinformatics.

[45]  P Willett,et al.  Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. , 2002, Combinatorial chemistry & high throughput screening.

[46]  Nicola D. Gold,et al.  A Searchable Database for Comparing Protein-Ligand Binding Sites for the Analysis of Structure-Function Relationships , 2006, J. Chem. Inf. Model..

[47]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[48]  Jürgen Bajorath,et al.  Distinguishing between Bioactive and Modeled Compound Conformations through Mining of Emerging Chemical Patterns , 2008, J. Chem. Inf. Model..

[49]  Suzanne C Brewerton,et al.  The use of protein-ligand interaction fingerprints in docking. , 2008, Current opinion in drug discovery & development.

[50]  Ralf Zimmer,et al.  AutoPSI: a database for automatic structural classification of protein sequences and structures , 2008, Nucleic Acids Res..

[51]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[52]  A M Hassell,et al.  Structure of the catalytic domain of fibroblast collagenase complexed with an inhibitor. , 1994, Science.

[53]  Frederick P. Roth,et al.  Chemical substructures that enrich for biological activity , 2008, Bioinform..

[54]  Janet M Thornton,et al.  Cognate ligand domain mapping for enzymes. , 2006, Journal of molecular biology.

[55]  M. Congreve,et al.  Fragment-based lead discovery , 2004, Nature Reviews Drug Discovery.

[56]  Markus Hartenfeller,et al.  Concept of Combinatorial De Novo Design of Drug‐like Molecules by Particle Swarm Optimization , 2008, Chemical biology & drug design.