Knowledge-based annotation of small molecule binding sites in proteins

BackgroundThe study of protein-small molecule interactions is vital for understanding protein function and for practical applications in drug discovery. To benefit from the rapidly increasing structural data, it is essential to improve the tools that enable large scale binding site prediction with greater emphasis on their biological validity.ResultsWe have developed a new method for the annotation of protein-small molecule binding sites, using inference by homology, which allows us to extend annotation onto protein sequences without experimental data available. To ensure biological relevance of binding sites, our method clusters similar binding sites found in homologous protein structures based on their sequence and structure conservation. Binding sites which appear evolutionarily conserved among non-redundant sets of homologous proteins are given higher priority. After binding sites are clustered, position specific score matrices (PSSMs) are constructed from the corresponding binding site alignments. Together with other measures, the PSSMs are subsequently used to rank binding sites to assess how well they match the query and to better gauge their biological relevance. The method also facilitates a succinct and informative representation of observed and inferred binding sites from homologs with known three-dimensional structures, thereby providing the means to analyze conservation and diversity of binding modes. Furthermore, the chemical properties of small molecules bound to the inferred binding sites can be used as a starting point in small molecule virtual screening. The method was validated by comparison to other binding site prediction methods and to a collection of manually curated binding site annotations. We show that our method achieves a sensitivity of 72% at predicting biologically relevant binding sites and can accurately discriminate those sites that bind biological small molecules from non-biological ones.ConclusionsA new algorithm has been developed to predict binding sites with high accuracy in terms of their biological validity. It also provides a common platform for function prediction, knowledge-based docking and for small molecule virtual screening. The method can be applied even for a query sequence without structure. The method is available at http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi.

[1]  Renxiao Wang,et al.  The PDBbind database: methodologies and updates. , 2005, Journal of medicinal chemistry.

[2]  Jian Zhang,et al.  Peptide deformylase is a potential target for anti‐Helicobacter pylori drugs: Reverse docking, enzymatic assay, and X‐ray crystallography validation , 2006, Protein science : a publication of the Protein Society.

[3]  Christophe Combet,et al.  The SuMo server: 3D search for protein functional sites , 2005, Bioinform..

[4]  Gail J. Bartlett,et al.  Analysis of catalytic residues in enzyme active sites. , 2002, Journal of molecular biology.

[5]  Mona Singh,et al.  Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure , 2009, PLoS Comput. Biol..

[6]  Michael J E Sternberg,et al.  Prediction of ligand binding sites using homologous structures and conservation at CASP8 , 2009, Proteins.

[7]  Alfonso Valencia,et al.  firestar—prediction of functionally important residues using structural templates and alignment reliability , 2007, Nucleic Acids Res..

[8]  Bingding Huang,et al.  MetaPocket: a meta approach to improve protein ligand binding site prediction. , 2009, Omics : a journal of integrative biology.

[9]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[10]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[11]  Haruki Nakamura,et al.  Comprehensive structural classification of ligand-binding motifs in proteins. , 2008, Structure.

[12]  Yanli Wang,et al.  MMDB: annotating protein sequences with Entrez's 3D-structure database , 2006, Nucleic Acids Res..

[13]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[14]  B. Honig,et al.  On the nature of cavities on protein surfaces: Application to the identification of drug‐binding sites , 2006, Proteins.

[15]  A. Panchenko,et al.  Prediction of functional sites by analysis of sequence and structure conservation , 2004, Protein science : a publication of the Protein Society.

[16]  Salim Bougouffa,et al.  SitesIdentify: a protein functional site prediction tool , 2009, BMC Bioinformatics.

[17]  Xiaomin Luo,et al.  TarFisDock: a web server for identifying drug targets with docking approach , 2006, Nucleic Acids Res..

[18]  Richard M. Jackson,et al.  Predicting protein interaction sites: binding hot-spots in protein-protein and protein-ligand interfaces , 2006, Bioinform..

[19]  Richard M. Jackson,et al.  Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites , 2005, Bioinform..

[20]  S. J. Campbell,et al.  Ligand binding: functional site location, similarity and docking. , 2003, Current opinion in structural biology.

[21]  Michal Brylinski,et al.  FINDSITELHM: A Threading-Based Approach to Ligand Homology Modeling , 2009, PLoS Comput. Biol..

[22]  C. Chothia,et al.  Determination of protein function, evolution and interactions by structural genomics. , 2001, Current opinion in structural biology.

[23]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[24]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[25]  B. Rost Enzyme function less conserved than anticipated. , 2002, Journal of molecular biology.

[26]  Marc A. Martí-Renom,et al.  The AnnoLite and AnnoLyze programs for comparative annotation of protein structures , 2007, BMC Bioinformatics.

[27]  Benjamin A. Shoemaker,et al.  Inferred Biomolecular Interaction Server—a web server to analyze and predict protein interacting partners and binding sites , 2009, Nucleic Acids Res..

[28]  Igor N. Berezovsky,et al.  Domain Hierarchy and closed Loops (DHcL): a server for exploring hierarchy of protein domain structure , 2008, Nucleic Acids Res..

[29]  Janet M. Thornton,et al.  WSsas: a web service for the annotation of functional residues through structural homologues , 2009, Bioinform..

[30]  M. Schroeder,et al.  LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation , 2006, BMC Structural Biology.

[31]  Huan-Xiang Zhou,et al.  meta-PPISP: a meta web server for protein-protein interaction site prediction , 2007, Bioinform..

[32]  W. Bialek,et al.  Information-based clustering. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[33]  M. Sternberg,et al.  Automated prediction of protein function and detection of functional sites from structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Stella Veretnik,et al.  Partitioning protein structures into domains: why is it so difficult? , 2006, Journal of molecular biology.

[35]  Mona Singh,et al.  Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[36]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[37]  Evgeny B. Krissinel,et al.  Crystal contacts as nature's docking solutions , 2010, J. Comput. Chem..

[38]  Song Liu,et al.  Protein binding site prediction using an empirical scoring function , 2006, Nucleic acids research.

[39]  K. Nishikawa,et al.  Prediction of catalytic residues in enzymes based on known tertiary structure, stability profile, and sequence conservation. , 2003, Journal of molecular biology.

[40]  Gabriel del Rio,et al.  Improved prediction of critical residues for protein function based on network and phylogenetic analyses , 2005, BMC Bioinformatics.

[41]  Narmada Thanki,et al.  CDD: specific functional annotation with the Conserved Domain Database , 2008, Nucleic Acids Res..

[42]  M. Gerstein,et al.  Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. , 2004, Genome research.

[43]  P. Bork,et al.  Predicting functions from protein sequences—where are the bottlenecks? , 1998, Nature Genetics.

[44]  Johannes Söding,et al.  Prediction of protein functional residues from sequence by probability density estimation , 2008, Bioinform..

[45]  Christopher W. V. Hogue,et al.  Domain-based small molecule binding site annotation , 2006, BMC Bioinformatics.

[46]  Vladimir A. Ivanisenko,et al.  PDBSiteScan: a program for searching for active, binding and posttranslational modification sites in the 3D structures of proteins , 2004, Nucleic Acids Res..

[47]  J. Warwicker,et al.  Enzyme/non-enzyme discrimination and prediction of enzyme active site location using charge-based methods. , 2004, Journal of molecular biology.

[48]  M. Gerstein,et al.  The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. , 1999, Journal of molecular biology.

[49]  D. Eisenberg,et al.  Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. , 2001, Journal of molecular biology.

[50]  Markus Fischer,et al.  Structural relationships among proteins with different global topologies and their implications for function annotation strategies , 2009, Proceedings of the National Academy of Sciences.

[51]  J. Skolnick,et al.  A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation , 2008, Proceedings of the National Academy of Sciences.

[52]  Hideaki Sugawara,et al.  The GTOP database in 2009: updated content and novel features to expand and deepen insights into protein structures and functions , 2008, Nucleic Acids Res..

[53]  H. Edelsbrunner,et al.  Anatomy of protein pockets and cavities: Measurement of binding site geometry and implications for ligand design , 1998, Protein science : a publication of the Protein Society.

[54]  John B. Anderson,et al.  CDD: a curated Entrez database of conserved domain alignments , 2003, Nucleic Acids Res..

[55]  R. Greaves,et al.  Active site identification through geometry-based and sequence profile-based calculations: burial of catalytic clefts. , 2005, Journal of molecular biology.

[56]  John B. Anderson,et al.  MMDB: Entrez's 3D-structure database , 2002, Nucleic Acids Res..

[57]  Lukasz A. Kurgan,et al.  Accurate sequence-based prediction of catalytic residues , 2008, Bioinform..

[58]  R. Laskowski SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. , 1995, Journal of molecular graphics.

[59]  Yen-Jen Oyang,et al.  Protemot: prediction of protein binding sites with automatically extracted geometrical templates , 2006, Nucleic Acids Res..

[60]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[61]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[62]  Pieter F. W. Stouten,et al.  Fast prediction and visualization of protein binding pockets with PASS , 2000, J. Comput. Aided Mol. Des..

[63]  Michal Brylinski,et al.  FINDSITE: a combined evolution/structure-based approach to protein function prediction , 2009, Briefings Bioinform..

[64]  Janet M Thornton,et al.  Cognate ligand domain mapping for enzymes. , 2006, Journal of molecular biology.

[65]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[66]  Stephen H. Bryant,et al.  CD-Search: protein domain annotations on the fly , 2004, Nucleic Acids Res..

[67]  Patricia C Babbitt,et al.  Can sequence determine function? , 2000, Genome Biology.

[68]  Dario Ghersi,et al.  SITEHOUND-web: a server for ligand binding site identification in protein structures , 2009, Nucleic Acids Res..

[69]  Ruth Nussinov,et al.  SiteEngines: recognition and comparison of binding sites and protein–protein interfaces , 2005, Nucleic Acids Res..

[70]  Yanli Wang,et al.  MMDB: Entrez's 3D-structure database , 2003, Nucleic Acids Res..

[71]  T. Blundell,et al.  Distinguishing structural and functional restraints in evolution in order to identify interaction sites. , 2004, Journal of molecular biology.

[72]  Irena Roterman-Konieczna,et al.  Prediction of Functional Sites Based on the Fuzzy Oil Drop Model , 2007, PLoS Comput. Biol..