FINDSITE: a combined evolution/structure-based approach to protein function prediction

A key challenge of the post-genomic era is the identification of the function(s) of all the molecules in a given organism. Here, we review the status of sequence and structure-based approaches to protein function inference and ligand screening that can provide functional insights for a significant fraction of the approximately 50% of ORFs of unassigned function in an average proteome. We then describe FINDSITE, a recently developed algorithm for ligand binding site prediction, ligand screening and molecular function prediction, which is based on binding site conservation across evolutionary distant proteins identified by threading. Importantly, FINDSITE gives comparable results when high-resolution experimental structures as well as predicted protein models are used.

[1]  Prasanna R Kolatkar,et al.  Assessment of CASP7 structure predictions for template free targets , 2007, Proteins.

[2]  Gail J. Bartlett,et al.  Analysis of catalytic residues in enzyme active sites. , 2002, Journal of molecular biology.

[3]  Patricia C. Babbitt,et al.  Automated discovery of 3D motifs for protein function annotation , 2006, Bioinform..

[4]  Yang Zhang,et al.  The protein structure prediction problem could be solved using the current PDB library. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  P C Babbitt,et al.  Evolution of enzymatic activities in the enolase superfamily: crystal structure of (D)-glucarate dehydratase from Pseudomonas putida. , 1998, Biochemistry.

[6]  Michela Taufer,et al.  Study of a highly accurate and fast protein–ligand docking method based on molecular dynamics: Research Articles , 2005 .

[7]  M. Ondrechen,et al.  THEMATICS: A simple computational predictor of enzyme function from structure , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[8]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[9]  B. Shoichet,et al.  Soft docking and multiple receptor conformations in virtual screening. , 2004, Journal of medicinal chemistry.

[10]  P. Babbitt,et al.  Evolution of enzyme superfamilies. , 2006, Current opinion in chemical biology.

[11]  I. Vakser Low-resolution docking: prediction of complexes for underdetermined structures. , 1998, Biopolymers.

[12]  A. Valencia,et al.  Automatic methods for predicting functionally important residues. , 2003, Journal of molecular biology.

[13]  Yang Zhang,et al.  TASSER: An automated method for the prediction of protein tertiary structures in CASP6 , 2005, Proteins.

[14]  Yang Zhang,et al.  Large-scale assessment of the utility of low-resolution protein structures for biochemical function assignment , 2004, Bioinform..

[15]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[16]  Patricia C Babbitt,et al.  Can sequence determine function? , 2000, Genome Biology.

[17]  D. van der Spoel,et al.  Blind docking of drug‐sized compounds to proteins with up to a thousand residues , 2006, FEBS letters.

[18]  J. Skolnick,et al.  Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. , 1998, Journal of molecular biology.

[19]  A. Sali,et al.  Structural genomics: beyond the Human Genome Project , 1999, Nature Genetics.

[20]  Torsten Schwede,et al.  Assessment of CASP7 predictions for template‐based modeling targets , 2007, Proteins.

[21]  M. L. Connolly Analytical molecular surface calculation , 1983 .

[22]  Yang Zhang,et al.  Template‐based modeling and free modeling by I‐TASSER in CASP7 , 2007, Proteins.

[23]  T J Oldfield,et al.  Data mining the protein data bank: Residue interactions , 2002, Proteins.

[24]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[25]  Didier Rognan,et al.  Protein‐based virtual screening of chemical databases. II. Are homology models of g‐protein coupled receptors suitable targets? , 2002, Proteins.

[26]  Weidong Tian,et al.  High precision multi-genome scale reannotation of enzyme function by EFICAz , 2006, BMC Genomics.

[27]  J. Skolnick,et al.  EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference. , 2004, Nucleic acids research.

[28]  Frances M. G. Pearl,et al.  The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution , 2006, Nucleic Acids Res..

[29]  J. Skolnick,et al.  From genes to protein structure and function: novel applications of computational approaches in the genomic era. , 2000, Trends in biotechnology.

[30]  M Hendlich,et al.  LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. , 1997, Journal of molecular graphics & modelling.

[31]  Jens Meiler,et al.  ROSETTALIGAND: Protein–small molecule docking with full side‐chain flexibility , 2006, Proteins.

[32]  B. Shoichet,et al.  Information decay in molecular docking screens against holo, apo, and modeled conformations of enzymes. , 2003, Journal of medicinal chemistry.

[33]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of genome information in 2007 , 2007, Nucleic Acids Res..

[34]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[35]  A J Olson,et al.  Recognition templates for predicting adenylate-binding sites in proteins. , 2001, Journal of molecular biology.

[36]  M. Gerstein,et al.  The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. , 1999, Journal of molecular biology.

[37]  L. Kavraki,et al.  An accurate, sensitive, and scalable method to identify functional sites in protein structures. , 2003, Journal of molecular biology.

[38]  E. Koonin,et al.  Evolution of protein domain promiscuity in eukaryotes. , 2008, Genome research.

[39]  Thomas Lengauer,et al.  A fast flexible docking method using an incremental construction algorithm. , 1996, Journal of molecular biology.

[40]  Jeffrey Skolnick,et al.  Efficient prediction of nucleic acid binding function from low-resolution protein structures. , 2006, Journal of molecular biology.

[41]  J. Skolnick,et al.  Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm , 2004, Proteins.

[42]  J. Thornton,et al.  A method for localizing ligand binding pockets in protein structures , 2005, Proteins.

[43]  Adam Godzik,et al.  New avenues in protein function prediction , 2006, Protein science : a publication of the Protein Society.

[44]  Jürgen Bajorath,et al.  Similarity Search Profiling Reveals Effects of Fingerprint Scaling in Virtual Screening. , 2005 .

[45]  Shashi B. Pandit,et al.  SUPFAM - a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes , 2002, Nucleic Acids Res..

[46]  E. Jaeger,et al.  Comparison of automated docking programs as virtual screening tools. , 2005, Journal of Medicinal Chemistry.

[47]  Maya Topf,et al.  PREDICT modeling and in‐silico screening for G‐protein coupled receptors , 2004, Proteins.

[48]  D. Higgins,et al.  Bioinformatics : sequence, structure, and databanks , 2000 .

[49]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[50]  David Baker,et al.  Ranking predicted protein structures with support vector regression , 2007, Proteins.

[51]  X. Zou,et al.  Ensemble docking of multiple protein structures: Considering protein structural variations in molecular docking , 2006, Proteins.

[52]  P C Babbitt,et al.  Evolution of an enzyme active site: the structure of a new crystal form of muconate lactonizing enzyme compared with mandelate racemase and enolase. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Alasdair T. R. Laurie,et al.  Methods for the prediction of protein-ligand binding sites for structure-based drug design and virtual ligand screening. , 2006, Current protein & peptide science.

[54]  C. Ouzounis,et al.  Whole‐genome sequence annotation: ‘Going wrong with confidence’ , 1999, Molecular microbiology.

[55]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[56]  Andrey A Mironov,et al.  A metabolic network in the evolutionary context: multiscale structure and modularity. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[57]  Stacy T. Knutson,et al.  Synergistic Computational and Experimental Proteomics Approaches for More Accurate Detection of Active Serine Hydrolases in Yeast , 2004, Molecular & Cellular Proteomics.

[58]  Akihiro Yamaguchi,et al.  Enlarged FAMSBASE: protein 3D structure models of genome sequences for 41 species , 2003, Nucleic Acids Res..

[59]  Benjamin F. Cravatt,et al.  Assignment of protein function in the postgenomic era , 2005 .

[60]  H. Edelsbrunner,et al.  Anatomy of protein pockets and cavities: Measurement of binding site geometry and implications for ligand design , 1998, Protein science : a publication of the Protein Society.

[61]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998 .

[62]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[63]  Hans Lehrach,et al.  GOblet: a platform for Gene Ontology annotation of anonymous sequence data , 2004, Nucleic Acids Res..

[64]  Andreas Evers,et al.  Virtual screening of biogenic amine-binding G-protein coupled receptors: comparative evaluation of protein- and ligand-based virtual screening protocols. , 2005, Journal of medicinal chemistry.

[65]  Robert B. Russell,et al.  Annotation in three dimensions , 2003 .

[66]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[67]  B. Shoichet,et al.  Flexible ligand docking using conformational ensembles , 1998, Protein science : a publication of the Protein Society.

[68]  M J Sternberg,et al.  Supersites within superfolds. Binding site similarity in the absence of homology. , 1998, Journal of molecular biology.

[69]  A. Elcock Prediction of functionally important residues based solely on the computed energetics of protein structure. , 2001, Journal of molecular biology.

[70]  C. E. Peishoff,et al.  A critical assessment of docking programs and scoring functions. , 2006, Journal of medicinal chemistry.

[71]  M. Gerstein,et al.  Structural Genomics: Current Progress , 2003, Science.

[72]  Günther Zehetner,et al.  OntoBlast function: from sequence similarities directly to potential functional annotations by ontology terms , 2003, Nucleic Acids Res..

[73]  Marc A. Martí-Renom,et al.  MODBASE: a database of annotated comparative protein structure models and associated resources , 2005, Nucleic Acids Res..

[74]  P. Willett,et al.  Promoting Access to White Rose Research Papers Similarity-based Virtual Screening Using 2d Fingerprints , 2022 .

[75]  Peter Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[76]  Michael Ashburner,et al.  Assessment of genome-wide protein function classification for Drosophila melanogaster. , 2003, Genome research.

[77]  Russ B. Altman,et al.  Automated Construction of Structural Motifs for Predicting Functional Sites on Protein Structures , 2003, Pacific Symposium on Biocomputing.

[78]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[79]  E. Koonin,et al.  The ancient Virus World and evolution of cells , 2006, Biology Direct.

[80]  Robert B. Russell,et al.  Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures , 2003, Nucleic Acids Res..

[81]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998, J. Comput. Chem..

[82]  R. Russell,et al.  Detection of protein three-dimensional side-chain patterns: new examples of convergent evolution. , 1998, Journal of molecular biology.

[83]  B. Rost,et al.  Automatic prediction of protein function , 2003, Cellular and Molecular Life Sciences CMLS.

[84]  J. Skolnick,et al.  How well is enzyme function conserved as a function of pairwise sequence identity? , 2003, Journal of molecular biology.

[85]  Cheryl H Arrowsmith,et al.  Solution NMR in structural genomics. , 2006, Current opinion in structural biology.

[86]  I. Vakser Protein docking for low-resolution structures. , 1995, Protein engineering.

[87]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[88]  D. J. Price,et al.  Assessing scoring functions for protein-ligand interactions. , 2004, Journal of medicinal chemistry.

[89]  S. Brenner A tour of structural genomics , 2001, Nature Reviews Genetics.

[90]  Todd J. A. Ewing,et al.  DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases , 2001, J. Comput. Aided Mol. Des..

[91]  Minoru Kanehisa,et al.  Using protein motif combinations to update KEGG pathway maps and orthologue tables. , 2004, Genome informatics. International Conference on Genome Informatics.

[92]  J. Skolnick,et al.  Ab initio modeling of small proteins by iterative TASSER simulations , 2007, BMC Biology.

[93]  Ivano Bertini Structural genomics. , 2003, Accounts of chemical research.

[94]  Jacquelyn S. Fetrow,et al.  Structural genomics and its importance for gene function analysis , 2000, Nature Biotechnology.

[95]  Patricia C. Babbitt,et al.  Evolutionarily Conserved Substrate Substructures for Automated Annotation of Enzyme Superfamilies , 2008, PLoS Comput. Biol..

[96]  Peter D Karp,et al.  The past, present and future of genome-wide re-annotation , 2002, Genome Biology.

[97]  J. Thornton,et al.  Tess: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites , 1997, Protein science : a publication of the Protein Society.

[98]  I. Enyedy,et al.  Discovery of small-molecule inhibitors of Bcl-2 through structure-based computer screening. , 2001, Journal of medicinal chemistry.

[99]  W Patrick Walters,et al.  A detailed comparison of current docking and scoring methods on systems of pharmaceutical relevance , 2004, Proteins.

[100]  J. Skolnick,et al.  Automated structure prediction of weakly homologous proteins on a genomic scale. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[101]  Yang Zhang,et al.  I-TASSER server for protein 3D structure prediction , 2008, BMC Bioinformatics.

[102]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt): an expanding universe of protein information , 2005, Nucleic Acids Res..

[103]  C. Frömmel,et al.  The automatic search for ligand binding sites in proteins of known three-dimensional structure using only geometric criteria. , 1996, Journal of molecular biology.

[104]  Robert D. Finn,et al.  Pfam 10 years on: 10 000 families and still growing , 2008, Briefings Bioinform..

[105]  Lydia E. Kavraki,et al.  Prediction of enzyme function based on 3D templates of evolutionarily important amino acids , 2008, BMC Bioinformatics.

[106]  David T. Jones,et al.  Threading methods for protein structure prediction , 2000 .

[107]  Yang Zhang,et al.  Tertiary structure predictions on a comprehensive benchmark of medium to large size proteins. , 2004, Biophysical journal.

[108]  S. Teague Implications of protein flexibility for drug discovery , 2003, Nature Reviews Drug Discovery.

[109]  Randy J Read,et al.  Automated server predictions in CASP7 , 2007, Proteins.

[110]  M. Schroeder,et al.  LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation , 2006, BMC Structural Biology.

[111]  Jacquelyn S Fetrow,et al.  Function first: a powerful approach to post-genomic drug discovery. , 2002, Drug discovery today.

[112]  Jeffrey Skolnick,et al.  Assessment of programs for ligand binding affinity prediction , 2008, J. Comput. Chem..

[113]  Neil Hall,et al.  Advanced sequencing technologies and their wider impact in microbiology , 2007, Journal of Experimental Biology.

[114]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[115]  E. Huang,et al.  Are predicted structures good enough to preserve functional sites? , 1999, Structure.

[116]  P. Babbitt Definitions of enzyme function for the structural genomics era. , 2003, Current opinion in chemical biology.

[117]  M. Sternberg,et al.  Automated prediction of protein function and detection of functional sites from structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[118]  Andreas Martin Lisewski,et al.  De-Orphaning the Structural Proteome through Reciprocal Comparison of Evolutionarily Important Structural Features , 2008, PloS one.

[119]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[120]  Lars Malmström,et al.  Structure prediction for CASP7 targets using extensive all‐atom refinement with Rosetta@home , 2007, Proteins.

[121]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[122]  Eckart Bindewald,et al.  A scoring function for docking ligands to low‐resolution protein structures , 2005, J. Comput. Chem..

[123]  Jeffrey Skolnick,et al.  DBD-Hunter: a knowledge-based method for the prediction of DNA–protein interactions , 2008, Nucleic acids research.

[124]  Randy J Read,et al.  Assessment of CASP7 predictions in the high accuracy template‐based modeling category , 2007, Proteins.

[125]  Didier Rognan,et al.  Comparative evaluation of eight docking tools for docking and virtual screening accuracy , 2004, Proteins.

[126]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[127]  G J Kleywegt,et al.  Recognition of spatial motifs in protein structures. , 1999, Journal of molecular biology.

[128]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[129]  Richard A. Lewis,et al.  Lessons in molecular recognition: the effects of ligand and protein flexibility on molecular docking accuracy. , 2004, Journal of medicinal chemistry.

[130]  W. Fitch Homology a personal view on some of the problems. , 2000, Trends in genetics : TIG.

[131]  P. Bork,et al.  Predicting functions from protein sequences—where are the bottlenecks? , 1998, Nature Genetics.

[132]  Michal Brylinski,et al.  Q‐Dock: Low‐resolution flexible ligand docking with pocket‐specific threading restraints , 2008, J. Comput. Chem..

[133]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[134]  Irena Roterman-Konieczna,et al.  Sequence-Structure-Function Relation Characterized in silico , 2006, Silico Biol..

[135]  Seung Yup Lee,et al.  Analysis of TASSER‐based CASP7 protein structure prediction results , 2007, Proteins.

[136]  Thomas Hamelryck,et al.  Efficient identification of side‐chain patterns using a multidimensional index tree , 2003, Proteins.

[137]  Andrew A. Chien,et al.  Study of a highly accurate and fast protein–ligand docking method based on molecular dynamics , 2005, Concurr. Comput. Pract. Exp..

[138]  Dmitrij Frishman,et al.  The PEDANT genome database in 2005 , 2004, Nucleic Acids Res..

[139]  Dora M Schnur Recent trends in library design: 'rational design' revisited. , 2008, Current opinion in drug discovery & development.

[140]  J. Scott Dixon,et al.  Flexible ligand docking using a genetic algorithm , 1995, J. Comput. Aided Mol. Des..

[141]  Rachel Kolodny,et al.  Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. , 2005, Journal of molecular biology.

[142]  Leroy Hood,et al.  The impact of systems approaches on biological problems in drug discovery , 2004, Nature Biotechnology.

[143]  Michael E Phelps,et al.  Systems Biology and New Technologies Enable Predictive and Preventative Medicine , 2004, Science.

[144]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[145]  J. Skolnick,et al.  A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation , 2008, Proceedings of the National Academy of Sciences.

[146]  Dmitrij Frishman,et al.  The PEDANT genome database , 2003, Nucleic Acids Res..

[147]  D. Eisenberg,et al.  Inference of protein function from protein structure. , 2005, Structure.

[148]  Marek Wojciechowski,et al.  Docking of small ligands to low‐resolution and theoretically predicted receptor structures , 2002, J. Comput. Chem..

[149]  Ivan Rayment,et al.  Divergent evolution in the enolase superfamily: the interplay of mechanism and specificity. , 2005, Archives of biochemistry and biophysics.

[150]  Michael J E Sternberg,et al.  The proteome: structure, function and evolution , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[151]  Janet M Thornton,et al.  Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. , 2003, Nucleic acids research.

[152]  Daisuke Kihara,et al.  Microbial genomes have over 72% structure assignment by the threading algorithm PROSPECTOR_Q , 2004, Proteins.

[153]  Shiow-Fen Hwang,et al.  SODOCK: Swarm optimization for highly flexible protein–ligand docking , 2007, J. Comput. Chem..

[154]  Krzysztof Fidelis,et al.  Progress from CASP6 to CASP7 , 2007, Proteins.

[155]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[156]  Janet M. Thornton,et al.  ProFunc: a server for predicting protein function from 3D structure , 2005, Nucleic Acids Res..

[157]  John-Marc Chandonia,et al.  Structural proteomics of minimal organisms: Conservation of protein fold usage and evolutionary implications , 2006, BMC Structural Biology.