Computational methods for identification of functional residues in protein structures.

The recent accumulation of experimentally determined protein 3D structures combined with our ability to computationally model structure from amino acid sequence has resulted in an increased importance of structure-based methods for protein function prediction. Two types of methods for function prediction have been proposed: those that can accurately predict overall biochemical or biological roles of a protein and those that predict its functional residues. Here, we review approaches used for the computational identification of functional residues in protein structures and summarize their applications to a wide variety of problems in functional proteomics, such as the prediction of catalytic residues, post-translational modifications, or nucleic acid-binding sites. We examine four different problems in order to perform a comparison between several recently proposed methods and, finally, conclude by identifying limitations and future challenges in this field.

[1]  David Baker,et al.  An exciting but challenging road ahead for computational enzyme design , 2010, Protein science : a publication of the Protein Society.

[2]  Steven Myers,et al.  Structure-based kernels for the prediction of catalytic residues and their involvement in human inherited disease , 2010, BMC Bioinformatics.

[3]  Gregory Butler,et al.  A regression tree-based Gibbs sampler to learn the regulation programs in a transcription regulatory module network , 2010, 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[4]  V. Vacic,et al.  Identification, analysis, and prediction of protein ubiquitination sites , 2010, Proteins.

[5]  Vladimir Vacic,et al.  Graphlet Kernels for Prediction of Functional Residues in Protein Structures , 2010, J. Comput. Biol..

[6]  Michael I. Jordan,et al.  Active site prediction using evolutionary and structural information , 2010, Bioinform..

[7]  J. Gsponer Transcript Synthesis to Protein Degradation Tight Regulation of Unstructured Proteins: From , 2010 .

[8]  Tianyun Liu,et al.  Prediction of calcium-binding sites by combining loop-modeling with machine learning , 2009 .

[9]  William Stafford Noble,et al.  How does multiple testing correction work? , 2009, Nature Biotechnology.

[10]  Patricia C. Babbitt,et al.  Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies , 2009, PLoS Comput. Biol..

[11]  Mona Singh,et al.  Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure , 2009, PLoS Comput. Biol..

[12]  Predrag Radivojac,et al.  Influence of Sequence Changes and Environment on Intrinsically Disordered Proteins , 2009, PLoS Comput. Biol..

[13]  Russ B Altman,et al.  Improving structure-based function prediction using molecular dynamics. , 2009, Structure.

[14]  David A. Lee,et al.  PSI-2: structural genomics to cover protein domain family space. , 2009, Structure.

[15]  Ying Wei,et al.  Partial Order Optimum Likelihood (POOL): Maximum Likelihood Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties , 2009, PLoS Comput. Biol..

[16]  Alfonso Valencia,et al.  Progress and challenges in predicting protein-protein interaction sites , 2008, Briefings Bioinform..

[17]  Michael J E Sternberg,et al.  Prediction of ligand binding sites using homologous structures and conservation at CASP8 , 2009, Proteins.

[18]  Yoav Freund,et al.  ResBoost: characterizing and predicting catalytic residues in enzymes , 2009, BMC Bioinformatics.

[19]  S. Teichmann,et al.  Tight Regulation of Unstructured Proteins: From Transcript Synthesis to Protein Degradation , 2008, Science.

[20]  B. Rost,et al.  Automated Identification of Complementarity Determining Regions (CDRs) Reveals Peculiar Characteristics of CDRs and B Cell Epitopes1 , 2008, The Journal of Immunology.

[21]  Andrew J. Bordner,et al.  Predicting small ligand binding sites in proteins using backbone structure , 2008, Bioinform..

[22]  David A. Cieslak,et al.  Automatically countering imbalance and its empirical relationship to cost , 2008, Data Mining and Knowledge Discovery.

[23]  Gunnar Rätsch,et al.  Support Vector Machines and Kernels for Computational Biology , 2008, PLoS Comput. Biol..

[24]  Dariya S. Glazer,et al.  The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications , 2008, BMC Genomics.

[25]  Jyoti S Choudhary,et al.  Phosphoproteomic Analysis of the Mouse Brain Cytosol Reveals a Predominance of Protein Phosphorylation in Regions of Intrinsic Sequence Disorder*S , 2008, Molecular & Cellular Proteomics.

[26]  M. Helmer-Citterich,et al.  Structure-based function prediction: approaches and applications. , 2008, Briefings in functional genomics & proteomics.

[27]  Mehryar Mohri,et al.  Sample Selection Bias Correction Theory , 2008, ALT.

[28]  Yong-Zi Chen,et al.  An improved prediction of catalytic residues in enzyme structures. , 2008, Protein engineering, design & selection : PEDS.

[29]  Ronald J. Williams,et al.  Enhanced performance in prediction of protein active sites with THEMATICS and support vector machines , 2008, Protein science : a publication of the Protein Society.

[30]  Janet M. Thornton,et al.  Understanding the molecular machinery of genetics through 3D structures , 2008, Nature Reviews Genetics.

[31]  David W Ritchie,et al.  Recent progress and future directions in protein-protein docking. , 2008, Current protein & peptide science.

[32]  J. Skolnick,et al.  A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation , 2008, Proceedings of the National Academy of Sciences.

[33]  Russ B. Altman,et al.  Combining Molecular Dynamics and Machine Learning to Improve Protein Function Recognition , 2007, Pacific Symposium on Biocomputing.

[34]  Jessica C. Ebert,et al.  Robust recognition of zinc binding sites in proteins , 2007, Protein science : a publication of the Protein Society.

[35]  Russ B Altman,et al.  The SeqFEATURE library of 3D functional site models: comparison to existing methods and applications to protein function annotation , 2008, Genome Biology.

[36]  D. Kern,et al.  Dynamic personalities of proteins , 2007, Nature.

[37]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[38]  P. Bourne,et al.  Antibody-protein interactions: benchmark datasets and prediction tools evaluation , 2007 .

[39]  Huan-Xiang Zhou,et al.  Interaction-site prediction for protein complexes: a critical assessment , 2007, Bioinform..

[40]  Philip E. Bourne,et al.  A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand binding sites , 2007, BMC Bioinformatics.

[41]  Giovanni De Micheli,et al.  Clustering protein environments for function prediction: finding PROSITE motifs in 3D , 2007, BMC Bioinformatics.

[42]  A Keith Dunker,et al.  Characterization of molecular recognition features, MoRFs, and their binding partners. , 2007, Journal of proteome research.

[43]  G. Schneider,et al.  PocketPicker: analysis of ligand binding-sites with shape descriptors , 2007, Chemistry Central Journal.

[44]  N. Bhardwaj,et al.  Residue‐level prediction of DNA‐binding sites and its application on DNA‐binding protein predictions , 2007, FEBS letters.

[45]  Christopher J. Oldfield,et al.  Intrinsic disorder and functional proteomics. , 2007, Biophysical journal.

[46]  P. Radivojac,et al.  Evaluation of features for catalytic residue prediction in novel folds , 2007 .

[47]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[48]  Ying Wei,et al.  Selective prediction of interaction sites in protein structures with THEMATICS , 2007, BMC Bioinformatics.

[49]  C. E. Peishoff,et al.  A critical assessment of docking programs and scoring functions. , 2006, Journal of medicinal chemistry.

[50]  Alasdair T. R. Laurie,et al.  Methods for the prediction of protein-ligand binding sites for structure-based drug design and virtual ligand screening. , 2006, Current protein & peptide science.

[51]  M. Schroeder,et al.  LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation , 2006, BMC Structural Biology.

[52]  Wei Yang,et al.  Predicting calcium‐binding sites in proteins—A graph theory and geometry approach , 2006, Proteins.

[53]  Seungwoo Hwang,et al.  Using evolutionary and structural information to predict DNA‐binding sites on DNA‐binding proteins , 2006, Proteins.

[54]  Cathy H. Wu,et al.  Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties , 2006, BMC Bioinformatics.

[55]  B. Honig,et al.  On the nature of cavities on protein surfaces: Application to the identification of drug‐binding sites , 2006, Proteins.

[56]  Jinze Liu,et al.  Structure‐based function inference using protein family‐specific fingerprints , 2006, Protein science : a publication of the Protein Society.

[57]  Patricia C. Babbitt,et al.  Automated discovery of 3D motifs for protein function annotation , 2006, Bioinform..

[58]  J. Thornton,et al.  A method for localizing ligand binding pockets in protein structures , 2005, Proteins.

[59]  Vasant Honavar,et al.  Predicting DNA-binding sites of proteins from amino acid sequence , 2006, BMC Bioinformatics.

[60]  Jie Liang,et al.  Protein surface analysis for function annotation in high‐throughput structural genomics pipeline , 2005, Protein science : a publication of the Protein Society.

[61]  Mike P. Liang,et al.  Structural characterization of proteins using residue environments , 2005, Proteins.

[62]  Christophe Combet,et al.  The SuMo server: 3D search for protein functional sites , 2005, Bioinform..

[63]  Janet M Thornton,et al.  Protein function prediction using local 3D templates. , 2005, Journal of molecular biology.

[64]  Wei Wang,et al.  Comparing Graph Representations of Protein Structure for Mining Family-Specific Residue-Based Packing Motifs , 2005, J. Comput. Biol..

[65]  Janet M. Thornton,et al.  ProFunc: a server for predicting protein function from 3D structure , 2005, Nucleic Acids Res..

[66]  Richard M. Jackson,et al.  Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites , 2005, Bioinform..

[67]  J. Skolnick,et al.  Prediction of physical protein–protein interactions , 2005, Physical biology.

[68]  Gail J. Bartlett,et al.  Using a library of structural templates to recognise catalytic sites and explore their evolution in homologous families. , 2005, Journal of molecular biology.

[69]  Anna Tramontano,et al.  The ten most wanted solutions in protein bioinformatics , 2005 .

[70]  Ceslovas Venclovas,et al.  Progress over the first decade of CASP experiments , 2005, Proteins.

[71]  Predrag Radivojac,et al.  Intrinsic Disorder and Prote in Modifications: Building an SVM Predictor for Methylation , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[72]  N. Bhardwaj,et al.  Structure Based Prediction of Binding Residues on DNA-binding Proteins , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[73]  Shandar Ahmad,et al.  PSSM-based prediction of DNA binding sites in proteins , 2005, BMC Bioinformatics.

[74]  Gil Amitai,et al.  Network analysis of protein structures identifies functional residues. , 2004, Journal of molecular biology.

[75]  M. Sternberg,et al.  Automated prediction of protein function and detection of functional sites from structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[76]  Nitesh V. Chawla,et al.  Classification and knowledge discovery in protein databases , 2004, J. Biomed. Informatics.

[77]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[78]  Vladimir A. Ivanisenko,et al.  PDBSiteScan: a program for searching for active, binding and posttranslational modification sites in the 3D structures of proteins , 2004, Nucleic Acids Res..

[79]  H. Wolfson,et al.  Recognition of Functional Sites in Protein Structures☆ , 2004, Journal of Molecular Biology.

[80]  C. Innis,et al.  Prediction of functional sites in proteins using conserved functional group analysis. , 2004, Journal of molecular biology.

[81]  A. Panchenko,et al.  Prediction of functional sites by analysis of sequence and structure conservation , 2004, Protein science : a publication of the Protein Society.

[82]  Shandar Ahmad,et al.  Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information , 2004, Bioinform..

[83]  L. Iakoucheva,et al.  The importance of intrinsic disorder for protein phosphorylation. , 2004, Nucleic acids research.

[84]  Wei Wang,et al.  Accurate Classification of Protein Structural Families Using Coherent Subgraph Analysis , 2003, Pacific Symposium on Biocomputing.

[85]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[86]  B. Rost,et al.  Automatic prediction of protein function , 2003, Cellular and Molecular Life Sciences CMLS.

[87]  Jacquelyn S Fetrow,et al.  Structure-based active site profiles for genome analysis and functional family subclassification. , 2003, Journal of molecular biology.

[88]  Janet M. Thornton,et al.  An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis , 2003, Bioinform..

[89]  M. Jambon,et al.  A new bioinformatic approach to detect common 3D sites in protein structures , 2003, Proteins.

[90]  Gail J. Bartlett,et al.  Using a neural network and spatial clustering to predict the location of active sites in enzymes. , 2003, Journal of molecular biology.

[91]  Robert B. Russell,et al.  Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures , 2003, Nucleic Acids Res..

[92]  Russ B. Altman,et al.  WebFEATURE: an interactive web tool for identifying and visualizing functional sites on macromolecular structures , 2003, Nucleic Acids Res..

[93]  K. Nishikawa,et al.  Prediction of catalytic residues in enzymes based on known tertiary structure, stability profile, and sequence conservation. , 2003, Journal of molecular biology.

[94]  Russ B. Altman,et al.  Recognizing Complex, Asymmetric Functional Sites in Protein Structures Using a Bayesian Scoring Function , 2003, J. Bioinform. Comput. Biol..

[95]  Robert B Russell,et al.  A model for statistical significance of local similarities in structure. , 2003, Journal of molecular biology.

[96]  Yael Mandel-Gutfreund,et al.  Annotating nucleic acid-binding function based on protein structure. , 2003, Journal of molecular biology.

[97]  Ashish V. Tendulkar,et al.  Functional sites in protein families uncovered via an objective and automated graph theoretic approach. , 2003, Journal of molecular biology.

[98]  L. Kavraki,et al.  An accurate, sensitive, and scalable method to identify functional sites in protein structures. , 2003, Journal of molecular biology.

[99]  B. Rost,et al.  Analysing six types of protein-protein interfaces. , 2003, Journal of molecular biology.

[100]  Gail J. Bartlett,et al.  Analysis of catalytic residues in enzyme active sites. , 2002, Journal of molecular biology.

[101]  D. van der Spoel,et al.  Efficient docking of peptides to proteins without prior knowledge of the binding site , 2002, Protein science : a publication of the Protein Society.

[102]  O. Lichtarge,et al.  Structural clusters of evolutionary trace residues are statistically significant and common in proteins. , 2002, Journal of molecular biology.

[103]  A Keith Dunker,et al.  Intrinsic disorder and protein function. , 2002, Biochemistry.

[104]  Marco Saerens,et al.  Adjusting the Outputs of a Classifier to New a Priori Probabilities: A Simple Procedure , 2002, Neural Computation.

[105]  M. Ondrechen,et al.  THEMATICS: A simple computational predictor of enzyme function from structure , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[106]  Luc De Raedt,et al.  Proceedings of the 12th European Conference on Machine Learning , 2001 .

[107]  Zoran Obradovic,et al.  Classification on Data with Biased Class Distribution , 2001, ECML.

[108]  M. Sternberg,et al.  Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. , 2001, Journal of molecular biology.

[109]  Marco Saerens,et al.  Adjusting the Outputs of a Classifier to New a Priori Probabilities May Significantly Improve Classification Accuracy: Evidence from a multi-class problem in remote sensing , 2001, ICML.

[110]  D. Eisenberg,et al.  Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. , 2001, Journal of molecular biology.

[111]  J. Moult,et al.  SNPs, protein structure, and disease , 2001, Human mutation.

[112]  N. Ben-Tal,et al.  ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. , 2001, Journal of molecular biology.

[113]  Stephen K. Burley,et al.  An overview of structural genomics , 2000, Nature Structural Biology.

[114]  Annabel E. Todd,et al.  From structure to function: Approaches and limitations , 2000, Nature Structural Biology.

[115]  D. Baker,et al.  A surprising simplicity to protein folding , 2000, Nature.

[116]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[117]  A. Baucom,et al.  Predicting protein function from structure: unique structural features of proteases. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[118]  Jacquelyn S. Fetrow,et al.  Structural genomics and its importance for gene function analysis , 2000, Nature Biotechnology.

[119]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[120]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[121]  D Fischer,et al.  Analysis of heregulin symmetry by weighted evolutionary tracing. , 1999, Protein engineering.

[122]  S. Brenner Errors in genome annotation. , 1999, Trends in genetics : TIG.

[123]  G J Kleywegt,et al.  Recognition of spatial motifs in protein structures. , 1999, Journal of molecular biology.

[124]  J. Skolnick,et al.  Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. , 1998, Journal of molecular biology.

[125]  R. Russell,et al.  Detection of protein three-dimensional side-chain patterns: new examples of convergent evolution. , 1998, Journal of molecular biology.

[126]  R. Altman,et al.  Recognizing protein binding sites using statistical descriptions of their 3D environments. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[127]  J. Thornton,et al.  Tess: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites , 1997, Protein science : a publication of the Protein Society.

[128]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[129]  M. Swindells,et al.  Protein clefts in molecular recognition and function. , 1996, Protein science : a publication of the Protein Society.

[130]  J M Thornton,et al.  Derivation of 3D coordinate templates for searching structural databases: Application to ser‐His‐Asp catalytic triads in the serine proteinases and lipases , 1996, Protein science : a publication of the Protein Society.

[131]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[132]  R. Laskowski SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. , 1995, Journal of molecular graphics.

[133]  R. Altman,et al.  Characterizing the microenvironment surrounding protein sites , 1995, Protein science : a publication of the Protein Society.

[134]  Russ B. Altman,et al.  Characterizing Oriented Protein Structural Sites Using Biochemical Properties , 1995, ISMB.

[135]  P. Willett,et al.  A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. , 1994, Journal of molecular biology.

[136]  H. Wolfson,et al.  Shape complementarity at protein–protein interfaces , 1994, Biopolymers.

[137]  R. Nussinov,et al.  Three‐dimensional, sequence order‐independent structural comparison of a serine protease against the crystallographic database reveals active site similarities: Potential implications to evolution and to protein folding , 1994, Protein science : a publication of the Protein Society.

[138]  A R Rees,et al.  The prediction and characterization of metal binding sites in proteins. , 1993, Protein engineering.

[139]  H. Wolfson,et al.  Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[140]  M J Sternberg,et al.  Analysis and prediction of the location of catalytic residues in enzymes. , 1988, Protein engineering.

[141]  Peter Willett,et al.  Algorithms for the identification of three-dimensional maximal common substructures , 1987, J. Chem. Inf. Comput. Sci..

[142]  Dennis H. Smith,et al.  Computer-assisted examination of compounds for common three-dimensional substructures , 1983, Journal of chemical information and computer sciences.

[143]  Arthur M. Lesk,et al.  Detection of three-dimensional patterns of atoms in chemical structures , 1979, CACM.

[144]  J. Heckman Sample selection bias as a specification error , 1979 .

[145]  Michael M. Cone,et al.  Molecular structure comparison program for the identification of maximal common substructures , 1977 .