Combining specificity determining and conserved residues improves functional site prediction

BackgroundPredicting the location of functionally important sites from protein sequence and/or structure is a long-standing problem in computational biology. Most current approaches make use of sequence conservation, assuming that amino acid residues conserved within a protein family are most likely to be functionally important. Most often these approaches do not consider many residues that act to define specific sub-functions within a family, or they make no distinction between residues important for function and those more relevant for maintaining structure (e.g. in the hydrophobic core). Many protein families bind and/or act on a variety of ligands, meaning that conserved residues often only bind a common ligand sub-structure or perform general catalytic activities.ResultsHere we present a novel method for functional site prediction based on identification of conserved positions, as well as those responsible for determining ligand specificity. We define Specificity-Determining Positions (SDPs), as those occupied by conserved residues within sub-groups of proteins in a family having a common specificity, but differ between groups, and are thus likely to account for specific recognition events. We benchmark the approach on enzyme families of known 3D structure with bound substrates, and find that in nearly all families residues predicted by SDPsite are in contact with the bound substrate, and that the addition of SDPs significantly improves functional site prediction accuracy. We apply SDPsite to various families of proteins containing known three-dimensional structures, but lacking clear functional annotations, and discusse several illustrative examples.ConclusionThe results suggest a better means to predict functional details for the thousands of protein structures determined prior to a clear understanding of molecular function.

[1]  Geoffrey J. Barton,et al.  The contrasting properties of conservation and correlated phylogeny in protein functional residue prediction , 2015 .

[2]  R. Nussinov,et al.  Protein–protein interactions: Structurally conserved residues distinguish between binding sites and exposed protein surfaces , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[3]  C. Sander,et al.  Dali: a network tool for protein structure comparison. , 1995, Trends in biochemical sciences.

[4]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[5]  Kai Ye,et al.  Multi-RELIEF: a method to recognize specificity determining residues from multiple sequence alignments using a Machine-Learning approach for feature weighting , 2008, Bioinform..

[6]  M. Sternberg,et al.  Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. , 2001, Journal of molecular biology.

[7]  T. P. Flores,et al.  Multiple protein structure alignment , 1994, Protein science : a publication of the Protein Society.

[8]  Wei Cai,et al.  Prediction of functional specificity determinants from protein sequences using log-likelihood ratios , 2006, Bioinform..

[9]  Alfonso Valencia,et al.  TreeDet: a web server to explore sequence space , 2006, Nucleic Acids Res..

[10]  D. Eisenberg,et al.  Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. , 2001, Journal of molecular biology.

[11]  Mona Singh,et al.  Characterization and prediction of residues determining protein functional specificity , 2008, Bioinform..

[12]  Mark A Willis,et al.  Structure of YciI from Haemophilus influenzae (HI0828) reveals a ferredoxin-like alpha/beta-fold with a histidine/aspartate centered catalytic site. , 2005, Proteins.

[13]  M. A. Willis,et al.  Structure of YciI from Haemophilus influenzae (HI0828) reveals a ferredoxin‐like α/β‐fold with a histidine/aspartate centered catalytic site , 2005 .

[14]  Kai Ye,et al.  Tracing evolutionary pressure , 2008, Bioinform..

[15]  Xun Gu,et al.  Predicting functional divergence in protein evolution by site-specific rate shifts. , 2002, Trends in biochemical sciences.

[16]  Robert B Russell,et al.  Finding functional sites in structural genomics proteins. , 2004, Structure.

[17]  E V Koonin,et al.  Phosphoesterase domains associated with DNA polymerases of diverse origins. , 1998, Nucleic acids research.

[18]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[19]  G J Kleywegt,et al.  Recognition of spatial motifs in protein structures. , 1999, Journal of molecular biology.

[20]  Alex Bateman,et al.  New Knowledge from Old: In silico discovery of novel protein domains in Streptomyces coelicolor , 2003, BMC Microbiology.

[21]  A. Fiser,et al.  Convergent evolution of Trichomonas vaginalis lactate dehydrogenase from malate dehydrogenase. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Itay Mayrose,et al.  ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures , 2005, Nucleic Acids Res..

[23]  Patricia C. Babbitt,et al.  Automated discovery of 3D motifs for protein function annotation , 2006, Bioinform..

[24]  Steven E Brenner,et al.  The Impact of Structural Genomics: Expectations and Outcomes , 2005, Science.

[25]  L. Mirny,et al.  Using orthologous and paralogous proteins to identify specificity determining residues. , 2002, Genome biology.

[26]  L. Kavraki,et al.  An accurate, sensitive, and scalable method to identify functional sites in protein structures. , 2003, Journal of molecular biology.

[27]  Ozlem Keskin,et al.  Protein–DNA interactions: structural, thermodynamic and clustering patterns of conserved residues in DNA-binding proteins , 2008, Nucleic acids research.

[28]  A Wlodawer,et al.  Catalytic triads and their relatives. , 1998, Trends in biochemical sciences.

[29]  Robert B. Russell,et al.  Annotation in three dimensions , 2003 .

[30]  Kimmen Sjölander,et al.  INTREPID—INformation-theoretic TREe traversal for Protein functional site IDentification , 2008, Bioinform..

[31]  Gary L Gilliland,et al.  Crystal structure of the Escherichia coli YcdX protein reveals a trinuclear zinc active site , 2003, Proteins.

[32]  Ruth Nussinov,et al.  Prediction of interacting single-stranded RNA bases by protein-binding patterns. , 2008, Journal of molecular biology.

[33]  R. Russell,et al.  Analysis and prediction of functional sub-types from protein sequence alignments. , 2000, Journal of molecular biology.

[34]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[35]  M Hendlich,et al.  LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. , 1997, Journal of molecular graphics & modelling.

[36]  Richard J. Edwards,et al.  BADASP: predicting functional specificity in protein families using ancestral sequences , 2005, Bioinform..

[37]  Robert B. Russell,et al.  Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures , 2003, Nucleic Acids Res..

[38]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[39]  C. Orengo,et al.  Plasticity of enzyme active sites. , 2002, Trends in biochemical sciences.

[40]  C. Sander,et al.  Determinants of protein function revealed by combinatorial entropy optimization , 2007, Genome Biology.

[41]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[42]  A. Valencia,et al.  Automatic methods for predicting functionally important residues. , 2003, Journal of molecular biology.

[43]  Eugene I. Shakhnovich,et al.  Predicting specificity-determining residues in two large eukaryotic transcription factor families , 2005, Nucleic acids research.

[44]  O V Kalinina,et al.  [Computational method for prediction of protein functional sites using specificity determinants]. , 2007, Molekuliarnaia biologiia.

[45]  M. S. Gelfand,et al.  Computational method for predicting protein functional sites with the use of specificity determinants , 2007, Molecular Biology.

[46]  Mona Singh,et al.  Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[47]  A. Böck,et al.  The product of the hypB gene, which is required for nickel incorporation into hydrogenases, is a novel guanine nucleotide-binding protein , 1993, Journal of bacteriology.

[48]  Gary L Gilliland,et al.  Crystal structure of the Escherichia coli YjiA protein suggests a GTP‐dependent regulatory function , 2004, Proteins.

[49]  Jie Liang,et al.  CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues , 2006, Nucleic Acids Res..

[50]  Anna R Panchenko,et al.  Functional specificity lies within the properties and evolutionary changes of amino acids. , 2007, Journal of molecular biology.

[51]  Kimberly M. Mayer,et al.  Linking enzyme sequence to function using conserved property difference locator to identify and annotate positions likely to control specific functionality , 2005, BMC Bioinformatics.

[52]  Desmond G. Higgins,et al.  Supervised multivariate analysis of sequence groups to identify specificity determining residues , 2007, BMC Bioinformatics.

[53]  D. Suck,et al.  Crystal structure of tRNA‐guanine transglycosylase: RNA modification by base exchange. , 1996, The EMBO journal.

[54]  Richard M. Jackson,et al.  Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites , 2005, Bioinform..

[55]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[56]  Leszek Rychlewski,et al.  LigProf: A simple tool for in silico prediction of ligand-binding sites , 2007, Journal of molecular modeling.

[57]  M. Gelfand,et al.  Comparative Genomics of the Vitamin B12 Metabolism and Regulation in Prokaryotes* , 2003, Journal of Biological Chemistry.

[58]  M. Gelfand,et al.  Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families , 2004, Protein science : a publication of the Protein Society.

[59]  Nicolas Rodriguez,et al.  PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees , 2005, Nucleic Acids Res..

[60]  Francesco Musiani,et al.  Biochemical studies on Mycobacterium tuberculosis UreG and comparative modeling reveal structural and functional conservation among the bacterial UreG family. , 2007, Biochemistry.

[61]  Jaap Heringa,et al.  Sequence harmony: detecting functional specificity from alignments , 2007, Nucleic Acids Res..

[62]  R. Camerini-Otero,et al.  Over 1000 genes are involved in the DNA damage response of Escherichia coli , 2002, Molecular microbiology.

[63]  Ilya B. Muchnik,et al.  Layered clusters of tightness set functions , 2002, Appl. Math. Lett..