Disentangling evolutionary signals: conservation, specificity determining positions and coevolution. Implication for catalytic residue prediction

BackgroundA large panel of methods exists that aim to identify residues with critical impact on protein function based on evolutionary signals, sequence and structure information. However, it is not clear to what extent these different methods overlap, and if any of the methods have higher predictive potential compared to others when it comes to, in particular, the identification of catalytic residues (CR) in proteins. Using a large set of enzymatic protein families and measures based on different evolutionary signals, we sought to break up the different components of the information content within a multiple sequence alignment to investigate their predictive potential and degree of overlap.ResultsOur results demonstrate that the different methods included in the benchmark in general can be divided into three groups with a limited mutual overlap. One group containing real-value Evolutionary Trace (rvET) methods and conservation, another containing mutual information (MI) methods, and the last containing methods designed explicitly for the identification of specificity determining positions (SDPs): integer-value Evolutionary Trace (ivET), SDPfox, and XDET. In terms of prediction of CR, we find using a proximity score integrating structural information (as the sum of the scores of residues located within a given distance of the residue in question) that only the methods from the first two groups displayed a reliable performance. Next, we investigated to what degree proximity scores for conservation, rvET and cumulative MI (cMI) provide complementary information capable of improving the performance for CR identification. We found that integrating conservation with proximity scores for rvET and cMI achieved the highest performance. The proximity conservation score contained no complementary information when integrated with proximity rvET. Moreover, the signal from rvET provided only a limited gain in predictive performance when integrated with mutual information and conservation proximity scores. Combined, these observations demonstrate that the rvET and cMI scores add complementary information to the prediction system.ConclusionsThis work contributes to the understanding of the different signals of evolution and also shows that it is possible to improve the detection of catalytic residues by integrating structural and higher order sequence evolutionary information with sequence conservation.

[1]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[2]  N. Wicker,et al.  Secator: a program for inferring protein subfamilies from phylogenetic trees. , 2001, Molecular biology and evolution.

[3]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[4]  Morten Nielsen,et al.  NetCTLpan: pan-specific MHC class I pathway epitope predictions , 2010, Immunogenetics.

[5]  A. Valencia,et al.  Automatic methods for predicting functionally important residues. , 2003, Journal of molecular biology.

[6]  Duncan P. Brown,et al.  Automated Protein Subfamily Identification and Classification , 2007, PLoS Comput. Biol..

[7]  Kai Ye,et al.  Tracing evolutionary pressure , 2008, Bioinform..

[8]  O. Lichtarge,et al.  A family of evolution-entropy hybrid methods for ranking protein residues by importance. , 2004, Journal of molecular biology.

[9]  Mona Singh,et al.  Characterization and prediction of residues determining protein functional specificity , 2008, Bioinform..

[10]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[11]  R. Russell,et al.  Analysis and prediction of functional sub-types from protein sequence alignments. , 2000, Journal of molecular biology.

[12]  Robert B. Russell,et al.  An automated stochastic approach to the identification of the protein specificity determinants and functional subfamilies , 2010, Algorithms for Molecular Biology.

[13]  Olivier Lichtarge,et al.  Evolution-guided discovery and recoding of allosteric pathway specificity determinants in psychoactive bioamine receptors , 2010, Proceedings of the National Academy of Sciences.

[14]  J. Heringa,et al.  Sequence comparison by sequence harmony identifies subtype-specific functional sites , 2006, Nucleic acids research.

[15]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[16]  Wei Cai,et al.  Prediction of functional specificity determinants from protein sequences using log-likelihood ratios , 2006, Bioinform..

[17]  BMC Bioinformatics , 2005 .

[18]  M. Gelfand,et al.  Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families , 2004, Protein science : a publication of the Protein Society.

[19]  Cristina Marino Buslje,et al.  Networks of High Mutual Information Define the Structural Proximity of Catalytic Sites: Implications for Catalytic Residue Identification , 2010, PLoS Comput. Biol..

[20]  Anna R Panchenko,et al.  Coevolution in defining the functional specificity , 2009, Proteins.

[21]  Anna R. Panchenko,et al.  Ensemble approach to predict specificity determinants: benchmarking and validation , 2009, BMC Bioinformatics.

[22]  C. Sander,et al.  A method to predict functional residues in proteins , 1995, Nature Structural Biology.

[23]  G Vriend,et al.  Identification of class-determining residues in G protein-coupled receptors by sequence analysis. , 1997, Receptors & channels.

[24]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[25]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[26]  Olivier Lichtarge,et al.  ET viewer: an application for predicting and visualizing functional sites in protein structures , 2006, Bioinform..

[27]  Alfonso Valencia,et al.  Phylogeny-independent detection of functional residues , 2006, Bioinform..

[28]  Jukka Corander,et al.  Bayesian search of functionally divergent protein subgroups and their function specific residues , 2006, Bioinform..

[29]  Kai Ye,et al.  Multi-RELIEF: a method to recognize specificity determining residues from multiple sequence alignments using a Machine-Learning approach for feature weighting , 2008, Bioinform..

[30]  Kimmen Sjölander,et al.  INTREPID—INformation-theoretic TREe traversal for Protein functional site IDentification , 2008, Bioinform..