A survey on prediction of specificity-determining sites in proteins

Specificity-determining sites (SDS) are the key positions of a protein family that show a specific conservation of amino acids, related to the subfamily members of that family. SDS play crucial role in developing functional variation within the protein family during the course of evolution. Thus, it is important to identify SDS to understand the evolutionary process of diversification of biological functions within a protein family. A wide range of computational tools have been designed to detect such SDS. In this review, we intend to examine the concept of SDS in more details along with the advancements and drawbacks of different computational approaches designed towards successful prediction of SDS. Further, we discussed the algorithms behind the computational approaches developed till date and provide an exhaustive comparison of performance of each method. We also introduce a new ensemble approach, SubSite as another tool to predict SDS through a user-friendly webserver available at www.hpppi.iicb.res.in/subsite.

[1]  Anna R Panchenko,et al.  Functional specificity lies within the properties and evolutionary changes of amino acids. , 2007, Journal of molecular biology.

[2]  L. Mirny,et al.  Using evolutionary information to find specificity-determining and co-evolving residues. , 2009, Methods in molecular biology.

[3]  Olivier Lichtarge,et al.  Evolutionary Trace Annotation Server: automated enzyme function prediction in protein structures using 3D templates , 2009, Bioinform..

[4]  Alfonso Valencia,et al.  Clustering of proximal sequence space for the identification of protein families , 2002, Bioinform..

[5]  Eugene I. Shakhnovich,et al.  Predicting specificity-determining residues in two large eukaryotic transcription factor families , 2005, Nucleic acids research.

[6]  Anna R. Panchenko,et al.  SPEER-SERVER: a web server for prediction of protein specificity determining sites , 2012, Nucleic Acids Res..

[7]  R. Russell,et al.  Analysis and prediction of functional sub-types from protein sequence alignments. , 2000, Journal of molecular biology.

[8]  Raquel Cardoso de Melo Minardi,et al.  Identification of subfamily-specific sites based on active sites modeling and clustering , 2010, Bioinform..

[9]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[10]  Harald Stenmark,et al.  The Rab GTPase family , 2001, Genome Biology.

[11]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[12]  Jaap Heringa,et al.  Multi-Harmony: detecting functional specificity from sequence alignment , 2010, Nucleic Acids Res..

[13]  C. Sander,et al.  Determinants of protein function revealed by combinatorial entropy optimization , 2007, Genome Biology.

[14]  Kimmen Sjölander,et al.  INTREPID—INformation-theoretic TREe traversal for Protein functional site IDentification , 2008, Bioinform..

[15]  Olivier Lichtarge,et al.  Evolutionary trace report_maker: a new type of service for comparative analysis of proteins , 2006, Bioinform..

[16]  G J Barton,et al.  Identification of functional residues and secondary structure from protein multiple sequence alignment. , 1996, Methods in enzymology.

[17]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[18]  Alfonso Valencia,et al.  Phylogeny-independent detection of functional residues , 2006, Bioinform..

[19]  N. Wicker,et al.  Secator: a program for inferring protein subfamilies from phylogenetic trees. , 2001, Molecular biology and evolution.

[20]  Jukka Corander,et al.  Bayesian search of functionally divergent protein subgroups and their function specific residues , 2006, Bioinform..

[21]  B. Jayaram,et al.  Proteins: sequence to structure and function--current status. , 2010, Current protein & peptide science.

[22]  A. Valencia,et al.  Automatic methods for predicting functionally important residues. , 2003, Journal of molecular biology.

[23]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.

[24]  Duncan P. Brown,et al.  Automated Protein Subfamily Identification and Classification , 2007, PLoS Comput. Biol..

[25]  W. Atchley,et al.  Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Olivier Lichtarge,et al.  Rank information: A structure‐independent measure of evolutionary trace quality that improves identification of protein functional sites , 2006, Proteins.

[27]  Robert B. Russell,et al.  Combining specificity determining and conserved residues improves functional site prediction , 2009, BMC Bioinformatics.

[28]  Desmond G. Higgins,et al.  Supervised multivariate analysis of sequence groups to identify specificity determining residues , 2007, BMC Bioinformatics.

[29]  Joe Faith,et al.  Predicting functional residues of protein sequence alignments as a feature selection task , 2011, Int. J. Data Min. Bioinform..

[30]  Werner Dubitzky,et al.  Bistable switching and excitable behaviour in the activation of Src at mitosis , 2006, ISMB.

[31]  Kai Ye,et al.  Multi-RELIEF: a method to recognize specificity determining residues from multiple sequence alignments using a Machine-Learning approach for feature weighting , 2008, Bioinform..

[32]  Miguel A. Andrade-Navarro,et al.  Classification of protein families and detection of the determinant residues with an improved self-organizing map , 1997, Biological Cybernetics.

[33]  Arthur Wuster,et al.  Spial: analysis of subtype-specific features in multiple sequence alignments of proteins , 2010, Bioinform..

[34]  D. Eisenberg,et al.  Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. , 2001, Journal of molecular biology.

[35]  Mona Singh,et al.  Characterization and prediction of residues determining protein functional specificity , 2008, Bioinform..

[36]  Erik L. L. Sonnhammer,et al.  Automated ortholog inference from phylogenetic trees and calculation of orthology reliability , 2002, Bioinform..

[37]  Christopher J. Lanczycki,et al.  Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures , 2012, BMC Bioinformatics.

[38]  Erik L. L. Sonnhammer,et al.  FunShift: a database of function shift analysis on protein subfamilies , 2004, Nucleic Acids Res..

[39]  Geoffrey J. Barton,et al.  The contrasting properties of conservation and correlated phylogeny in protein functional residue prediction , 2015 .

[40]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[41]  X. Gu,et al.  Maximum-likelihood approach for gene family evolution under functional divergence. , 2001, Molecular biology and evolution.

[42]  Kai Ye,et al.  A two‐entropies analysis to identify functional positions in the transmembrane region of class A G protein‐coupled receptors , 2006, Proteins.

[43]  L. Mirny,et al.  Using orthologous and paralogous proteins to identify specificity determining residues. , 2002, Genome biology.

[44]  Byung-Hoon Park,et al.  In silico discovery of enzyme-substrate specificity-determining residue clusters. , 2005, Journal of molecular biology.

[45]  Kimmen Sjölander,et al.  INTREPID: a web server for prediction of functionally important residues by evolutionary analysis , 2009, Nucleic Acids Res..

[46]  Bjarne Knudsen,et al.  Using evolutionary rates to investigate protein functional divergence and conservation. A case study of the carbonic anhydrases. , 2003, Genetics.

[47]  Wei Cai,et al.  Prediction of functional specificity determinants from protein sequences using log-likelihood ratios , 2006, Bioinform..

[48]  Thomas W. H. Lui,et al.  Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments , 2003, Bioinform..

[49]  Mikhail S. Gelfand,et al.  SDPpred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins , 2004, Nucleic Acids Res..

[50]  Alfonso Valencia,et al.  Emerging methods in protein co-evolution , 2013 .

[51]  Xun Gu,et al.  Predicting functional divergence in protein evolution by site-specific rate shifts. , 2002, Trends in biochemical sciences.

[52]  M M Miyamoto,et al.  A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Sean R. Eddy,et al.  RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs , 2002, BMC Bioinformatics.

[54]  Xun Gu,et al.  DIVERGE: phylogeny-based analysis for functional-structural divergence of a protein family , 2002, Bioinform..

[55]  E. Sonnhammer,et al.  Large‐scale prediction of function shift in protein families with a focus on enzymatic function , 2005, Proteins.

[56]  Peng Zhai,et al.  High Resolution Crystal Structures of Human Rab5a and Five Mutants with Substitutions in the Catalytically Important Phosphate-binding Loop* , 2003, The Journal of Biological Chemistry.

[57]  Richard J. Edwards,et al.  BADASP: predicting functional specificity in protein families using ancestral sequences , 2005, Bioinform..

[58]  Geoffrey J. Barton,et al.  Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation , 1993, Comput. Appl. Biosci..

[59]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[60]  Jimin Pei,et al.  AL2CO: calculation of positional conservation in a protein sequence alignment , 2001, Bioinform..

[61]  J. Heringa,et al.  Sequence comparison by sequence harmony identifies subtype-specific functional sites , 2006, Nucleic acids research.

[62]  Robert B. Russell,et al.  An automated stochastic approach to the identification of the protein specificity determinants and functional subfamilies , 2010, Algorithms for Molecular Biology.

[63]  N. Ben-Tal,et al.  ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. , 2001, Journal of molecular biology.

[64]  M. Sternberg,et al.  Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. , 2001, Journal of molecular biology.

[65]  K. Holsinger The neutral theory of molecular evolution , 2004 .

[66]  Lydia E. Kavraki,et al.  Prediction of enzyme function based on 3D templates of evolutionarily important amino acids , 2008, BMC Bioinformatics.

[67]  Anna R. Panchenko,et al.  Ensemble approach to predict specificity determinants: benchmarking and validation , 2009, BMC Bioinformatics.

[68]  I. Garcia-Saez,et al.  The crystal structure of human neuronal Rab6B in its active GTPgS-bound form , 2006 .

[69]  X. Gu,et al.  Statistical methods for testing functional divergence after gene duplication. , 1999, Molecular biology and evolution.

[70]  D. Yee,et al.  DILL Families and the structural relatedness among globular proteins data , 1993 .

[71]  Alfonso Valencia,et al.  Protein interactions and ligand binding: From protein subfamilies to functional specificity , 2010, Proceedings of the National Academy of Sciences.

[72]  C. Sander,et al.  A method to predict functional residues in proteins , 1995, Nature Structural Biology.