Analysis and prediction of functionally important sites in proteins

The rapidly increasing volume of sequence and structure information available for proteins poses the daunting task of determining their functional importance. Computational methods can prove to be very useful in understanding and characterizing the biochemical and evolutionary information contained in this wealth of data, particularly at functionally important sites. Therefore, we perform a detailed survey of compositional and evolutionary constraints at the molecular and biological function level for a large set of known functionally important sites extracted from a wide range of protein families. We compare the degree of conservation across different functional categories and provide detailed statistical insight to decipher the varying evolutionary constraints at functionally important sites. The compositional and evolutionary information at functionally important sites has been compiled into a library of functional templates. We developed a module that predicts functionally important columns (FIC) of an alignment based on the detection of a significant “template match score” to a library template. Our template match score measures an alignment column's similarity to a library template and combines a term explicitly representing a column's residue composition with various evolutionary conservation scores (information content and position‐specific scoring matrix‐derived statistics). Our benchmarking studies show good sensitivity/specificity for the prediction of functional sites and high accuracy in attributing correct molecular function type to the predicted sites. This prediction method is based on information derived from homologous sequences and no structural information is required. Therefore, this method could be extremely useful for large‐scale functional annotation.

[1]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[2]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[3]  J Moult,et al.  Analysis of the steric strain in the polypeptide backbone of protein molecules , 1991, Proteins.

[4]  A. B. Robinson,et al.  Distribution of glutamine and asparagine residues and their near neighbors in peptides and proteins. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[5]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[6]  B. Honig,et al.  Classical electrostatics in biology and chemistry. , 1995, Science.

[7]  C. Sander,et al.  A method to predict functional residues in proteins , 1995, Nature Structural Biology.

[8]  M. Swindells,et al.  Protein clefts in molecular recognition and function. , 1996, Protein science : a publication of the Protein Society.

[9]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[10]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[11]  M. L. Jones,et al.  PDBsum: a Web-based database of summaries and analyses of all PDB structures. , 1997, Trends in biochemical sciences.

[12]  J. Felsenstein An alternating least squares approach to inferring phylogenies from pairwise distances. , 1997, Systematic biology.

[13]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[14]  Miguel A. Andrade-Navarro,et al.  Classification of protein families and detection of the determinant residues with an improved self-organizing map , 1997, Biological Cybernetics.

[15]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[16]  Kimmen Sjölander,et al.  Phylogenetic Inference in Protein Superfamilies: Analysis of SH2 Domains , 1998, ISMB.

[17]  J. Skolnick,et al.  Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. , 1998, Journal of molecular biology.

[18]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[19]  P. Argos,et al.  Strain in protein structures as viewed through nonrotameric side chains: II. effects upon ligand binding , 1999, Proteins.

[20]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[21]  L Rychlewski,et al.  From fold predictions to function predictions: Automation of functional site conservation analysis for functional genome predictions , 1999, Protein science : a publication of the Protein Society.

[22]  R. Russell,et al.  Analysis and prediction of functional sub-types from protein sequence alignments. , 2000, Journal of molecular biology.

[23]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[24]  M. Sternberg,et al.  Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. , 2001, Journal of molecular biology.

[25]  N. Ben-Tal,et al.  ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. , 2001, Journal of molecular biology.

[26]  J. Skolnick,et al.  Access the most recent version at doi: 10.1110/ps.49201 References , 2000 .

[27]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[28]  C. Orengo,et al.  Plasticity of enzyme active sites. , 2002, Trends in biochemical sciences.

[29]  Benjamin A. Shoemaker,et al.  CDD: a database of conserved domain alignments with links to domain three-dimensional structure , 2002, Nucleic Acids Res..

[30]  E. Boeggeman,et al.  Studies on the metal binding sites in the catalytic domain of beta1,4-galactosyltransferase. , 2002, Glycobiology.

[31]  O. Lichtarge,et al.  Structural clusters of evolutionary trace residues are statistically significant and common in proteins. , 2002, Journal of molecular biology.

[32]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[33]  E. Koonin,et al.  Trends in protein evolution inferred from sequence and structure analysis. , 2002, Current opinion in structural biology.

[34]  E. Koonin,et al.  Emergence of diverse biochemical activities in evolutionarily conserved structural scaffolds of proteins. , 2003, Current opinion in chemical biology.

[35]  Eugene I Shakhnovich,et al.  Amino acids determining enzyme-substrate specificity in prokaryotic and eukaryotic protein kinases , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Roland Eils,et al.  Applying Support Vector Machines for Gene ontology based gene function prediction , 2004, BMC Bioinformatics.

[37]  T. Blundell,et al.  Distinguishing structural and functional restraints in evolution in order to identify interaction sites. , 2004, Journal of molecular biology.

[38]  A. Panchenko,et al.  Prediction of functional sites by analysis of sequence and structure conservation , 2004, Protein science : a publication of the Protein Society.

[39]  C. Innis,et al.  Prediction of functional sites in proteins using conserved functional group analysis. , 2004, Journal of molecular biology.

[40]  Cathy H. Wu,et al.  InterPro, progress and status in 2005 , 2004, Nucleic Acids Res..

[41]  D. Baker,et al.  Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design , 2005, Nucleic acids research.

[42]  Anna R. Panchenko,et al.  Refining multiple sequence alignments with conserved core regions , 2006, Nucleic acids research.