Prediction of Functional Sites in Proteins by Evolutionary Methods

Functional sites are well-defined regions that are relevant for protein function, and that include characteristic groups of amino acids. These regions may be involved in the interaction between proteins and other molecules, such as other proteins, nucleic acids, small ligands and substrates. Interaction sites have been studied in great detail in representative protein families, and their relationship with natural substrates and drugs has been characterized, as well as their mediation in protein complex formation. In many cases they have been studied in relation to their potential for engineering protein activity. Protein binding sites have also been studied at a more general level by characterizing the typical structure of binding sites, and their general residue preferences. However, it is the relationship between the conservation of sequence features and protein active sites and binding sites that constitutes the basis of the development of prediction methods. The conservation of the chemical characteristics of the amino acids in specific groups of sequences, in the context of large protein families, is a particular method used in a growing collection of methods aimed at predicting protein binding sites at a genomic scale. In this review we analyze these methods, discuss their similarities, and describe a number of key unsolved problems.

[1]  R. Russell,et al.  Analysis and prediction of functional sub-types from protein sequence alignments. , 2000, Journal of molecular biology.

[2]  G. Arlaud,et al.  Evolutionary conserved rigid module-domain interactions can be detected at the sequence level: the examples of complement and blood coagulation proteases. , 1998, Journal of molecular biology.

[3]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[4]  T. Clackson,et al.  A hot spot of binding energy in a hormone-receptor interface , 1995, Science.

[5]  G M Shepherd,et al.  Potential ligand-binding residues in rat olfactory receptors identified by correlated mutation analysis. , 1995, Receptors & channels.

[6]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[7]  T J Hubbard,et al.  Prediction of the structure of GroES and its interaction with GroEL , 1995, Proteins.

[8]  P. Argos,et al.  Weighting aligned protein or nucleic acid sequences to correct for unequal representation. , 1990, Journal of molecular biology.

[9]  A. Valencia,et al.  Practical limits of function prediction , 2000, Proteins.

[10]  Geoffrey J. Barton,et al.  Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation , 1993, Comput. Appl. Biosci..

[11]  W. Taylor,et al.  The classification of amino acid conservation. , 1986, Journal of theoretical biology.

[12]  Janet M Thornton,et al.  Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity. , 2002, Journal of molecular biology.

[13]  Alfonso Valencia,et al.  Comparative Analysis of Different Methods for the Detection of Specificity Regions in Protein Families , 1997, BCEC.

[14]  J. Bazan,et al.  Helical fold prediction for the cyclin box , 1996, Proteins.

[15]  G. Casari,et al.  A SequenceSpace analysis of Lys49 phopholipases A2: clues towards identification of residues involved in a novel mechanism of membrane damage and in myotoxicity. , 1998, Protein engineering.

[16]  R. Tafi,et al.  Mimotopes of the hepatitis C virus hypervariable region 1, but not the natural sequences, induce cross‐reactive antibody response by genetic immunization , 2001, Hepatology.

[17]  A Valencia,et al.  Model of the ran-RCC1 interaction using biochemical and docking experiments. , 1999, Journal of molecular biology.

[18]  H O Villar,et al.  Amino acid preferences at protein binding sites , 1994, FEBS letters.

[19]  M Ikeguchi,et al.  The use of sequence comparison to detect 'identities' in tRNA genes. , 1998, Nucleic acids research.

[20]  M. Sternberg,et al.  Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. , 2001, Journal of molecular biology.

[21]  E. Shakhnovich,et al.  Topological determinants of protein folding , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[22]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[23]  Alan S. Lapedes,et al.  Analysis of Correlations Between Sites in Models of Protein Sequences , 1998 .

[24]  W. Atchley,et al.  Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. , 2000, Molecular biology and evolution.

[25]  M. Helmer-Citterich,et al.  Three-dimensional profiles: a new tool to identify protein surface similarities. , 1998, Journal of molecular biology.

[26]  S. Miyazawa,et al.  Two types of amino acid substitutions in protein evolution , 1979, Journal of Molecular Evolution.

[27]  Anna Tramontano,et al.  Towards a solution for hepatitis C virus hypervariability: mimotopes of the hypervariable region 1 can induce antibodies cross‐reacting with a large number of viral variants , 1998, The EMBO journal.

[28]  G Vriend,et al.  Identification of class-determining residues in G protein-coupled receptors by sequence analysis. , 1997, Receptors & channels.

[29]  Alfonso Valencia,et al.  Structural Model of a Malonyl-CoA-binding Site of Carnitine Octanoyltransferase and Carnitine Palmitoyltransferase I , 2002, The Journal of Biological Chemistry.

[30]  M Nilges,et al.  Functional diversity of PH domains: an exhaustive modelling study. , 1997, Folding & design.

[31]  C. Sander,et al.  Are binding residues conserved? , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[32]  G. Casari,et al.  Modulation of IgE reactivity of allergens by site‐directed mutagenesis: potential use of hypoallergenic variants for immunotherapy , 1998, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[33]  A J Padilla-Zúñiga,et al.  Non-homology knowledge-based prediction of the papain prosegment folding pattern: a description of plausible folding and activation mechanisms. , 1998, Folding & design.

[34]  R. Valenta,et al.  The Importance of Recombinant Allergens for Diagnosis and Therapy of IgE–Mediated Allergies , 1999, International Archives of Allergy and Immunology.

[35]  J. Bazan,et al.  Sequence and structural links between distant ADP-ribosyltransferase families. , 1997, Advances in experimental medicine and biology.

[36]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[37]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[38]  J. Janin,et al.  Structural basis of macromolecular recognition. , 2002, Advances in protein chemistry.

[39]  M. Lazdunski,et al.  Cloning and cDNA sequence analysis of Lys(49) and Asp(49) basic phospholipase A(2) myotoxin isoforms from Bothrops asper. , 2001, The international journal of biochemistry & cell biology.

[40]  A. Nicosia,et al.  High Prevalence of Hypervariable Region 1-Specific and -Cross-Reactive CD4+ T Cells in HCV-Infected Individuals Responsive to IFN-α Treatment , 2000 .

[41]  D. Eisenberg,et al.  Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. , 2001, Journal of molecular biology.

[42]  William R. Atchley,et al.  Positional Dependence, Cliques, and Predictive Motifs in the bHLH Protein Domain , 1999, Journal of Molecular Evolution.

[43]  J. Thornton,et al.  Protein–protein interfaces: Analysis of amino acid conservation in homodimers , 2001, Proteins.

[44]  Joaquín Dopazo A new index to find regions showing an unexpected variability or conservation in sequence alignments , 1997, Comput. Appl. Biosci..

[45]  Miguel A. Andrade-Navarro,et al.  Classification of protein families and detection of the determinant residues with an improved self-organizing map , 1997, Biological Cybernetics.

[46]  E. Morett,et al.  A proposed architecture for the central domain of the bacterial enhancer‐binding proteins based on secondary structure prediction and fold recognition , 1997, Protein science : a publication of the Protein Society.

[47]  A. Tramontano,et al.  Mimotopes of the hyper variable region 1 of the hepatitis C virus induce cross-reactive antibodies directed against discontinuous epitopes. , 2001, Molecular immunology.

[48]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[49]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[50]  I R Vetter,et al.  Effector Recognition by the Small GTP-binding Proteins Ras and Ral* , 1999, The Journal of Biological Chemistry.

[51]  C. Sander,et al.  Genome sequences and great expectations , 2000, Genome Biology.

[52]  A. Valencia,et al.  Structural model for family 32 of glycosyl‐hydrolase enzymes , 1998, Proteins.

[53]  J. Skolnick,et al.  Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. , 1998, Journal of molecular biology.

[54]  Gürol M. Süel,et al.  Evolutionarily conserved networks of residues mediate allosteric communication in proteins , 2003, Nature Structural Biology.

[55]  B. Rost Enzyme function less conserved than anticipated. , 2002, Journal of molecular biology.

[56]  A. Lesk,et al.  Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. , 1987, Journal of molecular biology.

[57]  Alfonso Valencia,et al.  Identification of Conserved Amino Acid Residues in Rat Liver Carnitine Palmitoyltransferase I Critical for Malonyl-CoA Inhibition , 2003, The Journal of Biological Chemistry.

[58]  D Fischer,et al.  Analysis of heregulin symmetry by weighted evolutionary tracing. , 1999, Protein engineering.

[59]  G. Church,et al.  Predicting ligand-binding function in families of bacterial receptors. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[60]  J M Blaney,et al.  A geometric approach to macromolecule-ligand interactions. , 1982, Journal of molecular biology.

[61]  L. Mirny,et al.  Using orthologous and paralogous proteins to identify specificity determining residues. , 2002, Genome biology.

[62]  Michael Gribskov,et al.  Profile scanning for three-dimensional structural patterns in protein sequences , 1988, Comput. Appl. Biosci..

[63]  N. Grishin,et al.  The subunit interfaces of oligomeric enzymes are conserved to a similar extent to the overall protein sequences , 1994, Protein science : a publication of the Protein Society.

[64]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[65]  L. Pauling,et al.  Evolutionary Divergence and Convergence in Proteins , 1965 .

[66]  K. Hatrick,et al.  Compensating changes in protein multiple sequence alignments. , 1994, Protein engineering.

[67]  F. Ayala,et al.  ADH evolution and the phylogenetic footprint , 1995, Journal of Molecular Evolution.

[68]  N. Ben-Tal,et al.  ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. , 2001, Journal of molecular biology.

[69]  X. Gu,et al.  Evolutionary Analysis for Functional Divergence of Jak Protein Kinase Domains and Tissue-Specific Genes , 2002, Journal of Molecular Evolution.

[70]  Barry Honig,et al.  Bioinformatics in structural genomics - Editorial , 2002, Bioinform..

[71]  Molecular cloning of an alpha-glucosidase-like gene from Penicillium minioluteum and structure prediction of its gene product. , 2001, Biochemical and biophysical research communications.

[72]  Alberta Jaqueline Padilla-Zu´ñiga,et al.  Non-homology knowledge-based prediction of the papain prosegment folding pattern: a description of plausible folding and activation mechanisms , 1998 .

[73]  X. Gu,et al.  Functional divergence in the caspase gene family and altered functional constraints: statistical analysis and prediction. , 2001, Genetics.

[74]  C. Sander,et al.  A method to predict functional residues in proteins , 1995, Nature Structural Biology.

[75]  N D Clarke,et al.  Covariation of residues in the homeodomain sequence family , 1995, Protein science : a publication of the Protein Society.

[76]  L. Kavraki,et al.  An accurate, sensitive, and scalable method to identify functional sites in protein structures. , 2003, Journal of molecular biology.

[77]  Montserrat Morillas,et al.  Structural Model of the Catalytic Core of Carnitine Palmitoyltransferase I and Carnitine Octanoyltransferase (COT) , 2001, The Journal of Biological Chemistry.

[78]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[79]  H. Breiteneder,et al.  Genetic Engineering of Allergens: Future Therapeutic Products , 2002, International Archives of Allergy and Immunology.

[80]  O. Lichtarge,et al.  Structural clusters of evolutionary trace residues are statistically significant and common in proteins. , 2002, Journal of molecular biology.

[81]  A. Bogan,et al.  Anatomy of hot spots in protein interfaces. , 1998, Journal of molecular biology.

[82]  A. Valencia,et al.  Automatic methods for predicting functionally important residues. , 2003, Journal of molecular biology.

[83]  Frank K. Pettit,et al.  Protein surface roughness and small molecular binding sites. , 1999, Journal of molecular biology.

[84]  Derivation and testing residue-residue mean-force potentials for use in protein structure recognition. , 2000, Methods in molecular biology.