Detecting compensatory covariation signals in protein evolution using reconstructed ancestral sequences.

When protein sequences divergently evolve under functional constraints, some individual amino acid replacements that reverse the charge (e.g. Lys to Asp) may be compensated by a replacement at a second position that reverses the charge in the opposite direction (e.g. Glu to Arg). When these side-chains are near in space (proximal), such double replacements might be driven by natural selection, if either is selectively disadvantageous, but both together restore fully the ability of the protein to contribute to fitness (are together "neutral"). Accordingly, many have sought to identify pairs of positions in a protein sequence that suffer compensatory replacements, often as a way to identify positions near in space in the folded structure. A "charge compensatory signal" might manifest itself in two ways. First, proximal charge compensatory replacements may occur more frequently than predicted from the product of the probabilities of individual positions suffering charge reversing replacements independently. Conversely, charge compensatory pairs of changes may be observed to occur more frequently in proximal pairs of sites than in the average pair. Normally, charge compensatory covariation is detected by comparing the sequences of extant proteins at the "leaves" of phylogenetic trees. We show here that the charge compensatory signal is more evident when it is sought by examining individual branches in the tree between reconstructed ancestral sequences at nodes in the tree. Here, we find that the signal is especially strong when the positions pairs are in a single secondary structural unit (e.g. alpha helix or beta strand) that brings the side-chains suffering charge compensatory covariation near in space, and may be useful in secondary structure prediction. Also, "node-node" and "node-leaf" compensatory covariation may be useful to identify the better of two equally parsimonious trees, in a way that is independent of the mathematical formalism used to construct the tree itself. Further, compensatory covariation may provide a signal that indicates whether an episode of sequence evolution contains more or less divergence in functional behavior. Compensatory covariation analysis on reconstructed evolutionary trees may become a valuable tool to analyze genome sequences, and use these analyses to extract biomedically useful information from proteome databases.

[1]  E. Neher How frequent are correlated changes in families of protein sequences? , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[2]  G Chelvanayagam,et al.  A combinatorial distance-constraint approach to predicting protein tertiary models from known secondary structure. , 1998, Folding & design.

[3]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[4]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[5]  Chantal Roth-Korostensky,et al.  Algorithms for building multiple sequence alignments and evolutionary trees , 2000 .

[6]  S. Benner Reconstructing the Evolution of Proteins , 1988 .

[7]  W. Messier,et al.  Episodic adaptive evolution of primate lysozymes , 1997, Nature.

[8]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[9]  C. Luo,et al.  A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. , 1985, Molecular biology and evolution.

[10]  M. Nei,et al.  Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used. , 2000, Molecular biology and evolution.

[11]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank. , 1991, Nucleic acids research.

[12]  G. Chelvanayagam,et al.  An analysis of simultaneous variation in protein structures. , 1997, Protein engineering.

[13]  S. Benner,et al.  Functional inferences from reconstructed evolutionary biology involving rectified databases--an evolutionarily grounded approach to functional genomics. , 2000, Research in microbiology.

[14]  J. Felsenstein,et al.  Inching toward reality: An improved likelihood model of sequence evolution , 2004, Journal of Molecular Evolution.

[15]  Burkhard Rost,et al.  PHD - an automatic mail server for protein secondary structure prediction , 1994, Comput. Appl. Biosci..

[16]  Redesigning the Molecules of Life , 1988, Springer Berlin Heidelberg.

[17]  M. Gerstein,et al.  Annotation Transfer for Genomics: Measuring Functional Divergence in Multi-Domain Proteins , 2001, Genome Research.

[18]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[19]  Scott R. Presnell,et al.  The ribonuclease from an extinct bovid ruminant , 1990, FEBS letters.

[20]  K. Holsinger The neutral theory of molecular evolution , 2004 .

[21]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[22]  K. Hatrick,et al.  Compensating changes in protein multiple sequence alignments. , 1994, Protein engineering.

[23]  Gaston H. Gonnet,et al.  Computational biochemistry research at ETH , 1991 .

[24]  S. Benner,et al.  Post-genomic science: converting primary structure into physiological function. , 1998, Advances in enzyme regulation.

[25]  G. Gonnet,et al.  Empirical and structural models for insertions and deletions in the divergent evolution of proteins. , 1993, Journal of molecular biology.

[26]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[27]  M. Miyamoto,et al.  Testing the covarion hypothesis of molecular evolution. , 1995, Molecular biology and evolution.

[28]  S. Benner,et al.  Interpreting the behavior of enzymes: purpose or pedigree? , 1988, CRC critical reviews in biochemistry.

[29]  M M Miyamoto,et al.  Function-structure analysis of proteins using covarion-based evolutionary approaches: Elongation factors. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[30]  S. Benner,et al.  Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: a prediction of the structure of the catalytic domain of protein kinases. , 1991, Advances in enzyme regulation.

[31]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[32]  Brian W. Matthews,et al.  Ancestral lysozymes reconstructed, neutrality tested, and thermostability linked to hydrocarbon packing , 1990, Nature.

[33]  A. Lesk,et al.  Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. , 1987, Journal of molecular biology.

[34]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[35]  F. Cohen,et al.  Co-evolution of proteins with their interaction partners. , 2000, Journal of molecular biology.

[36]  David A Liberles,et al.  The Adaptive Evolution Database (TAED) , 2001, Genome Biology.

[37]  B. Rost,et al.  Effective use of sequence correlation and conservation in fold recognition. , 1999, Journal of molecular biology.

[38]  Marcel Turcotte,et al.  Bona Fide Predictions of Protein Secondary Structure Using Transparent Analyses of Multiple Sequence Alignments , 1998 .

[39]  S A Benner,et al.  Analysis of amino acid substitution during divergent evolution: the 400 by 400 dipeptide substitution matrix. , 1994, Biochemical and biophysical research communications.

[40]  J. Huelsenbeck,et al.  Application and accuracy of molecular phylogenies. , 1994, Science.

[41]  C. Sander,et al.  Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? , 1994, Protein engineering.

[42]  M. Sternberg,et al.  Modelling the ATP‐binding site of oncogene products, the epidermal growth factor receptor and related proteins , 1984, FEBS letters.

[43]  J. Skolnick,et al.  Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. , 1998, Journal of molecular biology.

[44]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[45]  Steven A. Benner,et al.  Reconstructing the evolutionary history of the artiodactyl ribonuclease superfamily , 1995, Nature.

[46]  K. Nagai,et al.  Coordinated amino acid changes in homologous protein families. , 1988, Protein engineering.

[47]  S. Benner,et al.  Pseudogenes in ribonuclease evolution: a source of new biomacromolecular function? , 1996, FEBS letters.