Identification of Coevolving Residues and Coevolution Potentials Emphasizing Structure, Bond Formation and Catalytic Coordination in Protein Evolution

The structure and function of a protein is dependent on coordinated interactions between its residues. The selective pressures associated with a mutation at one site should therefore depend on the amino acid identity of interacting sites. Mutual information has previously been applied to multiple sequence alignments as a means of detecting coevolutionary interactions. Here, we introduce a refinement of the mutual information method that: 1) removes a significant, non-coevolutionary bias and 2) accounts for heteroscedasticity. Using a large, non-overlapping database of protein alignments, we demonstrate that predicted coevolving residue-pairs tend to lie in close physical proximity. We introduce coevolution potentials as a novel measure of the propensity for the 20 amino acids to pair amongst predicted coevolutionary interactions. Ionic, hydrogen, and disulfide bond-forming pairs exhibited the highest potentials. Finally, we demonstrate that pairs of catalytic residues have a significantly increased likelihood to be identified as coevolving. These correlations to distinct protein features verify the accuracy of our algorithm and are consistent with a model of coevolution in which selective pressures towards preserving residue interactions act to shape the mutational landscape of a protein by restricting the set of admissible neutral mutations.

[1]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[2]  A. Valencia,et al.  High-confidence prediction of global interactomes based on genome-wide coevolutionary networks , 2008, Proceedings of the National Academy of Sciences.

[3]  James G. Ferry,et al.  Urkinase: Structure of Acetate Kinase, a Member of the ASKHA Superfamily of Phosphotransferases , 2001, Journal of bacteriology.

[4]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[5]  G. Gloor,et al.  Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions. , 2005, Biochemistry.

[6]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[7]  David Haussler,et al.  Detecting Coevolution in and among Protein Domains , 2007, PLoS Comput. Biol..

[8]  C. Toniolo,et al.  Peptide helices based on α‐amino acids , 2006 .

[9]  S. Suh,et al.  Crystal structure of carboxylesterase from Pseudomonas fluorescens, an alpha/beta hydrolase with broad substrate specificity. , 1997, Structure.

[10]  T. Jukes Non-Darwinian Evolution , 2001 .

[11]  S. Suh,et al.  Crystal structure of chorismate synthase: a novel FMN-binding protein fold and functional insights. , 2004, Journal of molecular biology.

[12]  Sachdev S Sidhu,et al.  Origins of PDZ Domain Ligand Specificity , 2003, The Journal of Biological Chemistry.

[13]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[14]  A. Horovitz,et al.  Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations , 2002, Proteins.

[15]  E. van Nimwegen,et al.  Accurate Prediction of Protein–protein Interactions from Sequence Alignments Using a Bayesian Method , 2022 .

[16]  P Fariselli,et al.  Prediction of contact maps with neural networks and correlated mutations. , 2001, Protein engineering.

[17]  B. Rost,et al.  Effective use of sequence correlation and conservation in fold recognition. , 1999, Journal of molecular biology.

[18]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[19]  A. Lapedes,et al.  Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[20]  David S. Eisenberg,et al.  Using inferred residue contacts to distinguish between correct and incorrect protein models , 2008, Bioinform..

[21]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[22]  F A Quiocho,et al.  Complexes of adenosine deaminase with two potent inhibitors: X-ray structures in four independent molecules at pH of maximum activity. , 1998, Biochemistry.

[23]  Mark Gerstein,et al.  An integrated system for studying residue coevolution in proteins , 2008, Bioinform..

[24]  M. Kimura Evolutionary Rate at the Molecular Level , 1968, Nature.

[25]  R. Aldrich,et al.  Influence of conservation on calculations of amino acid covariance in multiple sequence alignments , 2004, Proteins.

[26]  W. Atchley,et al.  Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[27]  C. Toniolo,et al.  Peptide helices based on alpha-amino acids. , 2006, Biopolymers.

[28]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[29]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[30]  Anders Gorm Pedersen,et al.  Finding coevolving amino acid residues using row and column weighting of mutual information and multi-dimensional amino acid representation , 2007, Algorithms for molecular biology : AMB.

[31]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.

[32]  Simon A. A. Travers,et al.  Functional coevolutionary networks of the Hsp70-Hop-Hsp90 system revealed through computational analyses. , 2007, Molecular biology and evolution.

[33]  L. C. Martin,et al.  Using information theory to search for co-evolving residues in proteins , 2005, Bioinform..

[34]  Kevin Karplus,et al.  Contact prediction using mutual information and neural nets , 2007, Proteins.

[35]  Alan S. Lapedes,et al.  Analysis of Correlations Between Sites in Models of Protein Sequences , 1998 .

[36]  W. Fitch,et al.  An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution , 1970, Biochemical Genetics.

[37]  E. Baker,et al.  Hydrogen bonding in globular proteins. , 1984, Progress in biophysics and molecular biology.

[38]  R. Jernigan,et al.  Self‐consistent estimation of inter‐residue protein contact energies based on an equilibrium mixture approximation of residues , 1999, Proteins.

[39]  Zhengyuan O. Wang,et al.  Coevolutionary Patterns in Cytochrome c Oxidase Subunit I Depend on Structural and Functional Context , 2007, Journal of Molecular Evolution.

[40]  Ook Joon Yoo,et al.  Crystal structure of carboxylesterase from Pseudomonas fluorescens, an α/β hydrolase with broad substrate specificity , 1997 .

[41]  Thomas W. H. Lui,et al.  Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments , 2003, Bioinform..

[42]  Florencio Pazos,et al.  Prediction of protein interaction based on similarity of phylogenetic trees. , 2008, Methods in molecular biology.