Amino acid composition of proteins reduces deleterious impact of mutations

The evolutionary origin of amino acid occurrence frequencies in proteins (composition) is not yet fully understood. We suggest that protein composition works alongside the genetic code to minimize impact of mutations on protein structure. First, we propose a novel method for estimating thermodynamic stability of proteins whose sequence is constrained to a fixed composition. Second, we quantify the average deleterious impact of substituting one amino acid with another. Natural proteome compositions are special in at least two ways: 1) Natural compositions do not generate more stable proteins than the average random composition, however, they result in proteins that are less susceptible to damage from mutations. 2) Natural proteome compositions that result in more stable proteins (i.e. those of thermophiles) are also tuned to have a higher tolerance for mutations. This is consistent with the observation that environmental factors selecting for more stable proteins also enhance the deleterious impact of mutations.

[1]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[2]  B. K. Davis Evolution of the genetic code. , 1999, Progress in biophysics and molecular biology.

[3]  C. Gautier,et al.  Hydrophobicity, expressivity and aromaticity are the major trends of amino-acid usage in 999 Escherichia coli chromosome-encoded genes. , 1994, Nucleic acids research.

[4]  Christopher J. Oldfield,et al.  Intrinsically disordered protein. , 2001, Journal of molecular graphics & modelling.

[5]  Igor N. Berezovsky,et al.  Positive and Negative Design in Stability and Thermal Adaptation of Natural Proteins , 2006, PLoS Comput. Biol..

[6]  Philippe Marlière,et al.  Adaptive eradication of methionine and cysteine from cyanobacterial light-harvesting proteins , 1989, Nature.

[7]  L. Hurst,et al.  Early fixation of an optimal genetic code. , 2000, Molecular biology and evolution.

[8]  Takashi Gojobori,et al.  Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[9]  H. Dyson,et al.  Intrinsically unstructured proteins and their functions , 2005, Nature Reviews Molecular Cell Biology.

[10]  Bernard Derrida,et al.  Finite-size effects in random energy models and in the problem of polymers in a random medium , 1991 .

[11]  Laura F. Landweber,et al.  Rewiring the keyboard: evolvability of the genetic code , 2001, Nature Reviews Genetics.

[12]  W. Lim,et al.  Deciphering the message in protein sequences: tolerance to amino acid substitutions. , 1990, Science.

[13]  D. Baker,et al.  Functional rapidly folding proteins from simplified amino acid sequences , 1997, Nature Structural Biology.

[14]  Serge Massar,et al.  Optimality of the genetic code with respect to protein stability and amino-acid frequencies , 2001, Genome Biology.

[15]  J. Lobry,et al.  Influence of genomic G+C content on average amino-acid composition of proteins from 59 bacterial species. , 1997, Gene.

[16]  L. Hurst,et al.  The Genetic Code Is One in a Million , 1998, Journal of Molecular Evolution.

[17]  R. Sterner,et al.  Thermophilic Adaptation of Proteins , 2001, Critical reviews in biochemistry and molecular biology.

[18]  Eugene I. Shakhnovich,et al.  Protein stability imposes limits on organism complexity and speed of molecular evolution , 2007, Proceedings of the National Academy of Sciences.

[19]  J. Berg,et al.  Metal binding and folding properties of a minimalist Cys2His2 zinc finger peptide. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Eugene I Shakhnovich,et al.  A biophysical protein folding model accounts for most mutational fitness effects in viruses , 2011, Proceedings of the National Academy of Sciences.

[21]  K. Dill,et al.  Statistical potentials extracted from protein structures: how accurate are they? , 1996, Journal of molecular biology.

[22]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[23]  John W. Drake,et al.  Avoiding Dangerous Missense: Thermophiles Display Especially Low Mutation Rates , 2009, PLoS genetics.

[24]  L A Mirny,et al.  How to derive a protein folding potential? A new approach to an old problem. , 1996, Journal of molecular biology.

[25]  B. Derrida Random-energy model: An exactly solvable model of disordered systems , 1981 .

[26]  Johan Nilsson,et al.  Comparative analysis of amino acid distributions in integral membrane proteins from 107 genomes , 2005, Proteins.

[27]  Vijay S. Pande,et al.  Heteropolymer freezing and design: Towards physical models of protein folding , 2000 .

[28]  A. Beyer Sequence analysis of the AAA protein family , 1997, Protein science : a publication of the Protein Society.

[29]  N. Wingreen,et al.  NATURE OF DRIVING FORCE FOR PROTEIN FOLDING : A RESULT FROM ANALYZING THE STATISTICAL POTENTIAL , 1995, cond-mat/9512111.

[30]  Pawel Mackiewicz,et al.  Correlation between Mutation Pressure, Selection Pressure, and Occurrence of Amino Acids , 2003, International Conference on Computational Science.

[31]  E I Shakhnovich,et al.  Protein design: a perspective from simple tractable models , 1998, Folding & design.

[32]  R. Jernigan,et al.  Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. , 1996, Journal of molecular biology.

[33]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[34]  Uri Alon,et al.  The genetic code is nearly optimal for allowing additional information within protein-coding sequences. , 2007, Genome research.

[35]  R. Goldstein,et al.  The evolution and evolutionary consequences of marginal thermostability in proteins , 2011, Proteins.

[36]  Eugene Shakhnovich,et al.  Protein folding thermodynamics and dynamics: where physics, chemistry, and biology meet. , 2006, Chemical reviews.

[37]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[38]  V. Uversky Intrinsically Disordered Proteins , 2014 .

[39]  J. Drake,et al.  Genome-Wide Patterns of Nucleotide Substitution Reveal Stringent Functional Constraints on the Protein Sequences of Thermophiles , 2004, Genetics.

[40]  Sahand Hormoz,et al.  Design principles for self-assembly with short-range interactions , 2011, Proceedings of the National Academy of Sciences.

[41]  E. Shakhnovich,et al.  Implications of thermodynamics of protein folding for evolution of primary sequences , 1990, Nature.

[42]  J. L. King,et al.  Non-Darwinian evolution. , 1969, Science.

[43]  Igor N. Berezovsky,et al.  Protein and DNA Sequence Determinants of Thermophilic Adaptation , 2006, PLoS Comput. Biol..

[44]  Dan S. Tawfik,et al.  Stability effects of mutations and protein evolvability. , 2009, Current opinion in structural biology.

[45]  E. Shakhnovich,et al.  Formation of unique structure in polypeptide chains. Theoretical investigation with the aid of a replica approach. , 1989, Biophysical chemistry.

[46]  K. Dill Theory for the folding and stability of globular proteins. , 1985, Biochemistry.

[47]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[48]  Rolf Apweiler,et al.  UniProt archive , 2004, Bioinform..

[49]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.