Random Amino Acid Mutations and Protein Misfolding Lead to Shannon Limit in Sequence-Structure Communication

The transmission of genomic information from coding sequence to protein structure during protein synthesis is subject to stochastic errors. To analyze transmission limits in the presence of spurious errors, Shannon's noisy channel theorem is applied to a communication channel between amino acid sequences and their structures established from a large-scale statistical analysis of protein atomic coordinates. While Shannon's theorem confirms that in close to native conformations information is transmitted with limited error probability, additional random errors in sequence (amino acid substitutions) and in structure (structural defects) trigger a decrease in communication capacity toward a Shannon limit at 0.010 bits per amino acid symbol at which communication breaks down. In several controls, simulated error rates above a critical threshold and models of unfolded structures always produce capacities below this limiting value. Thus an essential biological system can be realistically modeled as a digital communication channel that is (a) sensitive to random errors and (b) restricted by a Shannon error limit. This forms a novel basis for predictions consistent with observed rates of defective ribosomal products during protein synthesis, and with the estimated excess of mutual information in protein contact potentials.

[1]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[2]  M.K. Gupta,et al.  The quest for error correction in biology , 2006, IEEE Engineering in Medicine and Biology Magazine.

[3]  Lahomtoires d'Electronique AN INFORMATIONAL THEORY OF THE STATISTICAL STRUCTURE OF LANGUAGE 36 , 2010 .

[4]  Udo Heinemann,et al.  Crystal structures and properties of de novo circularly permuted 1,3‐1,4‐β‐glucanases , 1998, Proteins.

[5]  Christoph Adami,et al.  Information theory in molecular biology , 2004, q-bio/0405004.

[6]  Frances M. G. Pearl,et al.  The CATH domain structure database. , 2005, Methods of biochemical analysis.

[7]  J. R. Pierce,et al.  Symposium on Information Theory in Biology , 1959 .

[8]  S. Marqusee,et al.  Experimental evaluation of topological parameters determining protein-folding rates , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Dewey Algorithmic complexity of a protein. , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[10]  Algorithmic complexity of a protein. , 1996 .

[11]  Chris Sander,et al.  Dali/FSSP classification of three-dimensional protein folds , 1997, Nucleic Acids Res..

[12]  Steven E Brenner,et al.  Measurements of protein sequence–structure correlations , 2004, Proteins.

[13]  D. Haussler,et al.  Information‐theoretic dissection of pairwise contact potentials , 2002, Proteins.

[14]  I. Grosse,et al.  MEASURING CORRELATIONS IN SYMBOL SEQUENCES , 1995 .

[15]  Jonathan W. Yewdell,et al.  Rapid degradation of a large fraction of newly synthesized proteins by proteasomes , 2000, Nature.

[16]  L. Brillouin,et al.  Science and information theory , 1956 .

[17]  E N Trifonov,et al.  Loop fold nature of globular proteins. , 2001, Protein engineering.

[18]  Hubert P. Yockey,et al.  Information theory, evolution and the origin of life , 2005, Inf. Sci..

[19]  J. Gallant,et al.  An estimate of the global error frequency in translation , 1982, Molecular and General Genetics MGG.

[20]  J. Yewdell,et al.  Quantitating defective ribosome products. , 2005, Methods in molecular biology.

[21]  E. Yilmaz,et al.  Chemical Chaperones Reduce ER Stress and Restore Glucose Homeostasis in a Mouse Model of Type 2 Diabetes , 2006, Science.

[22]  W. Ebeling,et al.  Finite sample effects in sequence analysis , 1994 .

[23]  Christian Schlegel,et al.  On error bounds and turbo-codes , 1999, IEEE Communications Letters.

[24]  S. Wodak,et al.  Deviations from standard atomic volumes as a quality measure for protein crystal structures. , 1996, Journal of molecular biology.

[25]  H. P. Yockey,et al.  An application of information theory to the Central Dogma and the Sequence Hypothesis. , 1974, Journal of theoretical biology.

[26]  Shu-Bing Qian,et al.  Quantitating protein synthesis, degradation, and endogenous antigen processing. , 2003, Immunity.

[27]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[28]  M. Eigen Selforganization of matter and the evolution of biological macromolecules , 1971, Naturwissenschaften.

[29]  K. Thangavel,et al.  Optimization of code book in vector quantization , 2006, Ann. Oper. Res..

[30]  J. Yewdell,et al.  Characterization of Rapidly Degraded Polypeptides in Mammalian Cells Reveals a Novel Layer of Nascent Protein Quality Control* , 2006, Journal of Biological Chemistry.

[31]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[32]  Yanay Ofran,et al.  Proteins of the same fold and unrelated sequences have similar amino acid composition , 2006, Proteins.

[33]  M Vendruscolo,et al.  Recovery of protein structure from contact maps. , 1997, Folding & design.

[34]  C. Levinthal Are there pathways for protein folding , 1968 .

[35]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[36]  T. Gregory Dewey ALGORITHMIC COMPLEXITY AND THERMODYNAMICS OF SEQUENCE-STRUCTURE RELATIONSHIPS IN PROTEINS , 1997 .

[37]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[38]  E.E. May Communication theory and molecular biology at the crossroads , 2006, IEEE Engineering in Medicine and Biology Magazine.

[39]  Robert G. Gallager,et al.  A simple derivation of the coding theorem and some applications , 1965, IEEE Trans. Inf. Theory.

[40]  F. Melo,et al.  Assessing protein structures with a non-local atomic interaction energy. , 1998, Journal of molecular biology.

[41]  Y-h. Taguchi,et al.  Application of amino acid occurrence for discriminating different folding types of globular proteins , 2007, BMC Bioinformatics.

[42]  J. Buchner,et al.  Protein aggregation as a cause for disease. , 2006, Handbook of experimental pharmacology.

[43]  J Sühnel,et al.  More Hydrogen Bonds for the (structural) Biologist , 2022 .

[44]  G. Crooks,et al.  Protein secondary structure: entropy, correlations and prediction. , 2003, Bioinformatics.

[45]  Zhiping Weng,et al.  FAST: A novel protein structure alignment algorithm , 2004, Proteins.

[46]  C. Dobson Protein Folding and Disease: a view from the first Horizon Symposium , 2003, Nature Reviews Drug Discovery.

[47]  A. M. Lisewski,et al.  Rapid detection of similarity in protein structure and function through contact metric distances , 2006, Nucleic acids research.