A sequence-compatible amount of native burial information is sufficient for determining the structure of small globular proteins

Protein tertiary structures are known to be encoded in amino acid sequences, but the problem of structure prediction from sequence continues to be a challenge. With this question in mind, recent simulations have shown that atomic burials, as expressed by atom distances to the molecular geometrical center, are sufficiently informative for determining native conformations of small globular proteins. Here we use a simple computational experiment to estimate the amount of this required burial information and find it to be surprisingly small, actually comparable with the stringent limit imposed by sequence statistics. Atomic burials appear to satisfy, therefore, minimal requirements for a putative dominating property in the folding code because they provide an amount of information sufficiently large for structural determination but, at the same time, sufficiently small to be encodable in sequences. In a simple analogy with human communication, atomic burials could correspond to the actual “language” encoded in the amino acid “script” from which the complexity of native conformations is recovered during the folding process.

[1]  W. Kauzmann Some factors in the interpretation of protein denaturation. , 1959, Advances in protein chemistry.

[2]  K. Dill Dominant forces in protein folding. , 1990, Biochemistry.

[3]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[4]  P. Wolynes,et al.  Optimal protein-folding codes from spin-glass theory. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[5]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[6]  J. Onuchic,et al.  Navigating the folding routes , 1995, Science.

[7]  J. Onuchic,et al.  Funnels, pathways, and the energy landscape of protein folding: A synthesis , 1994, Proteins.

[8]  E I Shakhnovich,et al.  A test of lattice protein folding algorithms. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[9]  M. Billeter,et al.  MOLMOL: a program for display and analysis of macromolecular structures. , 1996, Journal of molecular graphics.

[10]  Peter G. Wolynes,et al.  As simple as can be? , 1997, Nature Structural Biology.

[11]  D. Baker,et al.  Functional rapidly folding proteins from simplified amino acid sequences , 1997, Nature Structural Biology.

[12]  A F Pereira De Araújo Folding protein models with a simple hydrophobic energy function: the fundamental importance of monomer inside/outside segregation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[13]  L. G. Garcia,et al.  Folding simulations of a three-dimensional protein model with a nonspecific hydrophobic energy function. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Vijay S Pande,et al.  Thoroughly sampling sequence space: Large‐scale protein design of structural ensembles , 2002, Protein science : a publication of the Protein Society.

[15]  Patrice Koehl,et al.  Protein topology and stability define the space of allowed sequences , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Michael C. Prentiss,et al.  Associative memory Hamiltonians for structure prediction without homology: α/β proteins , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[17]  J. Onuchic,et al.  Theory of Protein Folding This Review Comes from a Themed Issue on Folding and Binding Edited Basic Concepts Perfect Funnel Landscapes and Common Features of Folding Mechanisms , 2022 .

[18]  Flavio Seno,et al.  Geometry and symmetry presculpt the free-energy landscape of proteins. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[19]  N. V. Dokholyan What is the protein design alphabet? , 2004, Proteins.

[20]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[21]  G. Crooks,et al.  Protein secondary structure: entropy, correlations and prediction. , 2003, Bioinformatics.

[22]  A. Liwo,et al.  Ab initio simulations of protein-folding pathways by molecular dynamics with the united-residue model of polypeptide chains. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Eugene Shakhnovich,et al.  Protein folding thermodynamics and dynamics: where physics, chemistry, and biology meet. , 2006, Chemical reviews.

[24]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[25]  Description of atomic burials in compact globular proteins by Fermi‐Dirac probability distributions , 2006, Proteins.

[26]  Jeffrey Skolnick,et al.  All-atom ab initio folding of a diverse set of proteins. , 2006, Structure.

[27]  Osamu Miyashita,et al.  Conformational transitions of adenylate kinase: switching by cracking. , 2007, Journal of molecular biology.

[28]  F. Ding,et al.  Ab initio folding of proteins with all-atom discrete molecular dynamics. , 2008, Structure.

[29]  E. Shakhnovich,et al.  Native atomic burials, supplemented by physically motivated hydrogen bond constraints, contain sufficient information to determine the tertiary structure of small globular proteins , 2008, Proteins.

[30]  K. Dill,et al.  The protein folding problem. , 1993, Annual review of biophysics.

[31]  S Rackovsky,et al.  On the Information Content of Protein Sequences , 2011, Journal of biomolecular structure & dynamics.