The Amino Acid Alphabet and the Architecture of the Protein Sequence-Structure Map. I. Binary Alphabets

The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet.

[1]  D. Lipman,et al.  Modelling neutral and selective evolution of protein folding , 1991, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[2]  O. Ptitsyn,et al.  Why do globular proteins fit the limited set of folding patterns? , 1987, Progress in biophysics and molecular biology.

[3]  Michael Levitt,et al.  Comparative modeling and protein‐like features of hydrophobic–polar models on a two‐dimensional lattice , 2012, Proteins.

[4]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[5]  E. Bornberg-Bauer,et al.  How are model protein structures distributed in sequence space? , 1997, Biophysical journal.

[6]  H. Chan Folding alphabets , 1999, Nature Structural Biology.

[7]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[8]  P. Schuster,et al.  Generic properties of combinatory maps: neutral networks of RNA secondary structures. , 1997, Bulletin of mathematical biology.

[9]  G. Crippen,et al.  Contact potential that recognizes the correct folding of globular proteins. , 1992, Journal of molecular biology.

[10]  J. Onuchic,et al.  Funnels, pathways, and the energy landscape of protein folding: A synthesis , 1994, Proteins.

[11]  T. Fink,et al.  Protein design depends on the size of the amino acid alphabet. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  D. Yee,et al.  Principles of protein folding — A perspective from simple exact models , 1995, Protein science : a publication of the Protein Society.

[13]  P. Wolynes,et al.  Spin glasses and the statistical mechanics of protein folding. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[14]  High solubility of random-sequence proteins consisting of five kinds of primitive amino acids. , 2005, Protein engineering, design & selection : PEDS.

[15]  Carl Troein,et al.  Enumerating Designing Sequences in the HP Model , 2002, Journal of biological physics.

[16]  Ke Fan,et al.  What is the minimum number of letters required to fold a protein? , 2003, Journal of molecular biology.

[17]  A. Finkelstein,et al.  Why are the same protein folds used to perform different functions? , 1993, FEBS letters.

[18]  M. Karplus,et al.  An analysis of incorrectly folded protein models. Implications for structure predictions. , 1984, Journal of molecular biology.

[19]  K. Dill,et al.  Statistical potentials extracted from protein structures: how accurate are they? , 1996, Journal of molecular biology.

[20]  Hilla Peretz,et al.  The , 1966 .

[21]  Gustavo Caetano-Anollés,et al.  The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world. , 2009, Structure.

[22]  Miller Wg,et al.  Collapsed structure polymers. A scattergun approach to amino acid copolymers. , 1974 .

[23]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[24]  A. Godzik,et al.  Are proteins ideal mixtures of amino acids? Analysis of energy parameter sets , 1995, Protein science : a publication of the Protein Society.

[25]  F E Cohen,et al.  Novel method for the rapid evaluation of packing in protein structures. , 1990, Journal of molecular biology.

[26]  Seishi Shimizu,et al.  Cooperativity principles in protein folding. , 2004, Methods in enzymology.

[27]  Dan S. Tawfik,et al.  Intense neutral drifts yield robust and evolvable consensus proteins. , 2008, Journal of molecular biology.

[28]  L. Reymond,et al.  Charge interactions can dominate the dimensions of intrinsically disordered proteins , 2010, Proceedings of the National Academy of Sciences.

[29]  L. H. Bradley,et al.  Protein design by binary patterning of polar and nonpolar amino acids. , 1993, Methods in molecular biology.

[30]  M. Levitt,et al.  A lattice model for protein structure prediction at low resolution. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[31]  David R. Liu,et al.  Sequence-Controlled Polymers , 2013, Science.

[32]  Frances H. Arnold,et al.  In the Light of Evolution III: Two Centuries of Darwin Sackler Colloquium: In the light of directed evolution: Pathways of adaptive protein evolution , 2009 .

[33]  M. Karplus,et al.  Kinetics of protein folding. A lattice model study of the requirements for folding to the native state. , 1994, Journal of molecular biology.

[34]  H. Scheraga,et al.  Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins. , 1976, Macromolecules.

[35]  Andreas Wagner,et al.  Neutralism and selectionism: a network-based reconciliation , 2008, Nature Reviews Genetics.

[36]  Nicolas E. Buchler,et al.  Effect of alphabet size and foldability requirements on protein structure designability , 1999, Proteins.

[37]  Erich Bornberg-Bauer,et al.  Evolutionary Dynamics on Protein Bi-stability Landscapes can Potentially Resolve Adaptive Conflicts , 2012, PLoS Comput. Biol..

[38]  Stephen Freeland,et al.  On the evolution of the standard amino-acid alphabet , 2006, Genome Biology.

[39]  P. Schuster,et al.  IR-98-039 / April Continuity in Evolution : On the Nature of Transitions , 1998 .

[40]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[41]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[42]  Richard A. Goldstein,et al.  Searching for foldable protein structures using optimized energy functions , 1995 .

[43]  P G Schultz,et al.  Expanding the Genetic Code of Escherichia coli , 2001, Science.

[44]  Andreas Wagner,et al.  New structural variation in evolutionary searches of RNA neutral networks , 2006, Biosyst..

[45]  A. Maritan,et al.  Compactness, aggregation, and prionlike behavior of protein: A lattice model study , 2000 .

[46]  K. Dill,et al.  Comparing folding codes for proteins and polymers , 1996, Proteins.

[47]  Erich Bornberg-Bauer,et al.  A structural model of latent evolutionary potentials underlying neutral networks in proteins. , 2007, HFSP journal.

[48]  Klimov,et al.  Criterion that determines the foldability of proteins. , 1996, Physical review letters.

[49]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[50]  P. K. Warme,et al.  A survey of amino acid side-chain interactions in 21 proteins. , 1978, Journal of molecular biology.

[51]  P. Schuster,et al.  From sequences to shapes and back: a case study in RNA secondary structures , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[52]  Erich Bornberg-Bauer,et al.  Comparing folding codes in simple heteropolymer models of protein evolutionary landscape: robustness of the superfunnel paradigm. , 2005, Biophysical journal.

[53]  J. Maynard Smith Natural Selection and the Concept of a Protein Space , 1970 .

[54]  R. Sauer,et al.  Folded proteins occur frequently in libraries of random amino acid sequences. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[55]  C Schwabe The structure and evolution of αβ barrel proteins , 1996, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[56]  P. Alberch From genes to phenotype: dynamical systems and evolvability , 2004, Genetica.

[57]  K. Dill,et al.  A lattice statistical mechanics model of the conformational and sequence spaces of proteins , 1989 .

[58]  M Levitt,et al.  Different protein sequences can give rise to highly similar folds through different stabilizing interactions , 1994, Protein science : a publication of the Protein Society.

[59]  A. Lesk,et al.  How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. , 1980, Journal of molecular biology.

[60]  E. Bornberg-Bauer,et al.  Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[61]  F. H. C. CRICK,et al.  Origin of the Genetic Code , 1967, Nature.

[62]  D. Baker,et al.  Functional rapidly folding proteins from simplified amino acid sequences , 1997, Nature Structural Biology.

[63]  Christoph Adami,et al.  Thermodynamic prediction of protein neutrality. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[64]  N. Madras,et al.  THE SELF-AVOIDING WALK , 2006 .

[65]  Erich Bornberg-Bauer,et al.  Recombinatoric exploration of novel folded structures: A heteropolymer-based model of protein evolutionary landscapes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[66]  F. Arnold,et al.  Protein stability promotes evolvability. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[67]  J. Noirel,et al.  Neutral evolution of Protein-protein interactions: a computational study using simple models , 2007, BMC Structural Biology.

[68]  E I Shakhnovich,et al.  Protein design: a perspective from simple tractable models , 1998, Folding & design.

[69]  Thomas H Segall-Shapiro,et al.  Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome , 2010, Science.

[70]  Michael Levitt,et al.  On the universe of protein folds. , 2013, Annual review of biophysics.

[71]  Erich Bornberg-Bauer,et al.  Perspectives on protein evolution from simple exact models. , 2002, Applied bioinformatics.

[72]  Julia Hartling,et al.  Mutational robustness and geometrical form in protein structures. , 2008, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[73]  Joost Schymkowitz,et al.  The stability effects of protein mutations appear to be universally distributed. , 2007, Journal of molecular biology.

[74]  Ken A. Dill,et al.  Symmetry and designability for lattice protein models , 2000, cond-mat/0006372.

[75]  R A Goldstein,et al.  Why are some proteins structures so common? , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[76]  S. Bryant,et al.  The frequency of ion‐pair substructures in proteins is quantitatively related to electrostatic potential: A statistical model for nonbonded interactions , 1991, Proteins.

[77]  Peter G. Wolynes,et al.  As simple as can be? , 1997, Nature Structural Biology.

[78]  Leslie G. Valiant,et al.  Evolvability , 2009, JACM.

[79]  N. Wingreen,et al.  Emergence of Preferred Structures in a Simple Model of Protein Folding , 1996, Science.

[80]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[81]  A. Godzik,et al.  Topology fingerprint approach to the inverse protein folding problem. , 1992, Journal of molecular biology.

[82]  Ned S. Wingreen,et al.  Designability, thermodynamic stability, and dynamics in protein folding: A lattice model study , 1998, cond-mat/9806197.

[83]  M. Levitt Nature of the protein universe , 2009, Proceedings of the National Academy of Sciences.

[84]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[85]  W. G. Miller,et al.  Collapsed structure polymers. A scattergun approach to amino acid copolymers. , 1974, Biochemistry.

[86]  E. Shakhnovich,et al.  Engineering of stable and fast-folding sequences of model proteins. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[87]  A. Urvoas,et al.  Artificial proteins from combinatorial approaches. , 2012, Trends in biotechnology.

[88]  Hue Sun Chan,et al.  Cooperativity, local-nonlocal coupling, and nonnative interactions: principles of protein folding from coarse-grained models. , 2011, Annual review of physical chemistry.

[89]  Richard A. Goldstein,et al.  Surveying determinants of protein structure designability across different energy models and amino-acid alphabets: A consensus , 2000 .