A comparison of genotype-phenotype maps for RNA and proteins.

The relationship between the genotype (sequence) and the phenotype (structure) of macromolecules affects their ability to evolve new structures and functions. We here compare the genotype space organization of proteins and RNA molecules to identify differences that may affect this ability. To this end, we computationally study the genotype-phenotype relationship for short RNA and lattice proteins of a reduced monomer alphabet size, to make exhaustive analysis and direct comparison of their genotype spaces feasible. We find that many fewer protein molecules than RNA molecules fold, but they fold into many more structures than RNA. In consequence, protein phenotypes have smaller genotype networks whose member genotypes tend to be more similar than for RNA phenotypes. Neighborhoods in sequence space of a given radius around an RNA molecule contain more novel structures than for protein molecules. We compare this property to evidence from natural RNA and protein molecules, and conclude that RNA genotype space may be more conducive to the evolution of new structure phenotypes.

[1]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[2]  Paul Schimmel,et al.  Incorporation of nonnatural amino acids into proteins. , 2004, Annual review of biochemistry.

[3]  E. Bornberg-Bauer,et al.  How are model protein structures distributed in sequence space? , 1997, Biophysical journal.

[4]  Devarajan Thirumalai,et al.  Kinetics of Folding of Proteins and RNA , 1996 .

[5]  E. Gumbel,et al.  Statistics of extremes , 1960 .

[6]  P. Stadler,et al.  Conserved RNA secondary structures in Flaviviridae genomes. , 2004, The Journal of general virology.

[7]  D. Baker,et al.  Functional rapidly folding proteins from simplified amino acid sequences , 1997, Nature Structural Biology.

[8]  P. Schuster,et al.  Analysis of RNA sequence structure maps by exhaustive enumeration II. Structures of neutral networks and shape space covering , 1996 .

[9]  Udayan Mohanty,et al.  Compact and ordered collapse of randomly generated RNA sequences , 2005, Nature Structural &Molecular Biology.

[10]  L. Chew,et al.  Unit‐vector RMS (URMS) as a tool to analyze molecular dynamics trajectories , 1999, Proteins.

[11]  Marc A. Martí-Renom,et al.  Quantifying the relationship between sequence and three-dimensional structure conservation in RNA , 2009, BMC Bioinformatics.

[12]  W. Fontana Modelling 'evo-devo' with RNA. , 2002, BioEssays : news and reviews in molecular, cellular and developmental biology.

[13]  A. Wagner Robustness and evolvability: a paradox resolved , 2008, Proceedings of the Royal Society B: Biological Sciences.

[14]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[15]  Marc A. Martí-Renom,et al.  RNA structure alignment by a unit-vector approach , 2008, ECCB.

[16]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[17]  D. Yee,et al.  Principles of protein folding — A perspective from simple exact models , 1995, Protein science : a publication of the Protein Society.

[18]  P. Schuster,et al.  Analysis of RNA sequence structure maps by exhaustive enumeration I. Neutral networks , 1995 .

[19]  Erich Bornberg-Bauer,et al.  Perspectives on protein evolution from simple exact models. , 2002, Applied bioinformatics.

[20]  K. Dill,et al.  A lattice statistical mechanics model of the conformational and sequence spaces of proteins , 1989 .

[21]  Andreas Wagner,et al.  The Origins of Evolutionary Innovations: A Theory of Transformative Change in Living Systems , 2011 .

[22]  P. Schuster,et al.  From sequences to shapes and back: a case study in RNA secondary structures , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[23]  N. Cuceanu,et al.  Evolutionarily conserved RNA secondary structures in coding and non-coding sequences at the 3' end of the hepatitis G virus/GB-virus C genome. , 2001, The Journal of general virology.

[24]  John Maynard Smith,et al.  Natural Selection and the Concept of a Protein Space , 1970, Nature.

[25]  P. Schuster,et al.  Discrete Models of Biopolymers , 2007 .

[26]  Carl Troein,et al.  Enumerating Designing Sequences in the HP Model , 2002, Journal of biological physics.

[27]  P. Schuster,et al.  IR-98-039 / April Continuity in Evolution : On the Nature of Transitions , 1998 .

[28]  Harjinder Singh,et al.  Base pairing in RNA structures: A computational analysis of structural aspects and interaction energies , 2007 .

[29]  Nicolas E. Buchler,et al.  Effect of alphabet size and foldability requirements on protein structure designability , 1999, Proteins.

[30]  R. Sauer,et al.  Folded proteins occur frequently in libraries of random amino acid sequences. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[31]  R. Levy,et al.  Simplified amino acid alphabets for protein fold recognition and implications for folding. , 2000, Protein engineering.

[32]  T. Yomo,et al.  Solubility of artificial proteins with random sequences , 1996, FEBS letters.

[33]  A. G. Brevern,et al.  A reduced amino acid alphabet for understanding and designing protein adaptation to mutation , 2007, European Biophysics Journal.

[34]  G. F. Joyce,et al.  A ribozyme composed of only two different nucleotides , 2002, Nature.

[35]  Robert T. Sauer,et al.  Cooperatively folded proteins in random sequence libraries , 1995, Nature Structural Biology.

[36]  Gerald F. Joyce,et al.  A ribozyme that lacks cytidine , 1999, Nature.

[37]  Andreas Wagner,et al.  New structural variation in evolutionary searches of RNA neutral networks , 2006, Biosyst..

[38]  J Abelson,et al.  Evolution of a transfer RNA gene through a point mutation in the anticodon. , 1998, Science.

[39]  Yingfu Li,et al.  DNAzyme-mediated catalysis with only guanosine and cytidine nucleotides , 2008, Nucleic acids research.

[40]  L. H. Bradley,et al.  De novo proteins from designed combinatorial libraries , 2004, Protein science : a publication of the Protein Society.

[41]  J. Szostak,et al.  In vitro selection of functional nucleic acids. , 1999, Annual review of biochemistry.

[42]  A. Wagner,et al.  Evolutionary Innovations and the Organization of Protein Functions in Genotype Space , 2010, PloS one.

[43]  M. Distefano,et al.  Generation of New Enzymes via Covalent Modification of Existing Proteins , 2001 .

[44]  D. Lipman,et al.  Modelling neutral and selective evolution of protein folding , 1991, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[45]  D. Thirumalai,et al.  RNA and protein folding: common themes and variations. , 2005, Biochemistry.

[46]  E. Bornberg-Bauer,et al.  Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[47]  W. Kauzmann Some factors in the interpretation of protein denaturation. , 1959, Advances in protein chemistry.

[48]  P. Schuster,et al.  Statistics of RNA secondary structures , 1993, Biopolymers.

[49]  Thirumalai,et al.  Minimum energy compact structures of random sequences of heteropolymers. , 1993, Physical review letters.

[50]  N. Wingreen,et al.  Emergence of Preferred Structures in a Simple Model of Protein Folding , 1996, Science.

[51]  W. Hendrickson,et al.  Quantification of tertiary structural conservation despite primary sequence drift in the globin fold , 1994, Protein science : a publication of the Protein Society.

[52]  M. Huynen Exploring phenotype space through neutral evolution , 1996, Journal of Molecular Evolution.

[53]  R. Hardison,et al.  A brief history of hemoglobins: plant, animal, protist, and bacteria. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Osvaldo Olmea,et al.  MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison , 2002, Protein science : a publication of the Protein Society.

[55]  P. Stadler,et al.  Neutral networks in protein space: a computational study based on knowledge-based potentials of mean force. , 1997, Folding & design.

[56]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..