The organization of biological sequences into constrained and unconstrained parts determines fundamental properties of genotype–phenotype maps

Biological information is stored in DNA, RNA and protein sequences, which can be understood as genotypes that are translated into phenotypes. The properties of genotype–phenotype (GP) maps have been studied in great detail for RNA secondary structure. These include a highly biased distribution of genotypes per phenotype, negative correlation of genotypic robustness and evolvability, positive correlation of phenotypic robustness and evolvability, shape-space covering, and a roughly logarithmic scaling of phenotypic robustness with phenotypic frequency. More recently similar properties have been discovered in other GP maps, suggesting that they may be fundamental to biological GP maps, in general, rather than specific to the RNA secondary structure map. Here we propose that the above properties arise from the fundamental organization of biological information into ‘constrained' and ‘unconstrained' sequences, in the broadest possible sense. As ‘constrained' we describe sequences that affect the phenotype more immediately, and are therefore more sensitive to mutations, such as, e.g. protein-coding DNA or the stems in RNA secondary structure. ‘Unconstrained' sequences, on the other hand, can mutate more freely without affecting the phenotype, such as, e.g. intronic or intergenic DNA or the loops in RNA secondary structure. To test our hypothesis we consider a highly simplified GP map that has genotypes with ‘coding' and ‘non-coding' parts. We term this the Fibonacci GP map, as it is equivalent to the Fibonacci code in information theory. Despite its simplicity the Fibonacci GP map exhibits all the above properties of much more complex and biologically realistic GP maps. These properties are therefore likely to be fundamental to many biological GP maps.

[1]  Javier M. Buldú,et al.  Correction: Topological Structure of the Space of Phenotypes: The Case of RNA Neutral Networks , 2011, PLoS ONE.

[2]  Andreas Wagner,et al.  A comparison of genotype-phenotype maps for RNA and proteins. , 2012, Biophysical journal.

[3]  Sebastian E. Ahnert,et al.  Genetic Correlations Greatly Increase Mutational Robustness and Can Both Reduce and Enhance Evolvability , 2015, PLoS Comput. Biol..

[4]  Andreas Wagner,et al.  The molecular origins of evolutionary innovations. , 2011, Trends in genetics : TIG.

[5]  C V Forst,et al.  Replication and mutation on neutral networks , 2001, Bulletin of mathematical biology.

[6]  A. Wagner Robustness and evolvability: a paradox resolved , 2008, Proceedings of the Royal Society B: Biological Sciences.

[7]  S. Manrubia,et al.  On the structural repertoire of pools of short, random RNA sequences. , 2008, Journal of theoretical biology.

[8]  P. Schuster,et al.  From sequences to shapes and back: a case study in RNA secondary structures , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[9]  A. Wagner Robustness, evolvability, and neutrality , 2005, FEBS letters.

[10]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[11]  Iain G. Johnston,et al.  A tractable genotype–phenotype map modelling the self-assembly of protein quaternary structure , 2014, Journal of The Royal Society Interface.

[12]  Joshua L. Payne,et al.  The Robustness and Evolvability of Transcription Factor Binding Sites , 2014, Science.

[13]  J. Doye,et al.  Self-assembly, modularity, and physical complexity. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Sebastian E Ahnert,et al.  Evolutionary dynamics in a simple model of self-assembly. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Christoph Adami,et al.  Information theory in molecular biology , 2004, q-bio/0405004.

[16]  Ard A. Louis,et al.  The Arrival of the Frequent: How Bias in Genotype-Phenotype Maps Can Steer Populations to Local Optima , 2014, PloS one.

[17]  A. Wagner,et al.  Evolutionary Innovations and the Organization of Protein Functions in Genotype Space , 2010, PloS one.

[18]  E. Bornberg-Bauer,et al.  How are model protein structures distributed in sequence space? , 1997, Biophysical journal.

[19]  Alberto Apostolico,et al.  Robust transmission of unbounded strings using Fibonacci representations , 1987, IEEE Trans. Inf. Theory.