DNA, dichotomic classes and frame synchronization: a quasi-crystal framework

In this article, we show how a new mathematical model of the genetic code can be exploited for investigating the almost periodic properties of DNA and mRNA protein-coding sequences. We present the main mathematical features of the model and highlight its connections with both number theory and group theory. The group theoretic framework presents interesting analogies with the theory of crystals. Moreover, we exploit the information provided by dichotomic classes, binary variables naturally derived from the mathematical model, in order to build statistical classifiers for retrieving and predicting the normal reading frame used by the ribosome in protein synthesis. The results show that coding sequences possess a local informational structure that can be related to frame synchronization processes. The information for retrieving the normal reading frame, which implies the existence of short-range correlations and almost periodic structures related to the organization of codons, offers an interesting analogy with the properties of quasi-crystals. From a theoretical point of view, our results might contribute to clarifying the relation between biological information and shape in nucleic acids and proteins. Also, from the point of view of applications, we present new promising tools for designing efficient algorithms for frame synchronization, which plays a crucial role in faithful synthesis of proteins.

[1]  V. V. Luchinin,et al.  A dodecahedron-based Model of Spatial Representation of the Canonical Set of amino acids , 2005, Advances in Bioinformatics and Its Applications.

[2]  E. Trifonov,et al.  The pitch of chromatin DNA is reflected in its nucleotide sequence. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[3]  J. Thompson,et al.  DNA information: from digital code to analogue structure , 2012, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[4]  Iu B Rumer [Codon systematization in the genetic code]. , 1966, Doklady Akademii nauk SSSR.

[5]  Alexandre Lomsadze,et al.  Frameshift detection in prokaryotic genomic sequences , 2009, Int. J. Bioinform. Res. Appl..

[6]  F H Crick,et al.  CODES WITHOUT COMMAS. , 1957, Proceedings of the National Academy of Sciences of the United States of America.

[7]  A. Schneemann The structural and functional role of RNA in icosahedral virus assembly. , 2006, Annual review of microbiology.

[8]  Diego L. González,et al.  The Mathematical Structure of the Genetic Code , 2008 .

[9]  Christian J. Michel,et al.  A 2006 review of circular codes in genes , 2008, Comput. Math. Appl..

[10]  Ryan A. Rossi,et al.  Crick's Hypothesis Revisited: The Existence of a Universal Coding Frame , 2007, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07).

[11]  S. Giannerini,et al.  On the origin of the mitochondrial genetic code: Towards a unified mathematical framework for the management of genetic information , 2012 .

[12]  Hornos Algebraic model for the evolution of the genetic code. , 1993, Physical review letters.

[13]  V. I. Shcherbak The co-operative symmetry of the genetic code. , 1988, Journal of theoretical biology.

[14]  J. Widom Short-range order in two eukaryotic genomes: relation to chromosome structure. , 1996, Journal of molecular biology.

[15]  Joachim Hagenauer,et al.  Applying Techniques from Frame Synchronization for Biological Sequence Analysis , 2007, 2007 IEEE International Conference on Communications.

[16]  A. Travers The Evolution of the Genetic Code Revisited , 2007, Origins of Life and Evolution of Biospheres.

[17]  S. Golomb,et al.  Comma-Free Codes , 1958, Canadian Journal of Mathematics.

[18]  S. Giannerini,et al.  Detecting structure in parity binary sequences , 2006, IEEE Engineering in Medicine and Biology Magazine.

[19]  Christian J. Michel,et al.  Identification of circular codes in bacterial genomes and their use in a factorization method for retrieving the reading frames of genes , 2006, Comput. Biol. Chem..

[20]  J. Biro Discovery of Proteomic Code with mRNA Assisted Protein Folding , 2008, International journal of molecular sciences.

[21]  C. Day Binary Quasicrystals Discovered That Are Stable and Icosahedral , 2001 .

[22]  Stephen Wolfram,et al.  A New Kind of Science , 2003, Artificial Life.

[23]  Brian Hayes,et al.  THE INVENTION OF THE GENETIC CODE , 1998 .

[24]  E. Schrödinger What is life? : the physical aspect of the living cell , 1944 .

[25]  Dónall A. Mac Dónaill,et al.  Why Nature Chose A, C, G and U/T: An Error-Coding Perspective of Nucleotide Alphabet Composition , 2003, Origins of life and evolution of the biosphere.

[26]  Diego L Gonzalez Can the genetic code be mathematically described? , 2004, Medical science monitor : international medical journal of experimental and clinical research.

[27]  J. Biro Protein Folding Information in Nucleic Acids which Is Not Present in the Genetic Code , 2006, Annals of the New York Academy of Sciences.

[28]  Julyan H E Cartwright,et al.  Beyond crystals: the dialectic of materials and information , 2012, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[29]  Simone Giannerini,et al.  Strong short-range correlations and dichotomic codon classes in coding DNA sequences. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  W. Gelbart,et al.  Origin of icosahedral symmetry in viruses. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[31]  S. Giannerini,et al.  Circular codes revisited: a statistical approach. , 2011, Journal of theoretical biology.

[32]  E. Trifonov Translation framing code and frame-monitoring mechanism as suggested by the analysis of mRNA and 16 S rRNA nucleotide sequences. , 1987, Journal of molecular biology.

[33]  S. Giannerini,et al.  THE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE , 2009 .

[34]  J. R. Lobry,et al.  SeqinR 1.0-2: A Contributed Package to the R Project for Statistical Computing Devoted to Biological Sequences Retrieval and Analysis , 2007 .

[35]  C J Michel,et al.  A complementary circular code in the protein coding genes. , 1996, Journal of theoretical biology.

[36]  Mladen A. Vouk,et al.  Analysis of Free Energy Signals Arising from Nucleotide Hybridization Between rRNA and mRNA Sequences during Translation in Eubacteria , 2006, EURASIP J. Bioinform. Syst. Biol..

[37]  Nikola Štambuk On Circular Coding Properties of Gene and Protein Sequences , 1999 .