n-Nucleotide circular codes in graph theory

The circular code theory proposes that genes are constituted of two trinucleotide codes: the classical genetic code with 61 trinucleotides for coding the 20 amino acids (except the three stop codons {TAA,TAG,TGA}) and a circular code based on 20 trinucleotides for retrieving, maintaining and synchronizing the reading frame. It relies on two main results: the identification of a maximal C3 self-complementary trinucleotide circular code X in genes of bacteria, eukaryotes, plasmids and viruses (Michel 2015 J. Theor. Biol. 380, 156–177. (doi:10.1016/j.jtbi.2015.04.009); Arquès & Michel 1996 J. Theor. Biol. 182, 45–58. (doi:10.1006/jtbi.1996.0142)) and the finding of X circular code motifs in tRNAs and rRNAs, in particular in the ribosome decoding centre (Michel 2012 Comput. Biol. Chem. 37, 24–37. (doi:10.1016/j.compbiolchem.2011.10.002); El Soufi & Michel 2014 Comput. Biol. Chem. 52, 9–17. (doi:10.1016/j.compbiolchem.2014.08.001)). The univerally conserved nucleotides A1492 and A1493 and the conserved nucleotide G530 are included in X circular code motifs. Recently, dinucleotide circular codes were also investigated (Michel & Pirillo 2013 ISRN Biomath. 2013, 538631. (doi:10.1155/2013/538631); Fimmel et al. 2015 J. Theor. Biol. 386, 159–165. (doi:10.1016/j.jtbi.2015.08.034)). As the genetic motifs of different lengths are ubiquitous in genes and genomes, we introduce a new approach based on graph theory to study in full generality n-nucleotide circular codes X, i.e. of length 2 (dinucleotide), 3 (trinucleotide), 4 (tetranucleotide), etc. Indeed, we prove that an n-nucleotide code X is circular if and only if the corresponding graph is acyclic. Moreover, the maximal length of a path in corresponds to the window of nucleotides in a sequence for detecting the correct reading frame. Finally, the graph theory of tournaments is applied to the study of dinucleotide circular codes. It has full equivalence between the combinatorics theory (Michel & Pirillo 2013 ISRN Biomath. 2013, 538631. (doi:10.1155/2013/538631)) and the group theory (Fimmel et al. 2015 J. Theor. Biol. 386, 159–165. (doi:10.1016/j.jtbi.2015.08.034)) of dinucleotide circular codes while its mathematical approach is simpler.

[1]  Christian J. Michel,et al.  Circular code motifs in transfer RNAs , 2013, Comput. Biol. Chem..

[2]  M. Blaser,et al.  Evolutionary implications of microbial genome tetranucleotide frequency biases. , 2003, Genome research.

[3]  Lutz Strüngmann,et al.  Maximal dinucleotide comma-free codes. , 2016, Journal of theoretical biology.

[4]  C J Michel,et al.  A study of the purine/pyrimidine codon occurrence with a reduced centered variable and an evaluation compared to the frequency statistic. , 1989, Mathematical biosciences.

[5]  L. Welch,et al.  CONSTRUCTION AND PROPERTIES OF COMMA-FREE CODES , 2015 .

[6]  K. Zänker,et al.  Modulation of Epidermal Growth Factor Receptor Gene Transcription by a Polymorphic Dinucleotide Repeat in Intron 1* , 1999, The Journal of Biological Chemistry.

[7]  A. Schmidt,et al.  Microsatellite mutation directed by an external stimulus. , 2004, Mutation research.

[8]  Andrzej K. Konopka,et al.  DISTAN--a program which detects significant distances between short oligonucleotides , 1987, Comput. Appl. Biosci..

[9]  A. Bird The dinucleotide CG as a genomic signalling module , 2013, Epigenetics & Chromatin.

[10]  C J Michel,et al.  New statistical approach to discriminate between protein coding and non-coding regions in DNA sequences and its evaluation. , 1986, Journal of theoretical biology.

[11]  Lutz Strüngmann,et al.  On the hierarchy of trinucleotide n-circular codes and their corresponding amino acids. , 2015, Journal of theoretical biology.

[12]  M. Goossens,et al.  Polyvariant mutant cystic fibrosis transmembrane conductance regulator genes. The polymorphic (Tg)m locus explains the partial penetrance of the T5 polymorphism as a disease mutation. , 1998, The Journal of clinical investigation.

[13]  Hervé Seligmann,et al.  Tetracoding increases with body temperature in Lepidosauria , 2013, Biosyst..

[14]  D. Arquès,et al.  Periodicities in introns. , 1987, Nucleic acids research.

[15]  S. Golomb,et al.  Comma-Free Codes , 1958, Canadian Journal of Mathematics.

[16]  Christian J. Michel,et al.  A 2006 review of circular codes in genes , 2008, Comput. Math. Appl..

[17]  Hervé Seligmann,et al.  Putative mitochondrial polypeptides coded by expanded quadruplet codons, decoded by antisense tRNAs with unusual anticodons , 2012, Biosyst..

[18]  Christian J. Michel,et al.  Circular code motifs in transfer and 16S ribosomal RNAs: A possible translation code in genes , 2012, Comput. Biol. Chem..

[19]  C J Michel,et al.  A complementary circular code in the protein coding genes. , 1996, Journal of theoretical biology.

[20]  F. Crick,et al.  A speculation on the origin of protein synthesis , 2004, Origins of life.

[21]  Lutz Strüngmann,et al.  Dinucleotide circular codes and bijective transformations. , 2015, Journal of theoretical biology.

[22]  V. Solovyev,et al.  Analysis of canonical and non-canonical splice sites in mammalian genomes. , 2000, Nucleic acids research.

[23]  S. Giannerini,et al.  On the origin of the mitochondrial genetic code: Towards a unified mathematical framework for the management of genetic information , 2012 .

[24]  Giuseppe Pirillo,et al.  Varieties of comma-free codes , 2008, Comput. Math. Appl..

[25]  Lutz Strüngmann,et al.  Circular codes, symmetries and transformations , 2015, Journal of mathematical biology.

[26]  J. C. Shepherd Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Christian J Michel,et al.  A genetic scale of reading frame coding. , 2014, Journal of theoretical biology.

[28]  John Clark,et al.  A First Look at Graph Theory , 1991 .

[29]  M. Eigen,et al.  The Hypercycle: A principle of natural self-organization , 2009 .

[30]  Marshall W. Nirenberg,et al.  The dependence of cell-free protein synthesis in E. coli upon naturally occurring or synthetic polyribonucleotides , 1961, Proceedings of the National Academy of Sciences.

[31]  F H Crick,et al.  CODES WITHOUT COMMAS. , 1957, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Christian J. Michel,et al.  The Maximal C3 Self-Complementary Trinucleotide Circular Code X in Genes of Bacteria, Archaea, Eukaryotes, Plasmids and Viruses , 2017, Life.

[33]  Christian J. Michel,et al.  Circular code motifs in the ribosome decoding center , 2014, Comput. Biol. Chem..

[34]  Giuseppe Pirillo,et al.  Dinucleotide Circular Codes , 2013 .

[35]  Stephen M. Mount,et al.  A catalogue of splice junction sequences. , 1982, Nucleic acids research.

[36]  Manfred Eigen,et al.  The Realistic Hypercycle , 1979 .

[37]  J. Isola,et al.  Allelic length of a CA dinucleotide repeat in the egfr gene correlates with the frequency of amplifications of this sequence—first results of an inter‐ethnic breast cancer study , 2004, The Journal of pathology.