Evolutionary conservation and functional implications of circular code motifs in eukaryotic genomes

A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses (Michel, 2015, 2017; Arquès and Michel, 1996). This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code (Arquès and Michel, 1996). Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the reading frame in genes. In a recent study of the X motifs in the complete genome of the yeast, Saccharomyces cerevisiae, it was shown that they are significantly enriched in the reading frame of the genes (protein-coding regions) of the genome (Michel et al., 2017). It was suggested that these X motifs may be evolutionary relics of a primitive code originally used for gene translation. The aim of this paper is to address two questions: are X motifs conserved during evolution? and do they continue to play a functional role in the processes of genome decoding and protein production? In a large scale analysis involving complete genomes from four mammals and nine different yeast species, we highlight specific evolutionary pressures on the X motifs in the genes of all the genomes, and identify important new properties of X motif conservation at the level of the encoded amino acids. We then compare the occurrence of X motifs with existing experimental data concerning protein expression and protein production, and report a significant correlation between the number of X motifs in a gene and increased protein abundance. In a general way, this work suggests that motifs from circular codes, i.e. motifs having the property of reading frame retrieval, may represent functional elements located within the coding regions of extant genomes.

[1]  C J Michel,et al.  A complementary circular code in the protein coding genes. , 1996, Journal of theoretical biology.

[2]  Christian J. Michel,et al.  Circular code motifs in transfer RNAs , 2013, Comput. Biol. Chem..

[3]  Christian J. Michel,et al.  Circular code motifs near the ribosome decoding center , 2015, Comput. Biol. Chem..

[4]  Michael Hiller,et al.  Increased alignment sensitivity improves the usage of genome alignments for comparative gene annotation , 2017, Nucleic acids research.

[5]  Lutz Strüngmann,et al.  Strong Comma-Free Codes in Genetic Information , 2017, Bulletin of mathematical biology.

[6]  Gang Wu,et al.  SGDB: a database of synthetic genes re-designed for optimizing protein over-expression , 2006, Nucleic Acids Res..

[7]  Christian J Michel,et al.  The maximal C(3) self-complementary trinucleotide circular code X in genes of bacteria, eukaryotes, plasmids and viruses. , 2015, Journal of theoretical biology.

[8]  K. Hughes,et al.  Case for the genetic code as a triplet of triplets , 2017, Proceedings of the National Academy of Sciences.

[9]  Hervé Seligmann,et al.  Evolution of Nucleotide Punctuation Marks: From Structural to Linear Signals , 2017, Front. Genet..

[10]  Christian J Michel,et al.  A genetic scale of reading frame coding. , 2014, Journal of theoretical biology.

[11]  Lutz Strüngmann,et al.  Mathematical fundamentals for the noise immunity of the genetic code , 2017, Biosyst..

[12]  Christian J. Michel,et al.  Circular code motifs in the ribosome decoding center , 2014, Comput. Biol. Chem..

[13]  M. Nei,et al.  Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. , 1986, Molecular biology and evolution.

[14]  C. J. Michel WITHDRAWN: The maximal C3 self-complementary trinucleotide circular code X in genes of bacteria, archaea, eukaryotes, plasmids and viruses. , 2017, Journal of theoretical biology.

[15]  E. Grayhack,et al.  Synonymous Codons: Choose Wisely for Expression. , 2017, Trends in genetics : TIG.

[16]  Christian J. Michel,et al.  Unitary circular code motifs in genomes of eukaryotes , 2017, Biosyst..

[17]  Hervé Seligmann,et al.  Error compensation of tRNA misacylation by codon-anticodon mismatch prevents translational amino acid misinsertion , 2011, Comput. Biol. Chem..

[18]  Christian J. Michel,et al.  A 2006 review of circular codes in genes , 2008, Comput. Math. Appl..

[19]  Lutz Strüngmann,et al.  n-Nucleotide circular codes in graph theory , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[20]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[21]  Christian J. Michel,et al.  Circular code motifs in transfer and 16S ribosomal RNAs: A possible translation code in genes , 2012, Comput. Biol. Chem..

[22]  J. Thompson,et al.  Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae , 2017, Life.

[23]  Luis A Diambra,et al.  Differential bicodon usage in lowly and highly abundant proteins , 2017, PeerJ.

[24]  Stanley Fields,et al.  Adjacent Codons Act in Concert to Modulate Translation Efficiency in Yeast , 2016, Cell.

[25]  Hervé Seligmann,et al.  Genetic Code Optimization for Cotranslational Protein Folding: Codon Directional Asymmetry Correlates with Antiparallel Betasheets, tRNA Synthetase Classes , 2017, Computational and structural biotechnology journal.

[26]  Hervé Seligmann,et al.  The ambush hypothesis: hidden stop codons prevent off-frame gene reading. , 2004, DNA and cell biology.