Modeling Global and local Codon Bias with Deep Language Models

Codon bias, the usage patterns of synonymous codons for encoding a protein sequence as nucleotides, is a biological phenomenon that is not fully understood. Several methods exist to represent the codon bias of an organism: codon adaptation index (CAI) [1], individual codon usage (ICU), hidden stop codons (HSC) [2] and codon context (CC) [3]. These methods are often employed in the optimization of heterologous gene expression to increase the accuracy and rate of translation. They, however, have many shortcomings as they dont take into account the local and global context of a gene. We present a method for modeling global and local codon bias through deep language models that is more robust than current methods by providing more contextual information and long-range dependencies.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[3]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[4]  N. Blüthgen,et al.  Molecular Systems Biology 9; Article number 675; doi:10.1038/msb.2013.32 Citation: Molecular Systems Biology 9:675 , 2022 .

[5]  Catherine Putonti,et al.  CBDB: The codon bias database , 2012, BMC Bioinformatics.

[6]  Measurements of translation initiation from all 64 codons in E. coli , 2016 .

[7]  T. Ikemura Codon usage and tRNA content in unicellular and multicellular organisms. , 1985, Molecular biology and evolution.

[8]  P. Spencer,et al.  Genetic code redundancy and its influence on the encoded polypeptides , 2012, Computational and structural biotechnology journal.

[9]  J. R. Coleman,et al.  Virus Attenuation by Genome-Scale Changes in Codon Pair Bias , 2008, Science.

[10]  P. Sharp,et al.  The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. , 1987, Nucleic acids research.

[11]  Sriram Kosuri,et al.  Causes and Effects of N-Terminal Codon Bias in Bacterial Genes , 2013, Science.

[12]  S. Osawa,et al.  Levels of tRNAs in bacterial cells as affected by amino acid usage in proteins. , 1991, Nucleic acids research.

[13]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[14]  V. Gladyshev,et al.  Dual functions of codons in the genetic code , 2010, Critical reviews in biochemistry and molecular biology.

[15]  Judith Frydman,et al.  Evolutionary conservation of codon optimality reveals hidden signatures of co-translational folding , 2012, Nature Structural &Molecular Biology.

[16]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[17]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[18]  Eva Maria Novoa,et al.  Speeding with control: codon usage, tRNAs, and ribosomes. , 2012, Trends in genetics : TIG.

[19]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[20]  Andre R. O. Cavalcanti,et al.  Factors influencing codon usage bias in genomes , 2008 .

[21]  D. Hoover,et al.  DNAWorks: an automated method for designing oligonucleotides for PCR-based gene synthesis. , 2002, Nucleic acids research.

[22]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[23]  H. Hellinga,et al.  Multifactorial determinants of protein expression in prokaryotic open reading frames. , 2010, Journal of molecular biology.

[24]  Ruth Hershberg,et al.  Selection on codon bias. , 2008, Annual review of genetics.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Kai Zeng,et al.  Forces that influence the evolution of codon bias , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[27]  Alan Villalobos,et al.  Design Parameters to Control Synthetic Gene Expression in Escherichia coli , 2009, PloS one.

[28]  Alison K. Hottes,et al.  Codon usage between genomes is constrained by genome-wide mutational processes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[29]  J. Plotkin,et al.  Synonymous but not the same: the causes and consequences of codon bias , 2011, Nature Reviews Genetics.

[30]  Hervé Seligmann,et al.  The ambush hypothesis: hidden stop codons prevent off-frame gene reading. , 2004, DNA and cell biology.