Two Proteins for the Price of One: The Design of Maximally Compressed Coding Sequences

The emerging field of synthetic biology moves beyond conventional genetic manipulation to construct novel life forms which do not originate in nature. We explore the problem of designing the provably shortest genomic sequence to encode a given set of genes by exploiting alternate reading frames. We present an algorithm for designing the shortest DNA sequence simultaneously encoding two given amino acid sequences. We show that the coding sequence of naturally occurring pairs of overlapping genes approach maximum compression. We also investigate the impact of alternate coding matrices on overlapping sequence design. Finally, we discuss an interesting application for overlapping gene design, namely the interleaving of an antibiotic resistance gene into a target gene inserted into a virus or plasmid for amplification.

[1]  D. Krakauer,et al.  STABILITY AND EVOLUTION OF OVERLAPPING GENES , 2000, Evolution; international journal of organic evolution.

[2]  Sarah J Kodumal,et al.  Total synthesis of long DNA sequences: synthesis of a contiguous 32-kb polyketide synthase gene cluster. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Serge Massar,et al.  Optimality of the genetic code with respect to protein stability and amino-acid frequencies , 2001, Genome Biology.

[4]  M. Tomita,et al.  Comparative study of overlapping genes in the genomes of Mycoplasma genitalium and Mycoplasma pneumoniae. , 1999, Nucleic acids research.

[5]  Michal Galdzicki,et al.  Mammalian overlapping genes: the comparative perspective. , 2004, Genome research.

[6]  David C. Krakauer,et al.  Evolutionary Principles of Genomic Compression , 2002 .

[7]  Steven Skiena,et al.  Natural Selection and Algorithmic Design of mRNA , 2003, J. Comput. Biol..

[8]  M. Karplus,et al.  Enhanced sampling in molecular dynamics: use of the time-dependent Hartree approximation for a simulation of carbon monoxide diffusion through myoglobin , 1990 .

[9]  Masaru Tomita,et al.  On dynamics of overlapping genes in bacterial genomes. , 2003, Gene.

[10]  Eugene V Koonin,et al.  Purifying and directional selection in overlapping prokaryotic genes. , 2002, Trends in genetics : TIG.

[11]  T. Miyata,et al.  Evolution of overlapping genes , 1978, Nature.

[12]  Steven Skiena,et al.  Designing better phages , 2001, ISMB.

[13]  Alan J. Cann,et al.  Principles of molecular virology , 1993 .

[14]  P. Keese,et al.  Origins of genes: "big bang" or continuous creation? , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Philip Ball,et al.  Synthetic biology: Starting from scratch , 2004, Nature.

[16]  C. Yanofsky,et al.  Translational coupling during expression of the tryptophan operon of Escherichia coli. , 1980, Genetics.

[17]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[18]  Samuel Karlin,et al.  Associations between human disease genes and overlapping gene groups and multiple amino acid runs , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  J Craig Venter,et al.  Generating a synthetic genome by whole genome assembly: φX174 bacteriophage from synthetic oligonucleotides , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Masaru Tomita,et al.  Evolution of Overlapping Genes: Comparative Genomics of Mycoplasma genitalium and Mycoplasma pneumoniae , 1998 .

[21]  A. Paul,et al.  Chemical Synthesis of Poliovirus cDNA: Generation of Infectious Virus in the Absence of Natural Template , 2002, Science.

[22]  M. Levitt A simplified representation of protein conformations for rapid simulation of protein folding. , 1976, Journal of molecular biology.

[23]  Viktor Hornak,et al.  Generation of accurate protein loop conformations through low‐barrier molecular dynamics , 2003, Proteins.

[24]  G. Church,et al.  Accurate multiplex gene synthesis from programmable DNA microchips , 2004, Nature.