Enzymatic DNA synthesis for digital information storage

DNA is an emerging storage medium for digital data but its adoption is hampered by limitations of phosphoramidite chemistry, which was developed for single-base accuracy required for biological functionality. Here, we establish a de novo enzymatic DNA synthesis strategy designed from the bottom-up for information storage. We harness a template-independent DNA polymerase for controlled synthesis of sequences with user-defined information content. We demonstrate retrieval of 144-bits, including addressing, from perfectly synthesized DNA strands using batch-processed Illumina and real-time Oxford Nanopore sequencing. We then develop a codec for data retrieval from populations of diverse but imperfectly synthesized DNA strands, each with a ~30% error tolerance. With this codec, we experimentally validate a kilobyte-scale design which stores 1 bit per nucleotide. Simulations of the codec support reliable and robust storage of information for large-scale systems. This work paves the way for alternative synthesis and sequencing strategies to advance information storage in DNA.

[1]  D A Lashkari,et al.  An automated multiplex oligonucleotide synthesizer: development of high-throughput, low-cost DNA synthesis. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Yaniv Erlich,et al.  DNA Fountain enables a robust and efficient storage architecture , 2016, Science.

[3]  Ewan Birney,et al.  Towards practical, high-capacity, low-maintenance information storage in synthesized DNA , 2013, Nature.

[4]  Reza M Zadegan,et al.  Nucleic acid memory. , 2016, Nature materials.

[5]  Edward A. Motea,et al.  Terminal deoxynucleotidyl transferase: the story of a misguided DNA polymerase. , 2010, Biochimica et biophysica acta.

[6]  A. D. Kaiser,et al.  Enzymatic end-to end joining of DNA molecules. , 1973, Journal of molecular biology.

[7]  M. Ronaghi,et al.  A Sequencing Method Based on Real-Time Pyrophosphate , 1998, Science.

[8]  F. Bollum,et al.  Thermal conversion of nonpriming deoxyribonucleic acid to primer. , 1959, The Journal of biological chemistry.

[9]  M. Mitzenmacher A survey of results for deletion channels and related synchronization channels , 2009 .

[10]  Olgica Milenkovic,et al.  Portable and Error-Free DNA-Based Data Storage , 2016, Scientific Reports.

[11]  R. Symons,et al.  Biochemical method for inserting new genetic information into DNA of Simian Virus 40: circular SV40 DNA molecules containing lambda phage genes and the galactose operon of Escherichia coli. , 1972, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Xiao-Ming Chen,et al.  Forward Error Correction for DNA Data Storage , 2016, ICCS.

[13]  David J. C. MacKay,et al.  Reliable communication over channels with insertions, deletions, and substitutions , 2001, IEEE Trans. Inf. Theory.

[14]  George M. Church,et al.  Molecular recordings by directed CRISPR spacer acquisition , 2016, Science.

[15]  Bollum Fj,et al.  Thermal conversion of nonpriming deoxyribonucleic acid to primer. , 1959, The Journal of biological chemistry.

[16]  L. Chang,et al.  Molecular biology of terminal transferase. , 1986, CRC critical reviews in biochemistry.

[17]  Joakim Lundeberg,et al.  Competitive enzymatic reaction to control allele-specific extensions , 2005, Nucleic acids research.

[18]  R. Wu,et al.  An improved procedure for utilizing terminal transferase to add homopolymers to the 3' termini of DNA. , 1981, Nucleic acids research.

[19]  S. Koren,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, bioRxiv.

[20]  M. A. Jensen,et al.  Template-Independent Enzymatic Oligonucleotide Synthesis (TiEOS): Its History, Prospects, and Challenges. , 2018, Biochemistry.

[21]  G. Church,et al.  Next-Generation Digital Information Storage in DNA , 2012, Science.

[22]  Michael Mitzenmacher,et al.  A Survey of Results for Deletion Channels and Related Synchronization Channels , 2008, SWAT.

[23]  Jian Ma,et al.  A Rewritable, Random-Access DNA-Based Storage System , 2015, Scientific Reports.

[24]  Keith E. J. Tyo,et al.  Measuring Cation Dependent DNA Polymerase Fidelity Landscapes by Deep Sequencing , 2012, PloS one.

[25]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[26]  Matthew Loose,et al.  Real-time selective sequencing using nanopore technology , 2016, Nature Methods.

[27]  Robert N Grass,et al.  Robust chemical preservation of digital information on DNA in silica with error-correcting codes. , 2015, Angewandte Chemie.

[28]  F. Bollum,et al.  Oligodeoxyribonucleotide-primed reactions catalyzed by calf thymus polymerase. , 1962, The Journal of biological chemistry.

[29]  F. Bollum,et al.  Deoxynucleotide-polymerizing enzymes of calf thymus gland. II. Properties of the terminal deoxynucleotidyltransferase. , 1967, The Journal of biological chemistry.

[30]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[31]  P. Granell,et al.  Automated and inexpensive method to manufacture solid- state nanopores and micropores in robust silicon wafers , 2016 .

[32]  Zhiping Weng,et al.  Optical recognition of converted DNA nucleotides for single-molecule DNA sequencing using nanopore arrays. , 2010, Nano letters.

[33]  Edward S Boyden,et al.  Rosetta Brains: A Strategy for Molecularly-Annotated Connectomics , 2014, 1404.5103.

[34]  C Bancroft,et al.  Long-Term Storage of Information in DNA , 2001, Science.

[35]  D. McNabb,et al.  Slowing DNA translocation in a solid-state nanopore. , 2005, Nano letters.

[36]  Amit Meller,et al.  Fabrication and characterization of solid-state nanopore arrays for high-throughput DNA sequencing , 2012, Nanotechnology.

[37]  J. Odeberg,et al.  Genotyping by apyrase-mediated allele-specific extension. , 2001, Nucleic acids research.