Towards Practical and Robust DNA-based Data Archiving by Codec System Named ‘Yin-Yang’

Motivation DNA has been reported as a promising medium of data storage for its remarkable durability and space-efficient storage capacity. Here, we propose a robust DNA-based data storage method based on a new codec algorithm, namely ‘Yin-Yang’. Results Using this strategy, we successfully stored different formats of files in one synthetic DNA oligonucleotide pool. Compared to most DNA-based data storage coding schemes presented to date, this codec system can efficiently achieve a variety of user goals (e.g. reduce homopolymer length to 3 or 4 at most, maintain balanced GC content between 40% and 60% and simple secondary structure with the Gibbs free energy above −30 kcal/mol). We tested this codec by an end-to-end experiment including encoding, DNA synthesis, sequencing and decoding. We demonstrate successful retrieval of 2.02 Megabits /3 files using this method. The original information was fully retrieved after sequencing and decoding. Compared to the previously reported methods, our strategy exhibits great potential at achieving high storing capacity per nucleotide (230 PB/gram) and high fidelity of data recovery.

[1]  J. Shendure,et al.  DNA sequencing at 40: past, present and future , 2017, Nature.

[2]  Ulrich F. Keyser,et al.  Secure data storage on DNA hard drives , 2019, bioRxiv.

[3]  Sriram Kosuri,et al.  Scalable gene synthesis by selective amplification of DNA pools from high-fidelity microchips , 2010, Nature Biotechnology.

[4]  G. Church,et al.  Accurate multiplex gene synthesis from programmable DNA microchips , 2004, Nature.

[5]  Alan Bensky,et al.  Technologies and applications , 2019, Short-range Wireless Communication.

[6]  G. Church,et al.  Next-Generation Digital Information Storage in DNA , 2012, Science.

[7]  Jan Kieleczawa,et al.  Fundamentals of sequencing of difficult templates--an overview. , 2006, Journal of biomolecular techniques : JBT.

[8]  Robert N Grass,et al.  Robust chemical preservation of digital information on DNA in silica with error-correcting codes. , 2015, Angewandte Chemie.

[9]  Wook Park,et al.  High information capacity DNA-based data storage with augmented encoding characters using degenerate bases , 2019, Scientific Reports.

[10]  S. Kowalczykowski,et al.  Independent and Stochastic Action of DNA Polymerases in the Replisome , 2017, Cell.

[11]  Charlotte L. Oskam,et al.  The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils , 2012, Proceedings of the Royal Society B: Biological Sciences.

[12]  Ewan Birney,et al.  Towards practical, high-capacity, low-maintenance information storage in synthesized DNA , 2013, Nature.

[13]  Yaniv Erlich,et al.  DNA Fountain enables a robust and efficient storage architecture , 2016, Science.

[14]  Jerzy K. Kulski,et al.  Next-Generation Sequencing — An Overview of the History, Tools, and “Omic” Applications , 2016 .

[15]  Alexander Simpson,et al.  Driving the scalability of DNA-based information storage systems , 2019, bioRxiv.

[16]  Naveen Goela,et al.  Terminator-free template-independent enzymatic DNA synthesis for digital information storage , 2019, Nature Communications.

[17]  G. Church,et al.  Large-scale de novo DNA synthesis: technologies and applications , 2014, Nature Methods.

[18]  Leon Anavy,et al.  Data storage in DNA with fewer synthesis cycles using composite DNA letters , 2019, Nature Biotechnology.

[19]  Fei Guo,et al.  Carbon-based archiving: current progress and future prospects of DNA-based data storage , 2019, GigaScience.

[20]  Alexandros G. Dimakis,et al.  Repairable Fountain Codes , 2014, IEEE J. Sel. Areas Commun..

[21]  Christopher Rose,et al.  Encoding Information in Synthetic Metabolomes , 2019 .

[22]  Jian Ma,et al.  A Rewritable, Random-Access DNA-Based Storage System , 2015, Scientific Reports.

[23]  Milan Mrksich,et al.  Storage of Information Using Small Organic Molecules , 2019, ACS central science.

[24]  Cyrus Rashtchian,et al.  Random access in large-scale DNA data storage , 2018, Nature Biotechnology.

[25]  Reza M Zadegan,et al.  Nucleic acid memory. , 2016, Nature materials.

[26]  Yaniv Erlich,et al.  A DNA-of-things storage architecture to create materials with embedded memory , 2019, Nature Biotechnology.