Portable and Error-Free DNA-Based Data Storage

DNA-based data storage is an emerging nonvolatile memory technology of potentially unprecedented density, durability, and replication efficiency. The basic system implementation steps include synthesizing DNA strings that contain user information and subsequently retrieving them via high-throughput sequencing technologies. Existing architectures enable reading and writing but do not offer random-access and error-free data recovery from low-cost, portable devices, which is crucial for making the storage technology competitive with classical recorders. Here we show for the first time that a portable, random-access platform may be implemented in practice using nanopore sequencers. The novelty of our approach is to design an integrated processing pipeline that encodes data to avoid costly synthesis and sequencing errors, enables random access through addressing, and leverages efficient portable sequencing via new iterative alignment and deletion error-correcting codes. Our work represents the only known random access DNA-based data storage system that uses error-prone nanopore sequencers, while still producing error-free readouts with the highest reported information rate/density. As such, it represents a crucial step towards practical employment of DNA molecules as storage media.

[1]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[2]  Luis Ceze,et al.  A DNA-Based Archival Storage System , 2016, ASPLOS.

[3]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[4]  Xiao-Ming Chen,et al.  Forward Error Correction for DNA Data Storage , 2016, ICCS.

[5]  M. Kitsuregawa,et al.  The History of Storage Systems , 2012, Proceedings of the IEEE.

[6]  T Laver,et al.  Assessing the performance of the Oxford Nanopore Technologies MinION , 2015, Biomolecular detection and quantification.

[7]  Olgica Milenkovic,et al.  Coding in 2D: Using Intentional Dispersity to Enhance the Information Capacity of Sequence-Coded Polymer Barcodes. , 2016, Angewandte Chemie.

[8]  Simon Josefsson,et al.  The Base16, Base32, and Base64 Data Encodings , 2003, RFC.

[9]  Han Mao Kiah,et al.  Weakly mutually uncorrelated codes , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[10]  Jian Ma,et al.  A Rewritable, Random-Access DNA-Based Storage System , 2015, Scientific Reports.

[11]  Ewan Birney,et al.  Towards practical, high-capacity, low-maintenance information storage in synthesized DNA , 2013, Nature.

[12]  Jim Gray,et al.  Empirical Measurements of Disk Failure Rates and Error Rates , 2007, ArXiv.

[13]  Jian Ma,et al.  PSAR-Align: improving multiple sequence alignment using probabilistic sampling , 2014, Bioinform..

[14]  Robert N Grass,et al.  Robust chemical preservation of digital information on DNA in silica with error-correcting codes. , 2015, Angewandte Chemie.

[15]  L J Steinbock,et al.  Probing the size of proteins with glass nanopores. , 2014, Nanoscale.

[16]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1991, CACM.

[17]  Reza M Zadegan,et al.  Nucleic acid memory. , 2016, Nature materials.

[18]  G. Church,et al.  Next-Generation Digital Information Storage in DNA , 2012, Science.

[19]  Yaniv Erlich,et al.  Capacity-approaching DNA storage , 2016 .