Construction of Bio-Constrained Code for DNA Data Storage

With extremely high density and durable preservation, DNA data storage has become one of the most cutting-edge techniques for long-term data storage. Similar to traditional storage which impose restrictions on the form of encoded data, data stored in DNA storage systems are also subject to two biochemical constraints, i.e., maximum homopolymer run limit and balanced GC content limit. Previous studies used successive process to satisfy these two constraints. As a result, the process suffers low efficiency and high complexity. In this letter, we propose a novel content-balanced run-length limited code with an efficient code construction method, which generates short DNA sequences that satisfy both constraints at one time. Besides, we develop an encoding method to map binary data into long DNA sequences for DNA data storage, which ensures both local and global stability in terms of satisfying the biochemical constraints. The proposed encoding method has high effective code rate of 1.917 bits per nucleotide and low coding complexity.

[1]  Xiao-Ming Chen,et al.  Forward Error Correction for DNA Data Storage , 2016, ICCS.

[2]  Luis Ceze,et al.  A DNA-Based Archival Storage System , 2016, ASPLOS.

[3]  Jian Ma,et al.  A Rewritable, Random-Access DNA-Based Storage System , 2015, Scientific Reports.

[4]  Olgica Milenkovic,et al.  Portable and Error-Free DNA-Based Data Storage , 2016, Scientific Reports.

[5]  C Bancroft,et al.  Long-Term Storage of Information in DNA , 2001, Science.

[6]  Schouhamer Immink,et al.  Codes for mass data storage systems , 2004 .

[7]  Chau Yuen,et al.  Codes With Run-Length and GC-Content Constraints for DNA-Based Data Storage , 2018, IEEE Communications Letters.

[8]  Cyrus Rashtchian,et al.  Random access in large-scale DNA data storage , 2018, Nature Biotechnology.

[9]  Yaniv Erlich,et al.  DNA Fountain enables a robust and efficient storage architecture , 2016, Science.

[10]  Ewan Birney,et al.  Towards practical, high-capacity, low-maintenance information storage in synthesized DNA , 2013, Nature.

[11]  Olgica Milenkovic,et al.  Portable and Error-Free DNA-Based Data Storage , 2016 .

[12]  N. Lennon,et al.  Characterizing and measuring bias in sequence data , 2013, Genome Biology.

[13]  Robert N Grass,et al.  Robust chemical preservation of digital information on DNA in silica with error-correcting codes. , 2015, Angewandte Chemie.

[14]  Kui Cai,et al.  Design of Capacity-Approaching Constrained Codes for DNA-Based Storage Systems , 2018, IEEE Communications Letters.

[15]  Kees A. Schouhamer Immink,et al.  Runlength-limited sequences , 1990, Proc. IEEE.

[16]  G. Church,et al.  Next-Generation Digital Information Storage in DNA , 2012, Science.