Mutually Uncorrelated Primers for DNA-Based Data Storage

We introduce the notion of weakly mutually uncorrelated (WMU) sequences, motivated by applications in DNA-based data storage systems and synchronization between communication devices. WMU sequences are characterized by the property that no sufficiently long suffix of one sequence is the prefix of the same or another sequence. WMU sequences used for primer design in DNA-based data storage systems are also required to be at large mutual Hamming distance from each other, have balanced compositions of symbols, and avoid primer-dimer byproducts. We derive bounds on the size of WMU and various constrained WMU codes and present a number of constructions for balanced, error-correcting, primer-dimer free WMU codes using Dyck paths, prefix-synchronized, and cyclic codes.

[1]  Jian Ma,et al.  DNA-Based Storage: Trends and Methods , 2015, IEEE Transactions on Molecular, Biological and Multi-Scale Communications.

[2]  Olgica Milenkovic,et al.  Portable and Error-Free DNA-Based Data Storage , 2016, Scientific Reports.

[3]  Navin Kashyap,et al.  On the Design of Codes for DNA Computing , 2005, WCC.

[4]  Yeow Meng Chee,et al.  Cross-Bifix-Free Codes Within a Constant Factor of Optimality , 2013, IEEE Transactions on Information Theory.

[5]  J. Massey,et al.  Optimum Frame Synchronization , 1972, IEEE Trans. Commun..

[6]  Han Mao Kiah,et al.  Codes for DNA sequence profiles , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[7]  László Györfi,et al.  Constructions of binary constant-weight cyclic codes and cyclically permutable codes , 1992, IEEE Trans. Inf. Theory.

[8]  M. Frank-Kamenetskii,et al.  Base-stacking and base-pairing contributions into thermal stability of the DNA double helix , 2006, Nucleic acids research.

[9]  E. Gilbert A comparison of signalling alphabets , 1952 .

[10]  Simon R. Blackburn Non-Overlapping Codes , 2015, IEEE Transactions on Information Theory.

[11]  Stefano Bilotta,et al.  A New Approach to Cross-Bifix-Free Sets , 2011, IEEE Transactions on Information Theory.

[12]  Jian Ma,et al.  A Rewritable, Random-Access DNA-Based Storage System , 2015, Scientific Reports.

[13]  Edgar N. Gilbert,et al.  Synchronization of binary messages , 1960, IRE Trans. Inf. Theory.

[14]  Ewan Birney,et al.  Towards practical, high-capacity, low-maintenance information storage in synthesized DNA , 2013, Nature.

[15]  Leo J. Guibas,et al.  Maximal Prefix-Synchronized Codes , 1978 .

[16]  Dragana Bajic,et al.  Distributed sequences and search process , 2004, 2004 IEEE International Conference on Communications (IEEE Cat. No.04CH37577).

[17]  G. Church,et al.  Next-Generation Digital Information Storage in DNA , 2012, Science.

[18]  Eitan Yaakobi,et al.  Codes in the damerau distance for DNA storage , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[19]  de Ng Dick Bruijn,et al.  THE AVERAGE HEIGHT OF PLANTED PLANE TREES , 1972 .

[20]  Peter Tolstrup Nielsen,et al.  On the expected duration of a search for a fixed pattern in random data , 1973 .

[21]  Adriaan J. de Lind van Wijngaarden,et al.  Frame synchronization using distributed sequences , 2000, IEEE Trans. Commun..

[22]  Navin Kashyap,et al.  DNA codes that avoid secondary structures , 2005, Proceedings. International Symposium on Information Theory, 2005. ISIT 2005..

[23]  Schouhamer Immink,et al.  Codes for mass data storage systems , 2004 .

[24]  J. Butler,et al.  AutoDimer: a screening tool for primer-dimer and hairpin structures. , 2004, BioTechniques.

[25]  Eitan Yaakobi,et al.  Mutually uncorrelated codes for DNA storage , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[26]  Han Mao Kiah,et al.  Asymmetric Lee Distance Codes for DNA-Based Storage , 2017, IEEE Trans. Inf. Theory.