Reconstructing Mixtures of Coded Strings from Prefix and Suffix Compositions

The problem of string reconstruction from substring information has found many applications due to its relevance in DNA- and polymer-based data storage. One practically important and challenging paradigm requires reconstructing mixtures of strings based on the union of compositions of their prefixes and suffixes, generated by mass spectrometry readouts. We describe new coding methods that allow for unique joint reconstruction of subsets of strings selected from a code and provide matching upper and lower bounds on the asymptotic rate of the underlying codebooks. Under certain mild constraints on the problem parameters, one can show that the largest possible rate of a codebook that allows for all subcollections of $\leq h$ codestrings to be uniquely reconstructable from the prefix-suffix information equals $1/h$.

[1]  Olgica Milenkovic,et al.  Portable and Error-Free DNA-Based Data Storage , 2016, Scientific Reports.

[2]  Ewan Birney,et al.  Towards practical, high-capacity, low-maintenance information storage in synthesized DNA , 2013, Nature.

[3]  Alon Orlitsky,et al.  String Reconstruction from Substring Compositions , 2014, SIAM J. Discret. Math..

[4]  Gérard D. Cohen,et al.  Binary B2-Sequences : A New Upper Bound , 2001, J. Comb. Theory, Ser. A.

[5]  Jean-François Lutz,et al.  Mass spectrometry sequencing of long digital polymers facilitated by programmed inter-byte fragmentation , 2017, Nature Communications.

[6]  D. Gigmes,et al.  Precise alkoxyamine-design enables automated tandem mass spectrometry sequencing of digital poly(phosphodiester)s. , 2020, Angewandte Chemie.

[7]  Robert N Grass,et al.  Robust chemical preservation of digital information on DNA in silica with error-correcting codes. , 2015, Angewandte Chemie.

[8]  Laurence Rackham,et al.  Bh Sequences in Higher Dimensions , 2010, Electron. J. Comb..

[9]  Jian Ma,et al.  A Rewritable, Random-Access DNA-Based Storage System , 2015, Scientific Reports.

[10]  B. Lindström Determination of two vectors from the sum , 1969 .

[11]  Olgica Milenkovic,et al.  Reconstruction and Error-Correction Codes for Polymer-Based Data Storage , 2019, 2019 IEEE Information Theory Workshop (ITW).

[12]  Olgica Milenkovic,et al.  Mass Error-Correction Codes for Polymer-Based Data Storage , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).