Exact sequence reconstruction for insertion-correcting codes

We study the problem of perfectly reconstructing sequences from traces. The sequences are codewords from a deletion/insertion-correcting code and the traces are the result of corruption by a fixed number of symbol insertions (larger than the minimum edit distance of the code.) This is the general version of a problem tackled by Levenshtein for uncoded sequences. We introduce an exact formula for the maximum number of common supersequences shared by sequences at a certain edit distance, yielding a tight upper bound on the number of distinct traces necessary to guarantee exact reconstruction. We apply our results to the famous single deletion/insertion-correcting Varshamov-Tenengolts (VT) codes and show that a significant number of VT codeword pairs achieve the worst-case number of outputs needed for exact reconstruction.

[1]  Ankur A. Kulkarni,et al.  Nonasymptotic Upper Bounds for Deletion Correcting Codes , 2012, IEEE Transactions on Information Theory.

[2]  Ilan Shomorony,et al.  Do read errors matter for genome assembly? , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[3]  Vladimir I. Levenshtein,et al.  Efficient Reconstruction of Sequences from Their Subsequences or Supersequences , 2001, J. Comb. Theory A.

[4]  Vladimir I. Levenshtein,et al.  Efficient reconstruction of sequences , 2001, IEEE Trans. Inf. Theory.

[5]  Sampath Kannan,et al.  Reconstructing strings from random traces , 2004, SODA '04.

[6]  Alon Orlitsky,et al.  String Reconstruction from Substring Compositions , 2014, SIAM J. Discret. Math..

[7]  Rina Panigrahy,et al.  Trace reconstruction with constant deletion probability and related results , 2008, SODA '08.

[8]  Frederic Sala,et al.  Exact Reconstruction From Insertions in Synchronization Codes , 2016, IEEE Transactions on Information Theory.

[9]  Frederic Sala,et al.  Three novel combinatorial theorems for the insertion/deletion channel , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).