We are given a collection of m random subsequences (traces) of a string t of length n where each trace is obtained by deleting each bit in the string with probability q. Our goal is to exactly reconstruct the string t from these observed traces. We initiate here a study of deletion rates for which we can successfully reconstruct the original string using a small number of samples. We investigate a simple reconstruction algorithm called Bitwise Majority Alignment that uses majority voting (with suitable shifts) to determine each bit of the original string. We show that for random strings t, we can reconstruct the original string (w.h.p.) for q = O(1/ log n) using only O(log n) samples. For arbitrary strings t, we show that a simple modification of Bitwise Majority Alignment reconstructs a string that has identical structure to the original string (w.h.p.) for q = O(1/n1/2+ε) using O(1) samples. In this case, using O(n log n) samples, we can reconstruct the original string exactly. Our setting can be viewed as the study of an idealized biological evolutionary process where the only possible mutations are random deletions. Our goal is to understand at what mutation rates, a small number of observed samples can be correctly aligned to reconstruct the parent string.In the process of establishing these results, we show that Bitwise Majority Alignment has an interesting self-correcting property whereby local distortions in the traces do not generate errors in the reconstruction and eventually get corrected.
[1]
Vladimir I. Levenshtein,et al.
Binary codes capable of correcting deletions, insertions, and reversals
,
1965
.
[2]
V. Levenshtein.
On perfect codes in deletion and insertion metric
,
1992
.
[3]
Dan Gusfield,et al.
Algorithms on strings
,
1997
.
[4]
Dan Gusfield,et al.
Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology
,
1997
.
[5]
Vladimir I. Levenshtein,et al.
Efficient reconstruction of sequences
,
2001,
IEEE Trans. Inf. Theory.
[6]
M. Luby,et al.
Asymptotically Good Codes Correcting Insertions, Deletions, and Transpositions
,
1999
.
[7]
Noga Alon,et al.
Linear time erasure codes with nearly optimal recovery
,
1995,
Proceedings of IEEE 36th Annual Foundations of Computer Science.