Polynomial-time trace reconstruction in the smoothed complexity model

In the \emph{trace reconstruction problem}, an unknown source string $x \in \{0,1\}^n$ is sent through a probabilistic \emph{deletion channel} which independently deletes each bit with probability $\delta$ and concatenates the surviving bits, yielding a \emph{trace} of $x$. The problem is to reconstruct $x$ given independent traces. This problem has received much attention in recent years both in the worst-case setting where $x$ may be an arbitrary string in $\{0,1\}^n$ \cite{DOS17,NazarovPeres17,HHP18,HL18,Chase19} and in the average-case setting where $x$ is drawn uniformly at random from $\{0,1\}^n$ \cite{PeresZhai17,HPP18,HL18,Chase19}. This paper studies trace reconstruction in the \emph{smoothed analysis} setting, in which a ``worst-case'' string $x^{\worst}$ is chosen arbitrarily from $\{0,1\}^n$, and then a perturbed version $\bx$ of $x^{\worst}$ is formed by independently replacing each coordinate by a uniform random bit with probability $\sigma$. The problem is to reconstruct $\bx$ given independent traces from it. Our main result is an algorithm which, for any constant perturbation rate $0<\sigma < 1$ and any constant deletion rate $0 < \delta < 1$, uses $\poly(n)$ running time and traces and succeeds with high probability in reconstructing the string $\bx$. This stands in contrast with the worst-case version of the problem, for which $\text{exp}(O(n^{1/3}))$ is the best known time and sample complexity \cite{DOS17,NazarovPeres17}. Our approach is based on reconstructing $\bx$ from the multiset of its short subwords and is quite different from previous algorithms for either the worst-case or average-case versions of the problem. The heart of our work is a new $\poly(n)$-time procedure for reconstructing the multiset of all $O(\log n)$-length subwords of any source string $x\in \{0,1\}^n$ given access to traces of $x$.

[1]  Krishnamurthy Viswanathan,et al.  Improved string reconstruction over insertion-deletion channels , 2008, SODA '08.

[2]  Rocco A. Servedio,et al.  Efficient average-case population recovery in the presence of insertions and deletions , 2019, APPROX-RANDOM.

[3]  Ryan O'Donnell,et al.  Optimal mean-based algorithms for trace reconstruction , 2017, STOC.

[4]  Yuval Peres,et al.  Trace reconstruction with varying deletion probabilities , 2018, ANALCO.

[5]  Yuval Peres,et al.  Subpolynomial trace reconstruction for random strings and arbitrary deletion probability , 2018, COLT.

[6]  Shang-Hua Teng,et al.  Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time , 2001, STOC '01.

[7]  Vladimir I. Levenshtein,et al.  Efficient Reconstruction of Sequences from Their Subsequences or Supersequences , 2001, J. Comb. Theory A.

[8]  Akshay Krishnamurthy,et al.  Trace Reconstruction: Generalized and Parameterized , 2019, ESA.

[9]  Sofya Vorotnikova,et al.  Trace Reconstruction Revisited , 2014, ESA.

[10]  Yuval Peres,et al.  Trace reconstruction with exp(O(n1/3)) samples , 2017, STOC.

[11]  T. Sanders,et al.  Analysis of Boolean Functions , 2012, ArXiv.

[12]  Shyam Narayanan,et al.  Population Recovery from the Deletion Channel: Nearly Matching Trace Reconstruction Bounds , 2020, ArXiv.

[13]  Yuval Peres,et al.  Average-Case Reconstruction for the Deletion Channel: Subpolynomially Many Traces Suffice , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[14]  Vladimir I. Levenshtein,et al.  Efficient reconstruction of sequences , 2001, IEEE Trans. Inf. Theory.

[15]  Russell Lyons,et al.  Lower bounds for trace reconstruction , 2018, ArXiv.

[16]  Rocco A. Servedio,et al.  Beyond Trace Reconstruction: Population Recovery from the Deletion Channel , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[17]  Zachary Chase New lower bounds for trace reconstruction , 2021 .

[18]  Zachary Chase New Upper Bounds for Trace Reconstruction , 2020, ArXiv.

[19]  Rina Panigrahy,et al.  Trace reconstruction with constant deletion probability and related results , 2008, SODA '08.

[20]  Sampath Kannan,et al.  Reconstructing strings from random traces , 2004, SODA '04.

[21]  Sampath Kannan,et al.  More on reconstructing strings from random traces: insertions and deletions , 2005, Proceedings. International Symposium on Information Theory, 2005. ISIT 2005..