Improved Algorithms for Population Recovery from the Deletion Channel

The population recovery problem asks one to recover an unknown distribution over $n$-bit strings given access to independent noisy samples of strings drawn from the distribution. Recently, Ban et al. [BCF+19] studied the problem where the noise is induced through the deletion channel. This problem generalizes the famous trace reconstruction problem, where one wishes to learn a single string under the deletion channel. Ban et al. showed how to learn $\ell$-sparse distributions over strings using $\exp\big(n^{1/2} \cdot (\log n)^{O(\ell)}\big)$ samples. In this work, we learn the distribution using only $\exp\big(\tilde{O}(n^{1/3}) \cdot \ell^2\big)$ samples, by developing a higher-moment analog of the algorithms of [DOS17, NP17], which solve trace reconstruction in $\exp\big(\tilde{O}(n^{1/3})\big)$ samples. We also give the first algorithm with a runtime subexponential in $n$, solving population recovery in $\exp\big(\tilde{O}(n^{1/3}) \cdot \ell^3\big)$ samples and time. Notably, our dependence on $n$ nearly matches the upper bound of [DOS17, NP17] when $\ell = O(1)$, and we reduce the dependence on $\ell$ from doubly to singly exponential. Therefore, we are able to learn large mixtures of strings: while Ban et al.'s algorithm can only learn a mixture of $O(\log n/\log \log n)$ strings with a subexponential number of samples, we are able to learn a mixture of $n^{o(1)}$ strings in $\exp\big(n^{1/3 + o(1)}\big)$ samples and time.

[1]  Cyrus Rashtchian,et al.  Reconstructing Trees from Traces , 2019, COLT.

[2]  Rocco A. Servedio,et al.  Polynomial-time trace reconstruction in the smoothed complexity model , 2020, ArXiv.

[3]  Rina Panigrahy,et al.  Trace reconstruction with constant deletion probability and related results , 2008, SODA '08.

[4]  Sampath Kannan,et al.  Reconstructing strings from random traces , 2004, SODA '04.

[5]  Krishnamurthy Viswanathan,et al.  Improved string reconstruction over insertion-deletion channels , 2008, SODA '08.

[6]  Ryan O'Donnell,et al.  Sharp bounds for population recovery , 2017, ArXiv.

[7]  T. Tao Topics in Random Matrix Theory , 2012 .

[8]  Rocco A. Servedio,et al.  Efficient average-case population recovery in the presence of insertions and deletions , 2019, APPROX-RANDOM.

[9]  Yuval Peres,et al.  Subpolynomial trace reconstruction for random strings and arbitrary deletion probability , 2018, COLT.

[10]  Olgica Milenkovic,et al.  Coded Trace Reconstruction , 2019, 2019 IEEE Information Theory Workshop (ITW).

[11]  Yuval Peres,et al.  Trace reconstruction with varying deletion probabilities , 2018, ANALCO.

[12]  Rocco A. Servedio,et al.  Beyond Trace Reconstruction: Population Recovery from the Deletion Channel , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[13]  Shyam Narayanan,et al.  Circular Trace Reconstruction , 2020, ArXiv.

[14]  Avi Wigderson,et al.  Restriction access , 2012, ITCS '12.

[15]  Michael O. Rabin,et al.  Probabilistic Algorithms in Finite Fields , 1980, SIAM J. Comput..

[16]  Ryan O'Donnell,et al.  Optimal mean-based algorithms for trace reconstruction , 2017, STOC.

[17]  Russell Impagliazzo,et al.  Finding Heavy Hitters from Lossy or Noisy Data , 2013, APPROX-RANDOM.

[18]  Ananda Theertha Suresh,et al.  Sample complexity of population recovery , 2017, COLT.

[19]  Avi Wigderson,et al.  Population recovery and partial identification , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[20]  Vladimir I. Levenshtein,et al.  Efficient Reconstruction of Sequences from Their Subsequences or Supersequences , 2001, J. Comb. Theory A.

[21]  Yuval Peres,et al.  Average-Case Reconstruction for the Deletion Channel: Subpolynomially Many Traces Suffice , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[22]  Zachary Chase New Upper Bounds for Trace Reconstruction , 2020, ArXiv.

[23]  Russell Lyons,et al.  Lower bounds for trace reconstruction , 2018, ArXiv.

[24]  Vladimir I. Levenshtein,et al.  Efficient reconstruction of sequences , 2001, IEEE Trans. Inf. Theory.

[25]  László Lovász,et al.  Factoring polynomials with rational coefficients , 1982 .

[26]  Sampath Kannan,et al.  More on reconstructing strings from random traces: insertions and deletions , 2005, Proceedings. International Symposium on Information Theory, 2005. ISIT 2005..

[27]  Shachar Lovett,et al.  Improved Noisy Population Recovery, and Reverse Bonami-Beckner Inequality for Sparse Functions , 2014, Electron. Colloquium Comput. Complex..

[28]  Bruce Spang,et al.  Coded trace reconstruction in a constant number of traces , 2020, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS).

[29]  Michael E. Saks,et al.  Noisy Population Recovery in Polynomial Time , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[30]  Akshay Krishnamurthy,et al.  Trace Reconstruction: Generalized and Parameterized , 2019, ESA.

[31]  Michael E. Saks,et al.  A Polynomial Time Algorithm for Lossy Population Recovery , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[32]  Sofya Vorotnikova,et al.  Trace Reconstruction Revisited , 2014, ESA.