Limitations of Mean-Based Algorithms for Trace Reconstruction at Small Distance

Trace reconstruction considers the task of recovering an unknown string $x$ ∊ {0, l]ngiven a number of independent “traces”, i.e., subsequences of $x$ obtained by randomly and independently deleting every symbol of $x$ with some probability p. The information-theoretic limit of the number of traces needed to recover a string of length $n$ are still unknown. This limit is essentially the same as the number of traces needed to determine, given strings $x$ and $y$ and traces of one of them, which string is the source. The most studied class of algorithms for the worst-case version of the problem are “mean-based” algorithms. These are a restricted class of distinguishers that only use the mean value of each coordinate on the given samples. In this work we study limitations of mean-based algorithms on strings at small Hamming or edit distance. We show on the one hand that distinguishing strings that are nearby in Hamming distance is “easy” for such distinguishers. On the other hand, we show that distinguishing strings that are nearby in edit distance is “hard” for mean-based algorithms. Along the way we also describe a connection to the famous Prouhet-Tarry-Escott (PTE) problem, which shows a barrier to finding explicit hard-to-distinguish strings: namely such strings would imply explicit short solutions to the PTE problem, a well-known difficult problem in number theory. Our techniques rely on complex analysis arguments that involve careful trigonometric estimates, and algebraic techniques that include applications of Descartes' rule of signs for polynomials over the reals. A full version of this paper is accessible at: https://arxiv.org/abs/2011.13737

[1]  W. B. History of the Theory of Numbers , Nature.

[2]  L. Dickson History of the Theory of Numbers , 1924, Nature.

[3]  E. M. Wright,et al.  On Tarry's problem (ii) , 1935 .

[4]  Loo-Keng Hua,et al.  ON TARRY'S PROBLEM , 1938 .

[5]  E. M. Wright,et al.  Prouhet's 1851 Solution of the Tarry-Escott Problem of 1910 , 1959 .

[6]  S. Lang Complex Analysis , 1977 .

[7]  Tamás Erdélyi,et al.  LITTLEWOOD-TYPE PROBLEMS ON SUBARCS OF THE UNIT CIRCLE , 1997 .

[8]  Alex D. Scott,et al.  Reconstructing sequences , 1997, Discret. Math..

[9]  Ilia Krasikov,et al.  On a Reconstruction Problem for Sequences, , 1997, J. Comb. Theory, Ser. A.

[10]  Vladimir I. Levenshtein Efficient Reconstruction of Sequences from Their Subsequences or Supersequences , 2001, J. Comb. Theory, Ser. A.

[11]  Vladimir I. Levenshtein,et al.  Efficient reconstruction of sequences , 2001, IEEE Trans. Inf. Theory.

[12]  Sampath Kannan,et al.  Reconstructing strings from random traces , 2004, SODA '04.

[13]  Sampath Kannan,et al.  More on reconstructing strings from random traces: insertions and deletions , 2005, Proceedings. International Symposium on Information Theory, 2005. ISIT 2005..

[14]  A. Meyer,et al.  Introduction to Number Theory , 2005 .

[15]  Rina Panigrahy,et al.  Trace reconstruction with constant deletion probability and related results , 2008, SODA '08.

[16]  Krishnamurthy Viswanathan,et al.  Improved string reconstruction over insertion-deletion channels , 2008, SODA '08.

[17]  T. Erdélyi,et al.  Coppersmith–Rivlin type inequalities and the order of vanishing of polynomials at 1 , 2014, 1406.2560.

[18]  Sofya Vorotnikova,et al.  Trace Reconstruction Revisited , 2014, ESA.

[19]  Badih Ghazi,et al.  NP-Hardness of Reed-Solomon Decoding and the Prouhet-Tarry-Escott Problem , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[20]  Olgica Milenkovic,et al.  The hybrid k-deck problem: Reconstructing sequences from short and long traces , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[21]  Ryan O'Donnell,et al.  Optimal mean-based algorithms for trace reconstruction , 2017, STOC.

[22]  Yuval Peres,et al.  Average-Case Reconstruction for the Deletion Channel: Subpolynomially Many Traces Suffice , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[23]  Yuval Peres,et al.  Trace reconstruction with exp(O(n1/3)) samples , 2017, STOC.

[24]  Yuval Peres,et al.  Subpolynomial trace reconstruction for random strings and arbitrary deletion probability , 2018, COLT.

[25]  Yuval Peres,et al.  Trace reconstruction with varying deletion probabilities , 2018, ANALCO.

[26]  Russell Lyons,et al.  Lower bounds for trace reconstruction , 2018, ArXiv.

[27]  Olgica Milenkovic,et al.  Coded Trace Reconstruction , 2019, 2019 IEEE Information Theory Workshop (ITW).

[28]  Rocco A. Servedio,et al.  Beyond Trace Reconstruction: Population Recovery from the Deletion Channel , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[29]  Akshay Krishnamurthy,et al.  Trace Reconstruction: Generalized and Parameterized , 2019, ESA.

[30]  Olgica Milenkovic,et al.  Unique Reconstruction of Coded Strings From Multiset Substring Spectra , 2018, IEEE Transactions on Information Theory.

[31]  T. Erdélyi On the Multiplicity of the Zeros of Polynomials with Constrained Coefficients , 2020, Approximation Theory and Analytic Inequalities.

[32]  Bruce Spang,et al.  Coded trace reconstruction in a constant number of traces , 2020, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS).

[33]  Zachary Chase New Upper Bounds for Trace Reconstruction , 2020, ArXiv.

[34]  Shyam Narayanan,et al.  Circular Trace Reconstruction , 2021, ITCS.

[35]  Jehoshua Bruck,et al.  Trace Reconstruction with Bounded Edit Distance , 2021, 2021 IEEE International Symposium on Information Theory (ISIT).

[36]  Rocco A. Servedio,et al.  Polynomial-time trace reconstruction in the smoothed complexity model , 2020, ArXiv.

[37]  Zachary Chase New lower bounds for trace reconstruction , 2021 .

[38]  Cyrus Rashtchian,et al.  Approximate Trace Reconstruction: Algorithms , 2021, 2021 IEEE International Symposium on Information Theory (ISIT).

[39]  Mahdi Cheraghchi,et al.  Mean-Based Trace Reconstruction over Practically any Replication-Insertion Channel , 2021, 2021 IEEE International Symposium on Information Theory (ISIT).

[40]  Yuval Peres,et al.  Approximate trace reconstruction of random strings from a constant number of traces , 2021 .

[41]  Robert Krauthgamer,et al.  Approximate Trace Reconstruction via Median String (in Average-Case) , 2021, FSTTCS.

[42]  Zachary Chase Separating words and trace reconstruction , 2021, STOC.

[43]  Rocco A. Servedio,et al.  Near-Optimal Average-Case Approximate Trace Reconstruction from Few Traces , 2022, Proceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA).

[44]  M. Sudan,et al.  Limitations of Mean-Based Algorithms for Trace Reconstruction at Small Edit Distance , 2022, IEEE Transactions on Information Theory.