Circular Trace Reconstruction

Trace Reconstruction is the problem of learning an unknown string $x$ from independent traces of $x$, where traces are generated by independently deleting each bit of $x$ with some deletion probability $q$. In this paper, we initiate the study of Circular Trace Reconstruction, where the unknown string $x$ is circular and traces are now rotated by a random cyclic shift. Trace reconstruction is related to many computational biology problems studying DNA, which is a primary motivation for this problem as well, as many types of DNA are known to be circular. Our main results are as follows. First, we prove that we can reconstruct arbitrary circular strings of length $n$ using $\exp\big(\tilde{O}(n^{1/3})\big)$ traces for any constant deletion probability $q$, as long as $n$ is prime or the product of two primes. For $n$ of this form, this nearly matches the best known bound of $\exp\big(O(n^{1/3})\big)$ for standard trace reconstruction. Next, we prove that we can reconstruct random circular strings with high probability using $n^{O(1)}$ traces for any constant deletion probability $q$. Finally, we prove a lower bound of $\tilde{\Omega}(n^3)$ traces for arbitrary circular strings, which is greater than the best known lower bound of $\tilde{\Omega}(n^{3/2})$ in standard trace reconstruction.

[1]  Zachary Chase New lower bounds for trace reconstruction , 2021 .

[2]  Cyrus Rashtchian,et al.  Trace Reconstruction Problems in Computational Biology , 2020, ArXiv.

[3]  Russell Lyons,et al.  Lower bounds for trace reconstruction , 2018, ArXiv.

[4]  Yuval Peres,et al.  Subpolynomial trace reconstruction for random strings and arbitrary deletion probability , 2018, COLT.

[5]  M. Maes,et al.  On a Cyclic String-To-String Correction Problem , 1990, Inf. Process. Lett..

[6]  G. Church,et al.  Next-Generation Digital Information Storage in DNA , 2012, Science.

[7]  Vladimir I. Levenshtein,et al.  Efficient Reconstruction of Sequences from Their Subsequences or Supersequences , 2001, J. Comb. Theory A.

[8]  Yuval Peres,et al.  Trace reconstruction with varying deletion probabilities , 2018, ANALCO.

[9]  Ryan O'Donnell,et al.  Optimal mean-based algorithms for trace reconstruction , 2017, STOC.

[10]  Sofya Vorotnikova,et al.  Trace Reconstruction Revisited , 2014, ESA.

[11]  Rocco A. Servedio,et al.  Beyond Trace Reconstruction: Population Recovery from the Deletion Channel , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[12]  Wojciech Rytter,et al.  Circular pattern matching with k mismatches , 2021, J. Comput. Syst. Sci..

[13]  Rocco A. Servedio,et al.  Polynomial-time trace reconstruction in the smoothed complexity model , 2020, ArXiv.

[14]  Wojciech Rytter,et al.  Circular Pattern Matching with k Mismatches , 2019, FCT.

[15]  Shyam Narayanan,et al.  Population Recovery from the Deletion Channel: Nearly Matching Trace Reconstruction Bounds , 2020, ArXiv.

[16]  Rocco A. Servedio,et al.  Efficient average-case population recovery in the presence of insertions and deletions , 2019, APPROX-RANDOM.

[17]  Yuval Peres,et al.  Average-Case Reconstruction for the Deletion Channel: Subpolynomially Many Traces Suffice , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[18]  Olgica Milenkovic,et al.  Coded Trace Reconstruction , 2019, 2019 IEEE Information Theory Workshop (ITW).

[19]  Yuval Peres,et al.  Trace reconstruction with exp(O(n1/3)) samples , 2017, STOC.

[20]  Seungwoo Hwang,et al.  Long-Term Stability and Integrity of Plasmid-Based DNA Data Storage , 2018, Polymers.

[21]  Sampath Kannan,et al.  Reconstructing strings from random traces , 2004, SODA '04.

[22]  Alexandr Andoni,et al.  Homomorphic fingerprints under misalignments: sketching edit and shift distances , 2013, STOC '13.

[23]  Sampath Kannan,et al.  More on reconstructing strings from random traces: insertions and deletions , 2005, Proceedings. International Symposium on Information Theory, 2005. ISIT 2005..

[24]  Afonso S. Bandeira,et al.  Optimal rates of estimation for multi-reference alignment , 2017, Mathematical Statistics and Learning.

[25]  Bruce Spang,et al.  Coded trace reconstruction in a constant number of traces , 2020, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS).

[26]  Krishnamurthy Viswanathan,et al.  Improved string reconstruction over insertion-deletion channels , 2008, SODA '08.

[27]  Qiangru Kuang,et al.  Number Fields , 2019 .

[28]  Zachary Chase New Upper Bounds for Trace Reconstruction , 2020, ArXiv.

[29]  Rina Panigrahy,et al.  Trace reconstruction with constant deletion probability and related results , 2008, SODA '08.

[30]  Akshay Krishnamurthy,et al.  Trace Reconstruction: Generalized and Parameterized , 2019, ESA.

[31]  Vladimir I. Levenshtein,et al.  Efficient reconstruction of sequences , 2001, IEEE Trans. Inf. Theory.

[32]  Cyrus Rashtchian,et al.  Reconstructing Trees from Traces , 2019, COLT.

[33]  Amit Singer,et al.  Multireference alignment using semidefinite programming , 2013, ITCS.