Reconstructing Trees from Traces

We study the problem of learning a node-labeled tree given independent traces from an appropriately defined deletion channel. This problem, tree trace reconstruction, generalizes string trace reconstruction, which corresponds to the tree being a path. For many classes of trees, including complete trees and spiders, we provide algorithms that reconstruct the labels using only a polynomial number of traces. This exhibits a stark contrast to known results on string trace reconstruction, which require exponentially many traces, and where a central open problem is to determine whether a polynomial number of traces suffice. Our techniques combine novel combinatorial and complex analytic methods.

[1]  Vladimir I. Levenshtein,et al.  Efficient Reconstruction of Sequences from Their Subsequences or Supersequences , 2001, J. Comb. Theory A.

[2]  Leon Anavy,et al.  Data storage in DNA with fewer synthesis cycles using composite DNA letters , 2019, Nature Biotechnology.

[3]  Allon M. Klein,et al.  Single-cell mapping of gene expression landscapes and lineage in the zebrafish embryo , 2018, Science.

[4]  Vincent Tabard-Cossa,et al.  Capture and Translocation Characteristics of Short Branched DNA Labels in Solid-State Nanopores. , 2018, ACS sensors.

[5]  Russell Lyons,et al.  Lower bounds for trace reconstruction , 2018, ArXiv.

[6]  Sampath Kannan,et al.  Reconstructing strings from random traces , 2004, SODA '04.

[7]  Rocco A. Servedio,et al.  Beyond Trace Reconstruction: Population Recovery from the Deletion Channel , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[8]  Tamás Erdélyi,et al.  LITTLEWOOD-TYPE PROBLEMS ON SUBARCS OF THE UNIT CIRCLE , 1997 .

[9]  Mike A. Steel,et al.  Phylogeny - discrete and random processes in evolution , 2016, CBMS-NSF regional conference series in applied mathematics.

[10]  Zachary Chase New Upper Bounds for Trace Reconstruction , 2020, ArXiv.

[11]  Alexandr Andoni,et al.  Global Alignment of Molecular Sequences via Ancestral State Reconstruction , 2009, ICS.

[12]  Sofya Vorotnikova,et al.  Trace Reconstruction Revisited , 2014, ESA.

[13]  Alessandro Panconesi,et al.  Trace complexity of network inference , 2013, KDD.

[14]  Ilia Krasikov,et al.  On a Reconstruction Problem for Sequences, , 1997, J. Comb. Theory A.

[15]  Leon Anavy,et al.  Improved DNA based storage capacity and fidelity using composite DNA letters , 2018, bioRxiv.

[16]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[17]  Sebastien Jocelyn Roch,et al.  Markov models on trees: Reconstruction and applications , 2007 .

[18]  Cyrus Rashtchian,et al.  Random access in large-scale DNA data storage , 2018, Nature Biotechnology.

[19]  Yuval Peres,et al.  Subpolynomial trace reconstruction for random strings and arbitrary deletion probability , 2018, COLT.

[20]  Olgica Milenkovic,et al.  Coded Trace Reconstruction , 2019, 2019 IEEE Information Theory Workshop (ITW).

[21]  Samantha A. Morris,et al.  Single-cell mapping of lineage and identity in direct reprogramming , 2018, Nature.

[22]  Rina Panigrahy,et al.  Trace reconstruction with constant deletion probability and related results , 2008, SODA '08.

[23]  Akshay Krishnamurthy,et al.  Trace Reconstruction: Generalized and Parameterized , 2019, ESA.

[24]  Olgica Milenkovic,et al.  Portable and Error-Free DNA-Based Data Storage , 2016, Scientific Reports.

[25]  J. Lauri,et al.  Topics in Graph Automorphisms and Reconstruction , 2003 .

[26]  L. Ceze,et al.  Molecular digital data storage using DNA , 2019, Nature Reviews Genetics.

[27]  Yuval Peres,et al.  Trace reconstruction with varying deletion probabilities , 2018, ANALCO.

[28]  Yuval Peres,et al.  Trace reconstruction with exp(O(n1/3)) samples , 2017, STOC.

[29]  Ryan O'Donnell,et al.  Optimal mean-based algorithms for trace reconstruction , 2017, STOC.

[30]  Yaniv Erlich,et al.  DNA Fountain enables a robust and efficient storage architecture , 2016, Science.

[31]  J. Palmer,et al.  Horizontal gene transfer in eukaryotic evolution , 2008, Nature Reviews Genetics.

[32]  G. Church,et al.  Next-Generation Digital Information Storage in DNA , 2012, Science.

[33]  P. Kelly A congruence theorem for trees. , 1957 .

[34]  Miroslav Dudík,et al.  Reconstruction from subsequences , 2003, J. Comb. Theory A.

[35]  M. Mitzenmacher A survey of results for deletion channels and related synchronization channels , 2009 .

[36]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[37]  Ewan Birney,et al.  Towards practical, high-capacity, low-maintenance information storage in synthesized DNA , 2013, Nature.

[38]  Jian Ma,et al.  DNA-Based Storage: Trends and Methods , 2015, IEEE Transactions on Molecular, Biological and Multi-Scale Communications.

[39]  Krishnamurthy Viswanathan,et al.  Improved string reconstruction over insertion-deletion channels , 2008, SODA '08.

[40]  Elchanan Mossel,et al.  Shotgun Assembly of Labeled Graphs , 2015, IEEE Transactions on Network Science and Engineering.

[41]  L. Ahlfors Complex analysis : an introduction to the theory of analytic functions of one complex variable / Lars V. Ahlfors , 1984 .

[42]  S. Ulam A collection of mathematical problems , 1960 .

[43]  Zachary Chase New lower bounds for trace reconstruction , 2021 .

[44]  Elchanan Mossel,et al.  On the Impossibility of Reconstructing Ancestral Data and Phylogenies , 2003, J. Comput. Biol..