论文信息 - Adaptive learning of compressible strings

Adaptive learning of compressible strings

Suppose an oracle knows a string S that is unknown to us and that we want to determine. The oracle can answer queries of the form “Is s a substring of S?”. In 1995, Skiena and Sundaram showed that, in the worst case, any algorithm needs to ask the oracle σn/4−O(n) queries in order to be able to reconstruct the hidden string, where σ is the size of the alphabet of S and n its length, and gave an algorithm that spends (σ−1)n+O(σ√n) queries to reconstruct S. The main contribution of our paper is to improve the above upper-bound in the context where the string is compressible. We first present a universal algorithm that, given a (computable) compressor that compresses the string to τ bits, performs q = O(τ) substring queries; this algorithm, however, runs in exponential time. For this reason, the second part of the paper focuses on more time-efficient algorithms whose number of queries is bounded by specific compressibility measures. We first show that any string of length n over an integer alphabet of size σ with rle runs can be reconstructed with q = O(rle(σ + log n rle )) substring queries in linear time and space. We then present an algorithm that spends q ∈ O(σg log n) substring queries and runs in O(n(logn + log σ) + q) time using linear space, where g is the size of a smallest straight-line program generating the string.

[1] Aldo de Luca,et al. Words and special factors , 2001, Theor. Comput. Sci..

[2] Alon Orlitsky,et al. String Reconstruction from Substring Compositions , 2014, SIAM J. Discret. Math..

[3] Michael A. Bender,et al. Cache-oblivious string B-trees , 2006, PODS '06.

[4] Amihood Amir,et al. Adaptive Exact Learning in a Mixed-Up World: Dealing with Periodicity, Errors, and Jumbled-Index Queries in String Reconstruction , 2020, SPIRE.

[5] Francis Dominick Murgolo. Approximation algorithms for combinatorial optimization problems , 1985 .

[6] Abraham Lempel,et al. A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[7] Andreas W. M. Dress,et al. Reconstructing Words from Subwords in Linear Time , 2005 .

[8] P. Alam. ‘A’ , 2021, Composites Engineering: An A–Z Guide.

[9] Tao Jiang,et al. DNA sequencing and string learning , 2005, Mathematical systems theory.

[10] Moni Naor. String Matching with Preprocessing of Text and Pattern , 1991, ICALP.

[11] Esko Ukkonen,et al. On-line construction of suffix trees , 1995, Algorithmica.

[12] Davide Della Giustina,et al. A New Linear-Time Algorithm for Centroid Decomposition , 2019, SPIRE.

[13] Steven Skiena,et al. Reconstructing strings from substrings in rounds , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[14] C. Jordan. Sur les assemblages de lignes. , 1869 .

[15] Paolo Ferragina,et al. Compressed Cache-Oblivious String B-Tree , 2013, ESA.

[16] Kazuo Iwama,et al. Reconstructing Strings from Substrings: Optimal Randomized and Average-Case Algorithms , 2018, ArXiv.

[17] Anna Pagh,et al. The Complexity of Constructing Evolutionary Trees Using Experiments , 2001, ICALP.

[18] Steven Skiena,et al. Reconstructing Strings from Substrings , 1995, J. Comput. Biol..

[19] Dekel Tsur. Tight Bounds for String Reconstruction Using Substring Queries , 2005, APPROX-RANDOM.

[20] Gonzalo Navarro. Indexing Highly Repetitive String Collections , 2020, ArXiv.

[21] Antonio Restivo,et al. Word assembly through minimal forbidden words , 2006, Theor. Comput. Sci..