Conversion from RLBWT to LZ77

Converting a compressed format of a string into another compressed format without an explicit decompression is one of the central research topics in string processing. We discuss the problem of converting the run-length Burrows-Wheeler Transform (RLBWT) of a string to Lempel-Ziv 77 (LZ77) phrases of the reversed string. The first results with Policriti and Prezza's conversion algorithm [Algorithmica 2018] were $O(n \log r)$ time and $O(r)$ working space for length of the string $n$, number of runs $r$ in the RLBWT, and number of LZ77 phrases $z$. Recent results with Kempa's conversion algorithm [SODA 2019] are $O(n / \log n + r \log^{9} n + z \log^{9} n)$ time and $O(n / \log_{\sigma} n + r \log^{8} n)$ working space for the alphabet size $\sigma$ of the RLBWT. In this paper, we present a new conversion algorithm by improving Policriti and Prezza's conversion algorithm where dynamic data structures for general purpose are used. We argue that these dynamic data structures can be replaced and present new data structures for faster conversion. The time and working space of our conversion algorithm with new data structures are $O(n \min \{ \log \log n, \sqrt{\frac{\log r}{\log\log r}} \})$ and $O(r)$, respectively.

[1]  Hideo Bannai,et al.  Efficient LZ78 Factorization of Grammar Compressed Text , 2012, SPIRE.

[2]  Johannes Fischer,et al.  Alphabet-Dependent String Searching with Wexponential Search Trees , 2015, CPM.

[3]  Alberto Policriti,et al.  LZ77 Computation Based on the Run-Length Encoded BWT , 2018, Algorithmica.

[4]  Wing-Kai Hon,et al.  Succinct data structures for Searchable Partial Sums with optimal worst-case performance , 2011, Theor. Comput. Sci..

[5]  Hideo Bannai,et al.  Converting SLP to LZ78 in almost Linear Time , 2013, CPM.

[6]  Volker Heun,et al.  Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays , 2011, SIAM J. Comput..

[7]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[8]  Dominik Kempa Optimal Construction of Compressed Indexes for Highly Repetitive Texts , 2019, SODA.

[9]  Philip Bille,et al.  Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation , 2017, Algorithmica.

[10]  Gonzalo Navarro,et al.  Optimal-Time Text Indexing in BWT-runs Bounded Space , 2017, SODA.

[11]  Artur Jez,et al.  A really simple approximation of smallest grammar , 2014, Theor. Comput. Sci..

[12]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[13]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[14]  Hiroshi Sakamoto,et al.  RePair in Compressed Space and Time , 2019, 2019 Data Compression Conference (DCC).

[15]  Wojciech Rytter Application of Lempel-Ziv factorization to the approximation of grammar-based compression , 2003, Theor. Comput. Sci..

[16]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[17]  Faith Ellen,et al.  Optimal Bounds for the Predecessor Problem and Related Problems , 2002, J. Comput. Syst. Sci..