论文信息 - HOLZ: High-Order Entropy Encoding of Lempel-Ziv Factor Distances

HOLZ: High-Order Entropy Encoding of Lempel-Ziv Factor Distances

We propose a new representation of the offsets of the Lempel–Ziv (LZ) factorization based on the co-lexicographic order of the text’s prefixes. The selected offsets tend to approach the k-th order empirical entropy. Our evaluations show that this choice is superior to the rightmost and bit-optimal LZ parsings on datasets with small high-order entropy.

Gonzalo Navarro | Dominik Köppl | Nicola Prezza

[1] D. J. Wheeler,et al. A Block-sorting Lossless Data Compression Algorithm , 1994 .

[2] Abraham Lempel,et al. A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[3] Alberto Policriti,et al. LZ77 Computation Based on the Run-Length Encoded BWT , 2018, Algorithmica.

[4] Enno Ohlebusch,et al. Lempel-Ziv Factorization Revisited , 2011, CPM.

[5] Simon J. Puglisi,et al. Range Predecessor and Lempel-Ziv Parsing , 2016, SODA.

[6] G. Navarro. Indexing Highly Repetitive String Collections, Part I: Repetitiveness Measures , 2020 .

[7] Paolo Ferragina,et al. On the Bit-Complexity of Lempel-Ziv Compression , 2009, SIAM J. Comput..

[8] J. Ian Munro,et al. Compressed Data Structures for Dynamic Sequences , 2015, ESA.

[9] Giovanni Manzini,et al. Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[10] Abraham Lempel,et al. On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[11] A. D. Wyner,et al. The sliding-window Lempel-Ziv algorithm is asymptotically optimal , 1994, Proc. IEEE.

[12] G SzymanskiThomas,et al. Data compression via textual substitution , 1982 .

[13] Eugene W. Myers,et al. Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[14] Roberto Grossi,et al. High-order entropy-compressed text indexes , 2003, SODA '03.

[15] Lucian Ilie,et al. A Simple Algorithm for Computing the Lempel Ziv Factorization , 2008, Data Compression Conference (dcc 2008).