论文信息 - Analyzing Relative Lempel-Ziv Reference Construction

Analyzing Relative Lempel-Ziv Reference Construction

Relative Lempel-Ziv is a popular algorithm designed to compress sets of strings relative to a given reference string, which acts as a kind of dictionary. It can still applied even when there is no obvious natural reference string for a dataset, by sampling substrings from the dataset and concatenating them to obtain an artificial reference. This works well in practice but a theoretical analysis has been lacking. In this paper we provide such an analysis and verify it experimentally.

Simon J. Puglisi | Travis Gagie | Daniel Valenzuela

[1] Abhi Shelat,et al. The smallest grammar problem , 2005, IEEE Transactions on Information Theory.

[2] Alistair Moffat,et al. Effective Construction of Relative Lempel-Ziv Dictionaries , 2016, WWW.

[3] Juha Kärkkäinen,et al. Lempel-Ziv Parsing in External Memory , 2014, 2014 Data Compression Conference.

[4] Justin Zobel,et al. Principled dictionary pruning for low-memory corpus compression , 2014, SIGIR.

[5] Justin Zobel,et al. Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections , 2011, Proc. VLDB Endow..

[6] Abraham Lempel,et al. A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[7] Pawel Gawrychowski. Faster Algorithm for Computing the Edit Distance between SLP-Compressed Strings , 2012, SPIRE.

[8] Ronitt Rubinfeld,et al. Sublinear Algorithms for Approximating String Compressibility , 2007, Algorithmica.

[9] Justin Zobel,et al. Relative Lempel-Ziv Compression of Genomes for Large-Scale Storage and Retrieval , 2010, SPIRE.

[10] Juha Kärkkäinen,et al. Linear Time Lempel-Ziv Factorization: Simple, Fast, Small , 2012, CPM.

[11] Wojciech Rytter. Application of Lempel-Ziv factorization to the approximation of grammar-based compression , 2003, Theor. Comput. Sci..