A Space-Saving Linear-Time Algorithm for Grammar-Based Compression

A space-efficient linear-time approximation algorithm for the grammar-based compression problem, which requests for a given string to find a smallest context-free grammar deriving the string, is presented. The algorithm consumes only O(g * log g *) space and achieves the worst-case approximation ratio O(log g * log n), with the size n of an input and the optimum grammar size g *. Experimental results for typical benchmarks demonstrate that our algorithm is practical and efficient.

[1]  David Salomon,et al.  Data Compression: The Complete Reference , 2006 .

[2]  Alistair Moffat,et al.  Off-line dictionary-based compression , 2000 .

[3]  Dake He,et al.  Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform .2. With context models , 2000, IEEE Trans. Inf. Theory.

[4]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[5]  Abhi Shelat,et al.  Approximation algorithms for grammar-based compression , 2002, SODA '02.

[6]  Hiroshi Sakamoto,et al.  A fully linear-time approximation algorithm for grammar-based compression , 2003, J. Discrete Algorithms.

[7]  W. J. Thron,et al.  Encyclopedia of Mathematics and its Applications. , 1982 .

[8]  Giorgio Gambosi,et al.  Complexity and Approximation , 1999, Springer Berlin Heidelberg.

[9]  Giorgio Gambosi,et al.  Complexity and approximation: combinatorial optimization problems and their approximability properties , 1999 .

[10]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[11]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[12]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[13]  A. Moffat,et al.  Offline dictionary-based compression , 2000, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[14]  En-Hui Yang,et al.  Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform - Part one: Without context models , 2000, IEEE Trans. Inf. Theory.

[15]  James A. Storer,et al.  On-Line Versus Off-Line Computation in Dynamic Text Compression , 1996, Inf. Process. Lett..

[16]  Wojciech Rytter,et al.  Application of Lempel-Ziv factorization to the approximation of grammar-based compression , 2002, Theor. Comput. Sci..

[17]  Ayumi Shinohara,et al.  Collage system: a unifying framework for compressed pattern matching , 2003, Theor. Comput. Sci..

[18]  Abhi Shelat,et al.  Approximating the smallest grammar: Kolmogorov complexity in natural models , 2002, STOC '02.

[19]  En-Hui Yang,et al.  Grammar-based codes: A new class of universal lossless source codes , 2000, IEEE Trans. Inf. Theory.

[20]  James A. Storer,et al.  The macro model for data compression (Extended Abstract) , 1978, STOC '78.

[21]  Martin Farach-Colton,et al.  Optimal Suffix Tree Construction with Large Alphabets , 1997, FOCS.

[22]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[23]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[24]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[25]  Pamela C. Cosman,et al.  Universal lossless compression via multilevel pattern matching , 2000, IEEE Trans. Inf. Theory.

[26]  M. Lothaire Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications) , 2005 .

[27]  Craig G. Nevill-Manning,et al.  Compression and Explanation Using Hierarchical Grammars , 1997, Comput. J..

[28]  M. Lothaire Combinatorics on words: Bibliography , 1997 .