Online Grammar Transformation Based on Re-Pair Algorithm

The Re-Pair algorithm (Re-Pair), proposed by Larsson and Moat, is a simple grammar-based compression method that achieves a good compression ratio. Although Re-Pair runs in O(n) time and space for an input of length n, it cannot be used with a large input, because it runs oine and consumes substantial memory space. In this paper, we propose an online grammar transformation algorithm based on a modied Re-Pair along with a compression method using the algorithm. The proposed algorithm runs in O(n log^ h) time using O(g) space, where g and ^h are the number of production rules in a grammar and the maximum height of syntax trees generated by the rules, respectively. We implemented our method and demonstrated that it signicantly reduces memory usage with little sacrice of compression ratio in comparison with the original Re-Pair.

[1]  Hiroshi Sakamoto,et al.  Context-Sensitive Grammar Transform: Compression and Pattern Matching , 2008, SPIRE.

[2]  Hiroshi Sakamoto,et al.  A Space-Saving Linear-Time Algorithm for Grammar-Based Compression , 2004, SPIRE.

[3]  Yasuo Tabei,et al.  Fully Online Grammar Compression in Constant Space , 2014, 2014 Data Compression Conference.

[4]  Takuya Kida Suffix Tree Based VF-Coding for Compressed Pattern Matching , 2009, 2009 Data Compression Conference.

[5]  Ayumi Shinohara,et al.  Efficient algorithms to compute compressed longest common substrings and compressed palindromes , 2009, Theor. Comput. Sci..

[6]  Hideo Bannai,et al.  LZD Factorization: Simple and Practical Online Grammar Compression with Variable-to-Fixed Encoding , 2015, CPM.

[7]  Ayumi Shinohara,et al.  Speeding Up String Pattern Matching by Text Compression: The Dawn of a New Era , 2001 .

[8]  Alistair Moffat,et al.  Off-line dictionary-based compression , 2000 .

[9]  Shmuel Tomi Klein,et al.  Improved Variable-to-Fixed Length Codes , 2008, SPIRE.

[10]  Ayumi Shinohara,et al.  Simple Linear-Time Off-Line Text Compression by Longest-First Substitution , 2007, 2007 Data Compression Conference (DCC'07).

[11]  Satoshi Yoshida,et al.  Adaptive Dictionary Sharing Method for Re-Pair Algorithm , 2014, 2014 Data Compression Conference.

[12]  Hiroshi Sakamoto,et al.  An Online Algorithm for Lightweight Grammar-Based Compression , 2011, 2011 First International Conference on Data Compression, Communications and Processing.

[13]  Hiroshi Sakamoto,et al.  Fully-Online Grammar Compression , 2013, SPIRE.

[14]  Alistair Moffat,et al.  Off-line dictionary-based compression , 1999, Proceedings of the IEEE.

[15]  Gad M. Landau,et al.  Unified Compression-Based Acceleration of Edit-Distance Computation , 2011, Algorithmica.