Adaptive Dictionary Sharing Method for Re-Pair Algorithm

We address the problem of applying the Re-Pair [LM99] algorithm to large texts. The Re-Pair algorithm (Re-Pair) proposed by Larsson and Moffat in 1999 is a simple grammar-based compression method that achieves an extremely good compression ratio. However, Re-Pair is not applicable to a large text because it consumes much memory and runs in an offline manner. Dividing the input into a consecutive sequence of smaller blocks works well for the problem, but this solution makes the compression ratio worse. In our previous research we proposed a method that shares a part of dictionaries of Re-Pair among blocks in order to reduce the decrease, where the shared dictionary is constructed at the beginning. In this paper we present a new method that adaptively reconstructs the shared dictionary. We implemented our method and investigated its performance through several experiments. The results show that our method runs much faster than an existent method and obtains a good compression ratio comparable to well-known compression tools.

[1]  A. Moffat,et al.  Offline dictionary-based compression , 2000, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[2]  Alistair Moffat,et al.  Block Merging for Off-Line Compression , 2002, CPM.