High-Order Text Compression on Hierarchical Edge-Guided

High-order word-based modeling is able to achieve competitive compression ratios by using k-order text statistics. However, this can be an impracticable problem due to the large number of relationships between words. This paper focuses on how the 1-order Edge-Guided (E-G) technique can be enhanced to support modeling and coding on high-order text statistics. An improved E-G revision, called E-G1, is firstly done. A grammar-based building is next used to identify significative high-order contexts, in a first pass, which are used to encode the text on an extended revision of the E-G codification scheme. This current approach, E-Gk, yields a competitive space/efficiency trade-off with respect to comparable approaches.

[1]  A. Moffat,et al.  Offline dictionary-based compression , 2000, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[2]  Alistair Moffat,et al.  Off-line dictionary-based compression , 2000 .

[3]  Joaquín Adiego,et al.  Edge-Guided Natural Language Text Compression , 2007, SPIRE.