论文信息 - Training Parse Trees for Efficient VF Coding

Training Parse Trees for Efficient VF Coding

We address the problem of improving variable-length-to-fixed-length codes (VF codes), which have favourable properties for fast compressed pattern matching but moderate compression ratios. Compression ratio of VF codes depends on the parse tree that is used as a dictionary. We propose a method that trains a parse tree by scanning an input text repeatedly, and we show experimentally that it improves the compression ratio of VF codes rapidly to the level of state-of-the-art compression methods.

Satoshi Yoshida | Takuya Kida | Tatsuya Asai | Seishi Okamoto | Takashi Uemura

[1] Takuya Kida. Suffix Tree Based VF-Coding for Compressed Pattern Matching , 2009, 2009 Data Compression Conference.

[2] Peter Weiner,et al. Linear Pattern Matching Algorithms , 1973, SWAT.

[3] Gonzalo Navarro,et al. (S, C)-Dense Coding: An Optimized Compression Code for Natural Language Text Databases , 2003, SPIRE.

[4] Serap A. Savari,et al. Variable-to-fixed length codes for predictable sources , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[5] Shmuel Tomi Klein,et al. Improved Variable-to-Fixed Length Codes , 2008, SPIRE.

[6] Hiroshi Sakamoto,et al. Context-Sensitive Grammar Transform: Compression and Pattern Matching , 2008, SPIRE.

[7] Peter Ingwersen,et al. Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[8] Gonzalo Navarro,et al. An Efficient Compression Code for Text Databases , 2003, ECIR.

[9] Wojciech Rytter,et al. Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[10] Brian Parker Tunstall,et al. Synthesis of noiseless compression codes , 1967 .