论文信息 - Improving PPM with Dynamic Parameter Updates

Improving PPM with Dynamic Parameter Updates

This article makes several improvements to the classic PPM algorithm, resulting in a new algorithm with superior compression effectiveness on human text. The key differences of our algorithm to classic PPM are that (A) rather than the original escape mechanism, we use a generalised blending method with explicit hyper-parameters that control the way symbol counts are combined to form predictions, (B) different hyper-parameters are used for classes of different contexts, and (C) these hyper-parameters are updated dynamically using gradient information. The resulting algorithm (PPM-DP) compresses human text better than all currently published variants of PPM, CTW, DMC, LZ, CSE and BWT, with runtime only slightly slower than classic PPM.

[1] Frans M. J. Willems,et al. The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[2] Matthew V. Mahoney. The PAQ1 Data Compression Program , 2002 .

[3] Alistair Moffat,et al. Implementing the PPM data compression scheme , 1990, IEEE Trans. Commun..

[4] Christian Steinruecken,et al. Lossless data compression , 2015 .

[5] Y. Shtarkov,et al. The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[6] Ian H. Witten,et al. Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[7] Yee Whye Teh,et al. A stochastic memoizer for sequence data , 2009, ICML '09.

[8] Mark Weiser,et al. Source Code , 1987, Computer.

[9] Matthew V. Mahoney,et al. Fast Text Compression with Neural Networks , 2000, FLAIRS Conference.

[10] Yee Whye Teh,et al. A Bayesian Interpretation of Interpolated Kneser-Ney , 2006 .

[11] Suzanne Bunton,et al. Semantically Motivated Improvements for PPM Variants , 1997, Comput. J..

[12] Matthew V. Mahoney,et al. Adaptive weighing of context models for lossless data compression , 2005 .

[13] Yee Whye Teh,et al. Improvements to the Sequence Memoizer , 2010, NIPS.

[14] Frank D. Wood,et al. Deplump for Streaming Data , 2011, 2011 Data Compression Conference.

[15] Paul G. Howard,et al. The design and analysis of efficient lossless data compression systems , 1993 .

[16] Dmitry A. Shkarin,et al. PPM: one step to practicality , 2002, Proceedings DCC 2002. Data Compression Conference.

[17] Yee Whye Teh,et al. Lossless Compression Based on the Sequence Memoizer , 2010, 2010 Data Compression Conference.

[18] Frans M. J. Willems,et al. The Context-Tree Weighting Method : Extensions , 1998, IEEE Trans. Inf. Theory.

[19] Dmitry A. Shkarin. Improving the Efficiency of the PPM Algorithm , 2001, Probl. Inf. Transm..

[20] Frans M. J. Willems,et al. Context Tree Weighting : A Sequential Universal Source Coding Procedure for Fsmx Sources , 1993, Proceedings. IEEE International Symposium on Information Theory.

[21] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[22] R. Nigel Horspool,et al. Data Compression Using Dynamic Markov Modelling , 1987, Comput. J..

[23] D. J. Wheeler,et al. A Block-sorting Lossless Data Compression Algorithm , 1994 .

[24] Vincent Beaudoin,et al. Lossless Data Compression via Substring Enumeration , 2010, 2010 Data Compression Conference.

[25] Abraham Lempel,et al. A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.