论文信息 - Fast Text Compression with Neural Networks

Fast Text Compression with Neural Networks

Neural networks have the potential to extend data compression algorithms beyond the character level n-gram models now in use, but have usually been avoided because they are too slow to be practical. We introduce a model that produces better compression than popular Limpel-Ziv compressors (zip, gzip, compress), and is competitive in time, space, and compression ratio with PPM and BurrowsWheeler algorithms, currently the best known. The compressor, a bit-level predictive arithmetic encoder using a 2 layer, 4 × 10 6 by 1 network, is fast (about 10 4 characters/second) because only 4-5 connections are simultaneously active and because it uses a variable learning rate optimized for one-pass training.

Matthew V. Mahoney | M. Mahoney

[1] Jerome A. Feldman,et al. Connectionist Models and Their Properties , 1982, Cogn. Sci..

[2] Ian H. Witten,et al. Modeling for text compression , 1989, CSUR.

[3] Ian H. Witten,et al. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[4] D. J. Wheeler,et al. A Block-sorting Lossless Data Compression Algorithm , 1994 .

[5] W. Teahan,et al. Experiments on the zero frequency problem , 1995, Proceedings DCC '95 Data Compression Conference.

[6] Ronald Rosenfeld,et al. A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[7] Jürgen Schmidhuber,et al. Sequential neural text compression , 1996, IEEE Trans. Neural Networks.

[8] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[9] D. Ballard,et al. Connectionist models and their properties Cognitive Science , 2002 .