Better OPM/L Text Compression

An OPM/L data compression scheme suggested by Ziv and Lempel, LZ77, is applied to text compression. A slightly modified version suggested by Storer and Szymanski, LZSS, is found to achieve compression ratios as good as most existing schemes for a wide range of texts. LZSS decoding is very fast, and comparatively little memory is required for encoding and decoding. Although the time complexity of LZ77 and LZSS encoding is O(M) for a text of M characters, straightforward implementations are very slow. The time consuming step of these algorithms is a search for the longest string match. Here a binary search tree is used to find the longest string match, and experiments show that this results in a dramatic increase in encoding speed. The binary tree algorithm can be used to speed up other OPM/L schemes, and other applications where a longest string match is required. Although the LZSS scheme imposes a limit on the length of a match, the binary tree algorithm will work without any limit.

[1]  H. Urrows,et al.  LaserData, Mnemos, and Other Data Disks: The Race to Store and Retrieve with Optics. , 1984 .

[2]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[3]  Michael Rodeh,et al.  Linear Algorithm for Data Compression via String Matching , 1981, JACM.

[4]  James A. Storer,et al.  Parallel algorithms for data compression , 1985, JACM.

[5]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[6]  Alan Borning,et al.  A prototype electronic encyclopedia , 1985, TOIS.

[7]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[8]  R. Nigel Horspool,et al.  Algorithms for Adaptive Huffman Codes , 1984, Inf. Process. Lett..

[9]  E. B. James,et al.  Information Compression by Factorising Common Strings , 1975, Computer/law journal.

[10]  Niklaus Wirth,et al.  Algorithms + Data Structures = Programs , 1976 .

[11]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[12]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[13]  Glen G. Langdon,et al.  A note on the Ziv-Lempel model for compressing individual sequences , 1983, IEEE Trans. Inf. Theory.

[14]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[15]  Philippe G. H. Lehot,et al.  A classification of compression methods and their usefulness for a large data processing center , 1975, AFIPS '75.

[16]  J. Rissanen,et al.  A Double-Adaptive File Compression Algorithm , 1983, IEEE Trans. Commun..

[17]  Robert G. Gallager,et al.  Variations on a theme by Huffman , 1978, IEEE Trans. Inf. Theory.

[18]  Glen G. Langdon,et al.  A simple general binary source code , 1982, IEEE Trans. Inf. Theory.

[19]  Steve A. Money Teletext and Viewdata , 1979 .

[20]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[21]  Dennis G. Severance,et al.  A practitioner's guide to data base compression - Tutorial , 1983, Inf. Syst..