Coding for compression in full-text retrieval systems

Witten, Bell and Nevill (see ibid., p.23, 1991) have described compression models for use in full-text retrieval systems. The authors discuss other coding methods for use with the same models, and give results that show their scheme yielding virtually identical compression, and decoding more than forty times faster. One of the main features of their implementation is the complete absence of arithmetic coding; this, in part, is the reason for the high speed. The implementation is also particularly suited to slow devices such as CD-ROM, in that the answering of a query requires one disk access for each term in the query and one disk access for each answer. All words and numbers are indexed, and there are no stop words. They have built two compressed databases.<<ETX>>

[1]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[2]  David C. van Voorhis,et al.  Optimal source codes for geometrically distributed integer alphabets (Corresp.) , 1975, IEEE Trans. Inf. Theory.

[3]  S. Golomb Run-length encodings. , 1966 .

[4]  Robert G. Gallager,et al.  Variations on a theme by Huffman , 1978, IEEE Trans. Inf. Theory.

[5]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[6]  Ian H. Witten,et al.  Models for compression in full-text retrieval systems , 1991, [1991] Proceedings. Data Compression Conference.

[7]  Daniel S. Hirschberg,et al.  Efficient decoding of prefix codes , 1990, CACM.

[8]  Alistair Moffat,et al.  Implementing the PPM data compression scheme , 1990, IEEE Trans. Commun..

[9]  Alistair Moffat,et al.  Word‐based text compression , 1989, Softw. Pract. Exp..

[10]  Solomon W. Golomb,et al.  Run-length encodings (Corresp.) , 1966, IEEE Trans. Inf. Theory.

[11]  M. D. McIlroy,et al.  Development of a Spelling List , 1982, IEEE Trans. Commun..