论文信息 - Information Retrieval Systems for Large Document Collections

Information Retrieval Systems for Large Document Collections

Practical information retrieval systems must manage large volumes of data, often divided into several collections that may be held on separate machines. Techniques for locating matches to queries must therefore consider identification of probable collections as well as identification of documents that are probable answers. Further-more, the large amounts of data involved motivates the use of compression, but in a dynamic environment compression is problematic, because as new text is added the compression model slowly becomes inappropriate. In this paper we describe solutions to both of these problems. We show that use of centralised blocked indexes can reduce overall query processing costs in a multi-collection environment, and that careful application of text compression techniques allow collections to grow by several orders of magnitude without recompression becoming necessary.

Alistair Moffat | Justin Zobel | J. Zobel | Alistair Moffat

[1] Alistair Moffat,et al. Compression and Fast Indexing for Multi-Gigabyte Text Databases , 1994, Aust. Comput. J..

[2] B. Clifford Neuman,et al. A Comparison of Internet Resource Discovery Approaches , 1992, Comput. Syst..

[3] Ian H. Witten,et al. Text Compression , 1990, 125 Problems in Text Algorithms.

[4] Peter B. Danzig,et al. Research Problems for Scalable Internet Resource Discovery ; CU-CS-643-93 , 1993 .

[5] Ian H. Witten,et al. Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[6] L. R. Rasmussen,et al. In information retrieval: data structures and algorithms , 1992 .

[7] David A. Huffman,et al. A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[8] Shmuel Tomi Klein,et al. A Systematic Approach to Compressing a Full-Text Retrieval System , 1992, Inf. Process. Manag..

[9] Dennis G. Severance,et al. A practitioner's guide to data base compression - Tutorial , 1983, Inf. Syst..

[10] R. Nigel Horspool,et al. Constructing word-based text compression algorithms , 1992, Data Compression Conference, 1992..

[11] Gordon V. Cormack,et al. Data compression on a database system , 1985, CACM.