论文信息 - Making Retrieval Faster Through Document Clustering

Making Retrieval Faster Through Document Clustering

This work addresses the problem of reducing the time between query submission and results output in a retrieval system. The goal is achieved by considering only a database fraction as small as possible during the retrieval process. Our approach is based on a new clustering technique and comparisons with other clustering methods presented in the literature are performed. Our algorithm is shown to outperform the other techniques: retrieval performances close to those obtained with the whole corpus are achieved by selecting only 5% of the data.

Alessandro Vinciarelli | David Grangier

[1] Hans Peter Luhn,et al. The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[2] Stephen E. Robertson,et al. Okapi at TREC-3 , 1994, TREC.

[3] Stephen E. Robertson,et al. GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[4] W. Bruce Croft,et al. Cluster-based language models for distributed retrieval , 1999, SIGIR '99.

[5] Ian H. Witten,et al. Managing gigabytes , 1994 .

[6] Mark Liberman,et al. THE TDT-2 TEXT AND SPEECH CORPUS , 1999 .

[7] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[8] L. R. Rasmussen,et al. In information retrieval: data structures and algorithms , 1992 .

[9] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.