论文信息 - Cluster-based retrieval using language models

Cluster-based retrieval using language models

Previous research on cluster-based retrieval has been inconclusive as to whether it does bring improved retrieval effectiveness over document-based retrieval. Recent developments in the language modeling approach to IR have motivated us to re-examine this problem within this new retrieval framework. We propose two new models for cluster-based retrieval and evaluate them on several TREC collections. We show that cluster-based retrieval can perform consistently across collections of realistic size, and significant improvements over document-based retrieval can be obtained in a fully automatic manner and without relevance information provided by human.

W. Bruce Croft | Xiaoyong Liu | Xiaoyong Liu

[1] W. Bruce Croft,et al. Cluster-based language models for distributed retrieval , 1999, SIGIR '99.

[2] W. Bruce Croft,et al. Document clustering: An evaluation of some experiments with the cranfield 1400 collection , 1975, Inf. Process. Manag..

[3] W. Bruce Croft. A model of cluster searching bases on classification , 1980, Inf. Syst..

[4] Peter Willett,et al. Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[5] Stephen E. Robertson,et al. A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[6] Ellen M. Vdorhees,et al. The cluster hypothesis revisited , 1985, SIGIR '85.

[7] John D. Lafferty,et al. Two-stage language models for information retrieval , 2002, SIGIR '02.

[8] Richard M. Schwartz,et al. A hidden Markov model information retrieval system , 1999, SIGIR '99.

[9] Jonathan Yamron,et al. Topic Tracking in a News Stream , 1999 .

[10] Marti A. Hearst,et al. Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[11] Peter Willett,et al. Using interdocument similarity information in document retrieval systems , 1997 .