论文信息 - Reexamining the cluster hypothesis: scatter/gather on retrieval results

Reexamining the cluster hypothesis: scatter/gather on retrieval results

We present Scatter/Gather, a cluster-based document browsing method, as an alternative to ranked titles for the organization and viewing of retrieval results. We systematically evaluate Scatter/Gather in this context and find significant improvements over similarity search ranking alone. This result provides evidence validating the cluster hypothesis which states that relevant documents tend to be more similar to each other than to non-relevant documents. We describe a system employing Scatter/Gather and demonstrate that users are able to use this system close to its full potential.

Marti A. Hearst | Jan O. Pedersen

[1] Peter Willett,et al. Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[2] Ellen M. Vdorhees,et al. The cluster hypothesis revisited , 1985, SIGIR '85.

[3] David R. Karger,et al. Scatter/Gather as a Tool for the Navigation of Retrieval Results , 1995 .

[4] Peter Willett,et al. Using interdocument similarity information in document retrieval systems , 1997 .

[5] Jan O. Pedersen,et al. An object-oriented architecture for text retrieval , 1991, RIAO.

[6] P. Willett,et al. Using interdocument similarity information in document retrieval systems , 1997, J. Am. Soc. Inf. Sci..

[7] W. Bruce Croft. A model of cluster searching bases on classification , 1980, Inf. Syst..

[8] Marti A. Hearst,et al. Scatter/gather browsing communicates the topic structure of a very large text collection , 1996, CHI.

[9] David R. Karger,et al. Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.

[10] Ray R. Larson. Experiments in automatic Library of Congress Classification , 1992 .

[11] Gerard Salton,et al. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .