Reexamining the cluster hypothesis: scatter/gather on retrieval results

We present Scatter/Gather, a cluster-based document browsing method, as an alternative to ranked titles for the organization and viewing of retrieval results. We systematically evaluate Scatter/Gather in this context and find significant improvements over similarity search ranking alone. This result provides evidence validating the cluster hypothesis which states that relevant documents tend to be more similar to each other than to non-relevant documents. We describe a system employing Scatter/Gather and demonstrate that users are able to use this system close to its full potential.

[1]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[2]  Ellen M. Vdorhees,et al.  The cluster hypothesis revisited , 1985, SIGIR '85.

[3]  David R. Karger,et al.  Scatter/Gather as a Tool for the Navigation of Retrieval Results , 1995 .

[4]  Peter Willett,et al.  Using interdocument similarity information in document retrieval systems , 1997 .

[5]  Jan O. Pedersen,et al.  An object-oriented architecture for text retrieval , 1991, RIAO.

[6]  P. Willett,et al.  Using interdocument similarity information in document retrieval systems , 1997, J. Am. Soc. Inf. Sci..

[7]  W. Bruce Croft A model of cluster searching bases on classification , 1980, Inf. Syst..

[8]  Marti A. Hearst,et al.  Scatter/gather browsing communicates the topic structure of a very large text collection , 1996, CHI.

[9]  David R. Karger,et al.  Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.

[10]  Ray R. Larson Experiments in automatic Library of Congress Classification , 1992 .

[11]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[12]  Anselm Spoerri,et al.  InfoCrystal: a visual tool for information retrieval & management , 1993, CIKM '93.

[13]  Matthew Chalmers,et al.  Bead: explorations in information visualization , 1992, SIGIR '92.

[14]  John K. Ousterhout,et al.  An X11 Toolkit Based on the Tcl Language , 1991, USENIX Winter.

[15]  W. Bruce Croft,et al.  Support for Browsing in an Intelligent Text Retrieval System , 1989, Int. J. Man Mach. Stud..

[16]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[17]  Ray R. Larson,et al.  Experiments in Automatic Library of Congress Classification , 1992, J. Am. Soc. Inf. Sci..

[18]  Ellen M. Vdorhees The cluster hypothesis revisited , 1985, SIGIR 1985.

[19]  Robert R. Korfhage,et al.  To see, or not to see— is That the query? , 1991, SIGIR '91.

[20]  Ellen M. Voorhees,et al.  The Collection Fusion Problem , 1994, TREC.

[21]  Marti A. Hearst TileBars: visualization of term distribution information in full text information access , 1995, CHI '95.

[22]  Don Libes,et al.  expect: Curing Those Uncontrollable Fits of Interaction , 1990, USENIX Summer.

[23]  Wendy A. Lawrence-Fowler,et al.  Integrating query thesaurus, and documents through a common visual representation , 1991, SIGIR '91.