Evaluating document clustering for interactive information retrieval

We consider the problem of organizing and browsing the top ranked portion of the documents returned by an information retrieval system. We study the effectiveness of a document organization in helping a user to locate the relevant material among the retrieved documents as quickly as possible. In this context we examine a set of clustering algorithms and experimentally show that a clustering of the retrieved documents can be significantly more effective than traditional ranked list approach. We also show that the clustering approach can be as effective as the interactive relevance feedback based on query expansion while retaining an important advantage -- it provides the user with a valuable sense of control over the feedback process.

[1]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[2]  Ellen M. Voorhees,et al.  The fifth text REtrieval conference (TREC-5) , 1997 .

[3]  Gerald Salton,et al.  Automatic text processing , 1988 .

[4]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[5]  James Allan,et al.  Improving Interactive Retrieval by Combining Ranked List and Clustering , 2000, RIAO.

[6]  David R. Karger,et al.  Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.

[7]  David Dubin Document analysis for visualization , 1995, SIGIR '95.

[8]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[9]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[10]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[11]  Ellen M. Vdorhees,et al.  The cluster hypothesis revisited , 1985, SIGIR '85.

[12]  James Allan,et al.  Interactive information organization: techniques and evaluation , 2001 .

[13]  James Allan Building Hypertext Using Information Retrieval , 1997, Inf. Process. Manag..

[14]  W. Bruce Croft,et al.  I3R: A new approach to the design of document retrieval systems , 1987, J. Am. Soc. Inf. Sci..

[15]  W. Bruce Croft,et al.  I 3 R: a new approach to the design of document retrieval systems , 1987 .

[16]  Anton Leuski,et al.  Relevance and reinforcement in interactive browsing , 2000, CIKM '00.

[17]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[18]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[19]  Ellen M. Voorhees,et al.  The Sixth Text REtrieval Conference (TREC-6) , 2000, Inf. Process. Manag..

[20]  James Allan,et al.  Evaluating a Visual Navigation System for a Digital Library , 1998, ECDL.

[21]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[22]  Nicholas J. Belkin,et al.  A case for interaction: a study of interactive information retrieval behavior and effectiveness , 1996, CHI.

[23]  Paul Over,et al.  Comparing interactive information retrieval systems across sites: the TREC-6 interactive track matrix experiment , 1998, SIGIR '98.

[24]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[25]  Ellen M. Voorhees,et al.  Information Technology: The Fifth Text REtrieval Conference [TREC-5] | NIST , 1997 .

[26]  James Allan,et al.  Incremental relevance feedback for information filtering , 1996, SIGIR '96.

[27]  Ellen M. Vdorhees The cluster hypothesis revisited , 1985, SIGIR 1985.

[28]  James Allan,et al.  INQUERY Does Battle With TREC-6 , 1997, TREC.

[29]  Ellen M. Voorhees,et al.  Information Technology: The Sixth Text Retrieval Conference (TREC-6) | NIST , 1998 .

[30]  James Allan,et al.  INQUERY at TREC-5 , 1996, TREC.

[31]  Abraham Bookstein,et al.  Information retrieval: A sequential learning process , 1983, J. Am. Soc. Inf. Sci..