Using clustering and classification approaches in interactive retrieval

Satisfying non-trivial information needs involves collecting information from multiple resources, and synthesizing an answer that organizes that information. Traditional recall/precision-oriented information retrieval focuses on just one phase of that process: how to efficiently and effectively identify documents likely to be relevant to a specific, focused query. The TREC Interactive Track has as its goal the location of documents that pertain to different instances of a query topic, with no reward for duplicated coverage of topic instances. This task is similar to the task of organizing answer components into a complete answer. Clustering and classification are two mechanisms for organizing documents into groups. In this paper, we present an ongoing series of experiments that test the feasibility and effectiveness of using clustering and classification as an aid to instance retrieval and, ultimately, answer construction. Our results show that users prefer such structured presentations of candidate result set to a list-based approach. Assessment of the structured organizations based on the subjective judgement of the experiment subjects suggests that the structured organization can be more effective; however, assessment based on objective judgements shows mixed results. These results indicate that a full determination of the success of the approach depends on assessing the quality of the final answers generated by users, rather than on performance during the intermediate stages of answer construction.

[1]  Marc M. Sebrechts,et al.  Visualization of search results: a comparative evaluation of text, 2D, and 3D interfaces , 1999, SIGIR '99.

[2]  Charles L. A. Clarke,et al.  E cient Construction of Large Test , 1998 .

[3]  James Allan,et al.  Evaluating a Visual Navigation System for a Digital Library , 1998, ECDL.

[4]  Ruth B. Ekstrom,et al.  Manual for kit of factor-referenced cognitive tests , 1976 .

[5]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[6]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[7]  Gerard Salton,et al.  The SMART Retrieval System , 1971 .

[8]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[9]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[10]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[11]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[12]  Ellen M. Voorhees,et al.  Variations in relevance judgments and the measurement of retrieval effectiveness , 1998, SIGIR '98.

[13]  Ron Sacks-Davis,et al.  Efficient passage ranking for document databases , 1999, TOIS.

[14]  Paul Over,et al.  TREC-8 interactive track , 1999, SIGF.

[15]  Donna K. Harman,et al.  Overview of the Sixth Text REtrieval Conference (TREC-6) , 1997, Inf. Process. Manag..

[16]  Donna Harman,et al.  Information Processing and Management , 2022 .

[17]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[18]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[19]  William J. Doll,et al.  The Measurement of End-User Computing Satisfaction , 1988, MIS Q..

[20]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[21]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[22]  Paul Over,et al.  Comparing interactive information retrieval systems across sites: the TREC-6 interactive track matrix experiment , 1998, SIGIR '98.

[23]  Ross Wilkinson,et al.  Using Document Relationships for Better Answers , 1998, PODDP.

[24]  Daniel E. Rose,et al.  Content awareness in a file system interface: implementing the “pile” metaphor for organizing information , 1993, SIGIR.

[25]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[26]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[27]  Justin Zobel,et al.  How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.

[28]  Ross Wilkinson,et al.  TREC 7 Ad Hoc, Speech, and Interactive tracks at MDS/CSIRO , 1998, TREC.

[29]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[30]  Ross Wilkinson,et al.  The RMIT/CSIRO Ad Hoc, Q&A, Web, Interactive, and Speech Experiments at TREC 8 , 1999, TREC.

[31]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[32]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[33]  James Allan,et al.  Aspect windows, 3-D visualizations, and indirect comparisons of information retrieval systems , 1998, SIGIR '98.