Exploiting clustering and phrases for context-based information retrieval

This paper explores exploiting the synergy between document clustering and phrasal analysis for the purpose of automatically constructing a corrrex~-busedretrieval system. A contex~ consists of two components a cluster of logically related articles (its exrension) and a small set of salient concepts, represented by words and phrases and organized by the cluster’s key terms (its irr~ertsion). At inn-time, the system presents contexts that best match the result list of a user’s natural language query. The user can then choose a context and manipulate the intensionsd component to both browse the context’s extension and launch new searches over the entire database. We argue that the focused relevance feedback provided by contexts, at a level of abstraction higher than individual documents and lower than the database as a whole, provides a natural way for users to refine vague information needs and helps to blur the distinction between searching and browsing. The I%zraphrase interface, running over a database of business-related news articles, is used to illustrate the advantages of such a context-based retrieval paradigm.

[1]  David A. Evans,et al.  Specifying adverse drug reactions by formulating contexts through clarit processing of medical abstracts , 1994 .

[2]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[3]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[4]  W. Bruce Croft,et al.  Providing Government Information on the Internet: Experiences with THOMAS , 1995, DL.

[5]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[6]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[7]  Peter Ingwersen,et al.  Polyrepresentation of information needs and semantic entities: elements of a cognitive theory for information retrieval interaction , 1994, SIGIR '94.

[8]  IJsbrand Jan Aalbersberg,et al.  Incremental relevance feedback , 1992, SIGIR '92.

[9]  Thad Starner,et al.  Remembrance Agent: A Continuously Running Automated Information Retrieval System , 1996, PAAM.

[10]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[11]  Gary Marchionini,et al.  Interfaces for end‐user information seeking , 1992 .

[12]  Nicholas J. Belkin,et al.  Determining the functionality features of an intelligent interface to an information retrieval system , 1989, SIGIR '90.

[13]  Efthimis N. Efthimiadis,et al.  A user-centred evaluation of ranking algorithms for interactive query expansion , 1993, SIGIR.

[14]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[15]  Karen Spärck Jones,et al.  Automatic Search Term variant Generation , 1984, J. Documentation.

[16]  G Salton,et al.  Developments in Automatic Text Retrieval , 1991, Science.

[17]  Chanathip Namprempre,et al.  HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering , 1996, HYPERTEXT '96.

[18]  Alan J. Wecker,et al.  The Librarian's Assistant: Automatically Organizing On-line Books into Dynamic Bookshelves , 1994, RIAO.

[19]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[20]  W. Bruce Croft,et al.  Term clustering of syntactic phrases , 1989, SIGIR '90.

[21]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[22]  J. Edward Jackson,et al.  A User's Guide to Principal Components. , 1991 .

[23]  W. Bruce Croft,et al.  An Association Thesaurus for Information Retrieval , 1994, RIAO.

[24]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[25]  Kathleen McKeown,et al.  Automatically Extracting and Representing Collocations for Language Generation , 1990, ACL.

[26]  Charles T. Meadow,et al.  The design and evaluation of a front-end user interface for energy researchers , 1989, JASIS.

[27]  Peter Ingwersen,et al.  Search Procedures in the Library - Analysed from the Cognitive Point of View , 1982, J. Documentation.

[28]  Peter G. Anick Adapting a full-text information retrieval system to the computer troubleshooting domain , 1994, SIGIR '94.

[29]  Donna K. Harman,et al.  Overview of the first TREC conference , 1993, SIGIR.

[30]  Charles T. Meadow,et al.  The design and evaluation of a front-end user interface for energy researchers , 1989, JASIS.

[31]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[32]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[33]  David A. Evans,et al.  A Summary of the CLARIT project , 1991 .

[34]  Yoelle Maarek,et al.  Full text indexing based on lexical relations an application: software libraries , 1989, SIGIR '89.

[35]  Yoelle S. Maarek,et al.  Full-Text Indexing Based on Lexical Relations , 1989 .

[36]  Christian Jacquemin,et al.  Retrieving terms and their variants in a lexicalized unification-based framework , 1994, SIGIR '94.

[37]  W. Bruce Croft,et al.  Language‐oriented information retrieval , 1989, Int. J. Intell. Syst..

[38]  Helen M. Brooks,et al.  Plexus-the expert system for referral , 1987, Inf. Process. Manag..