Automatic query wefinement using lexical affinities with maximal information gain

This work describes an automatic query refinement technique, which focuses on improving precision of the top ranked documents. The terms used for refinement are lexical affinities (LAs), pairs of closely related words which contain exactly one of the original query terms. Adding these terms to the query is equivalent to re-ranking search results, thus, precision is improved while recall is preserved. We describe a novel method that selects the most "informative" LAs for refinement, namely, those LAs that best separate relevant documents from irrelevant documents in the set of results. The information gain of candidate LAs is determined using unsupervised estimation that is based on the scoring function of the search engine. This method is thus fully automatic and its quality depends on the quality of the scoring function. Experiments we conducted with TREC data clearly show a significant improvement in the precision of the top ranked documents.

[1]  Yoelle Maarek,et al.  Full text indexing based on lexical relations an application: software libraries , 1989, SIGIR '89.

[2]  H. Kang,et al.  Two-Level Document Ranking Using Mutual Information in Natural Language Information Retrieval , 1997, Inf. Process. Manag..

[3]  David Hawking,et al.  Overview of the TREC-9 Web Track , 2000, TREC.

[4]  Ellen M. Voorhees,et al.  Overview of the Seventh Text REtrieval Conference , 1998 .

[5]  Jaana Kekäläinen,et al.  The impact of query structure and query expansion on retrieval performance , 1998, SIGIR '98.

[6]  Yi Zhang,et al.  Maximum likelihood estimation for filtering thresholds , 2001, SIGIR '01.

[7]  Carolyn J. Crouch,et al.  Experiments in automatic statistical thesaurus construction , 1992, SIGIR '92.

[8]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC-9: Filtering Track , 2000, TREC.

[9]  Ellen M. Voorhees,et al.  Overview of the seventh text retrieval conference (trec-7) [on-line] , 1999 .

[10]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[11]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[12]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[13]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC 2002: Filtering Track , 2002, TREC.

[14]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[15]  Claudio Carpineto,et al.  An information-theoretic approach to automatic query expansion , 2001, TOIS.

[16]  R. Manmatha,et al.  Modeling score distributions for combining the outputs of search engines , 2001, SIGIR '01.

[17]  W. Bruce Croft,et al.  An Association Thesaurus for Information Retrieval , 1994, RIAO.

[18]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[19]  Marti A. Hearst Improving Full-Text Precision on Short Queries using Simple Constraints , 1996 .

[20]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[21]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[22]  Jianying Wang,et al.  A corpus analysis approach for automatic query expansion and its extension to multiple databases , 1999, TOIS.