Mining anchor text for query refinement

When searching large hypertext document collections, it is often possible that there are too many results available for ambiguous queries. Query refinement is an interactive process of query modification that can be used to narrow down the scope of search results. We propose a new method for automatically generating refinements or related terms to queries by mining anchor text for a large hypertext document collection. We show that the usage of anchor text as a basis for query refinement produces high quality refinement suggestions that are significantly better in terms of perceived usefulness compared to refinements that are derived using the document content. Furthermore, our study suggests that anchor text refinements can also be used to augment traditional query refinement algorithms based on query logs, since they typically differ in coverage and produce different refinements. Our results are based on experiments on an anchor text collection of a large corporate intranet.

[1]  Koichi Takeda,et al.  Information retrieval on the web , 2000, CSUR.

[2]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[3]  Ronald Fagin,et al.  Efficient similarity search and classification via rank aggregation , 2003, SIGMOD '03.

[4]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[5]  James Allan,et al.  The effect of adding relevance information in a relevance feedback environment , 1994, SIGIR '94.

[6]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[7]  Eitan Farchi,et al.  Automatic query wefinement using lexical affinities with maximal information gain , 2002, SIGIR '02.

[8]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[9]  Eric W. Brown,et al.  The GURU System in TREC-6 , 1997, TREC.

[10]  Hsi-Jian Lee,et al.  Translation of web queries using anchor text mining , 2002, TALIP.

[11]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[12]  Arnold L. Rosenberg,et al.  Finding topic words for hierarchical summarization , 2001, SIGIR '01.

[13]  Jörg Meyer,et al.  Web Query Characteristics and their Implications on Search Engines , 2001, WWW Posters.

[14]  Larry Fitzpatrick,et al.  Automatic feedback using past queries: social searching? , 1997, SIGIR '97.

[15]  Kevin S. McCurley,et al.  Analysis of anchor text for web search , 2003, SIGIR.

[16]  Stephen E. Robertson,et al.  Effective site finding using link anchor information , 2001, SIGIR '01.

[17]  Venkata Subramaniam,et al.  Information Retrieval: Data Structures & Algorithms , 1992 .

[18]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[19]  James W. Cooper,et al.  OBIWAN-a visual interface for prompted query refinement , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[20]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[21]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[22]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[23]  Peter G. Anick Using terminological feedback for web search refinement: a log-based study , 2003, SIGIR.

[24]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[25]  Peter G. Anick,et al.  The paraphrase search assistant: terminological feedback for iterative information seeking , 1999, SIGIR '99.

[26]  Oliver A. McBryan,et al.  GENVL and WWWW: Tools for taming the Web , 1994, WWW Spring 1994.

[27]  Nicholas J. Belkin,et al.  Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval , 1997, SIGIR 1997.

[28]  Ron Weiss,et al.  Fast and effective query refinement , 1997, SIGIR '97.