Ambiguous queries: test collections need more sense

Although there are many papers examining ambiguity in Information Retrieval, this paper shows that there is a whole class of ambiguous word that past research has barely explored. It is shown that the class is more ambiguous than other word types and is commonly used in queries. The lack of test collections containing ambiguous queries is highlighted and a method for creating collections from existing resources is described. Tests using the new collection show the impact of query ambiguity on an IR system: it is shown that conventional systems are incapable of dealing effectively with such queries and that current assumptions about how to improve search effectiveness do not hold when searching on this common query type.

[1]  Susan T. Dumais,et al.  Bringing order to the Web: automatically categorizing search results , 2000, CHI.

[2]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[3]  Mark Sanderson,et al.  The impact on retrieval effectiveness of skewed frequency distributions , 1999, TOIS.

[4]  Amanda Spink,et al.  How are we searching the World Wide Web? A comparison of nine search engine transaction logs , 2006, Inf. Process. Manag..

[5]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[6]  Jochen L. Leidner Toponym resolution in text: annotation, evaluation and applications of spatial grounding , 2007, SIGF.

[7]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[8]  Hinrich Schfitze Context Space , 2001 .

[9]  David Yarowsky,et al.  Estimating Upper and Lower Bounds on the Performance of Word-Sense Disambiguation Programs , 1992, ACL.

[10]  Stephen F. Weiss Learning to disambiguate , 1973, Inf. Storage Retr..

[11]  Douglas W. Oard,et al.  Probabilistic structured query methods , 2003, SIGIR.

[12]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[13]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[14]  Julio Gonzalo,et al.  The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[15]  Mark Sanderson,et al.  Word sense disambiguation and information retrieval , 1994, SIGIR '94.

[16]  Stephen E. Robertson,et al.  Ambiguous requests: implications for retrieval tests, systems and theories , 2007, SIGF.

[17]  Julio Gonzalo,et al.  Lexical ambiguity and Information Retrieval revisited , 1999, EMNLP.

[18]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[19]  John Tait,et al.  Word sense disambiguation in information retrieval revisited , 2003, SIGIR.

[20]  W. Bruce Croft,et al.  Lexical ambiguity and information retrieval , 1992, TOIS.

[21]  Ben Carterette,et al.  Million Query Track 2007 Overview , 2008, TREC.

[22]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[23]  Jochen L. Leidner Toponym resolution in text , 2007 .

[24]  Hua Li,et al.  Improving web search results using affinity graph , 2005, SIGIR '05.

[25]  Cyril W. Cleverdon,et al.  The significance of the Cranfield tests on index languages , 1991, SIGIR '91.