Learning Query Ambiguity Models by Using Search Logs

Abstractdentifying ambiguous queries is crucial to research on personalized Web search and search result diversity. Intuitively, query logs contain valuable information on how many intentions users have when issuing a query. However, previous work showed user clicks alone are misleading in judging a query as being ambiguous or not. In this paper, we address the problem of learning a query ambiguity model by using search logs. First, we propose enriching a query by mining the documents clicked by users and the relevant follow up queries in a session. Second, we use a text classifier to map the documents and the queries into predefined categories. Third, we propose extracting features from the processed data. Finally, we apply a state-of-the-art algorithm, Support Vector Machine (SVM), to learn a query ambiguity classifier. Experimental results verify that the sole use of click based features or session based features perform worse than the previous work based on top retrieved documents. When we combine the two sets of features, our proposed approach achieves the best effectiveness, specifically 86% in terms of accuracy. It significantly improves the click based method by 5.6% and the session based method by 4.6%.

[1]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[2]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[3]  Ji-Rong Wen,et al.  WWW 2007 / Track: Search Session: Personalization A Largescale Evaluation and Analysis of Personalized Search Strategies ABSTRACT , 2022 .

[4]  Yong Yu,et al.  Identification of ambiguous queries in web search , 2009, Inf. Process. Manag..

[5]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[6]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[7]  W. Bruce Croft,et al.  Lexical ambiguity and information retrieval , 1992, TOIS.

[8]  John D. Lafferty,et al.  A risk minimization framework for information retrieval , 2006, Inf. Process. Manag..

[9]  Luis Gravano,et al.  Categorizing web queries according to geographical locality , 2003, CIKM '03.

[10]  Ying Li,et al.  Detecting online commercial intention (OCI) , 2006, WWW '06.

[11]  Qiang Yang,et al.  Q2C@UST: our winning solution to query classification in KDDCUP 2005 , 2005, SKDD.

[12]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[13]  Mark Sanderson,et al.  Ambiguous queries: test collections need more sense , 2008, SIGIR '08.

[14]  Enhong Chen,et al.  Context-aware query suggestion by mining click-through and session data , 2008, KDD.

[15]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[16]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[17]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[18]  Zhenyu Liu,et al.  Automatic identification of user goals in Web search , 2005, WWW '05.

[19]  Ying Li,et al.  KDD CUP-2005 report: facing a great challenge , 2005, SKDD.