Using Word-Sense Disambiguation Methods to Classify Web Queries by Intent

Three methods are proposed to classify queries by intent (CQI), e.g., navigational, informational, commercial, etc. Following mixed-initiative dialog systems, search engines should distinguish navigational queries where the user is taking the initiative from other queries where there are more opportunities for system initiatives (e.g., suggestions, ads). The query intent problem has a number of useful applications for search engines, affecting how many (if any) advertisements to display, which results to return, and how to arrange the results page. Click logs are used as a substitute for annotation. Clicks on ads are evidence for commercial intent; other types of clicks are evidence for other intents. We start with a simple Naive Bayes baseline that works well when there is plenty of training data. When training data is less plentiful, we back off to nearby URLs in a click graph, using a method similar to Word-Sense Disambiguation. Thus, we can infer that designer trench is commercial because it is close to www.saksfifthavenue.com, which is known to be commercial. The baseline method was designed for precision and the backoff method was designed for recall. Both methods are fast and do not require crawling webpages. We recommend a third method, a hybrid of the two, that does no harm when there is plenty of training data, and generalizes better when there isn't, as a strong baseline for the CQI task.

[1]  Deepayan Chakrabarti,et al.  Contextual advertising by combining relevance with click feedback , 2008, WWW.

[2]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[3]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[4]  Andrei Z. Broder,et al.  Robust classification of rare queries using web knowledge , 2007, SIGIR.

[5]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[6]  ChengXiang Zhai,et al.  Mining long-term search history to improve search accuracy , 2006, KDD '06.

[7]  Ioannis Antonellis,et al.  Simrank++: query rewriting through link analysis of the clickgraph (poster) , 2008, WWW.

[8]  Shankar Kumar,et al.  Video suggestion and discovery for youtube: taking random walks through the view graph , 2008, WWW.

[9]  Nick Craswell,et al.  Proceedings of the 2009 workshop on Web Search Click Data, WSCD@WSDM 2009, Barcelona, Spain, February 9, 2009 , 2009, WSCD@WSDM.

[10]  Doug Downey,et al.  Understanding the relationship between searchers' queries and information goals , 2008, CIKM '08.

[11]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[12]  Zhenyu Liu,et al.  Automatic identification of user goals in Web search , 2005, WWW '05.

[13]  Steve Young,et al.  Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning , 2002 .

[14]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[15]  David Yarowsky,et al.  DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.

[16]  Ying Li,et al.  Detecting online commercial intention (OCI) , 2006, WWW '06.

[17]  Susan T. Dumais,et al.  To personalize or not to personalize: modeling queries with variation in user intent , 2008, SIGIR '08.

[18]  Ophir Frieder,et al.  Hourly analysis of a very large topically categorized web query log , 2004, SIGIR '04.

[19]  Luis A. Hernández Gómez,et al.  Robust and Flexible Mixed-Initiative Dialogue for Telephone Services , 1999, EACL.

[20]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[21]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[22]  Wei-Ying Ma,et al.  Optimizing web search using web click-through data , 2004, CIKM '04.

[23]  Xiao Li,et al.  Learning query intent from regularized click graphs , 2008, SIGIR '08.

[24]  Benjamin Piwowarski,et al.  Predictive user click models based on click-through history , 2007, CIKM '07.

[25]  Ophir Frieder,et al.  Improving automatic query classification via semi-supervised learning , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[26]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[27]  Vassilis Plachouras,et al.  Online learning from click data for sponsored search , 2008, WWW.

[28]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[29]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.