Deriving query intents from web search engine queries

The purpose of this article is to test the reliability of query intents derived from queries, either by the user who entered the query or by another juror. We report the findings of three studies. First, we conducted a large-scale classification study (~50,000 queries) using a crowdsourcing approach. Next, we used clickthrough data from a search engine log and validated the judgments given by the jurors from the crowdsourcing study. Finally, we conducted an online survey on a commercial search engine's portal. Because we used the same queries for all three studies, we also were able to compare the results and the effectiveness of the different approaches. We found that neither the crowdsourcing approach, using jurors who classified queries originating from other users, nor the questionnaire approach, using searchers who were asked about their own query that they just entered into a Web search engine, led to satisfying results. This leads us to conclude that there was little understanding of the classification tasks, even though both groups of jurors were given detailed instructions. Although we used manual classification, our research also has important implications for automatic classification. We must question the success of approaches using automatic classification and comparing its performance to a baseline from human jurors. © 2012 Wiley Periodicals, Inc.

[1]  Joachim Griesbaum,et al.  Evaluation of three German search engines: Altavista.de, Google.de and Lycos.de , 2004, Inf. Res..

[2]  Ricardo A. Baeza-Yates,et al.  The Intention Behind Web Queries , 2006, SPIRE.

[3]  Amanda Spink,et al.  Defining a session on Web search engines , 2007, J. Assoc. Inf. Sci. Technol..

[4]  Xin Li,et al.  Coupling feature selection and machine learning methods for navigational query identification , 2006, CIKM '06.

[5]  Dirk Lewandowski,et al.  The retrieval effectiveness of search engines on navigational queries , 2011, Aslib Proc..

[6]  Ken Ward Church,et al.  Using Word-Sense Disambiguation Methods to Classify Web Queries by Intent , 2009, EMNLP.

[7]  Charles L. A. Clarke,et al.  Term-based commercial intent analysis , 2009, SIGIR.

[8]  Zhenyu Liu,et al.  Automatic identification of user goals in Web search , 2005, WWW '05.

[9]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[10]  Xiaojie Yuan,et al.  Are click-through data adequate for learning web search rankings? , 2008, CIKM '08.

[11]  Scott B. Huffman,et al.  How well does result relevance predict session satisfaction? , 2007, SIGIR.

[12]  Judit Bar-Ilan,et al.  User rankings of search engine results , 2007, J. Assoc. Inf. Sci. Technol..

[13]  Amanda Spink,et al.  Classifying the user intent of web queries using k-means clustering , 2010, Internet Res..

[14]  A. Spink,et al.  Web Search: Public Searching of the Web (Information Science and Knowledge Management) , 2005 .

[15]  Dirk Lewandowski,et al.  Search-logger analyzing exploratory search tasks , 2011, SAC.

[16]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[17]  Ryen W. White,et al.  Predicting user interests from contextual information , 2009, SIGIR.

[18]  Ben Carterette,et al.  Session Track at TREC 2010 , 2010 .

[19]  Ricardo Baeza-Yates,et al.  Towards a Deeper Understanding of the User’s Query Intent , 2010 .

[20]  Ricardo Baeza-Yates,et al.  A Web Search Analysis Considering the Intention behind Queries , 2008, 2008 Latin American Web Conference.

[21]  Dirk Lewandowski,et al.  What Users See - Structures in Search Engine Results Pages , 2009, Inf. Sci..

[22]  Geoffrey Z. Liu Automated information retrieval: Theory and methods , 1998 .

[23]  In-Ho Kang,et al.  Transactional Query Identification in Web Search , 2005, AIRS.

[24]  Ying Li,et al.  Detecting online commercial intention (OCI) , 2006, WWW '06.

[25]  Amanda Spink,et al.  Web Search: Public Searching of the Web , 2011, Information Science and Knowledge Management.

[26]  Juan Zamora,et al.  Identifying the Intent of a User Query Using Support Vector Machines , 2009, SPIRE.

[27]  In-Ho Kang,et al.  Query type classification for web document retrieval , 2003, SIGIR.

[28]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[29]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[30]  Amanda Spink,et al.  Searching the Web: the public and their queries , 2001 .

[31]  Dirk Lewandowski,et al.  The Retrieval Effectiveness of Web Search Engines: Considering Results Descriptions , 2008, J. Documentation.

[32]  Craig MacDonald,et al.  Usefulness of quality click-through data for training , 2009, WSCD '09.

[33]  Nadine Höchstötter,et al.  Standard parameters for searching behaviour in search engines and their empirical evaluation , 2009, J. Inf. Sci..

[34]  Dirk Lewandowski,et al.  Query types and search topics of German Web search engine users , 2007, Inf. Serv. Use.

[35]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[36]  Gary Marchionini,et al.  Exploratory search , 2006, Commun. ACM.

[37]  Paul B. Kantor,et al.  Availability analysis , 1976, J. Am. Soc. Inf. Sci..

[38]  Amanda Spink,et al.  Determining the informational, navigational, and transactional intent of Web queries , 2008, Inf. Process. Manag..

[39]  Jacob Shapiro,et al.  Automated information retrieval - theory and methods , 1997, Library and information science series.

[40]  Daqing He,et al.  Combining evidence for automatic Web session identification , 2002, Inf. Process. Manag..

[41]  Chao Liu,et al.  Efficient multiple-click models in web search , 2009, WSDM '09.

[42]  Amanda Spink,et al.  Defining a session on Web search engines: Research Articles , 2007 .

[43]  Barry Smyth,et al.  Understanding the intent behind mobile information needs , 2009, IUI.