What Users Ask a Search Engine: Analyzing One Billion Russian Question Queries

We analyze the question queries submitted to a large commercial web search engine to get insights about what people ask, and to better tailor the search results to the users' needs. Based on a dataset of about one billion question queries submitted during the year 2012, we investigate askers' querying behavior with the support of automatic query categorization. While the importance of question queries is likely to increase, at present they only make up 3-4% of the total search traffic. Since questions are such a small part of the query stream, and are more likely to be unique than shorter queries, click-through information is typically rather sparse. Thus, query categorization methods based on the categories of clicked web documents do not work well for questions. As an alternative, we propose a robust question query classification method that uses the labeled questions from a large community question answering platform (CQA) as a training set. The resulting classifier is then transferred to the web search questions. Even though questions on CQA platforms tend to be different to web search questions, our categorization method proves competitive with strong baselines with respect to classification accuracy. To show the scalability of our proposed method we apply the classifiers to about one billion question queries and discuss the trade-offs between performance and accuracy that different classification models offer.

[1]  Meredith Ringel Morris,et al.  What do people ask their social networks, and why?: a survey study of status message q&a behavior , 2010, CHI.

[2]  Li Cai,et al.  Large-scale question classification in cQA by leveraging Wikipedia semantic knowledge , 2011, CIKM '11.

[3]  Dan Morris,et al.  Investigating the querying and browsing behavior of advanced search engine users , 2007, SIGIR.

[4]  Jiafeng Guo,et al.  BTM: Topic Modeling over Short Texts , 2014, IEEE Transactions on Knowledge and Data Engineering.

[5]  Christian S. Jensen,et al.  A generalized framework of exploring category information for question retrieval in community question answer archives , 2010, WWW '10.

[6]  Francoise Beaufays,et al.  “Your Word is my Command”: Google Search by Voice: A Case Study , 2010 .

[7]  Xiao Li,et al.  Learning query intent from regularized click graphs , 2008, SIGIR '08.

[8]  Qiang Yang,et al.  Building bridges for web query classification , 2006, SIGIR.

[9]  Ophir Frieder,et al.  Temporal analysis of a very large topically categorized Web query log , 2007, J. Assoc. Inf. Sci. Technol..

[10]  W. Bruce Croft,et al.  Retrieval models for question and answer archives , 2008, SIGIR '08.

[11]  Amanda Spink,et al.  From E-Sex to E-Commerce: Web Search Changes , 2002, Computer.

[12]  Kakali Chaki,et al.  Mining Historic Query Trails to Label Long and Rare Search Engine Queries — , 2017 .

[13]  W. Bruce Croft,et al.  Analysis of long queries in a large scale search log , 2009, WSCD '09.

[14]  Ryen W. White,et al.  Mining Historic Query Trails to Label Long and Rare Search Engine Queries , 2010, TWEB.

[15]  Matthew Richardson,et al.  Learning about the world through long-term query logs , 2008, TWEB.

[16]  Jerome R. Bellegarda,et al.  Spoken Language Understanding for Natural Interaction: The Siri Experience , 2012, Natural Interaction with Robots, Knowbots and Smartphones, Putting Spoken Dialog Systems into Practice.

[17]  Ophir Frieder,et al.  Automatic classification of Web queries using very large unlabeled query logs , 2007, TOIS.

[18]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[19]  Michael R. Lyu,et al.  Question routing in community question answering: putting category in its place , 2011, CIKM '11.

[20]  Eugene Agichtein,et al.  When web search fails, searchers become askers: understanding the transition , 2012, SIGIR '12.

[21]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[22]  Zhe Zhao,et al.  Questions about questions: an empirical analysis of information needs on Twitter , 2013, WWW.

[23]  Jian Liu,et al.  How do users grow up along with search engines?: a study of long-term users' behavior , 2013, CIKM.

[24]  Evgeniy Gabrilovich,et al.  Predicting web searcher satisfaction with existing community-based answers , 2011, SIGIR.

[25]  Bo Qu,et al.  An evaluation of classification models for question topic categorization , 2012, J. Assoc. Inf. Sci. Technol..

[26]  Weidong Yang,et al.  Community question topic categorization via hierarchical kernelized classification , 2013, CIKM.

[27]  Ravi Kumar,et al.  Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes , 2011, ACL.

[28]  Yong Yu,et al.  Searching Questions by Identifying Question Topic and Question Focus , 2008, ACL.

[29]  Andrei Z. Broder,et al.  Robust classification of rare queries using web knowledge , 2007, SIGIR.

[30]  Aristides Gionis,et al.  Answers, not links: extracting tips from yahoo! answers to address how-to web queries , 2012, WSDM '12.

[31]  Amanda Spink,et al.  Characteristics of question format web queries: an exploratory study , 2002, Inf. Process. Manag..

[32]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[33]  Anne Aula,et al.  How does search behavior change as search becomes more difficult? , 2010, CHI.

[34]  Judit Bar-Ilan,et al.  Topic-specific analysis of search queries , 2009, WSCD '09.