An Intent Taxonomy for Questions Asked in Web Search

We present a new, multi-faceted taxonomy to classify questions asked in web search engines based on the question intent, types of entities mentioned, types of question words, and granularity of the expected answer. Built based on the inspection of 1,000 real-life questions issued to a web search engine, the taxonomy reflects the recent search behavior of users and enables deep understanding of user intents, goals, and expected answers. This taxonomy is more fine-grained than previous query taxonomies, and is designed with the ultimate goal of reducing the inherent ambiguity in determining the intent of questions. In addition, we describe the formal procedure for conducting an editorial study of the taxonomy including its evaluation. The adopted procedure aims to increase assessor agreement without incurring too much overhead. Our results demonstrate that, despite being more fine-grained, the proposed intent categories result in higher agreement between assessors compared to an existing, commonly used taxonomy.

[1]  Kristy Elizabeth Boyer,et al.  An Empirically-Derived Question Taxonomy for Task-Oriented Tutorial Dialogue , 2009 .

[2]  Yu Hao,et al.  Function-Based Question Classification for General QA , 2010, EMNLP.

[3]  Dan Roth,et al.  Learning question classifiers: the role of semantic information , 2005, Natural Language Engineering.

[4]  Manoj Kumar Chinnakotla,et al.  "Answer ka type kya he?": Learning to Classify Questions in Code-Mixed Language , 2015, WWW.

[5]  Mohit Sharma,et al.  A Taxonomy of Queries for E-commerce Search , 2018, SIGIR.

[6]  Qinghua Zheng,et al.  Mining query subtopics from search log data , 2012, SIGIR '12.

[7]  Noriko Kando,et al.  What Makes a Good Answer in Community Question Answering? An Analysis of Assessors' Criteria , 2011, EVIA@NTCIR.

[8]  Alejandro Figueroa,et al.  Leveraging linguistic traits and semi-supervised learning to single out informational content across how-to community question-answering archives , 2017, Inf. Sci..

[9]  Jiayu Tang,et al.  Examining the Limits of Crowdsourcing for Relevance Assessment , 2013, IEEE Internet Computing.

[10]  Yiqun Liu,et al.  On Annotation Methodologies for Image Search Evaluation , 2019, ACM Trans. Inf. Syst..

[11]  Eero Sormunen,et al.  Liberal relevance criteria of TREC -: counting on negligible documents? , 2002, SIGIR '02.

[12]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[13]  P. Gorman,et al.  A taxonomy of generic clinical questions: classification study , 2000, BMJ : British Medical Journal.

[14]  Bernard J. Jansen,et al.  A Taxonomy for Classifying Questions Asked in Social Question and Answering , 2015, CHI Extended Abstracts.

[15]  Hsin-Hsi Chen,et al.  Intent mining in search query logs for automatic search script generation , 2014, Knowledge and Information Systems.

[16]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[17]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[18]  Bernard J. Jansen,et al.  Classifying web queries by topic and user intent , 2010, CHI Extended Abstracts.

[19]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[20]  Mark Levene,et al.  Understanding user intent in community question answering , 2012, WWW.

[21]  Pushpak Bhattacharyya,et al.  Can Taxonomy Help? Improving Semantic Question Matching using Question Taxonomy , 2018, COLING.

[22]  Nicholas J. Belkin,et al.  Identifying and improving retrieval for procedural questions , 2007, Inf. Process. Manag..

[23]  Ravi Kumar,et al.  Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes , 2011, ACL.

[24]  Iryna Gurevych,et al.  DKPro Agreement: An Open-Source Java Library for Measuring Inter-Rater Agreement , 2014, COLING.

[25]  Yiqun Liu,et al.  Does Diversity Affect User Satisfaction in Image Search , 2019, ACM Trans. Inf. Syst..

[26]  Wei-Ying Ma,et al.  Query Expansion by Mining User Logs , 2003, IEEE Trans. Knowl. Data Eng..

[27]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[28]  Wessel Kraaij,et al.  Assessing e-mail intent and tasks in e-mail messages , 2016, Inf. Sci..

[29]  Ryen W. White,et al.  Questions vs. Queries in Informational Search Tasks , 2015, WWW.

[30]  Yong Yu,et al.  Understanding and Summarizing Answers in Community-Based Question Answering Services , 2008, COLING.

[31]  Vanda Luengo,et al.  Profiling students from their questions in a blended learning environment , 2018, LAK.

[32]  Rafael Muñoz,et al.  Splitting Complex Temporal Questions for Question Answering Systems , 2004, ACL.

[33]  Djoerd Hiemstra,et al.  Predicting relevance based on assessor disagreement: analysis and practical applications for search evaluation , 2015, Information Retrieval Journal.

[34]  Jeffrey Pomerantz,et al.  A linguistic analysis of question taxonomies , 2005, J. Assoc. Inf. Sci. Technol..

[35]  Baogang Wei,et al.  Query Subtopic Mining via Subtractive Initialization of Non-negative Sparse Latent Semantic Analysis , 2016, J. Inf. Sci. Eng..

[36]  Alok Ranjan Pal,et al.  A knowledge based approach for long answer evaluation , 2017, 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT).

[37]  Manvi Breja,et al.  Why-type Question Classification in Question Answering System , 2017, FIRE.

[38]  Yiqun Liu,et al.  "Satisfaction with Failure" or "Unsatisfied Success": Investigating the Relationship between Search Success and User Satisfaction , 2018, WWW.

[39]  E. Voorhees Overview of the TREC 2003 Question Answering Track , 2004, TREC.

[40]  Wai Lam,et al.  Product Question Intent Detection using Indicative Clause Attention and Adversarial Learning , 2018, ICTIR.

[41]  Klaus Krippendorff,et al.  Answering the Call for a Standard Reliability Measure for Coding Data , 2007 .

[42]  Wessel Kraaij,et al.  Reliability and Validity of Query Intent Assessments , 2013, DIR.