Comparative Web Search Questions

\beginabstract We analyze comparative questions, i.e., questions asking to compare different items, that were submitted to Yandex in 2012. Responses to such questions might be quite different from the simple "ten blue links'' and could, for example, aggregate pros and cons of the different options as direct answers. However, changing the result presentation is an intricate decision such that the classification of comparative questions forms a highly precision-oriented task. From a year-long Yandex log, we annotate a random sample of 50,000~questions; 2.8%~of which are comparative. For these annotated questions, we develop a precision-oriented classifier by combining carefully hand-crafted lexico-syntactic rules with feature-based and neural approaches---achieving a recall of~0.6 at a perfect precision of~1.0. After running the classifier on the full year log (on average, there is at least one comparative question per second), we analyze 6,250~comparative questions using more fine-grained subclasses (e.g., should the answer be a "simple'' fact or rather a more verbose argument) for which individual classifiers are trained. An important insight is that more than 65%~of the comparative questions demand argumentation and opinions, i.e., reliable direct answers to comparative questions require more than the facts from a search engine's knowledge graph. In addition, we present a qualitative analysis of the underlying comparative information needs (separated into 14~categories likeconsumer electronics orhealth ), their seasonal dynamics, and possible answers from community question answering platforms. \endabstract

[1]  Ryen W. White,et al.  Questions vs. Queries in Informational Search Tasks , 2015, WWW.

[2]  Aristides Gionis,et al.  Answers, not links: extracting tips from yahoo! answers to address how-to web queries , 2012, WSDM '12.

[3]  Matthias Hagen,et al.  What Users Ask a Search Engine: Analyzing One Billion Russian Question Queries , 2015, CIKM.

[4]  Zheng Chen,et al.  Effective multi-label active learning for text classification , 2009, KDD.

[5]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[6]  Rajarshi Das,et al.  Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering , 2019, EMNLP.

[7]  Thomas W. Lauer,et al.  An analysis of comparison questions in the context of auditing , 1990 .

[8]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[9]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[10]  Zheng Chen,et al.  CWS: a comparative web search system , 2006, WWW '06.

[11]  Bing Liu,et al.  Identifying comparative sentences in text documents , 2006, SIGIR.

[12]  W. Bruce Croft,et al.  Document Summarization for Answering Non-Factoid Queries , 2018, IEEE Transactions on Knowledge and Data Engineering.

[13]  Steffen Staab,et al.  Comparatives in Context , 1997, AAAI/IAAI.

[14]  Joan Bresnan,et al.  Syntax of the Comparative Clause Construction in English , 1973 .

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Junta Mizuno,et al.  Non-factoid Question Answering Experiments at NTCIR-6: Towards Answer Type Detection for Realworld Questions , 2007, NTCIR.

[19]  Ido Guy,et al.  Searching by Talking: Analysis of Voice Queries on Mobile Web Search , 2016, SIGIR.

[20]  Ran El-Yaniv,et al.  Multi-Hop Paragraph Retrieval for Open-Domain Question Answering , 2019, ACL.

[21]  W. Bruce Croft,et al.  Answer Interaction in Non-factoid Question Answering Systems , 2019, CHIIR.

[22]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[23]  Lydia B. Chilton,et al.  Addressing people's information needs directly in a web search result page , 2011, WWW.

[24]  Patrick Pantel,et al.  Identifying comparable entities on the web , 2009, CIKM.

[25]  Sanda M. Harabagiu,et al.  Performance issues and error analysis in an open-domain question answering system , 2003, TOIS.

[26]  W. Bruce Croft,et al.  Analysis of long queries in a large scale search log , 2009, WSCD '09.

[27]  Yaohui Jin,et al.  MCapsNet: Capsule Network for Text with Multi-Task Learning , 2018, EMNLP.

[28]  Matthias Hagen,et al.  Answering Comparative Questions: Better than Ten-Blue-Links? , 2019, CHIIR.

[29]  Marie-Francine Moens,et al.  A survey on question answering technology from an information retrieval perspective , 2011, Inf. Sci..

[30]  Ilya Segalovich,et al.  A Fast Morphological Algorithm with Unknown Word Guessing Induced by a Dictionary for a Web Search Engine , 2003, MLMTA.

[31]  Ganesh Ramakrishnan,et al.  An Interactive Multi-Label Consensus Labeling Model for Multiple Labeler Judgments , 2018, AAAI.

[32]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[33]  Thomas Berlage,et al.  FOCUS: the interactive table for product comparison and selection , 1996, UIST '96.

[34]  A. Stechow COMPARING SEMANTIC THEORIES OF COMPARISON , 1984 .

[35]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[36]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[37]  Bing Liu,et al.  Mining Comparative Sentences and Relations , 2006, AAAI.

[38]  Swapna Somasundaran,et al.  QA with Attitude: Exploiting Opinion Type Analysis for Improving Question Answering in On-line Discussions and the News , 2007, ICWSM.

[39]  Yu Cao,et al.  BAG: Bi-directional Attention Entity Graph Convolutional Network for Multi-hop Reasoning Question Answering , 2019, NAACL.

[40]  Ravi Kumar,et al.  Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes , 2011, ACL.

[41]  Eric Brill,et al.  Analysis of factoid questions for effective relation extraction , 2005, SIGIR '05.