Generating Clarifying Questions for Information Retrieval

Search queries are often short, and the underlying user intent may be ambiguous. This makes it challenging for search engines to predict possible intents, only one of which may pertain to the current user. To address this issue, search engines often diversify the result list and present documents relevant to multiple intents of the query. An alternative approach is to ask the user a question to clarify her information need. Asking clarifying questions is particularly important for scenarios with “limited bandwidth” interfaces, such as speech-only and small-screen devices. In addition, our user studies and large-scale online experiments show that asking clarifying questions is also useful in web search. Although some recent studies have pointed out the importance of asking clarifying questions, generating them for open-domain search tasks remains unstudied and is the focus of this paper. Lack of training data even within major search engines for this task makes it challenging. To mitigate this issue, we first identify a taxonomy of clarification for open-domain search queries by analyzing large-scale query reformulation data sampled from Bing search logs. This taxonomy leads us to a set of question templates and a simple yet effective slot filling algorithm. We further use this model as a source of weak supervision to automatically generate clarifying questions for training. Furthermore, we propose supervised and reinforcement learning models for generating clarifying questions learned from weak supervision data. We also investigate methods for generating candidate answers for each clarifying question, so users can select from a set of pre-defined answers. Human evaluation of the clarifying questions and candidate answers for hundreds of search queries demonstrates the effectiveness of the proposed solutions.

[1]  Alistair Knott,et al.  A framework for utterance disambiguation in dialogue , 2004, ALTA.

[2]  Nick Craswell,et al.  Beyond clicks: query reformulation as a predictor of search satisfaction , 2013, CIKM.

[3]  Filip Radlinski,et al.  A Theoretical Framework for Conversational Search , 2017, CHIIR.

[4]  Ming Zhou,et al.  Question Generation for Question Answering , 2017, EMNLP.

[5]  James Allan,et al.  HARD Track Overview in TREC 2003: High Accuracy Retrieval from Documents , 2003, TREC.

[6]  Francesco Bonchi,et al.  Query reformulation mining: models, patterns, and applications , 2011, Information Retrieval.

[7]  Evaggelia Pitoura,et al.  Search result diversification , 2010, SGMD.

[8]  Eric Horvitz,et al.  Patterns of search: analyzing and modeling Web query refinement , 1999 .

[9]  W. Bruce Croft,et al.  Asking Clarifying Questions in Open-Domain Information-Seeking Conversations , 2019, SIGIR.

[10]  Yi Zhang,et al.  Conversational Recommender System , 2018, SIGIR.

[11]  Julia Hirschberg,et al.  Towards Natural Clarification Questions in Dialogue Systems , 2014 .

[12]  SpinkAmanda,et al.  Patterns of query reformulation during Web searching , 2009 .

[13]  W. Bruce Croft,et al.  On the Theory of Weak Supervision for Information Retrieval , 2018, ICTIR.

[14]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[15]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[16]  Ming Zhou,et al.  Neural Question Generation from Text: A Preliminary Study , 2017, NLPCC.

[17]  Filip Radlinski,et al.  Towards Conversational Recommender Systems , 2016, KDD.

[18]  Eugene Agichtein,et al.  What Do You Mean Exactly?: Analyzing Clarification Questions in CQA , 2017, CHIIR.

[19]  Aristides Gionis,et al.  Improving recommendation for long-tail queries via templates , 2011, WWW.

[20]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[21]  Krisztian Balog,et al.  Identifying Unclear Questions in Community Question Answering Websites , 2019, ECIR.

[22]  Neal Lewis,et al.  Did you mean A or B? Supporting Clarification Dialog for Entity Disambiguation , 2015, SumPre-HSWI@ESWC.

[23]  W. Bruce Croft,et al.  Estimating Embedding Vectors for Queries , 2016, ICTIR.

[24]  Yelong Shen,et al.  Deep Context Modeling for Web Query Entity Disambiguation , 2017, CIKM.

[25]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[26]  Fernando Diaz Pseudo-Query Reformulation , 2016, ECIR.

[27]  Aristides Gionis,et al.  The query-flow graph: model and applications , 2008, CIKM '08.

[28]  Suresh Manandhar,et al.  An Analysis of Clarification Dialogue for Question Answering , 2003, NAACL.

[29]  Jacek Gwizdka,et al.  Analysis and evaluation of query reformulations in different task types , 2010, ASIST.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[32]  W. Bruce Croft,et al.  From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing , 2018, CIKM.

[33]  Hal Daumé,et al.  Answer-based Adversarial Training for Generating Clarification Questions , 2019, NAACL.

[34]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[35]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[36]  Fernando Diaz,et al.  Research Frontiers in Information Retrieval: Report from the Third Strategic Workshop on Information Retrieval in Lorne (SWIRL 2018) , 2018, SIGF.

[37]  Noah A. Smith,et al.  Good Question! Statistical Ranking for Question Generation , 2010, NAACL.

[38]  M. de Rijke,et al.  A Survey of Query Auto Completion in Information Retrieval , 2016, Found. Trends Inf. Retr..

[39]  M. de Rijke,et al.  Diversifying Query Auto-Completion , 2016, ACM Trans. Inf. Syst..

[40]  Amanda Spink,et al.  Patterns of query reformulation during Web searching , 2009, J. Assoc. Inf. Sci. Technol..

[41]  Francesco Bonchi,et al.  Query suggestions using query-flow graphs , 2009, WSCD '09.

[42]  Fernando Diaz,et al.  SIGIR 2018 Workshop on Learning from Limited or Noisy Data for Information Retrieval , 2018, SIGIR.

[43]  Hal Daumé,et al.  Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information , 2018, ACL.

[44]  Irene Pimenta Rodrigues,et al.  Question/Answering Clarification Dialogues , 2008, MICAI.

[45]  Jian-Yun Nie,et al.  Multi-level Abstraction Convolutional Model with Weak Supervision for Information Retrieval , 2018, SIGIR.

[46]  Yiqun Liu,et al.  Training Deep Ranking Model with Weak Relevance Labels , 2017, ADC.

[47]  W. Bruce Croft,et al.  Analyzing and Characterizing User Intent in Information-seeking Conversations , 2018, SIGIR.

[48]  J. Shane Culpepper,et al.  Neural Query Performance Prediction using Weak Supervision from Multiple Signals , 2018, SIGIR.

[49]  Xu Chen,et al.  Towards Conversational Search and Recommendation: System Ask, User Respond , 2018, CIKM.

[50]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[51]  W. Bruce Croft The Importance of Interaction for Information Retrieval , 2019, SIGIR.

[52]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[53]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[54]  Matthias Hagen,et al.  Toward Voice Query Clarification , 2018, SIGIR.

[55]  Bhaskar Mitra,et al.  Exploring Session Context using Distributed Representations of Queries and Reformulations , 2015, SIGIR.

[56]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[57]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[58]  W. Bruce Croft,et al.  Neural Matching Models for Question Retrieval and Next Question Prediction in Conversation , 2017, ArXiv.

[59]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[60]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[61]  Suresh Manandhar,et al.  Implementing clarification dialogues in open domain question answering , 2005, Natural Language Engineering.

[62]  Grace Hui Yang,et al.  The Query Change Model , 2015, ACM Trans. Inf. Syst..