Passage Similarity and Diversification in Non-factoid Question Answering

The rise in popularity of mobile and voice search has led to a shift in focus from document retrieval to short answer passage retrieval for non-factoid questions. Some of the questions have multiple answers, and the aim is to retrieve a set of relevant answer passages, which covers all these alternatives. Compared to documents, answers are more specific and typically form more defined types or groups. Grouping answer passages based on strong similarity measures may provide a means of identifying these types. Typically, kNN clustering in combination with term-based representations have been used in Information Retrieval (IR) scenarios. An alternate method is to use pre-trained distributional representations such as GloVe and BERT, which capture additional semantic relationships. The recent success of trained neural models for various tasks provides the motivation for generating more task-specific representations. However, due to the absence of large datasets for incorporating passage level similarity information, a more feasible alternative is to use weak supervision based training. This information can then be used to generate a final ranked list of diversified answers using standard diversification algorithms. In this paper, we introduce a new dataset NFPassageQA_Sim, with human annotated similarity labels for pairs of answer passages corresponding to each question. These similarity labels are then processed to generate another dataset NFPassageQA_Div, which consists of answer types for these questions. Using the similarity labels, we demonstrate the effectiveness of using weak supervision signals derived from GloVe, fine-tuned and trained using a BERT model for the task of answer passage clustering. Finally, we introduce a model which incorporates these clusters into a MMR (Maximal Marginal Relevance) model, which significantly beats other diversification baselines using both diversity and relevance metrics.

[1]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[2]  Jimmy J. Lin,et al.  End-to-End Open-Domain Question Answering with BERTserini , 2019, NAACL.

[3]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[4]  Oren Kurland,et al.  Corpus structure, language models, and ad hoc information retrieval , 2004, SIGIR '04.

[5]  Xueqi Cheng,et al.  Learning for search result diversification , 2014, SIGIR.

[6]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[7]  Kyunghyun Cho,et al.  Passage Re-ranking with BERT , 2019, ArXiv.

[8]  W. Bruce Croft,et al.  Investigating the Successes and Failures of BERT for Passage Re-Ranking , 2019, ArXiv.

[9]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[10]  W. Bruce Croft,et al.  End to End Long Short Term Memory Networks for Non-Factoid Question Answering , 2016, ICTIR.

[11]  W. Bruce Croft,et al.  Learning a Better Negative Sampling Policy with Deep Neural Networks for Search , 2019, ICTIR.

[12]  W. Bruce Croft,et al.  A Hybrid Embedding Approach to Noisy Answer Passage Retrieval , 2018, ECIR.

[13]  Kenton Lee,et al.  A BERT Baseline for the Natural Questions , 2019, ArXiv.

[14]  W. Bruce Croft,et al.  Evaluating Text Representations for Retrieval of the Best Group of Documents , 2008, ECIR.

[15]  Oren Kurland,et al.  Ranking document clusters using markov random fields , 2013, SIGIR.

[16]  Jaap Kamps,et al.  Learning to Learn from Weak Supervision by Full Supervision , 2017, ArXiv.

[17]  Christopher De Sa,et al.  Data Programming: Creating Large Training Sets, Quickly , 2016, NIPS.

[18]  W. Bruce Croft,et al.  On the Theory of Weak Supervision for Information Retrieval , 2018, ICTIR.

[19]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[20]  Idan Szpektor,et al.  Novelty based Ranking of Human Answers for Community Questions , 2016, SIGIR.

[21]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[22]  Ali Montazeralghaem,et al.  Search Result Diversification with Guarantee of Topic Proportionality , 2020, ICTIR.

[23]  W. Bruce Croft,et al.  Beyond Factoid QA: Effective Methods for Non-factoid Answer Sentence Retrieval , 2016, ECIR.

[24]  J. Shane Culpepper,et al.  Neural Query Performance Prediction using Weak Supervision from Multiple Signals , 2018, SIGIR.

[25]  Ramesh Nallapati,et al.  Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering , 2019, EMNLP.

[26]  Brendan T. O'Connor,et al.  Exploring Diversification In Non-factoid Question Answering , 2018, ICTIR.

[27]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[28]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[29]  Samuel R. Bowman,et al.  Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks , 2018, ArXiv.

[30]  W. Bruce Croft,et al.  Term level search result diversification , 2013, SIGIR.

[31]  Tetsuya Sakai,et al.  Search Result Diversification Based on Hierarchical Intents , 2015, CIKM.

[32]  Oren Kurland,et al.  Testing the Cluster Hypothesis with Focused and Graded Relevance Judgments , 2018, SIGIR.

[33]  W. Bruce Croft,et al.  Diversity by proportionality: an election-based approach to search result diversification , 2012, SIGIR '12.

[34]  Xiaodong Liu,et al.  Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[35]  James Allan,et al.  A Comparative Study of Utilizing Topic Models for Information Retrieval , 2009, ECIR.

[36]  Chris Callison-Burch,et al.  Simple PPDB: A Paraphrase Database for Simplification , 2016, ACL.

[37]  W. Bruce Croft,et al.  ANTIQUE: A Non-factoid Question Answering Benchmark , 2019, ECIR.

[38]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[39]  Charles L. A. Clarke,et al.  Overview of the TREC 2011 Web Track , 2011, TREC.

[40]  Di Wang,et al.  A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering , 2015, ACL.

[41]  Ramesh Nallapati,et al.  Passage Ranking with Weak Supervsion , 2019, ArXiv.

[42]  Xueqi Cheng,et al.  Learning Maximal Marginal Relevance Model via Directly Optimizing Diversity Evaluation Measures , 2015, SIGIR.

[43]  Oren Kurland,et al.  The opposite of smoothing: a language model approach to ranking query-specific document clusters , 2008, SIGIR '08.

[44]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.