Weighting of Passages in Question Answering

Modern text retrieval systems employ text segmentation during the indexing of documents. We show that, rather than returning the passages to the user, significant improvements are achieved on the semantic text similarity task on question answering (QA) datasets by combining all passages from a document into a single result with an aggregate similarity score. Following an analysis of the SemEval-2016 and 2017 task 3 datasets, we develop a weighted averaging operator that achieves state-of-the-art results on subtask B and can be implemented into existing search engines. Segmentation in information retrieval matters. Our results show that paying attention to important passages by using a task-specific weighting method leads to the best results on these question answering domain retrieval tasks.

[1]  Suzan Verberne,et al.  Passage Retrieval for Question Answering using Sliding Windows , 2008, COLING 2008.

[2]  Paolo Rosso,et al.  UH-PRHLT at SemEval-2016 Task 3: Combining Lexical and Semantic-based Features for Community Question Answering , 2016, SemEval@NAACL-HLT.

[3]  Preslav Nakov,et al.  SemEval-2017 Task 3: Community Question Answering , 2017, *SEMEVAL.

[4]  Preslav Nakov,et al.  SemEval-2016 Task 3: Community Question Answering , 2019, *SEMEVAL.

[5]  W. Bruce Croft,et al.  Retrieving Passages and Finding Answers , 2014, ADCS '14.

[6]  Violaine Prince,et al.  Text Segmentation Based on Document Understanding for Information Retrieval , 2007, NLDB.

[7]  Jugal K. Kalita,et al.  Summarization as feature selection for text categorization , 2001, CIKM '01.

[8]  David Carmel,et al.  JuruXML - an XML Retrieval System at INEX'02 , 2002, INEX Workshop.

[9]  Delphine Charlet,et al.  SimBow at SemEval-2017 Task 3: Soft-Cosine Semantic Similarity between Questions for Community Question Answering , 2017, *SEMEVAL.

[10]  John O'Connor,et al.  Retrieval of answer-sentences and answer-figures from papers by text searching , 1975, Inf. Process. Manag..

[11]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[12]  Oren Kurland,et al.  Position-based contextualization for passage retrieval , 2013, CIKM.

[13]  Christian Plaunt,et al.  Subtopic structuring for full-length document access , 1993, SIGIR.

[14]  Robert Muir,et al.  Apache Lucene 4 , 2012, OSIR@SIGIR.

[15]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[16]  E. Chisholm,et al.  New Term Weighting Formulas for the Vector Space Method in Information Retrieval , 1999 .