Beyond Factoid QA: Effective Methods for Non-factoid Answer Sentence Retrieval

Retrieving finer grained text units such as passages or sentences as answers for non-factoid Web queries is becoming increasingly important for applications such as mobile Web search. In this work, we introduce the answer sentence retrieval task for non-factoid Web queries, and investigate how this task can be effectively solved under a learning to rank framework. We design two types of features, namely semantic and context features, beyond traditional text matching features. We compare learning to rank methods with multiple baseline methods including query likelihood and the state-of-the-art convolutional neural network based method, using an answer-annotated version of the TREC GOV2 collection. Results show that features used previously to retrieve topical sentences and factoid answer sentences are not sufficient for retrieving answer sentences for non-factoid queries, but with semantic and context features, we can significantly outperform the baseline methods.

[1]  Vibhu O. Mittal,et al.  Bridging the lexical chasm: statistical approaches to answer-finding , 2000, SIGIR '00.

[2]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[3]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[4]  W. Bruce Croft,et al.  Linear feature-based models for information retrieval , 2007, Information Retrieval.

[5]  Noah A. Smith,et al.  What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[6]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[7]  Yi Liu,et al.  Statistical Machine Translation for Query Expansion in Answer Retrieval , 2007, ACL.

[8]  Mihai Surdeanu,et al.  Learning to Rank Answers on Large Online QA Collections , 2008, ACL.

[9]  Tapas Kanungo,et al.  Machine Learned Sentence Selection Strategies for Query-Biased Summarization , 2008 .

[10]  W. Bruce Croft,et al.  Retrieval models for question and answer archives , 2008, SIGIR '08.

[11]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[12]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[13]  Paolo Ferragina,et al.  Fast and Accurate Annotation of Short Texts with Wikipedia Pages , 2010, IEEE Software.

[14]  Chris Callison-Burch,et al.  Answer Extraction as Sequence Tagging with Tree Edit Distance , 2013, NAACL.

[15]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[16]  Ming-Wei Chang,et al.  Question Answering Using Enhanced Lexical Semantic Models , 2013, ACL.

[17]  Peter Jansen,et al.  Discourse Complements Lexical Semantics for Non-factoid Answer Reranking , 2014, ACL.

[18]  Lei Yu,et al.  Deep Learning for Answer Sentence Selection , 2014, ArXiv.

[19]  W. Bruce Croft,et al.  Evaluating answer passages using summarization measures , 2014, SIGIR.

[20]  W. Bruce Croft,et al.  Retrieving Passages and Finding Answers , 2014, ADCS '14.

[21]  W. Bruce Croft,et al.  A Comparison of Retrieval Models using Term Dependencies , 2014, CIKM.

[22]  W. Bruce Croft,et al.  Harnessing Semantics for Answer Sentence Retrieval , 2015, ESAIR@CIKM.