Ranking Documents by Answer-Passage Quality

Evidence derived from passages that closely represent likely answers to a posed query can be useful input to the ranking process. Based on a novel use of Community Question Answering data, we present an approach for the creation of such passages. A general framework for extracting answer passages and estimating their quality is proposed, and this evidence is integrated into ranking models. Our experiments on two web collections show that such quality estimates from answer passages provide a strong indication of document relevance and compare favorably to previous passage-based methods. Combining such evidence can significantly improve over a set of state-of-the-art ranking models, including Quality-Biased Ranking, External Expansion, and a combination of both. A final ranking model that incorporates all quality estimates achieves further improvements on both collections.

[1]  Eugene Agichtein,et al.  Predicting information seeker satisfaction in community question answering , 2008, SIGIR '08.

[2]  W. Bruce Croft,et al.  Document Summarization for Answering Non-Factoid Queries , 2018, IEEE Transactions on Knowledge and Data Engineering.

[3]  Bhaskar Mitra,et al.  Neural Models for Information Retrieval , 2017, ArXiv.

[4]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[5]  Mark Sanderson,et al.  Advantages of query biased summaries in information retrieval , 1998, SIGIR '98.

[6]  Tapas Kanungo,et al.  Machine Learned Sentence Selection Strategies for Query-Biased Summarization , 2008 .

[7]  Benoit Favre,et al.  A Scalable Global Model for Summarization , 2009, ILP 2009.

[8]  Hiroya Takamura,et al.  Text Summarization Model Based on Maximum Coverage Problem and its Variant , 2009, EACL.

[9]  Maarten de Rijke,et al.  Supervised query modeling using wikipedia , 2010, SIGIR '10.

[10]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[11]  Tie-Yan Liu,et al.  Word-Entity Duet Representations for Document Ranking , 2017, SIGIR.

[12]  Charles L. A. Clarke,et al.  Efficient and effective spam filtering and re-ranking for large web datasets , 2010, Information Retrieval.

[13]  Oren Kurland,et al.  Query Expansion Using Word Embeddings , 2016, CIKM.

[14]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[15]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[16]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[17]  Oren Kurland,et al.  Ranking document clusters using markov random fields , 2013, SIGIR.

[18]  Stephen E. Robertson,et al.  Overview of the Okapi projects , 1997, J. Documentation.

[19]  W. Bruce Croft,et al.  Evaluating answer passages using summarization measures , 2014, SIGIR.

[20]  Alistair Moffat,et al.  Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.

[21]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[22]  Jing He,et al.  Bridging the Gap between Intrinsic and Perceived Relevance in Snippet Generation , 2012, COLING.

[23]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[24]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[25]  M. de Rijke,et al.  Exploiting External Collections for Query Expansion , 2012, TWEB.

[26]  W. Bruce Croft,et al.  Quality-biased ranking of web documents , 2011, WSDM '11.

[27]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[28]  Craig MacDonald,et al.  On the usefulness of query features for learning to rank , 2012, CIKM.

[29]  Evgeniy Gabrilovich,et al.  Predicting web searcher satisfaction with existing community-based answers , 2011, SIGIR.

[30]  Karen Spärck Jones,et al.  Generic summaries for indexing in information retrieval , 2001, SIGIR '01.

[31]  W. Bruce Croft,et al.  Embedding-based Query Language Models , 2016, ICTIR.

[32]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[33]  Oren Kurland,et al.  Utilizing Passage-Based Language Models for Document Retrieval , 2008, ECIR.

[34]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[35]  Oren Kurland,et al.  A study of the integration of passage-, document-, and cluster-based information for re-ranking search results , 2011, Information Retrieval.

[36]  Fernando Diaz,et al.  Improving the estimation of relevance models using large external corpora , 2006, SIGIR.

[37]  Ross Wilkinson,et al.  Effective retrieval of structured documents , 1994, SIGIR '94.

[38]  W. Bruce Croft,et al.  Retrieval models for question and answer archives , 2008, SIGIR '08.

[39]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41]  W. Bruce Croft Combining Approaches to Information Retrieval , 2002 .

[42]  John O'Connor,et al.  Answer-passage retrieval by text searching , 1980, J. Am. Soc. Inf. Sci..

[43]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[44]  Juan-Zi Li,et al.  Social context summarization , 2011, SIGIR.

[45]  Gareth J. F. Jones,et al.  Applying summarization techniques for term selection in relevance feedback , 2001, SIGIR '01.

[46]  Eugene Agichtein,et al.  Finding the right facts in the crowd: factoid question answering over social media , 2008, WWW.

[47]  Aristides Gionis,et al.  Answers, not links: extracting tips from yahoo! answers to address how-to web queries , 2012, WSDM '12.

[48]  Mirella Lapata,et al.  Multiple Aspect Summarization Using Integer Linear Programming , 2012, EMNLP.

[49]  Wai Lam,et al.  MEAD - A Platform for Multidocument Multilingual Text Summarization , 2004, LREC.

[50]  Jeffrey Pomerantz,et al.  Evaluating and predicting answer quality in community QA , 2010, SIGIR.

[51]  Hui Lin,et al.  Multi-document Summarization via Budgeted Maximization of Submodular Functions , 2010, NAACL.

[52]  W. Bruce Croft,et al.  Learning concept importance using a weighted dependence model , 2010, WSDM '10.