The Answer is at your Fingertips: Improving Passage Retrieval for Web Question Answering with Search Behavior Data

Passage retrieval is a crucial first step of automatic Question Answering (QA). While existing passage retrieval algorithms are effective at selecting document passages most similar to the question, or those that contain the expected answer types, they do not take into account which parts of the document the searchers actually found useful. We propose, to the best of our knowledge, the first successful attempt to incorporate searcher examination data into passage retrieval for question answering. Specifically, we exploit detailed examination data, such as mouse cursor movements and scrolling, to infer the parts of the document the searcher found interesting, and then incorporate this signal into passage retrieval for QA. Our extensive experiments and analysis demonstrate that our method significantly improves passage retrieval, compared to using textual features alone. As an additional contribution, we make available to the research community the code and the search behavior data used in this study, with the hope of encouraging further research in this area.

[1]  M. I. Jordan Leo Breiman , 2011, 1101.0929.

[2]  Min-Yen Kan,et al.  QANUS: An Open-source Question-Answering Platform , 2015, ArXiv.

[3]  Eugene Agichtein,et al.  Improving search result summaries by using searcher behavior data , 2013, SIGIR.

[4]  Ryen W. White,et al.  User see, user point: gaze and cursor alignment in web search , 2012, CHI.

[5]  Susan T. Dumais,et al.  An Analysis of the AskMSR Question-Answering System , 2002, EMNLP.

[6]  Yoshinori Hijikata,et al.  Implicit user profiling for on demand relevance feedback , 2004, IUI '04.

[7]  Kerry Rodden,et al.  Eye-mouse coordination patterns on web search results pages , 2008, CHI Extended Abstracts.

[8]  Jimmy J. Lin,et al.  Quantitative evaluation of passage retrieval algorithms for question answering , 2003, SIGIR.

[9]  Andreas Dengel,et al.  Query expansion using gaze-based feedback on the subdocument level , 2008, SIGIR '08.

[10]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[11]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[12]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[13]  Eugene Agichtein,et al.  Beyond dwell time: estimating document relevance from cursor movements and other post-click searcher behavior , 2012, WWW.

[14]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[15]  Andreas Dengel,et al.  Segment-level display time as implicit feedback: a comparison to eye tracking , 2009, SIGIR.

[16]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[17]  Tat-Seng Chua,et al.  Question answering passage retrieval using dependency relations , 2005, SIGIR '05.

[18]  Ryen W. White,et al.  Text selections as implicit relevance feedback , 2012, SIGIR '12.

[19]  Sanda M. Harabagiu,et al.  Employing Two Question Answering Systems in TREC 2005 , 2005, TREC.

[20]  Charles L. A. Clarke,et al.  Question Answering by Passage Selection (MultiText Experiments for TREC-9) , 2000, TREC.

[21]  Jimmy J. Lin,et al.  Overview of the TREC 2006 ciQA task , 2007, SIGF.

[22]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Evaluation , 2000, TREC.

[23]  Eugene Agichtein,et al.  Find it if you can: a game for modeling different types of web search success using interaction data , 2011, SIGIR.

[24]  Meredith Ringel Morris,et al.  What do you see when you're surfing?: using eye tracking to predict salient regions of web pages , 2009, CHI.

[25]  Tapas Kanungo,et al.  Machine Learned Sentence Selection Strategies for Query-Biased Summarization , 2008 .

[26]  John C. Mitchell,et al.  Third-Party Web Tracking: Policy and Technology , 2012, 2012 IEEE Symposium on Security and Privacy.

[27]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[28]  James Allan,et al.  Passage Reranking for Question Answering Using Syntactic Structures and Answer Types , 2011, ECIR.

[29]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[30]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[31]  Eugene Agichtein,et al.  Towards predicting web searcher gaze position from mouse movements , 2010, CHI Extended Abstracts.

[32]  Tibor Kiss,et al.  Unsupervised Multilingual Sentence Boundary Detection , 2006, CL.

[33]  Balachander Krishnamurthy,et al.  WWW 2009 MADRID! Track: Security and Privacy / Session: Web Privacy Privacy Diffusion on the Web: A Longitudinal Perspective , 2022 .

[34]  Tapas Kanungo,et al.  Predicting the readability of short web summaries , 2009, WSDM '09.