Do People and Neural Nets Pay Attention to the Same Words: Studying Eye-tracking Data for Non-factoid QA Evaluation

We investigated how users evaluate passage-length answers for non-factoid questions. We conduct a study where answers were presented to users, sometimes shown with automatic word highlighting. Users were tasked with evaluating answer quality, correctness, completeness, and conciseness. Words in the answer were also annotated, both explicitly through user mark up and implicitly through user gaze data obtained from eye-tracking. Our results show that the correctness of an answer strongly depends on its completeness, conciseness is less important. Analysis of the annotated words showed correct and incorrect answers were assessed differently. Automatic highlighting helped users to evaluate answers quicker while maintaining accuracy, particularly when highlighting was similar to annotation. We fine-tuned a BERT model on a non-factoid QA task to examine if the model attends to words similar to those annotated. Similarity was found, consequently, we propose a method to exploit the BERT attention map to generate suggestions that simulate eye gaze during user evaluation.

[1]  Jesse Vig,et al.  A Multiscale Visualization of Attention in the Transformer Model , 2019, ACL.

[2]  Chih-Hung Hsieh,et al.  Towards better measurement of attention and satisfaction in mobile search , 2014, SIGIR.

[3]  W. Bruce Croft,et al.  Beyond Factoid QA: Effective Methods for Non-factoid Answer Sentence Retrieval , 2016, ECIR.

[4]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[5]  Jacek Gwizdka,et al.  Characterizing relevance with eye-tracking measures , 2014, IIiX.

[6]  W. Bruce Croft,et al.  Retrieving Passages and Finding Answers , 2014, ADCS '14.

[7]  Mihai Surdeanu,et al.  Learning to Rank Answers on Large Online QA Collections , 2008, ACL.

[8]  Huiping Sun,et al.  CQArank: jointly model topics and expertise in community question answering , 2013, CIKM.

[9]  Falk Scholer,et al.  Constructing query-biased summaries: a comparison of human and system generated snippets , 2010, IIiX.

[10]  Yiqun Liu,et al.  Human Behavior Inspired Machine Reading Comprehension , 2019, SIGIR.

[11]  GayGeri,et al.  Eye tracking and online search: Lessons learned and challenges ahead , 2008 .

[12]  Ling Xia,et al.  Eye tracking and online search: Lessons learned and challenges ahead , 2008, J. Assoc. Inf. Sci. Technol..

[13]  Thorsten Joachims,et al.  Eye-tracking analysis of user behavior in WWW search , 2004, SIGIR '04.

[14]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[15]  Omer Levy,et al.  What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[16]  Keith Rayner,et al.  Eye Movements of Highly Skilled and Average Readers: Differential Effects of Frequency and Predictability , 2005, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[17]  David Zola,et al.  3 – The Temporal Characteristics of Visual Information Extraction during Reading , 1983 .

[18]  Johanna K. Kaakinen,et al.  Perspective effects in repeated reading: An eye movement study , 2007, Memory & cognition.

[19]  Fedor Moiseev,et al.  Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.

[20]  Edward Cutrell,et al.  What are you looking for?: an eye-tracking study of information usage in web search , 2007, CHI.

[21]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[22]  W. Bruce Croft,et al.  End to End Long Short Term Memory Networks for Non-Factoid Question Answering , 2016, ICTIR.

[23]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[24]  Vijaymeena M.K,et al.  A Survey on Similarity Measures in Text Mining , 2016 .

[25]  Anna Rumshisky,et al.  Revealing the Dark Secrets of BERT , 2019, EMNLP.

[26]  Joakim Nivre,et al.  An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation , 2018, WMT.

[27]  David Konopnicki,et al.  A Study of BERT for Non-Factoid Question-Answering under Passage Length Constraints , 2019, ArXiv.

[28]  Milad Shokouhi,et al.  Evaluating the Impact of Snippet Highlighting in Search , 2009, UIIR@SIGIR.

[29]  Madian Khabsa,et al.  Is This Your Final Answer?: Evaluating the Effect of Answers on Good Abandonment in Mobile Search , 2016, SIGIR.

[30]  Erik D. Reichle,et al.  The effect of word frequency, word predictability, and font difficulty on the eye movements of young and older readers. , 2006, Psychology and aging.

[31]  Lydia B. Chilton,et al.  Addressing people's information needs directly in a web search result page , 2011, WWW.

[32]  Pavel Braslavski,et al.  Search Snippet Evaluation at Yandex: Lessons Learned and Future Directions , 2011, CLEF.

[33]  Saskia Brand-Gruwel,et al.  Effects of task complexity on online search behavior of adolescents , 2017, J. Assoc. Inf. Sci. Technol..

[34]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[35]  Omer Levy,et al.  Are Sixteen Heads Really Better than One? , 2019, NeurIPS.

[36]  Mark Sanderson,et al.  Advantages of query biased summaries in information retrieval , 1998, SIGIR '98.

[37]  Filip Radlinski,et al.  Search Engines that Learn from Implicit Feedback , 2007, Computer.

[38]  W. Bruce Croft,et al.  Using Key Concepts in a Translation Model for Retrieval , 2015, SIGIR.

[39]  Richard Socher,et al.  A Neural Network for Factoid Question Answering over Paragraphs , 2014, EMNLP.

[40]  Charles L. A. Clarke,et al.  The influence of caption features on clickthrough patterns in web search , 2007, SIGIR.

[41]  W. Bruce Croft,et al.  Answer Interaction in Non-factoid Question Answering Systems , 2019, CHIIR.

[42]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[43]  Filip Radlinski,et al.  Online Evaluation for Information Retrieval , 2016, Found. Trends Inf. Retr..