Teach Machine How to Read: Reading Behavior Inspired Relevance Estimation

Retrieval models aim to estimate the relevance of a document to a certain query. Although existing retrieval models have gained much success in both deepening our understanding of information seeking behavior and constructing practical retrieval systems (e.g. Web search engines), we have to admit that the models work in a rather different manner than how humans make relevance judgments. In this paper, we aim to reexamine the existing models as well as to propose new ones based on the findings in how human read documents during relevance judgment. First, we summarize a number of reading heuristics from practical user behavior patterns, which are categorized into implicit and explicit heuristics. By reviewing a variety of existing retrieval models, we find that most of them only satisfy a part of these reading heuristics. To evaluate the effectiveness of each heuristic, we conduct an ablation study and find that most heuristics have positive impacts on retrieval performance. We further integrate all the effective heuristics into a new retrieval model named Reading Inspired Model (RIM). Specifically, implicit reading heuristics are incorporated into the model framework and explicit reading heuristics are modeled as a Markov Decision Process and learned by reinforcement learning. Experimental results on a large-scale public available benchmark dataset and two test sets from NTCIR WWW tasks show that RIM outperforms most existing models, which illustrates the effectiveness of the reading heuristics. We believe that this work contributes to constructing retrieval models with both higher retrieval performance and better explainability.

[1]  Benjamin Piwowarski,et al.  A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.

[2]  Erik D. Reichle,et al.  Toward a model of eye movement control in reading. , 1998, Psychological review.

[3]  Frank Keller,et al.  Modeling Human Reading with Neural Attention , 2016, EMNLP.

[4]  Xueqi Cheng,et al.  A Deep Investigation of Deep IR Models , 2017, ArXiv.

[5]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[6]  Erik D. Reichle,et al.  The E-Z Reader model of eye-movement control in reading: Comparisons to other models , 2003, Behavioral and Brain Sciences.

[7]  K. Rayner The 35th Sir Frederick Bartlett Lecture: Eye movements and attention in reading, scene perception, and visual search , 2009, Quarterly journal of experimental psychology.

[8]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[9]  Yelong Shen,et al.  ReasoNet: Learning to Stop Reading in Machine Comprehension , 2016, CoCo@NIPS.

[10]  Kai Hui PACRR: A Position-Aware Neural IR Model for Relevance Matching , 2017 .

[11]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[12]  Xueqi Cheng,et al.  DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval , 2017, CIKM.

[13]  M. de Rijke,et al.  An Introduction to Click Models for Web Search: SIGIR 2015 Tutorial , 2015, SIGIR.

[14]  Lili Mou,et al.  Jumper: Learning When to Make Classification Decision in Reading , 2018, IJCAI.

[15]  Zhiyuan Liu,et al.  End-to-End Neural Ad-hoc Ranking with Kernel Pooling , 2017, SIGIR.

[16]  Quoc V. Le,et al.  Learning to Skim Text , 2017, ACL.

[17]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[18]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[19]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[20]  W. Bruce Croft,et al.  Learning concept importance using a weighted dependence model , 2010, WSDM '10.

[21]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[22]  Maarten de Rijke,et al.  Click Models for Web Search and their Applications to IR: WSDM 2016 Tutorial , 2016, WSDM '16.

[23]  Tao Tao,et al.  A formal study of information retrieval heuristics , 2004, SIGIR '04.

[24]  Carol L. Barry,et al.  Order Effects: A Study of the Possible Influence of Presentation Order on User Judgments of Document Relevance. , 1988 .

[25]  Li Zhao,et al.  Learning Structured Representation for Text Classification via Reinforcement Learning , 2018, AAAI.

[26]  Wei-Yun Ma,et al.  Speed Reading: Learning to Read ForBackward via Shuttle , 2018, EMNLP.

[27]  Josipa Crnic,et al.  Introduction to Modern Information Retrieval , 2011 .

[28]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[29]  Kam-Fai Wong,et al.  A retrospective study of a hybrid document-context based retrieval model , 2007, Inf. Process. Manag..

[30]  J. Tsitsiklis,et al.  Actor-citic agorithms , 1999, NIPS 1999.

[31]  Yang Liu,et al.  Fast and Accurate Text Classification: Skimming, Rereading and Early Stopping , 2018, ICLR.

[32]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[33]  Tao Tao,et al.  An exploration of proximity measures in information retrieval , 2007, SIGIR.

[34]  Jian-Yun Nie,et al.  Empirical Study of Multi-level Convolution Models for IR Based on Representations and Interactions , 2018, ICTIR.

[35]  Jun Xu,et al.  Modeling Diverse Relevance Patterns in Ad-hoc Retrieval , 2018, SIGIR.

[36]  Yiqun Liu,et al.  Sogou-QCL: A New Dataset with Click Relevance Label , 2018, SIGIR.

[37]  Yiqun Liu,et al.  Understanding Reading Attention Distribution during Relevance Judgement , 2018, CIKM.

[38]  Cheng Luo,et al.  Overview of the NTCIR-13 We Want Web Task , 2017, NTCIR.