Learning to Reinforce Search Effectiveness

Session search is an Information Retrieval (IR) task which handles a series of queries issued for a search task. In this paper, we propose a novel reinforcement learning style information retrieval framework and develop a new feedback learning algorithm to model user feedback, including clicks and query reformulations, as reinforcement signals and to generate rewards in the RL framework. From a new perspective, we view session search as a cooperative game played between two agents, the user and the search engine. We study the communications between the two agents; they always exchange opinions on "whether the current stage of search is relevant" and "whether we should explore now." The algorithm infers user feedback models by an EM algorithm from the query logs. We compare to several state-of-the-art session search algorithms and evaluate our algorithm on the most recent TREC 2012 to 2014 Session Tracks. The experimental results demonstrates that our approach is highly effective for improving session search accuracy.

[1]  Luke S. Zettlemoyer,et al.  Bootstrapping Semantic Parsers from Conversations , 2011, EMNLP.

[2]  Regina Barzilay,et al.  Non-Linear Monte-Carlo Search in Civilization II , 2011, IJCAI.

[3]  Grace Hui Yang,et al.  Utilizing query change for session search , 2013, SIGIR.

[4]  Ryen W. White,et al.  Leaving so soon?: understanding and predicting web search abandonment rationales , 2012, CIKM.

[5]  Victor R. Lesser,et al.  Efficient multi-agent reinforcement learning through automated supervision , 2008, AAMAS.

[6]  Jun Wang,et al.  Dynamic Information Retrieval Modeling , 2015, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[7]  Ben Carterette,et al.  Overview of the TREC 2013 Session Track , 2013, TREC.

[8]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[9]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[10]  Luke S. Zettlemoyer,et al.  Reading between the Lines: Learning to Map High-Level Instructions to Commands , 2010, ACL.

[11]  Steve Fox,et al.  Evaluating implicit measures to improve web search , 2005, TOIS.

[12]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[13]  Rohit J. Kate,et al.  Learning Language Semantics from Ambiguous Supervision , 2007, AAAI.

[14]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[15]  Raymond J. Mooney,et al.  Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  David L. Roberts,et al.  A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback , 2014, AAAI.

[18]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[19]  Luke S. Zettlemoyer,et al.  Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Jürgen Schmidhuber,et al.  Sequential Decision Making Based on Direct Search , 2001, Sequence Learning.

[22]  Andrew McCallum,et al.  Resource-Bounded Information Extraction: Acquiring Missing Feature Values on Demand , 2010, PAKDD.

[23]  Victor R. Lesser,et al.  BIG: An agent for resource-bounded information gathering and decision making , 2000, Artif. Intell..

[24]  Dan Roth,et al.  Reading to Learn: Constructing Features from Semantic Abstracts , 2009, EMNLP.

[25]  Kallirroi Georgila,et al.  Single-Agent vs. Multi-Agent Techniques for Concurrent Reinforcement Learning of Negotiation Dialogue Policies , 2014, ACL.

[26]  Dan Klein,et al.  Learning Semantic Correspondences with Less Supervision , 2009, ACL.

[27]  Ben Carterette,et al.  Overview of the TREC 2014 Session Track , 2014, TREC.

[28]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[29]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[30]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[31]  James J. Little,et al.  Curious George: An attentive semantic robot , 2008, Robotics Auton. Syst..

[32]  Grace Hui Yang,et al.  Win-win search: dual-agent stochastic game in session search , 2014, SIGIR.

[33]  Dan Roth,et al.  Confidence Driven Unsupervised Semantic Parsing , 2011, ACL.

[34]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[35]  Daniel Jurafsky,et al.  Learning to Follow Navigational Directions , 2010, ACL.