Learning to Coordinate Multiple Reinforcement Learning Agents for Diverse Query Reformulation

We propose a method to efficiently learn diverse strategies in reinforcement learning for query reformulation in the tasks of document retrieval and question answering. In the proposed framework an agent consists of multiple specialized sub-agents and a meta-agent that learns to aggregate the answers from sub-agents to produce a final answer. Sub-agents are trained on disjoint partitions of the training data, while the meta-agent is trained on the full training set. Our method makes learning faster, because it is highly parallelizable, and has better generalization performance than strong baselines, such as an ensemble of agents trained on the full data. We show that the improved performance is due to the increased diversity of reformulation strategies.

[1]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[2]  Filip Radlinski,et al.  TREC Complex Answer Retrieval Overview , 2018, TREC.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Andrew Yates,et al.  Contextualized PACRR for Complex Answer Retrieval , 2017, TREC.

[5]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[6]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[9]  Gerard de Melo,et al.  PACRR: A Position-Aware Neural IR Model for Relevance Matching , 2017, EMNLP.

[10]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[11]  Ashwin K. Vijayakumar,et al.  Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models , 2016, ArXiv.

[12]  Luca Bascetta,et al.  Adaptive Step-Size for Policy Gradient Methods , 2013, NIPS.

[13]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[14]  Jeff Clune,et al.  Curiosity Search: Producing Generalists by Encouraging Individuals to Continually Explore and Acquire Skills throughout Their Lifetime , 2016, PloS one.

[15]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[16]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[17]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[18]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[19]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[20]  Kenneth O. Stanley,et al.  Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.

[21]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[22]  William B. Dolan,et al.  Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.

[23]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[24]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[25]  Joelle Pineau,et al.  An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[26]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[27]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[28]  Satinder P. Singh,et al.  Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[29]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[30]  Daniel Jurafsky,et al.  A Simple, Fast Diverse Decoding Algorithm for Neural Generation , 2016, ArXiv.

[31]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[32]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[33]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[34]  Joelle Pineau,et al.  A Deep Reinforcement Learning Chatbot , 2017, ArXiv.

[35]  Kyunghyun Cho,et al.  Task-Oriented Query Reformulation with Reinforcement Learning , 2017, EMNLP.

[36]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[37]  Kyunghyun Cho,et al.  End-to-End Goal-Driven Web Navigation , 2016, NIPS.

[38]  Wei Wang,et al.  Analyzing Language Learned by an Active Question Answering Agent , 2018, ArXiv.

[39]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[40]  Colin Cherry,et al.  A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU , 2014, WMT@ACL.

[41]  Zhiyuan Liu,et al.  Denoising Distantly Supervised Open-Domain Question Answering , 2018, ACL.

[42]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[43]  Wei Zhang,et al.  R3: Reinforced Reader-Ranker for Open-Domain Question Answering , 2017, ArXiv.

[44]  Jannis Bulian,et al.  Ask the Right Questions: Active Question Reformulation with Reinforcement Learning , 2017, ICLR.

[45]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[46]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[47]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[48]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[49]  Geoffrey E. Hinton,et al.  Large scale distributed neural network training through online distillation , 2018, ICLR.

[50]  Michael Fairbank,et al.  The divergence of reinforcement learning algorithms with value-iteration and function approximation , 2011, The 2012 International Joint Conference on Neural Networks (IJCNN).

[51]  Wei Zhang,et al.  Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering , 2017, ICLR.

[52]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[53]  Kyunghyun Cho,et al.  SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine , 2017, ArXiv.

[54]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[55]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[56]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.