论文信息 - Balancing Reinforcement Learning Training Experiences in Interactive Information Retrieval

Balancing Reinforcement Learning Training Experiences in Interactive Information Retrieval

Interactive Information Retrieval (IIR) and Reinforcement Learning (RL) share many commonalities, including an agent who learns while interacts, a long-term and complex goal, and an algorithm that explores and adapts. To successfully apply RL methods to IIR, one challenge is to obtain sufficient relevance labels to train the RL agents, which are infamously known as sample inefficient. However, in a text corpus annotated for a given query, it is not the relevant documents but the irrelevant documents that predominate. This would cause very unbalanced training experiences for the agent and prevent it from learning any policy that is effective. Our paper addresses this issue by using domain randomization to synthesize more relevant documents for the training. Our experimental results on the Text REtrieval Conference (TREC) Dynamic Domain (DD) 2017 Track show that the proposed method is able to boost an RL agent's learning effectiveness by 22% in dealing with unseen situations.

Grace Hui Yang | Limin Chen | Zhiwen Tang

[1] Taghi M. Khoshgoftaar,et al. A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[2] Kai Zou,et al. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[3] Jiliang Tang,et al. Deep Reinforcement Learning for Search, Recommendation, and Online Advertising: A Survey , 2018 .

[4] Yevgen Chebotar,et al. Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[5] Geoff Nitschke,et al. Improving Deep Learning using Generic Data Augmentation , 2017 .

[6] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[8] Jiliang Tang,et al. "Deep reinforcement learning for search, recommendation, and online advertising: a survey" by Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator , 2018, SIGWEB Newsl..

[9] Dejing Dou,et al. HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[10] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11] Sergey Levine,et al. (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[12] Grace Hui Yang,et al. Dynamic Search - Optimizing the Game of Information Seeking , 2019, ArXiv.

[13] Jianfeng Gao,et al. Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[14] Timothy Baldwin,et al. Robust Training under Linguistic Adversity , 2017, EACL.

[15] Grace Hui Yang,et al. Corpus-Level End-to-End Exploration for Interactive Systems , 2020, AAAI.

[16] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[17] Glen Berseth,et al. Progressive Reinforcement Learning with Distillation for Multi-Skilled Motion Control , 2018, ICLR.

[18] 悠太菊池,et al. 大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[19] Feng Liu,et al. Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling , 2018, ArXiv.