论文信息 - ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning - 字舞流文

ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning

Sparse reward is one of the most challenging problems in reinforcement learning (RL). Hindsight Experience Replay (HER) attempts to address this issue by converting a failed experience to a successful one by relabeling the goals. Despite its effectiveness, HER has limited applicability because it lacks a compact and universal goal representation. We present Augmenting experienCe via TeacheR's adviCE (ACTRCE), an efficient reinforcement learning technique that extends the HER framework using natural language as the goal representation. We first analyze the differences among goal representation, and show that ACTRCE can efficiently solve difficult reinforcement learning problems in challenging 3D navigation tasks, whereas HER with non-language goal representation failed to learn. We also show that with language goal representations, the agent can generalize to unseen instructions, and even generalize to instructions with unseen lexicons. We further demonstrate it is crucial to use hindsight advice to solve challenging tasks, and even small amount of advice is sufficient for the agent to achieve good performance.

Sanja Fidler | Jimmy Ba | Jamie Kiros | Yuhuai Wu | Harris Chan | Jimmy Ba | Yuhuai Wu | S. Fidler | Harris Chan | J. Kiros

[1] William Chan,et al. InferLite: Simple Universal Sentence Representations from Natural Language Inference Data , 2018, EMNLP.

[2] Wei Xu,et al. Interactive Grounded Language Acquisition and Generalization in a 2D World , 2018, ICLR.

[3] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4] Sanja Fidler,et al. Teaching Machines to Describe Images via Natural Language Feedback , 2017, ArXiv.

[5] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[6] Guillaume Lample,et al. Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[7] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[8] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[9] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[10] Gregory Kuhlmann and Peter Stone and Raymond J. Mooney and Shavlik. Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer , 2004, AAAI 2004.

[11] Stevan Harnad. The Symbol Grounding Problem , 1999, ArXiv.

[12] Chris Sauer,et al. Beating Atari with Natural Language Guided Reinforcement Learning , 2017, ArXiv.

[13] Dan Klein,et al. Learning with Latent Language , 2017, NAACL.

[14] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[15] Yuval Tassa,et al. Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.

[16] Pieter Abbeel,et al. The Importance of Sampling inMeta-Reinforcement Learning , 2018, NeurIPS.

[17] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[18] Pushmeet Kohli,et al. Learning to Understand Goal Specifications by Modelling Reward , 2018, ICLR.

[19] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.

[20] Holger Schwenk,et al. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[21] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.

[22] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[23] Wojciech Jaskowski,et al. ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[24] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[25] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[26] Jeffrey Mark Siskind,et al. Grounding language in perception , 1993, Other Conferences.

[27] Jude W. Shavlik,et al. Incorporating Advice into Agents that Learn from Reinforcements , 1994, AAAI.

[28] Andrew G. Barto,et al. Shaping as a method for accelerating reinforcement learning , 1992, Proceedings of the 1992 IEEE International Symposium on Intelligent Control.

[29] Terry Winograd,et al. Understanding natural language , 1974 .

[30] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[31] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[32] Ruslan Salakhutdinov,et al. Gated-Attention Architectures for Task-Oriented Language Grounding , 2017, AAAI.

[33] Demis Hassabis,et al. Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.

[34] John Langford,et al. Mapping Instructions and Visual Observations to Actions with Reinforcement Learning , 2017, EMNLP.