Addressing Sample Complexity in Visual Tasks Using HER and Hallucinatory GANs

Reinforcement Learning (RL) algorithms typically require millions of environment interactions to learn successful policies in sparse reward settings. Hindsight Experience Replay (HER) was introduced as a technique to increase sample efficiency by reimagining unsuccessful trajectories as successful ones by altering the originally intended goals. However, it cannot be directly applied to visual environments where goal states are often characterized by the presence of distinct visual features. In this work, we show how visual trajectories can be hallucinated to appear successful by altering agent observations using a generative model trained on relatively few snapshots of the goal. We then use this model in combination with HER to train RL agents in visual settings. We validate our approach on 3D navigation tasks and a simulated robotics application and show marked improvement over baselines derived from previous work.

[1]  Jonathan Scholz,et al.  Generative predecessor models for sample-efficient imitation learning , 2019, ICLR.

[2]  Sergey Levine,et al.  Few-Shot Goal Inference for Visuomotor Learning and Planning , 2018, CoRL.

[3]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[4]  OpenAI Learning Dexterous In-Hand Manipulation. , 2018 .

[5]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[6]  Yannick Schroecker,et al.  Imitating Latent Policies from Observation , 2018, ICML.

[7]  Guillaume Desjardins,et al.  Understanding disentangling in β-VAE , 2018, ArXiv.

[8]  Guillaume Desjardins,et al.  Understanding disentangling in $\beta$-VAE , 2018, 1804.03599.

[9]  Sergey Levine,et al.  Recall Traces: Backtracking Models for Efficient Reinforcement Learning , 2018, ICLR.

[10]  Ashley D. Edwards,et al.  Forward-Backward Reinforcement Learning , 2018, ArXiv.

[11]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[12]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[13]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[14]  Marcin Andrychowicz,et al.  Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.

[15]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[16]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[17]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[18]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[19]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[20]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Alejandro Hernández Cordero,et al.  Extending the OpenAI Gym for robotics: a toolkit for reinforcement learning using ROS and Gazebo , 2016, ArXiv.

[22]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[23]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[24]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[25]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[26]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[27]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[28]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[29]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[30]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[31]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[32]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[33]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[34]  Andrew Howard,et al.  Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[35]  Sebastian Thrun,et al.  FastSLAM: a factored solution to the simultaneous localization and mapping problem , 2002, AAAI/IAAI.

[36]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[37]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[38]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[39]  J. Schmidhuber Making the world differentiable: on using self supervised fully recurrent neural networks for dynamic reinforcement learning and planning in non-stationary environments , 1990, Forschungsberichte, TU Munich.

[40]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.