Contextual Imagined Goals for Self-Supervised Robotic Learning

While reinforcement learning provides an appealing formalism for learning individual skills, a general-purpose robotic system must be able to master an extensive repertoire of behaviors. Instead of learning a large collection of skills individually, can we instead enable a robot to propose and practice its own behaviors automatically, learning about the affordances and behaviors that it can perform in its environment, such that it can then repurpose this knowledge once a new task is commanded by the user? In this paper, we study this question in the context of self-supervised goal-conditioned reinforcement learning. A central challenge in this learning regime is the problem of goal setting: in order to practice useful skills, the robot must be able to autonomously set goals that are feasible but diverse. When the robot's environment and available objects vary, as they do in most open-world settings, the robot must propose to itself only those goals that it can accomplish in its present setting with the objects that are at hand. Previous work only studies self-supervised goal-conditioned RL in a single-environment setting, where goal proposals come from the robot's past experience or a generative model are sufficient. In more diverse settings, this frequently leads to impossible goals and, as we show experimentally, prevents effective learning. We propose a conditional goal-setting model that aims to propose goals that are feasible from the robot's current state. We demonstrate that this enables self-supervised goal-conditioned off-policy learning with raw image observations in the real world, enabling a robot to manipulate a variety of objects and generalize to new objects that were not seen during training.

[1]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[2]  Connor Schenck,et al.  Visual closed-loop control for pouring liquids , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[4]  Danica Kragic,et al.  Learning grasping points with shape context , 2010, Robotics Auton. Syst..

[5]  Martin A. Riedmiller,et al.  Self-supervised Learning of Image Embedding for Continuous Control , 2019, ArXiv.

[6]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[7]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[8]  Sergey Levine,et al.  Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[10]  Xingyu Lin,et al.  Reinforcement Learning without Ground-Truth State , 2019, ArXiv.

[11]  Sergey Levine,et al.  Robustness via Retrying: Closed-Loop Robotic Manipulation with Self-Supervised Learning , 2018, CoRL.

[12]  Peter Englert,et al.  Multi-task policy search for robotics , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[14]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Carme Torras,et al.  Active learning of manipulation sequences , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[17]  Sergey Levine,et al.  Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control , 2018, ArXiv.

[18]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[19]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[20]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[21]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[22]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[23]  Dieter Fox,et al.  Autonomous generation of complete 3D object models using next best view manipulation planning , 2011, 2011 IEEE International Conference on Robotics and Automation.

[24]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[25]  David Warde-Farley,et al.  Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.

[26]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[27]  Oliver Kroemer,et al.  Combining active learning and reactive control for robot grasping , 2010, Robotics Auton. Syst..

[28]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[29]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[30]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[31]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[32]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Sergey Levine,et al.  SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning , 2018, ICML.

[34]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[35]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[36]  Sergey Levine,et al.  Self-Supervised Visual Planning with Temporal Skip Connections , 2017, CoRL.

[37]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[38]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[39]  Sergey Levine,et al.  Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[40]  Sergey Levine,et al.  Stochastic Adversarial Video Prediction , 2018, ArXiv.

[41]  Leslie Pack Kaelbling,et al.  Learning to Achieve Goals , 1993, IJCAI.

[42]  Danica Kragic,et al.  Interactive grasp learning based on human demonstration , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.