IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data

Learning from offline task demonstrations is a problem of great interest in robotics. For simple short-horizon manipulation tasks with modest variation in task instances, offline learning from a small set of demonstrations can produce controllers that successfully solve the task. However, leveraging a fixed batch of data can be problematic for larger datasets and longer-horizon tasks with greater variations. The data can exhibit substantial diversity and consist of suboptimal solution approaches. In this paper, we propose Implicit Reinforcement without Interaction at Scale (IRIS), a novel framework for learning from large-scale demonstration datasets. IRIS factorizes the control problem into a goal-conditioned low-level controller that imitates short demonstration sequences and a high-level goal selection mechanism that sets goals for the low-level and selectively combines parts of suboptimal solutions leading to more successful task completions. We evaluate IRIS across three datasets, including the RoboTurk Cans dataset collected by humans via crowdsourcing, and show that performant policies can be learned from purely offline learning. Additional results at https://sites.google.com/stanford.edu/iris/.

[1]  Aysegul Dundar,et al.  Unsupervised Disentanglement of Pose, Appearance and Background from Images and Videos , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Dale Schuurmans,et al.  Striving for Simplicity in Off-policy Deep Reinforcement Learning , 2019, ArXiv.

[3]  Natasha Jaques,et al.  Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.

[4]  P. Abbeel,et al.  Goal-conditioned Imitation Learning , 2019, NeurIPS.

[5]  Sergey Levine,et al.  Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.

[6]  Yao Liu,et al.  Off-Policy Policy Gradient with Stationary Distribution Correction , 2019, UAI.

[7]  Pieter Abbeel,et al.  Towards Characterizing Divergence in Deep Q-Learning , 2019, ArXiv.

[8]  S. Levine,et al.  Learning Latent Plans from Play , 2019, CoRL.

[9]  Thomas Brox,et al.  CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity , 2019, 1902.05605.

[10]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[11]  Li Fei-Fei,et al.  ROBOTURK: A Crowdsourcing Platform for Robotic Skill Learning through Imitation , 2018, CoRL.

[12]  Silvio Savarese,et al.  SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark , 2018, CoRL.

[13]  Brijen Thananjeyan,et al.  SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards , 2018, Int. J. Robotics Res..

[14]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[15]  Silvio Savarese,et al.  Learning task-oriented grasping for tool manipulation from simulated self-supervision , 2018, Robotics: Science and Systems.

[16]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[17]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[18]  Nando de Freitas,et al.  Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[19]  Sergey Levine,et al.  Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.

[20]  Yang Gao,et al.  Reinforcement Learning from Imperfect Demonstrations , 2018, ICLR.

[21]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[23]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[24]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[25]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[26]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[27]  Sergey Levine,et al.  Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection , 2016, ISER.

[28]  Kuan-Ting Yu,et al.  More than a million ways to be pushed. A high-fidelity experimental dataset of planar pushing , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[30]  J. Andrew Bagnell,et al.  Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.

[31]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[32]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[33]  Rüdiger Dillmann,et al.  The KIT object models database: An object model database for object recognition, localization and manipulation in service robotics , 2012, Int. J. Robotics Res..

[34]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Matei T. Ciocarlie,et al.  The Columbia grasp database , 2009, 2009 IEEE International Conference on Robotics and Automation.

[36]  Pieter Abbeel,et al.  Learning from Demonstrations Through the Use of Non-rigid Registration , 2013, ISRR.

[37]  P. Abbeel,et al.  Inverse Reinforcement Learning , 2010, Encyclopedia of Machine Learning.

[38]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .