End-to-End Robotic Reinforcement Learning without Reward Engineering

The combination of deep neural network models and reinforcement learning algorithms can make it possible to learn policies for robotic behaviors that directly read in raw sensory inputs, such as camera images, effectively subsuming both estimation and control into one model. However, real-world applications of reinforcement learning must specify the goal of the task by means of a manually programmed reward function, which in practice requires either designing the very same perception pipeline that end-to-end reinforcement learning promises to avoid, or else instrumenting the environment with additional sensors to determine if the task has been performed successfully. In this paper, we propose an approach for removing the need for manual engineering of reward specifications by enabling a robot to learn from a modest number of examples of successful outcomes, followed by actively solicited queries, where the robot shows the user a state and asks for a label to determine whether that state represents successful completion of the task. While requesting labels for every single state would amount to asking the user to manually provide the reward signal, our method requires labels for only a tiny fraction of the states seen during training, making it an efficient and practical approach for learning skills without manually engineered rewards. We evaluate our method on real-world robotic manipulation tasks where the observations consist of images viewed by the robot's camera. In our experiments, our method effectively learns to arrange objects, place books, and drape cloth, directly from images and without any manually specified reward functions, and with only 1-4 hours of interaction with the real world.

[1]  Sergey Levine,et al.  Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition , 2018, NeurIPS.

[2]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[3]  Anca D. Dragan,et al.  Learning a Prior over Intent via Meta-Inverse Reinforcement Learning , 2018, ICML.

[4]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[5]  Sergey Levine,et al.  Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[6]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[7]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[8]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[9]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[10]  OpenAI Learning Dexterous In-Hand Manipulation. , 2018 .

[11]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[12]  Oleg O. Sushkov,et al.  A Practical Approach to Insertion with Variable Socket Position Using Deep Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[13]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[14]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[15]  Sergey Levine,et al.  Few-Shot Goal Inference for Visuomotor Learning and Planning , 2018, CoRL.

[16]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[17]  Sergey Levine,et al.  Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Connor Schenck,et al.  Visual closed-loop control for pouring liquids , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[20]  Sergey Levine,et al.  Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[21]  Ashley D. Edwards PERCEPTUAL GOAL SPECIFICATIONS FOR REINFORCEMENT LEARNING , 2017 .

[22]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[23]  Yuchen Cui,et al.  Active Reward Learning from Critiques , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[25]  Sergey Levine,et al.  A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models , 2016, ArXiv.

[26]  Sergey Levine,et al.  Collective robot reinforcement learning with distributed asynchronous guided policy search , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[28]  Manuel Lopes,et al.  Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.

[29]  Sergey Levine,et al.  Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control , 2018, ArXiv.

[30]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Sergey Levine,et al.  Learning Dexterous Manipulation Policies from Experience and Imitation , 2016, ArXiv.

[32]  Jan Peters,et al.  Learning robot in-hand manipulation with tactile features , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[33]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[34]  Kai Oliver Arras,et al.  Inverse Reinforcement Learning algorithms and features for robot navigation in crowds: An experimental comparison , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35]  Markus Wulfmeier,et al.  Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.

[36]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[37]  Thorsten Joachims,et al.  Learning Trajectory Preferences for Manipulators via Iterative Improvement , 2013, NIPS.

[38]  Edmund H. Durfee,et al.  Comparing Action-Query Strategies in Semi-Autonomous Agents , 2011, AAAI.

[39]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[40]  Oliver Kroemer,et al.  Active Reward Learning , 2014, Robotics: Science and Systems.

[41]  Marco Pavone,et al.  Risk-sensitive Inverse Reinforcement Learning via Coherent Risk Models , 2017, Robotics: Science and Systems.

[42]  Andrew J. Davison,et al.  Sim-to-Real Reinforcement Learning for Deformable Object Manipulation , 2018, CoRL.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[45]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[46]  Yuchen Cui,et al.  Risk-Aware Active Inverse Reinforcement Learning , 2018, CoRL.

[47]  Darwin G. Caldwell,et al.  Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.