Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards

Connector insertion and many other tasks commonly found in modern manufacturing settings involve complex contact dynamics and friction. Since it is difficult to capture related physical effects with first-order modeling, traditional control methods often result in brittle and inaccurate controllers, which have to be manually tuned. Reinforcement learning (RL) methods have been demonstrated to be capable of learning controllers in such environments from autonomous interaction with the environment, but running RL algorithms in the real world poses sample efficiency and safety challenges. Moreover, in practical real-world settings, we cannot assume access to perfect state information or dense reward signals. In this paper, we consider a variety of difficult industrial insertion tasks with visual inputs and different natural reward specifications, namely sparse rewards and goal images. We show that methods that combine RL with prior information, such as classical controllers or demonstrations, can solve these tasks from a reasonable amount of real-world interaction.

[1]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[2]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[3]  Jun Morimoto,et al.  Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[4]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[5]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[6]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[7]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[8]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[9]  Joshua R. Smith,et al.  Robot, feed thyself: Plugging in to unmodified electrical outlets by sensing emitted AC electric fields , 2010, 2010 IEEE International Conference on Robotics and Automation.

[10]  Carl E. Rasmussen,et al.  Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[11]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[12]  Victor Eruhimov,et al.  Outlet detection and pose estimation for robot continuous operation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[14]  Oliver Kroemer,et al.  Active Reward Learning , 2014, Robotics: Science and Systems.

[15]  Edward H. Adelson,et al.  Localization and manipulation of small parts using GelSight tactile sensing , 2014, IROS.

[16]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[17]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[18]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[19]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[20]  Joseph A. Falco,et al.  Robotic Grasping and Manipulation Competition: Competitor Feedback and Lessons Learned , 2016 .

[21]  Jürgen Schmidhuber,et al.  A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.

[22]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[23]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[24]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[26]  Gaurav S. Sukhatme,et al.  Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning , 2017, ICML.

[27]  Nicholas Rhinehart,et al.  First-Person Activity Forecasting with Online Inverse Reinforcement Learning , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[29]  Giovanni De Magistris,et al.  Deep reinforcement learning for high precision assembly tasks , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Tom Schaul,et al.  Learning from Demonstrations for Real World Reinforcement Learning , 2017, ArXiv.

[31]  Yuval Tassa,et al.  Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.

[32]  Sergey Levine,et al.  Learning from the hindsight plan — Episodic MPC improvement , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[34]  Pieter Abbeel,et al.  Learning Robotic Assembly from CAD , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Sergey Levine,et al.  Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition , 2018, NeurIPS.

[36]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[37]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[38]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[39]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[40]  Sergey Levine,et al.  Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.

[41]  Leslie Pack Kaelbling,et al.  Residual Policy Learning , 2018, ArXiv.

[42]  Jitendra Malik,et al.  Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[43]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Sergey Levine,et al.  Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[45]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[46]  Sergey Levine,et al.  End-to-End Robotic Reinforcement Learning without Reward Engineering , 2019, Robotics: Science and Systems.

[47]  Alice M. Agogino,et al.  Reinforcement Learning on Variable Impedance Controller for High-Precision Robotic Assembly , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[48]  Henry Zhu,et al.  Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[49]  Sergey Levine,et al.  Residual Reinforcement Learning for Robot Control , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[50]  Sergey Levine,et al.  SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning , 2018, ICML.