Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning

Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. On the other hand, reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sampleinefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline, while requiring minimal interactions with the environment. This is achieved by leveraging uncertainty estimates to divide the space in regions where the given model-based policy is reliable, and regions where it may have flaws or not be well defined. In these uncertain regions, we show that a locally learned-policy can be used directly with raw sensory inputs. We test our algorithm, Guided Uncertainty-Aware Policy Optimization (GUAPO), on a real-world robot performing peg insertion. Videos are available at: https://sites.google.com/view/guapo-rl.

[1]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[2]  Sergey Levine,et al.  Residual Reinforcement Learning for Robot Control , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[3]  David Filliat,et al.  State Representation Learning for Control: An Overview , 2018, Neural Networks.

[4]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[5]  Zoltan-Csaba Marton,et al.  Implicit 3D Orientation Learning for 6D Object Detection from RGB Images , 2018, ECCV.

[6]  Henry Zhu,et al.  Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[7]  Manolis I. A. Lourakis,et al.  T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[8]  Pieter Abbeel,et al.  Goal-conditioned Imitation Learning , 2019, NeurIPS.

[9]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[10]  Pieter Abbeel,et al.  Benchmarking Model-Based Reinforcement Learning , 2019, ArXiv.

[11]  Alberto Rodriguez,et al.  Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[14]  Hujun Bao,et al.  PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  S. LaValle Rapidly-exploring random trees : a new tool for path planning , 1998 .

[16]  Sachin Chitta,et al.  MoveIt! [ROS Topics] , 2012, IEEE Robotics Autom. Mag..

[17]  Pieter Abbeel,et al.  Learning Robotic Assembly from CAD , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[19]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[20]  Pascal Fua,et al.  Real-Time Seamless Single Shot 6D Object Pose Prediction , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Oussama Khatib,et al.  Experimental Studies of Contact Space Model for Multi-surface Collisions in Articulated Rigid-Body Systems , 2018, ISER.

[22]  Silvio Savarese,et al.  Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Leslie Pack Kaelbling,et al.  Residual Policy Learning , 2018, ArXiv.

[24]  Slobodan Ilic,et al.  DPOD: 6D Pose Object Detector and Refiner , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Byron Boots,et al.  RMPflow: A Computational Graph for Automatic Motion Policy Generation , 2018, WAFR.

[26]  Sergey Levine,et al.  Composable Deep Reinforcement Learning for Robotic Manipulation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Sami Haddadin,et al.  A Framework for Robot Manipulation: Skill Formalism, Meta Learning and Adaptive Control , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[28]  Nando de Freitas,et al.  Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[29]  Slobodan Ilic,et al.  DPOD: Dense 6D Pose Object Detector in RGB images , 2019, ArXiv.

[30]  Dieter Fox,et al.  Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[31]  Silvio Savarese,et al.  Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[32]  Dieter Fox,et al.  DART: Dense Articulated Real-Time Tracking , 2014, Robotics: Science and Systems.

[33]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[34]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[35]  Stefan Schaal,et al.  Probabilistic object tracking using a range camera , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[36]  Pascal Fua,et al.  Segmentation-Driven 6D Object Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Marc Toussaint,et al.  Understanding the geometry of workspace obstacles in Motion Optimization , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Pieter Abbeel,et al.  Finding Locally Optimal, Collision-Free Trajectories with Sequential Convex Optimization , 2013, Robotics: Science and Systems.

[39]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[40]  Pieter Abbeel,et al.  Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[41]  Jaime F. Fisac,et al.  A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems , 2017, IEEE Transactions on Automatic Control.

[42]  Jungwon Seo,et al.  Shallow-Depth Insertion: Peg in Shallow Hole Through Robotic In-Hand Manipulation , 2019, IEEE Robotics and Automation Letters.

[43]  Siddhartha S. Srinivasa,et al.  CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[44]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[45]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[46]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[47]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[48]  Gerd Hirzinger,et al.  A fast and robust grasp planner for arbitrary 3D objects , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[49]  Eugene Wong,et al.  Stochastic neural networks , 2009, Algorithmica.

[50]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[51]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[52]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[53]  Kevin Kelly,et al.  Comparative Peg-in-Hole Testing of a Force-Based Manipulation Controlled Robotic Hand , 2018, IEEE Transactions on Robotics.

[54]  Daniel Kappler,et al.  Riemannian Motion Policies , 2018, ArXiv.