论文信息 - Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning

Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning

Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. On the other hand, reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sampleinefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline, while requiring minimal interactions with the environment. This is achieved by leveraging uncertainty estimates to divide the space in regions where the given model-based policy is reliable, and regions where it may have flaws or not be well defined. In these uncertain regions, we show that a locally learned-policy can be used directly with raw sensory inputs. We test our algorithm, Guided Uncertainty-Aware Policy Optimization (GUAPO), on a real-world robot performing peg insertion. Videos are available at: https://sites.google.com/view/guapo-rl.

[1] Pieter Abbeel,et al. Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[2] Sergey Levine,et al. Residual Reinforcement Learning for Robot Control , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[3] David Filliat,et al. State Representation Learning for Control: An Overview , 2018, Neural Networks.

[4] Dieter Fox,et al. PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[5] Zoltan-Csaba Marton,et al. Implicit 3D Orientation Learning for 6D Object Detection from RGB Images , 2018, ECCV.

[6] Henry Zhu,et al. Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[7] Manolis I. A. Lourakis,et al. T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[8] Pieter Abbeel,et al. Goal-conditioned Imitation Learning , 2019, NeurIPS.

[9] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.

[10] Pieter Abbeel,et al. Benchmarking Model-Based Reinforcement Learning , 2019, ArXiv.

[11] Alberto Rodriguez,et al. Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[13] Sergey Levine,et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[14] Hujun Bao,et al. PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] S. LaValle. Rapidly-exploring random trees : a new tool for path planning , 1998 .

[16] Sachin Chitta,et al. MoveIt! [ROS Topics] , 2012, IEEE Robotics Autom. Mag..

[17] Pieter Abbeel,et al. Learning Robotic Assembly from CAD , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[19] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[20] Pascal Fua,et al. Real-Time Seamless Single Shot 6D Object Pose Prediction , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21] Oussama Khatib,et al. Experimental Studies of Contact Space Model for Multi-surface Collisions in Articulated Rigid-Body Systems , 2018, ISER.

[22] Silvio Savarese,et al. Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23] Leslie Pack Kaelbling,et al. Residual Policy Learning , 2018, ArXiv.

[24] Slobodan Ilic,et al. DPOD: 6D Pose Object Detector and Refiner , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25] Byron Boots,et al. RMPflow: A Computational Graph for Automatic Motion Policy Generation , 2018, WAFR.

[26] Sergey Levine,et al. Composable Deep Reinforcement Learning for Robotic Manipulation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[27] Sami Haddadin,et al. A Framework for Robot Manipulation: Skill Formalism, Meta Learning and Adaptive Control , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[28] Nando de Freitas,et al. Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[29] Slobodan Ilic,et al. DPOD: Dense 6D Pose Object Detector in RGB images , 2019, ArXiv.

[30] Dieter Fox,et al. Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[31] Silvio Savarese,et al. Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[32] Dieter Fox,et al. DART: Dense Articulated Real-Time Tracking , 2014, Robotics: Science and Systems.

[33] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[34] V. Lepetit,et al. EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[35] Stefan Schaal,et al. Probabilistic object tracking using a range camera , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[36] Pascal Fua,et al. Segmentation-Driven 6D Object Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Marc Toussaint,et al. Understanding the geometry of workspace obstacles in Motion Optimization , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[38] Pieter Abbeel,et al. Finding Locally Optimal, Collision-Free Trajectories with Sequential Convex Optimization , 2013, Robotics: Science and Systems.

[39] Sergey Levine,et al. Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[40] Pieter Abbeel,et al. Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[41] Jaime F. Fisac,et al. A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems , 2017, IEEE Transactions on Automatic Control.

[42] Jungwon Seo,et al. Shallow-Depth Insertion: Peg in Shallow Hole Through Robotic In-Hand Manipulation , 2019, IEEE Robotics and Automation Letters.

[43] Siddhartha S. Srinivasa,et al. CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[44] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[45] Vincent Lepetit,et al. Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[46] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[47] Yevgen Chebotar,et al. Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[48] Gerd Hirzinger,et al. A fast and robust grasp planner for arbitrary 3D objects , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[49] Eugene Wong,et al. Stochastic neural networks , 2009, Algorithmica.

[50] Martin A. Riedmiller,et al. Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[51] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[52] Steven M. LaValle,et al. Planning algorithms , 2006 .

[53] Kevin Kelly,et al. Comparative Peg-in-Hole Testing of a Force-Based Manipulation Controlled Robotic Hand , 2018, IEEE Transactions on Robotics.

[54] Daniel Kappler,et al. Riemannian Motion Policies , 2018, ArXiv.