论文信息 - Learning Deep Visuomotor Policies for Dexterous Hand Manipulation

Learning Deep Visuomotor Policies for Dexterous Hand Manipulation

Multi-fingered dexterous hands are versatile and capable of acquiring a diverse set of skills such as grasping, in-hand manipulation, and tool use. To fully utilize their versatility in real-world scenarios, we require algorithms and policies that can control them using on-board sensing capabilities, without relying on external tracking or motion capture systems. Cameras and tactile sensors are the most widely used on-board sensors that do not require instrumentation of the world. In this work, we demonstrate an imitation learning based approach to train deep visuomotor policies for a variety of manipulation tasks with a simulated five fingered dexterous hand. These policies directly control the hand using high dimensional visual observations of the world and propreoceptive observations from the robot, and can be trained efficiently with a few hundred expert demonstration trajectories. We also find that using touch sensing information enables faster learning and better asymptotic performance for tasks with high degree of occlusions. Video demonstration of our results are available at: https://sites.google.com/view/hand-vil/

[1] Misha Denil,et al. Learning Awareness Models , 2018, ICLR.

[2] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[3] Vikash Kumar,et al. Manipulators and Manipulation in high dimensional spaces , 2016 .

[4] Henry Zhu,et al. Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[5] Peter K. Allen,et al. Graspit! A versatile simulator for robotic grasping , 2004, IEEE Robotics & Automation Magazine.

[6] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[7] Emanuel Todorov,et al. Ensemble-CIO: Full-body dynamic motion planning that transfers to physical humanoids , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8] Razvan Pascanu,et al. Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[9] Jitendra Malik,et al. Investigating Deep Reinforcement Learning For Grasping Objects With An Anthropomorphic Hand , 2018 .

[10] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11] Oliver Brock,et al. A novel type of compliant and underactuated robotic hand for dexterous grasping , 2016, Int. J. Robotics Res..

[12] Allan Jabri,et al. Universal Planning Networks , 2018, ICML.

[13] Oliver Kroemer,et al. Learning robot tactile sensing for object manipulation , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14] Yuval Tassa,et al. Real-time behaviour synthesis for dynamic hand-manipulation , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[15] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.

[16] Aaron M. Dollar,et al. A Hand-Centric Classification of Human and Robot Dexterous Manipulation , 2013, IEEE Transactions on Haptics.

[17] Sham M. Kakade,et al. Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control , 2018, ICLR.

[18] Abhinav Gupta,et al. Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[19] Marcin Andrychowicz,et al. Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.

[20] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[21] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[22] Zoran Popovic,et al. Contact-invariant optimization for hand manipulation , 2012, SCA '12.

[23] Sergey Levine,et al. Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[24] Aaron Hertzmann,et al. Trajectory Optimization for Full-Body Movements with Complex Contacts , 2013, IEEE Transactions on Visualization and Computer Graphics.

[25] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[26] Sergey Levine,et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[27] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28] Jan Peters,et al. Learning robot in-hand manipulation with tactile features , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[29] Atil Iscen,et al. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[30] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[31] Sergey Levine,et al. Learning dexterous manipulation for a soft robotic hand from human demonstrations , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32] Sergey Levine,et al. DeepMimic , 2018, ACM Trans. Graph..

[33] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .

[34] Jun Wang,et al. Synthesis of force-closure grasps on 3-D objects based on the Q distance , 2003, IEEE Trans. Robotics Autom..

[35] Russ Tedrake,et al. A direct method for trajectory optimization of rigid bodies through contact , 2014, Int. J. Robotics Res..

[36] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[37] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.

[38] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[40] BrockOliver,et al. A novel type of compliant and underactuated robotic hand for dexterous grasping , 2016 .

[41] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[42] Ken Goldberg,et al. Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , 2017, ICRA.

[43] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[44] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[45] Sergey Levine,et al. (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[46] Richard M. Murray,et al. A Mathematical Introduction to Robotic Manipulation , 1994 .

[47] Balaraman Ravindran,et al. EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[48] Emanuel Todorov,et al. Reinforcement learning for non-prehensile manipulation: Transfer from simulation to physical system , 2018, 2018 IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR).

[49] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[50] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[51] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[52] Nando de Freitas,et al. Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.