Learning Dexterous Manipulation Policies from Experience and Imitation

We explore learning-based approaches for feedback control of a dexterous five-finger hand performing non-prehensile manipulation. First, we learn local controllers that are able to perform the task starting at a predefined initial state. These controllers are constructed using trajectory optimization with respect to locally-linear time-varying models learned directly from sensor data. In some cases, we initialize the optimizer with human demonstrations collected via teleoperation in a virtual environment. We demonstrate that such controllers can perform the task robustly, both in simulation and on the physical platform, for a limited range of initial conditions around the trained starting state. We then consider two interpolation methods for generalizing to a wider range of initial conditions: deep learning, and nearest neighbors. We find that nearest neighbors achieve higher performance. Nevertheless, the neural network has its advantages: it uses only tactile and proprioceptive feedback but no visual feedback about the object (i.e. it performs the task blind) and learns a time-invariant policy. In contrast, the nearest neighbors method switches between time-varying local controllers based on the proximity of initial object states sensed via motion capture. While both generalization methods leave room for improvement, our work shows that (i) local trajectory-based controllers for complex non-prehensile manipulation tasks can be constructed from surprisingly small amounts of training data, and (ii) collections of such controllers can be interpolated to form more global controllers. Results are summarized in the supplementary video: this https URL

[1]  R Bellman,et al.  A MATHEMATICAL THEORY OF ADAPTIVE CONTROL PROCESSES. , 1959, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Karl Johan Åström,et al.  Adaptive Control , 1989, Embedded Digital Control with Microcontrollers.

[3]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[4]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[5]  Kevin M. Lynch,et al.  Dynamic Nonprehensile Manipulation: Controllability, Planning, and Experiments , 1999, Int. J. Robotics Res..

[6]  Jeff G. Schneider,et al.  Covariant Policy Search , 2003, IJCAI.

[7]  Emanuel Todorov,et al.  Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[8]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[11]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[12]  KangKang Yin,et al.  SIMBICON: simple biped locomotion control , 2007, ACM Trans. Graph..

[13]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[14]  Stefan Schaal,et al.  Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[15]  Christopher G. Atkeson,et al.  Standing balance control using a trajectory library , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Stefan Schaal,et al.  Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[17]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[18]  Sethu Vijayakumar,et al.  Adaptive Optimal Feedback Control with Learned Internal Dynamics Models , 2010, From Motor Learning to Interaction Learning in Robots.

[19]  Carl E. Rasmussen,et al.  Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[20]  Jan Peters,et al.  Reinforcement Learning to Adjust Robot Movements to New Situations , 2010, IJCAI.

[21]  Heinrich M. Jaeger,et al.  A Positive Pressure Universal Gripper Based on the Jamming of Granular Material , 2012, IEEE Transactions on Robotics.

[22]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[24]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[25]  Y. Matsuoka,et al.  Reinforcement Learning and Synergistic Control of the ACT Hand , 2013, IEEE/ASME Transactions on Mechatronics.

[26]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[27]  Vikash Kumar,et al.  Fast, strong and compliant pneumatic actuation for dexterous tendon-driven hands , 2013, 2013 IEEE International Conference on Robotics and Automation.

[28]  Yuval Tassa,et al.  Value function approximation and model predictive control , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[29]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[30]  Dieter Fox,et al.  DART: dense articulated real-time tracking with consumer depth cameras , 2015, Auton. Robots.

[31]  Jan Peters,et al.  Learning robot in-hand manipulation with tactile features , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[32]  Marc Toussaint,et al.  Direct Loss Minimization Inverse Optimal Control , 2015, Robotics: Science and Systems.

[33]  Vikash Kumar,et al.  MuJoCo HAPTIX: A virtual reality system for hand manipulation , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[34]  Nolan Wagener,et al.  Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[36]  Sergey Levine,et al.  Learning dexterous manipulation for a soft robotic hand from human demonstrations , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37]  Sergey Levine,et al.  Optimal control with learned local models: Application to dexterous manipulation , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..