Optimal control with learned local models: Application to dexterous manipulation

We describe a method for learning dexterous manipulation skills with a pneumatically-actuated tendon-driven 24-DoF hand. The method combines iteratively refitted time-varying linear models with trajectory optimization, and can be seen as an instance of model-based reinforcement learning or as adaptive optimal control. Its appeal lies in the ability to handle challenging problems with surprisingly little data. We show that we can achieve sample-efficient learning of tasks that involve intermittent contact dynamics and under-actuation. Furthermore, we can control the hand directly at the level of the pneumatic valves, without the use of a prior model that describes the relationship between valve commands and joint torques. We compare results from learning in simulation and on the physical system. Even though the learned policies are local, they are able to control the system in the face of substantial variability in initial state.

[1]  R Bellman,et al.  A MATHEMATICAL THEORY OF ADAPTIVE CONTROL PROCESSES. , 1959, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[3]  B. Pasik-Duncan,et al.  Adaptive Control , 1996, IEEE Control Systems.

[4]  Jun Morimoto,et al.  Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach , 2002, NIPS.

[5]  Jeff G. Schneider,et al.  Covariant policy search , 2003, IJCAI 2003.

[6]  Emanuel Todorov,et al.  Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[7]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[10]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[11]  Stefan Schaal,et al.  Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[12]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[13]  Sethu Vijayakumar,et al.  Adaptive Optimal Feedback Control with Learned Internal Dynamics Models , 2010, From Motor Learning to Interaction Learning in Robots.

[14]  Carl E. Rasmussen,et al.  Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[15]  Jan Peters,et al.  Reinforcement Learning to Adjust Robot Movements to New Situations , 2010, IJCAI.

[16]  Heinrich M. Jaeger,et al.  A Positive Pressure Universal Gripper Based on the Jamming of Granular Material , 2012, IEEE Transactions on Robotics.

[17]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[19]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[20]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[21]  Vikash Kumar,et al.  Fast, strong and compliant pneumatic actuation for dexterous tendon-driven hands , 2013, 2013 IEEE International Conference on Robotics and Automation.

[22]  Jan Peters,et al.  Sample-based informationl-theoretic stochastic optimal control , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[24]  Nolan Wagener,et al.  Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[26]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..