Learning manipulation skills from a single demonstration

We consider the scenario where a robot is demonstrated a manipulation skill once and should then use only a few trials on its own to learn to reproduce, optimize, and generalize that same skill. A manipulation skill is generally a high-dimensional policy. To achieve the desired sample efficiency, we need to exploit the inherent structure in this problem. With our approach, we propose to decompose the problem into analytically known objectives, such as motion smoothness, and black-box objectives, such as trial success or reward, depending on the interaction with the environment. The decomposition allows us to leverage and combine (i) constrained optimization methods to address analytic objectives, (ii) constrained Bayesian optimization to explore black-box objectives, and (iii) inverse optimal control methods to eventually extract a generalizable skill representation. The algorithm is evaluated on a synthetic benchmark experiment and compared with state-of-the-art learning methods. We also demonstrate the performance on real-robot experiments with a PR2.

[1]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[2]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[3]  Meng Joo Er,et al.  A survey of inverse reinforcement learning techniques , 2012, Int. J. Intell. Comput. Cybern..

[4]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[5]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[6]  Sergey Levine,et al.  Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Olivier Sigaud,et al.  Policy Improvement Methods: Between Black-Box Optimization and Episodic Reinforcement Learning , 2012 .

[8]  Wolfram Burgard,et al.  A Probabilistic Framework for Learning Kinematic Models of Articulated Objects , 2011, J. Artif. Intell. Res..

[9]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[10]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[11]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[12]  Alkis Gotovos,et al.  Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.

[13]  C. Rasmussen,et al.  Approximations for Binary Gaussian Process Classification , 2008 .

[14]  Matt J. Kusner,et al.  Bayesian Optimization with Inequality Constraints , 2014, ICML.

[15]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[16]  Angela P. Schoellig,et al.  Safe and robust learning control with Gaussian processes , 2015, 2015 European Control Conference (ECC).

[17]  Sergey Levine,et al.  Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[18]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[19]  Oliver Brock,et al.  Opening a lockbox through physical exploration , 2017, 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids).

[20]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[21]  A.G. Alleyne,et al.  A survey of iterative learning control , 2006, IEEE Control Systems.

[22]  Marc Toussaint,et al.  Active exploration of joint dependency structures , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Olivier Sigaud,et al.  Robot Skill Learning: From Reinforcement Learning to Evolution Strategies , 2013, Paladyn J. Behav. Robotics.

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[26]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[27]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[28]  Jasper Snoek,et al.  Bayesian Optimization with Unknown Constraints , 2014, UAI.

[29]  Peter Englert,et al.  Dual execution of optimized contact interaction trajectories , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[31]  Marc Toussaint A Tutorial on Newton Methods for Constrained Trajectory Optimization and Relations to SLAM, Gaussian Process Smoothing, Optimal Control, and Probabilistic Inference , 2017, Geometric and Numerical Foundations of Movements.

[32]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[33]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[34]  Sergey Levine,et al.  Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[35]  Stefan Schaal,et al.  Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[36]  Ales Ude,et al.  Enhanced Policy Adaptation Through Directed Explorative Learning , 2015, Int. J. Humanoid Robotics.

[37]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[38]  Francisco Javier García-Polo,et al.  Safe reinforcement learning in high-risk tasks through policy improvement , 2011, ADPRL.

[39]  Timothy Bretl,et al.  A convex approach to inverse optimal control and its application to modeling human locomotion , 2012, 2012 IEEE International Conference on Robotics and Automation.

[40]  Donald R. Jones,et al.  Global versus local search in constrained optimization of computer models , 1998 .

[41]  Peter Englert,et al.  Inverse KKT - Learning Cost Functions of Manipulation Tasks from Demonstrations , 2017, ISRR.

[42]  Stefan Schaal,et al.  Learning force control policies for compliant manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[43]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[44]  Sergey Levine,et al.  One-shot learning of manipulation skills with online dynamics adaptation and neural network priors , 2015, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[45]  Duy Nguyen-Tuong,et al.  Safe Exploration for Active Learning with Gaussian Processes , 2015, ECML/PKDD.

[46]  Robert B. Gramacy,et al.  Optimization Under Unknown Constraints , 2010, 1004.4027.

[47]  Jan Peters,et al.  Data-Efficient Generalization of Robot Skills with Contextual Policy Search , 2013, AAAI.

[48]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[49]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[50]  Marc Toussaint,et al.  Learned graphical models for probabilistic planning provide a new class of movement primitives , 2013, Front. Comput. Neurosci..

[51]  Peter Englert,et al.  Combined Optimization and Reinforcement Learning for Manipulation Skills , 2016, Robotics: Science and Systems.

[52]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[53]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[54]  Jan Peters,et al.  Bayesian optimization for learning gaits under uncertainty , 2015, Annals of Mathematics and Artificial Intelligence.