Model-based imitation learning by probabilistic trajectory matching

One of the most elegant ways of teaching new skills to robots is to provide demonstrations of a task and let the robot imitate this behavior. Such imitation learning is a non-trivial task: Different anatomies of robot and teacher, and reduced robustness towards changes in the control task are two major difficulties in imitation learning. We present an imitation-learning approach to efficiently learn a task from expert demonstrations. Instead of finding policies indirectly, either via state-action mappings (behavioral cloning), or cost function learning (inverse reinforcement learning), our goal is to find policies directly such that predicted trajectories match observed ones. To achieve this aim, we model the trajectory of the teacher and the predicted robot trajectory by means of probability distributions. We match these distributions by minimizing their Kullback-Leibler divergence. In this paper, we propose to learn probabilistic forward models to compute a probability distribution over trajectories. We compare our approach to model-based reinforcement learning methods with hand-crafted cost functions. Finally, we evaluate our method with experiments on a real compliant robot.

[1]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[2]  William A. Gruver,et al.  CAD off-line programming for robot vision , 1985, Robotics.

[3]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[4]  Claude Sammut,et al.  A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.

[5]  Michael I. Jordan,et al.  An internal model for sensorimotor integration. , 1995, Science.

[6]  Jeff G. Schneider,et al.  Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning , 1996, NIPS.

[7]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[8]  Christopher G. Atkeson,et al.  A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.

[9]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[10]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[11]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[12]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[13]  K. Dautenhahn,et al.  Imitation in Animals and Artifacts , 2002 .

[14]  K. Dautenhahn,et al.  The correspondence problem , 2002 .

[15]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[16]  Agathe Girard,et al.  Propagation of uncertainty in Bayesian kernel models - application to multiple-step ahead forecasting , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[17]  W. Wong,et al.  On ψ-Learning , 2003 .

[18]  Tamim Asfour,et al.  Programming by demonstration: dual-arm manipulation tasks for humanoid robots , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[19]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[20]  Ales Ude,et al.  Programming full-body movements for humanoid robots by observation , 2004, Robotics Auton. Syst..

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[23]  Aude Billard,et al.  Discriminative and adaptive imitation in uni-manual and bi-manual tasks , 2006, Robotics Auton. Syst..

[24]  Aude Billard,et al.  On Learning, Representing, and Generalizing a Task in a Humanoid Robot , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[25]  Sebastian Thrun,et al.  Apprenticeship learning for motion planning with application to parking lot navigation , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Alex S. Taylor,et al.  Machine intelligence , 2009, CHI.

[27]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[28]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[29]  Tapani Raiko,et al.  Variational Bayesian learning of nonlinear hidden state-space models for model predictive control , 2009, Neurocomputing.

[30]  Heni Ben Amor,et al.  Kinesthetic Bootstrapping: Teaching Motor Skills to Humanoid Robots through Physical Interaction , 2009, KI.

[31]  Jan Peters,et al.  Imitation and Reinforcement Learning , 2010, IEEE Robotics & Automation Magazine.

[32]  Oskar von Stryk,et al.  BioRob-Arm: A Quickly Deployable and Intrinsically Safe, Light- Weight Robot Arm for Service Robotics Applications , 2010, ISR/ROBOTIK.

[33]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[34]  Geoffrey Biggs,et al.  A Survey of Robot Programming Systems , 2010 .

[35]  Carl E. Rasmussen,et al.  Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[36]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[37]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[38]  D Fox,et al.  Multiple-Target Reinforcement Learning with a Single Policy , 2011 .

[39]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[40]  Thomas Lens,et al.  Physical Human-Robot Interaction with a Lightweight, Elastic Tendon Driven Robotic Arm , 2012 .