Skills transfer across dissimilar robots by learning context-dependent rewards

Robot programming by demonstration encompasses a wide range of learning strategies, from simple mimicking of the demonstrator's actions to the higher level extraction of the underlying intent. By focusing on this last form, we study the problem of extracting the reward function explaining the demonstrations from a set of candidate reward functions, and using this information for self-refinement of the skill. This definition of the problem has links with inverse reinforcement learning problems in which the robot autonomously extracts an optimal reward function that defines the goal of the task. By relying on Gaussian mixture models, the proposed approach learns how the different candidate reward functions are combined, and in which contexts or phases of the task they are relevant for explaining the user's demonstrations. The extracted reward profile is then exploited to improve the skill with a self-refinement approach based on expectation-maximization, allowing the imitator to reach a skill level that goes beyond the demonstrations. The approach can be used to reproduce a skill in different ways or to transfer tasks across robots of different structures. The proposed approach is tested in simulation with a new type of continuum robot (STIFF-FLOP), using kinesthetic demonstrations from a Barrett WAM manipulator.

[1]  Andrew G. Barto,et al.  An Adaptive Robot Motivational System , 2006, SAB.

[2]  Jonathan P. How,et al.  Improving the efficiency of Bayesian inverse reinforcement learning , 2012, 2012 IEEE International Conference on Robotics and Automation.

[3]  Darwin G. Caldwell,et al.  Bayesian Nonparametric Multi-Optima Policy Search in Reinforcement Learning , 2013, AAAI.

[4]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[5]  Jan Peters,et al.  Imitation and Reinforcement Learning , 2010, IEEE Robotics & Automation Magazine.

[6]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[7]  Jan Peters,et al.  Imitation and Reinforcement Learning: Practical Algorithms for Motor Primitives in Robotics , 2010 .

[8]  Evan Dekker,et al.  Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[9]  J. Peters,et al.  Using Reward-weighted Regression for Reinforcement Learning of Task Space Control , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[10]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[11]  Tom Schaul,et al.  Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.

[12]  Darwin G. Caldwell,et al.  Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning , 2013, Robotics Auton. Syst..

[13]  Giorgio Metta,et al.  Learning the skill of archery by a humanoid robot iCub , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[14]  Srini Narayanan,et al.  Learning all optimal policies with multiple criteria , 2008, ICML '08.

[15]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[16]  Sethu Vijayakumar,et al.  Transferring Human Impedance Behavior to Heterogeneous Variable Impedance Actuators , 2013, IEEE Transactions on Robotics.

[17]  Darwin G. Caldwell,et al.  Multi-optima exploration with adaptive Gaussian mixture model , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[18]  Manuel Lopes,et al.  Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.

[19]  Aude Billard,et al.  On Learning, Representing, and Generalizing a Task in a Humanoid Robot , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  J. Wickens,et al.  Computational models of the basal ganglia: from robots to membranes , 2004, Trends in Neurosciences.