An ensemble inverse optimal control approach for robotic task learning and adaptation

This paper contributes a novel framework to efficiently learn cost-to-go function representations for robotic tasks with latent modes. The proposed approach relies on the principle behind ensemble methods, where improved performance is obtained by aggregating a group of simple models, each of which can be efficiently learnedq. The maximum-entropy approximation is adopted as an effective initialization and the quality of this surrogate is guaranteed by a theoretical bound. Our approach also provides an alternative perspective to view the popular mixture of Gaussians under the framework of inverse optimal control. We further propose to enforce a dynamics on the model ensemble, using Kalman estimation to infer and modulate model modes. This allows robots to exploit the demonstration redundancy and to adapt to human interventions, especially in tasks where sensory observations are non-Markovian. The framework is demonstrated with a synthetic inverted pendulum example and online adaptation tasks, which include robotic handwriting and mail delivery.

[1]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[2]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[3]  Sergey Levine,et al.  Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[4]  Sergey Levine,et al.  Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[5]  Emanuel Todorov,et al.  Inverse Optimal Control with Linearly-Solvable MDPs , 2010, ICML.

[6]  Vicenç Gómez,et al.  Optimal control as a graphical model inference problem , 2009, Machine Learning.

[7]  Antonio Criminisi,et al.  Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2012, Found. Trends Comput. Graph. Vis..

[8]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[9]  Sylvain Calinon,et al.  Robot Learning with Task-Parameterized Generative Models , 2015, ISRR.

[10]  Emanuel Todorov,et al.  Compositionality of optimal control laws , 2009, NIPS.

[11]  Marin Kobilarov,et al.  Cross-entropy motion planning , 2012, Int. J. Robotics Res..

[12]  Shimon Ullman The Correspondence Problem , 1979 .

[13]  Hang Yin,et al.  Synthesizing Robotic Handwriting Motion by Learning from Human Demonstrations , 2016, IJCAI.

[14]  Aude Billard,et al.  Modeling robot discrete movements with state-varying stiffness and damping: A framework for integrated motion generation and impedance control , 2014, Robotics: Science and Systems.

[15]  Markus Wulfmeier,et al.  Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.

[16]  Dorthe Sølvason,et al.  Teleoperation for learning by demonstration: Data glove versus object manipulation for intuitive robot control , 2014, 2014 6th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT).

[17]  Stefanos Nikolaidis,et al.  Efficient Model Learning from Joint-Action Demonstrations for Human-Robot Collaborative Tasks , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[18]  Stefan Schaal,et al.  Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[19]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[20]  Dean Pomerleau,et al.  Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[21]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[22]  Carl E. Rasmussen,et al.  Variational Gaussian Process State-Space Models , 2014, NIPS.

[23]  J A Bagnell,et al.  An Invitation to Imitation , 2015 .

[24]  Brian D. Ziebart,et al.  Intent Prediction and Trajectory Forecasting via Predictive Inverse Linear-Quadratic Regulation , 2015, AAAI.

[25]  Luís Paulo Reis,et al.  Contextual Stochastic Search , 2016, GECCO.

[26]  R. E. Kalman,et al.  When Is a Linear Control System Optimal , 1964 .

[27]  Darwin G. Caldwell,et al.  Learning optimal controllers in human-robot cooperative transportation tasks with position and force constraints , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[28]  Peter Englert,et al.  Model-based imitation learning by probabilistic trajectory matching , 2013, 2013 IEEE International Conference on Robotics and Automation.

[29]  Maya Cakmak,et al.  Trajectories and keyframes for kinesthetic teaching: A human-robot interaction perspective , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[30]  Jan Peters,et al.  Learning multiple collaborative tasks with a mixture of Interaction Primitives , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Darwin G. Caldwell,et al.  A task-parameterized probabilistic model with minimal intervention control , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Darwin G. Caldwell,et al.  Multi-optima exploration with adaptive Gaussian mixture model , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[33]  Markus Wulfmeier,et al.  Deep Inverse Reinforcement Learning , 2015, ArXiv.

[34]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.