论文信息 - Inverse KKT - Learning Cost Functions of Manipulation Tasks from Demonstrations

Inverse KKT - Learning Cost Functions of Manipulation Tasks from Demonstrations

Inverse optimal control (IOC) assumes that demonstrations are the solution to an optimal control problem with unknown underlying costs, and extracts parameters of these underlying costs. We propose the framework of inverse Karush–Kuhn–Tucker (KKT), which assumes that the demonstrations fulfill the KKT conditions of an unknown underlying constrained optimization problem, and extracts parameters of this underlying problem. Using this we can exploit the latter to extract the relevant task spaces and parameters of a cost function for skills that involve contacts. For a typical linear parameterization of cost functions this reduces to a quadratic program, ensuring guaranteed and very efficient convergence, but we can deal also with arbitrary non-linear parameterizations of cost functions. We also present a non-parametric variant of inverse KKT that represents the cost function as a functional in reproducing kernel Hilbert spaces. The aim of our approach is to push learning from demonstration to more complex manipulation scenarios that include the interaction with objects and therefore the realization of contacts/constraints within the motion. We demonstrate the approach on manipulation tasks such as sliding a box, closing a drawer and opening a door.

Peter Englert | Marc Toussaint | Ngo Anh Vien | Marc Toussaint | Péter Englert

[1] Darwin G. Caldwell,et al. On improving the extrapolation capability of task-parameterized movement models , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2] Pieter Abbeel,et al. Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[3] Timothy Bretl,et al. A convex approach to inverse optimal control and its application to modeling human locomotion , 2012, 2012 IEEE International Conference on Robotics and Automation.

[4] David Silver,et al. Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[5] Michael Ulbrich,et al. Imitating human reaching motions using physically inspired optimization principles , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[6] Marc Toussaint,et al. Newton methods for k-order Markov Constrained Motion Problems , 2014, ArXiv.

[7] Jan Peters,et al. Probabilistic Movement Primitives , 2013, NIPS.

[8] J. Andrew Bagnell,et al. Boosted Backpropagation Learning for Training Deep Modular Networks , 2010, ICML.

[9] Marc Toussaint,et al. Direct Loss Minimization Inverse Optimal Control , 2015, Robotics: Science and Systems.

[10] A. Atiya,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[11] Stefan Schaal,et al. A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[12] Jan Peters,et al. Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[13] Byron Boots,et al. Functional Gradient Motion Planning in Reproducing Kernel Hilbert Spaces , 2016, Robotics: Science and Systems.

[14] Peter Englert,et al. Dual execution of optimized contact interaction trajectories , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15] Sergey Levine,et al. Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[16] Marc Toussaint,et al. Learned graphical models for probabilistic planning provide a new class of movement primitives , 2013, Front. Comput. Neurosci..

[17] Marc Toussaint. A Tutorial on Newton Methods for Constrained Trajectory Optimization and Relations to SLAM, Gaussian Process Smoothing, Optimal Control, and Probabilistic Inference , 2017, Geometric and Numerical Foundations of Movements.

[18] Jonathan P. How,et al. Bayesian Nonparametric Inverse Reinforcement Learning , 2012, ECML/PKDD.

[19] Er Meng Joo,et al. A survey of inverse reinforcement learning techniques , 2012 .

[20] Ethem Alpaydin,et al. Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[21] Olivier Sigaud,et al. Learning compact parameterized skills with a single regression , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[22] Jean-Paul Laumond,et al. From human to humanoid locomotion—an inverse optimal control approach , 2010, Auton. Robots.

[23] Stefan Schaal,et al. Computational approaches to motor learning by imitation. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[24] Stefan Schaal,et al. Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[25] James Andrew Bagnell,et al. Learning in modular systems , 2010 .

[26] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[27] Oliver Kroemer,et al. Learning to predict phases of manipulation tasks as hidden states , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[28] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[29] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..

[30] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[31] Jochen J. Steil,et al. Automatic selection of task spaces for imitation learning , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32] J. Andrew Bagnell,et al. Maximum margin planning , 2006, ICML.

[33] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[34] Stefan Schaal,et al. Skill learning and task outcome prediction for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[35] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[36] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[37] Kee-Eung Kim,et al. Bayesian Nonparametric Feature Construction for Inverse Reinforcement Learning , 2013, IJCAI.

[38] Sergey Levine,et al. Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[39] Pieter Abbeel,et al. Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[40] Nikolaus Hansen,et al. Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.