Direct Loss Minimization Inverse Optimal Control

Inverse Optimal Control (IOC) has strongly impacted the systems engineering process, enabling automated planner tuning through straightforward and intuitive demonstration. The most successful and established applications, though, have been in lower dimensional problems such as navigation planning where exact optimal planning or control is feasible. In higher dimensional systems, such as humanoid robots, research has made substantial progress toward generalizing the ideas to model free or locally optimal settings, but these systems are complicated to the point where demonstration itself can be difficult. Typically, real-world applications are restricted to at best noisy or even partial or incomplete demonstrations that prove cumbersome in existing frameworks. This work derives a very flexible method of IOC based on a form of Structured Prediction known as Direct Loss Minimization. The resulting algorithm is essentially Policy Search on a reward function that rewards similarity to demonstrated behavior (using Covariance Matrix Adaptation (CMA) in our experiments). Our framework blurs the distinction between IOC, other forms of Imitation Learning, and Reinforcement Learning, enabling us to derive simple, versatile, and practical algorithms that blend imitation and reinforcement signals into a unified framework. Our experiments analyze various aspects of its performance and demonstrate its efficacy on conveying preferences for motion shaping and combined reach and grasp quality optimization.

[1]  C. Karen Liu,et al.  Learning physics-based motion style with nonlinear inverse optimization , 2005, ACM Trans. Graph..

[2]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[3]  Patrick MacAlpine,et al.  Keyframe Sampling, Optimization, and Behavior Integration: Towards Long-Distance Kicking in the RoboCup 3D Simulation League , 2014, RoboCup.

[4]  Marc Toussaint,et al.  Newton methods for k-order Markov Constrained Motion Problems , 2014, ArXiv.

[5]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[6]  Siddhartha S. Srinivasa,et al.  Imitation learning for locomotion and manipulation , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[7]  Martial Hebert,et al.  Contextual classification with functional Max-Margin Markov Networks , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Tutorial , 2016, ArXiv.

[9]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[10]  Peter K. Allen,et al.  Graspit! A versatile simulator for robotic grasping , 2004, IEEE Robotics & Automation Magazine.

[11]  Pieter Abbeel,et al.  Finding Locally Optimal, Collision-Free Trajectories with Sequential Convex Optimization , 2013, Robotics: Science and Systems.

[12]  Er Meng Joo,et al.  A survey of inverse reinforcement learning techniques , 2012 .

[13]  Siddhartha S. Srinivasa,et al.  CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[14]  Stefan Schaal,et al.  Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[15]  Siddhartha S. Srinivasa,et al.  Manipulation planning with goal sets using constrained trajectory optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[16]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects , 2006, NIPS.

[17]  Stefan Schaal,et al.  STOMP: Stochastic trajectory optimization for motion planning , 2011, 2011 IEEE International Conference on Robotics and Automation.

[18]  Tamir Hazan,et al.  Direct Loss Minimization for Structured Prediction , 2010, NIPS.

[19]  Alexander Herzog,et al.  Template-based learning of grasp selection , 2012, 2012 IEEE International Conference on Robotics and Automation.

[20]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[21]  Nikolaus Hansen,et al.  A restart CMA evolution strategy with increasing population size , 2005, 2005 IEEE Congress on Evolutionary Computation.

[22]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[23]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[24]  David Silver,et al.  Learning from Demonstration for Autonomous Navigation in Complex Unstructured Terrain , 2010, Int. J. Robotics Res..

[25]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[26]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[27]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[28]  Sergey Levine,et al.  Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[29]  Chonhyon Park,et al.  ITOMP: Incremental Trajectory Optimization for Real-Time Replanning in Dynamic Environments , 2012, ICAPS.

[30]  Nathan Ratliff,et al.  Online) Subgradient Methods for Structured Prediction , 2007 .

[31]  Daniel D. Lee,et al.  Scalable real-time object recognition and segmentation via cascaded, discriminative Markov random fields , 2010, 2010 IEEE International Conference on Robotics and Automation.

[32]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[33]  James J. Kuffner,et al.  Adaptive workspace biasing for sampling-based planners , 2008, 2008 IEEE International Conference on Robotics and Automation.