Structured Apprenticeship Learning

We propose a graph-based algorithm for apprenticeship learning when the reward features are noisy. Previous apprenticeship learning techniques learn a reward function by using only local state features. This can be a limitation in practice, as often some features are misspecified or subject to measurement noise. Our graphical framework, inspired from the work on Markov Random Fields, allows to alleviate this problem by propagating information between states, and rewarding policies that choose similar actions in adjacent states. We demonstrate the advantage of the proposed approach on grid-world navigation problems, and on the problem of teaching a robot to grasp novel objects in simulation.

[1]  Michael H. Bowling,et al.  Apprenticeship learning using linear programming , 2008, ICML '08.

[2]  Pushmeet Kohli,et al.  P3 & Beyond: Solving Energies with Higher Order Cliques , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Martial Hebert,et al.  Onboard contextual classification of 3-D point clouds with learned high-order Markov Random Fields , 2009, 2009 IEEE International Conference on Robotics and Automation.

[4]  Christos Dimitrakakis,et al.  Preference elicitation and inverse reinforcement learning , 2011, ECML/PKDD.

[5]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[6]  J. Andrew Bagnell,et al.  (Approximate) Subgradient Methods for Structured Prediction , 2007, International Conference on Artificial Intelligence and Statistics.

[7]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[8]  Anind K. Dey,et al.  Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.

[9]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[10]  Andrew T. Irish,et al.  TRAJECTORY LEARNING BASED ON CONDITIONAL RANDOM FIELDS FOR ROBOT PROGRAMMING BY DEMONSTRATION , 2010 .

[11]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[12]  Ben Taskar,et al.  Learning associative Markov networks , 2004, ICML.

[13]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[14]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[15]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[16]  Nathan Ratliff,et al.  Online) Subgradient Methods for Structured Prediction , 2007 .

[17]  Oliver Kroemer,et al.  Learning robot grasping from 3-D images with Markov Random Fields , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[19]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[20]  Nathan Ratliff,et al.  Learning to search: structured prediction techniques for imitation learning , 2009 .

[21]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[22]  Robert E. Schapire,et al.  A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[23]  Ben Taskar,et al.  Discriminative learning of Markov random fields for segmentation of 3D scan data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).