Apprenticeship learning with few examples

We consider the problem of imitation learning when the examples, provided by an expert human, are scarce. Apprenticeship learning via inverse reinforcement learning provides an efficient tool for generalizing the examples, based on the assumption that the expert's policy maximizes a value function, which is a linear combination of state and action features. Most apprenticeship learning algorithms use only simple empirical averages of the features in the demonstrations as a statistics of the expert's policy. However, this method is efficient only when the number of examples is sufficiently large to cover most of the states, or the dynamics of the system is nearly deterministic. In this paper, we show that the quality of the learned policies is sensitive to the error in estimating the averages of the features when the dynamics of the system is stochastic. To reduce this error, we introduce two new approaches for bootstrapping the demonstrations by assuming that the expert is near-optimal and the dynamics of the system is known. In the first approach, the expert's examples are used to learn a reward function and to generate furthermore examples from the corresponding optimal policy. The second approach uses a transfer technique, known as graph homomorphism, in order to generalize the expert's actions to unvisited regions of the state space. Empirical results on simulated robot navigation problems show that our approach is able to learn sufficiently good policies from a significantly small number of examples.

[1]  Panos E. Trahanias,et al.  Computational modeling of cortical pathways involved in action execution and action observation , 2011, Neurocomputing.

[2]  Aude Billard,et al.  Imitation : a review , 2002 .

[3]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, Comb..

[4]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[5]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[6]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[7]  Abdeslam Boularias,et al.  Apprenticeship learning via soft local homomorphisms , 2010, 2010 IEEE International Conference on Robotics and Automation.

[8]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[9]  Robert E. Schapire,et al.  A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[10]  Satinder P. Singh,et al.  Transfer via soft homomorphisms , 2009, AAMAS.

[11]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[12]  David Silver,et al.  Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[13]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[14]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[15]  A. Barto,et al.  An algebraic approach to abstraction in reinforcement learning , 2004 .

[16]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[17]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[18]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[19]  Marc Toussaint,et al.  Task Space Retrieval Using Inverse Feedback Control , 2011, ICML.

[20]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[21]  Michael H. Bowling,et al.  Apprenticeship learning using linear programming , 2008, ICML '08.

[22]  Csaba Szepesvári,et al.  Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.

[23]  Brahim Chaib-draa,et al.  Bootstrapping Apprenticeship Learning , 2010, NIPS.

[24]  John C. Cross,et al.  Learning to search , 1980 .

[25]  Oliver Kroemer,et al.  Learning robot grasping from 3-D images with Markov Random Fields , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Emilia I. Barakova,et al.  Mirror neuron framework yields representations for robot interaction , 2009, Neurocomputing.