Active Learning for Reward Estimation in Inverse Reinforcement Learning

Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, we introduce active learning for inverse reinforcement learning. We propose an algorithm that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at "arbitrary" states. The purpose of our algorithm is to estimate the reward function with similar accuracy as other methods from the literature while reducing the amount of policy samples required from the expert. We also discuss the use of our algorithm in higher dimensional problems, using both Monte Carlo and gradient methods. We present illustrative results of our algorithm in several simulated examples of different complexities.

[1]  Andrew P. Sage,et al.  Uncertainty in Artificial Intelligence , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[3]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[4]  K. Dautenhahn,et al.  Imitation in Animals and Artifacts , 2002 .

[5]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[6]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[7]  José Santos-Victor,et al.  A unified framework for imitation-like behaviors , 2007 .

[8]  Csaba Szepesvári,et al.  Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.

[9]  S. Timmer,et al.  Fitted Q Iteration with CMACs , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[10]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[11]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[12]  Michael H. Bowling,et al.  Apprenticeship learning using linear programming , 2008, ICML '08.

[13]  Manuel Lopes,et al.  A Computational Model of Social-Learning Mechanisms , 2009, Adapt. Behav..

[14]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[15]  Bart De Schutter,et al.  Approximate Dynamic Programming and Reinforcement Learning , 2010, Interactive Collaborative Information Systems.