论文信息 - Discriminative Apprenticeship Learning with Both Preference and Non-preference Behavior

Discriminative Apprenticeship Learning with Both Preference and Non-preference Behavior

Considering that expert's demonstrations are usually sub optimal and failed demonstrations often have some useful guidance, in this paper, a Discriminative Apprenticeship Learning algorithm is proposed, where the apprentice is taught with the join of failed attempts to acquire the ability that could discriminate the preference and non-preference cases so that to actively take a corresponding action. Since robot usually encounters changing environments, generalization ability is taken into account in the algorithm through which the reward function is recovered under the evaluation of generalization error. The problem of the representation error is also analyzed and involved in the algorithm. To ensure performance of the algorithm, theoretical guarantee is presented. Experiments on a simple car-driving robot and the comparison with a variety of inverse reinforcement learning methods are performed, which illustrate the proposed method is an effective and promising alternative.

Yi Wang | Xihong Wu | Dingsheng Luo

[1] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[2] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[3] Aude Billard,et al. Donut as I do: Learning from failed demonstrations , 2011, 2011 IEEE International Conference on Robotics and Automation.

[4] J. Andrew Bagnell,et al. Maximum margin planning , 2006, ICML.

[5] Manuel Lopes,et al. Learning from Demonstration Using MDP Induced Metrics , 2010, ECML/PKDD.

[6] Pieter Abbeel,et al. Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[7] Robert E. Schapire,et al. A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[8] Pedro U. Lima,et al. Inverse reinforcement learning with evaluation , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[9] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[10] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[11] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.