Inverse Reinforcement Learning from Failure

Inverse reinforcement learning (IRL) allows autonomous agents to learn to solve complex tasks from successful demonstrations. However, in many settings, e.g., when a human learns the task by trial and error, failed demonstrations are also readily available. In addition, in some tasks, purposely generating failed demonstrations may be easier than generating successful ones. Since existing IRL methods cannot make use of failed demonstrations, in this paper we propose inverse reinforcement learning from failure (IRLF) which exploits both successful and failed demonstrations. Starting from the state-of-the-art maximum causal entropy IRL method, we propose a new constrained optimisation formulation that accommodates both types of demonstrations while remaining convex. We then derive update rules for learning reward functions and policies. Experiments on both simulated and real-robot data demonstrate that IRLF converges faster and generalises better than maximum causal entropy IRL, especially when few successful demonstrations are available.

[1]  Craig Boutilier,et al.  Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..

[2]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[3]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[4]  Henrik I. Christensen,et al.  Embodied Social Interaction for Service Robots in Hallway Environments , 2005, FSR.

[5]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[6]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[7]  Robert E. Schapire,et al.  A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[8]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[9]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[10]  Csaba Szepesvári,et al.  Training parsers by inverse reinforcement learning , 2009, Machine Learning.

[11]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[12]  Christian Vollmer,et al.  Learning to navigate through crowded environments , 2010, 2010 IEEE International Conference on Robotics and Automation.

[13]  Sergey Levine,et al.  Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[14]  Christos Dimitrakakis,et al.  Bayesian Multitask Inverse Reinforcement Learning , 2011, EWRL.

[15]  Christos Dimitrakakis,et al.  Preference elicitation and inverse reinforcement learning , 2011, ECML/PKDD.

[16]  Michael L. Littman,et al.  Apprenticeship Learning About Multiple Intentions , 2011, ICML.

[17]  Aude Billard,et al.  Donut as I do: Learning from failed demonstrations , 2011, 2011 IEEE International Conference on Robotics and Automation.

[18]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[19]  Anind K. Dey,et al.  The Principle of Maximum Causal Entropy for Estimating Interacting Processes , 2013, IEEE Transactions on Information Theory.

[20]  Siyuan Liu,et al.  Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise , 2014, AAAI.

[21]  Kai Oliver Arras,et al.  Inverse Reinforcement Learning algorithms and features for robot navigation in crowds: An experimental comparison , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Wolfram Burgard,et al.  Learning driving styles for autonomous vehicles from demonstration , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Kyungjae Lee,et al.  Leveraged non-stationary Gaussian process regression for autonomous robot navigation , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).