Multi-task Maximum Entropy Inverse Reinforcement Learning

Multi-task Inverse Reinforcement Learning (IRL) is the problem of inferring multiple reward functions from expert demonstrations. Prior work, built on Bayesian IRL, is unable to scale to complex environments due to computational constraints. This paper contributes a formulation of multi-task IRL in the more computationally efficient Maximum Causal Entropy (MCE) IRL framework. Experiments show our approach can perform one-shot imitation learning in a gridworld environment that single-task IRL algorithms need hundreds of demonstrations to solve. We outline preliminary work using meta-learning to extend our method to the function approximator setting of modern MCE IRL algorithms. Evaluating on multi-task variants of common simulated robotics benchmarks, we discover serious limitations of these IRL algorithms, and conclude with suggestions for further work.

[1]  Charles A. Sutton,et al.  VEEGAN: Reducing Mode Collapse in GANs using Implicit Variational Learning , 2017, NIPS.

[2]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[3]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[4]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[5]  Michael L. Littman,et al.  Apprenticeship Learning About Multiple Intentions , 2011, ICML.

[6]  Christos Dimitrakakis,et al.  Bayesian Multitask Inverse Reinforcement Learning , 2011, EWRL.

[7]  Stefano Ermon,et al.  Multi-Agent Generative Adversarial Imitation Learning , 2018, NeurIPS.

[8]  Nando de Freitas,et al.  Robust Imitation of Diverse Behaviors , 2017, NIPS.

[9]  Kavosh Asadi,et al.  An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.

[10]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[11]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[12]  Gaurav S. Sukhatme,et al.  Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets , 2017, NIPS.

[13]  Anind K. Dey,et al.  Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.

[14]  Nan Jiang,et al.  Repeated Inverse Reinforcement Learning , 2017, NIPS.

[15]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[16]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[17]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[18]  Anca D. Dragan,et al.  Learning a Prior over Intent via Meta-Inverse Reinforcement Learning , 2018, ICML.

[19]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[20]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[21]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[22]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Kee-Eung Kim,et al.  Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions , 2012, NIPS.

[24]  Stefano Ermon,et al.  InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations , 2017, NIPS.