Generalized Maximum Causal Entropy for Inverse Reinforcement Learning

We consider the problem of learning from demonstrated trajectories with inverse reinforcement learning (IRL). Motivated by a limitation of the classical maximum entropy model in capturing the structure of the network of states, we propose an IRL model based on a generalized version of the causal entropy maximization problem, which allows us to generate a class of maximum entropy IRL models. Our generalized model has an advantage of being able to recover, in addition to a reward function, another expert's function that would (partially) capture the impact of the connecting structure of the states on experts' decisions. Empirical evaluation on a real-world dataset and a grid-world dataset shows that our generalized model outperforms the classical ones, in terms of recovering reward functions and demonstrated trajectories.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  Generalized entropy models , 2016 .

[3]  Markus Wulfmeier,et al.  Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.

[4]  A. Dawid,et al.  Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory , 2004, math/0410076.

[5]  Stuart J. Russell Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.

[6]  David Silver,et al.  Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[7]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[8]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[9]  André de Palma,et al.  E M ] 2 6 Se p 20 17 Discrete Choice and Rational Inattention : a General Equivalence Result ∗ , 2018 .

[10]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[11]  Sergey Levine,et al.  A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models , 2016, ArXiv.

[12]  Anind K. Dey,et al.  Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.

[13]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[14]  Anders Karlström,et al.  A link based network route choice model with unrestricted choice set , 2013 .

[15]  Sergey Levine,et al.  Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[16]  Haim H. Permuter,et al.  On directed information and gambling , 2008, 2008 IEEE International Symposium on Information Theory.

[17]  Pieter Abbeel,et al.  Inverse Reinforcement Learning , 2010, Encyclopedia of Machine Learning and Data Mining.

[18]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[19]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[20]  Gerhard Kramer,et al.  Directed information for channels with feedback , 1998 .

[21]  R. Bellman Dynamic programming. , 1957, Science.

[22]  K. Train Discrete Choice Methods with Simulation , 2003 .

[23]  Tien Mai,et al.  A nested recursive logit model for route choice analysis , 2015 .