Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning

While most approaches to the problem of Inverse Reinforcement Learning (IRL) focus on estimating a reward function that best explains an expert agent's policy or demonstrated behavior on a control task, it is often the case that such behavior is more succinctly represented by a simple reward combined with a set of hard constraints. In this setting, the agent is attempting to maximize cumulative rewards subject to these given constraints on their behavior. We reformulate the problem of IRL on Markov Decision Processes (MDPs) such that, given a nominal model of the environment and a nominal reward function, we seek to estimate state, action, and feature constraints in the environment that motivate an agent's behavior. Our approach is based on the Maximum Entropy IRL framework, which allows us to reason about the likelihood of an expert agent's demonstrations given our knowledge of an MDP. Using our method, we can infer which constraints can be added to the MDP to most increase the likelihood of observing these demonstrations. We present an algorithm which iteratively infers the Maximum Likelihood Constraint to best explain observed behavior, and we evaluate its efficacy using both simulated behavior and recorded data of humans navigating around an obstacle.

[1]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[2]  Sanjit A. Seshia,et al.  Learning Task Specifications from Demonstrations , 2017, NeurIPS.

[3]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[4]  Dmitry Berenson,et al.  Learning constraints from demonstrations with grid and parametric representations , 2018, WAFR.

[5]  Rüdiger Dillmann,et al.  Learning sequential constraints of tasks from user demonstrations , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[6]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[7]  Fiery Cushman,et al.  Teaching with Rewards and Punishments: Reinforcement or Communication? , 2015, CogSci.

[8]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[9]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[10]  David L. McPherson,et al.  Modeling Supervisor Safe Sets for Improving Collaboration in Human-Robot Teams , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[12]  Julie A. Shah,et al.  C-LEARN: Learning geometric constraints from demonstrations for multi-step manipulation in shared autonomy , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[14]  Kai Oliver Arras,et al.  Inverse Reinforcement Learning algorithms and features for robot navigation in crowds: An experimental comparison , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  D. Hochbaum,et al.  Analysis of the greedy approach in problems of maximum k‐coverage , 1998 .

[16]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[17]  Michael Gleicher,et al.  Inferring geometric constraints in human demonstrations , 2018, CoRL.

[18]  R. E. Kalman,et al.  When Is a Linear Control System Optimal , 1964 .

[19]  S. Shankar Sastry,et al.  Haptic Assistance via Inverse Reinforcement Learning , 2018, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC).