Inverse Constrained Reinforcement Learning

Standard reinforcement learning (RL) algorithms train agents to maximize given reward functions. However, many real-world applications of RL require agents to also satisfy certain constraints which may, for example, be motivated by safety concerns. Constrained RL algorithms approach this problem by training agents to maximize given reward functions while respecting \textit{explicitly} defined constraints. However, in many cases, manually designing accurate constraints is a challenging task. In this work, given a reward function and a set of demonstrations from an expert that maximizes this reward function while respecting \textit{unknown} constraints, we propose a framework to learn the most likely constraints that the expert respects. We then train agents to maximize the given reward function subject to the learned constraints. Previous works in this regard have either mainly been restricted to tabular settings or specific types of constraints or assume knowledge of transition dynamics of the environment. In contrast, we empirically show that our framework is able to learn arbitrary \textit{Markovian} constraints in high-dimensions in a model-free setting.

[1]  S. Levine,et al.  Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers , 2020, ICLR.

[2]  Dmitry Berenson,et al.  Learning Constraints From Locally-Optimal Demonstrations Under Cost Function Uncertainty , 2020, IEEE Robotics and Automation Letters.

[3]  S. Sastry,et al.  Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning , 2019, ICLR.

[4]  Miroslav Dudík,et al.  Reinforcement Learning with Convex Constraints , 2019, NeurIPS.

[5]  Yisong Yue,et al.  Batch Policy Learning under Constraints , 2019, ICML.

[6]  P. Abbeel,et al.  Preferences Implicit in the State of the World , 2019, ICLR.

[7]  Kee-Eung Kim,et al.  A Bayesian Approach to Generative Adversarial Imitation Learning , 2018, NeurIPS.

[8]  Shane Legg,et al.  Scalable agent alignment via reward modeling: a research direction , 2018, ArXiv.

[9]  Michael Gleicher,et al.  Inferring geometric constraints in human demonstrations , 2018, CoRL.

[10]  Shie Mannor,et al.  Reward Constrained Policy Optimization , 2018, ICLR.

[11]  Ofir Nachum,et al.  A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.

[12]  Laurent Orseau,et al.  AI Safety Gridworlds , 2017, ArXiv.

[13]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[14]  Anca D. Dragan,et al.  Active Preference-Based Learning of Reward Functions , 2017, Robotics: Science and Systems.

[15]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[16]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[17]  Julie A. Shah,et al.  C-LEARN: Learning geometric constraints from demonstrations for multi-step manipulation in shared autonomy , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Guan Wang,et al.  Interactive Learning from Policy-Dependent Human Feedback , 2017, ICML.

[19]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[20]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[21]  J. Schulman,et al.  OpenAI Gym , 2016, ArXiv.

[22]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Oliver Kroemer,et al.  Active Reward Learning , 2014, Robotics: Science and Systems.

[25]  Jonathan P. How,et al.  Bayesian Nonparametric Inverse Reinforcement Learning , 2012, ECML/PKDD.

[26]  Shalabh Bhatnagar,et al.  An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes , 2010, Syst. Control. Lett..

[27]  Brian D. Ziebart,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[28]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[29]  Rüdiger Dillmann,et al.  Learning sequential constraints of tasks from user demonstrations , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[30]  E. Altman Constrained Markov Decision Processes , 1999 .

[31]  Inverse Constrained Reinforcement Learning , 2021 .

[32]  Hongxia Jin,et al.  Text-Based Interactive Recommendation via Constraint-Augmented Reinforcement Learning , 2019, NeurIPS.

[33]  Dario Amodei,et al.  Benchmarking Safe Exploration in Deep Reinforcement Learning , 2019 .