Maximum Likelihood Constraint Inference from Stochastic Demonstrations

When an expert operates a safety-critical dynamic system, constraint information is tacitly contained in their demonstrated trajectories and controls. These constraints can be inferred by modeling the system and operator as a constrained Markov Decision Process and finding which constraint is most likely to generate the demonstrated controls. Prior constraint inference work has focused mainly on deterministic dynamics. Stochastic dynamics, however, can capture the uncertainty inherent to real applications and the risk tolerance that requires.This paper extends maximum likelihood constraint inference to stochastic applications by using maximum causal entropy likelihoods. Furthermore, this extension does not come at increased computational cost, as we derive an algorithm that computes constraint likelihood and risk tolerance in a unified Bellman backup, thereby keeping the same computational complexity.

[1]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[2]  Ian M. Mitchell,et al.  A Toolbox of Level Set Methods , 2005 .

[3]  Wolfram Burgard,et al.  The limits and potentials of deep learning for robotics , 2018, Int. J. Robotics Res..

[4]  Kevin M. Smith,et al.  A Risk-Sensitive Finite-Time Reachability Approach for Safety of Stochastic Dynamic Systems , 2019, 2019 American Control Conference (ACC).

[5]  Dmitry Berenson,et al.  Learning Object Orientation Constraints and Guiding Constraints for Narrow Passages from One Demonstration , 2016, ISER.

[6]  Leopoldo Armesto,et al.  Efficient learning of constraints and generic null space policies , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[7]  S. Shankar Sastry,et al.  Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning , 2020, ICLR.

[8]  R. E. Kalman,et al.  When Is a Linear Control System Optimal , 1964 .

[9]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[10]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[11]  S. Joe Qin,et al.  A survey of industrial model predictive control technology , 2003 .

[12]  Julie A. Shah,et al.  C-LEARN: Learning geometric constraints from demonstrations for multi-step manipulation in shared autonomy , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Anind K. Dey,et al.  Human Behavior Modeling with Maximum Entropy Inverse Optimal Control , 2009, AAAI Spring Symposium: Human Behavior Modeling.

[14]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[15]  James M. Rehg,et al.  Aggressive driving with model predictive path integral control , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Sanjit A. Seshia,et al.  Learning Task Specifications from Demonstrations , 2017, NeurIPS.

[17]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[18]  S. Shankar Sastry,et al.  Gradient-based inverse risk-sensitive reinforcement learning , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[19]  S. Shankar Sastry,et al.  Provably safe and robust learning-based model predictive control , 2011, Autom..

[20]  Hilbert J. Kappen,et al.  Risk Sensitive Path Integral Control , 2010, UAI.

[21]  Dmitry Berenson,et al.  Learning Parametric Constraints in High Dimensions from Demonstrations , 2019, CoRL.

[22]  Mohammad Shahidehpour,et al.  Enhancing the Dispatchability of Variable Wind Generation by Coordination With Pumped-Storage Hydro Units in Stochastic Power Systems , 2013, IEEE Transactions on Power Systems.

[23]  Dmitry Berenson,et al.  Learning constraints from demonstrations with grid and parametric representations , 2018, WAFR.