Risk-sensitive Inverse Reinforcement Learning via Coherent Risk Models

The literature on Inverse Reinforcement Learning (IRL) typically assumes that humans take actions in order to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive IRL in order to explicitly account for an expert’s risk sensitivity. To this end, we propose a flexible class of models based on coherent risk metrics, which allow us to capture an entire spectrum of risk preferences from risk-neutral to worst-case. We propose efficient algorithms based on Linear Programming for inferring an expert’s underlying risk metric and cost function for a rich class of static and dynamic decision-making settings. The resulting approach is demonstrated on a simulated driving game with ten human participants. Our method is able to infer and mimic a wide range of qualitatively different driving styles from highly risk-averse to risk-neutral in a data-efficient manner. Moreover, comparisons of the RiskSensitive (RS) IRL approach with a risk-neutral model show that the RS-IRL framework more accurately captures observed participant behavior both qualitatively and quantitatively.

[1]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[2]  M. Allais Le comportement de l'homme rationnel devant le risque : critique des postulats et axiomes de l'ecole americaine , 1953 .

[3]  D. Ellsberg Decision, probability, and utility: Risk, ambiguity, and the Savage axioms , 1961 .

[4]  R. Howard,et al.  Risk-Sensitive Markov Decision Processes , 1972 .

[5]  Jerzy A. Filar,et al.  Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..

[6]  I. Gilboa,et al.  Maxmin Expected Utility with Non-Unique Prior , 1989 .

[7]  Stuart J. Russell Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.

[8]  Philippe Artzner,et al.  Coherent Measures of Risk , 1999 .

[9]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[10]  M. Rabin Risk Aversion and Expected Utility Theory: A Calibration Theorem , 2000 .

[11]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[12]  D. Tasche,et al.  On the coherence of expected shortfall , 2001, cond-mat/0104295.

[13]  Michael A. Saunders,et al.  SNOPT: An SQP Algorithm for Large-Scale Constrained Optimization , 2002, SIAM J. Optim..

[14]  Abaxbank,et al.  Spectral Measures of Risk : a Coherent Representation of Subjective Risk Aversion , 2002 .

[15]  Ralph Neuneier,et al.  Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.

[16]  Kenneth Holmström,et al.  The TOMLAB Optimization Environment , 2004 .

[17]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[18]  Werner Römisch,et al.  Polyhedral Risk Measures in Stochastic Programming , 2005, SIAM J. Optim..

[19]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[20]  Colin Camerer,et al.  Neural Systems Responding to Degrees of Uncertainty in Human Decision-Making , 2005, Science.

[21]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[22]  A. Tversky,et al.  Prospect theory: an analysis of decision under risk — Source link , 2007 .

[23]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[24]  R. Tyrrell Rockafellar,et al.  Coherent Approaches to Risk in Optimization Under Uncertainty , 2007 .

[25]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[26]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[27]  Siddhartha S. Srinivasa,et al.  Planning-based prediction for pedestrians , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28]  Alexander Shapiro,et al.  On a time consistency concept in risk averse multistage stochastic programming , 2009, Oper. Res. Lett..

[29]  Christopher G. Atkeson,et al.  An optimization approach to rough terrain locomotion , 2010, 2010 IEEE International Conference on Robotics and Automation.

[30]  Jean-Paul Laumond,et al.  From human to humanoid locomotion—an inverse optimal control approach , 2010, Auton. Robots.

[31]  Andrzej Ruszczynski,et al.  Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..

[32]  Shie Mannor,et al.  Distributionally Robust Markov Decision Processes , 2010, Math. Oper. Res..

[33]  John D. Hey,et al.  The descriptive and predictive adequacy of theories of decision making under uncertainty/ambiguity , 2010, Experiments in Economics.

[34]  I. Gilboa,et al.  Advances in Economics and Econometrics: Ambiguity and the Bayesian Paradigm , 2011 .

[35]  Kevin Waugh,et al.  Computational Rationalization: The Inverse Equilibrium Problem , 2011, ICML.

[36]  Nicole Bäuerle,et al.  Markov Decision Processes with Average-Value-at-Risk criteria , 2011, Math. Methods Oper. Res..

[37]  Daeyeol Lee,et al.  Neuroeconomics , 2012, Front. Behav. Neurosci..

[38]  N. Barberis Thirty Years of Prospect Theory in Economics: A Review and Assessment , 2012 .

[39]  Takayuki Osogami,et al.  Robustness and risk-sensitivity in Markov decision processes , 2012, NIPS.

[40]  Marek Petrik,et al.  An Approximate Solution Method for Large Risk-Averse Markov Decision Processes , 2012, UAI.

[41]  Shie Mannor,et al.  Policy Gradients with Variance Related Risk Criteria , 2012, ICML.

[42]  Sergey Levine,et al.  Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[43]  S. Levine,et al.  Inverse Optimal Control for Humanoid Locomotion , 2013 .

[44]  Marco Pavone,et al.  A framework for time-consistent, risk-averse model predictive control: Theory and algorithms , 2014, 2014 American Control Conference.

[45]  Klaus Obermayer,et al.  Risk-Sensitive Reinforcement Learning , 2013, Neural Computation.

[46]  Wolfram Burgard,et al.  Learning driving styles for autonomous vehicles from demonstration , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Inverse KKT - Learning Cost Functions of Manipulation Tasks from Demonstrations , 2017, ISRR.

[48]  Shie Mannor,et al.  Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.

[49]  Dirk Wollherr,et al.  Towards Assessing the Human Trajectory Planning Horizon , 2016, PloS one.

[50]  Anca D. Dragan,et al.  Planning for Autonomous Cars that Leverage Effects on Human Actions , 2016, Robotics: Science and Systems.

[51]  Girish Chowdhary,et al.  Data-driven prediction of EVAR with confidence in time-varying datasets , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[52]  Marco Pavone,et al.  A Framework for Time-Consistent, Risk-Sensitive Model Predictive Control: Theory and Algorithms , 2019, IEEE Transactions on Automatic Control.