Active Learning for Risk-Sensitive Inverse Reinforcement Learning

One typical assumption in inverse reinforcement learning (IRL) is that human experts act to optimize the expected utility of a stochastic cost with a fixed distribution. This assumption deviates from actual human behaviors under ambiguity. Risk-sensitive inverse reinforcement learning (RS-IRL) bridges such gap by assuming that humans act according to a random cost with respect to a set of subjectively distorted distributions instead of a fixed one. Such assumption provides the additional flexibility to model human's risk preferences, represented by a risk envelope, in safe-critical tasks. However, like other learning from demonstration techniques, RS-IRL could also suffer inefficient learning due to redundant demonstrations. Inspired by the concept of active learning, this research derives a probabilistic disturbance sampling scheme to enable an RS-IRL agent to query expert support that is likely to expose unrevealed boundaries of the expert's risk envelope. Experimental results confirm that our approach accelerates the convergence of RS-IRL algorithms with lower variance while still guaranteeing unbiased convergence.

[1]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[2]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[3]  S. Levine,et al.  Inverse Optimal Control for Humanoid Locomotion , 2013 .

[4]  Edmund H. Durfee,et al.  Selecting Operator Queries Using Expected Myopic Gain , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[5]  I. Gilboa,et al.  Advances in Economics and Econometrics: Ambiguity and the Bayesian Paradigm , 2011 .

[6]  Meng Joo Er,et al.  A survey of inverse reinforcement learning techniques , 2012, Int. J. Intell. Comput. Cybern..

[7]  Sergey Levine,et al.  Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[8]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[9]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[10]  Philippe Artzner,et al.  Coherent Measures of Risk , 1999 .

[11]  Peter Englert,et al.  Inverse KKT - Learning Cost Functions of Manipulation Tasks from Demonstrations , 2017, ISRR.

[12]  Marco Pavone,et al.  Risk-sensitive Inverse Reinforcement Learning via Coherent Risk Models , 2017, Robotics: Science and Systems.

[13]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[14]  David Silver,et al.  Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[15]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[16]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[17]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[18]  Sriraam Natarajan,et al.  Active Advice Seeking for Inverse Reinforcement Learning , 2015, AAAI.

[19]  Manuel Lopes,et al.  Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.

[20]  J. Lofberg,et al.  YALMIP : a toolbox for modeling and optimization in MATLAB , 2004, 2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508).

[21]  Marco Pavone,et al.  Risk-sensitive inverse reinforcement learning via semi- and non-parametric methods , 2017, Int. J. Robotics Res..

[22]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[23]  D. Ellsberg Decision, probability, and utility: Risk, ambiguity, and the Savage axioms , 1961 .