论文信息 - Active Learning for Risk-Sensitive Inverse Reinforcement Learning

Active Learning for Risk-Sensitive Inverse Reinforcement Learning

One typical assumption in inverse reinforcement learning (IRL) is that human experts act to optimize the expected utility of a stochastic cost with a fixed distribution. This assumption deviates from actual human behaviors under ambiguity. Risk-sensitive inverse reinforcement learning (RS-IRL) bridges such gap by assuming that humans act according to a random cost with respect to a set of subjectively distorted distributions instead of a fixed one. Such assumption provides the additional flexibility to model human's risk preferences, represented by a risk envelope, in safe-critical tasks. However, like other learning from demonstration techniques, RS-IRL could also suffer inefficient learning due to redundant demonstrations. Inspired by the concept of active learning, this research derives a probabilistic disturbance sampling scheme to enable an RS-IRL agent to query expert support that is likely to expose unrevealed boundaries of the expert's risk envelope. Experimental results confirm that our approach accelerates the convergence of RS-IRL algorithms with lower variance while still guaranteeing unbiased convergence.

[1] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[2] J. Andrew Bagnell,et al. Maximum margin planning , 2006, ICML.

[3] S. Levine,et al. Inverse Optimal Control for Humanoid Locomotion , 2013 .

[4] Edmund H. Durfee,et al. Selecting Operator Queries Using Expected Myopic Gain , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[5] I. Gilboa,et al. Advances in Economics and Econometrics: Ambiguity and the Bayesian Paradigm , 2011 .

[6] Meng Joo Er,et al. A survey of inverse reinforcement learning techniques , 2012, Int. J. Intell. Comput. Cybern..

[7] Sergey Levine,et al. Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[8] Pieter Abbeel,et al. Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[9] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[10] Philippe Artzner,et al. Coherent Measures of Risk , 1999 .

[11] Peter Englert,et al. Inverse KKT - Learning Cost Functions of Manipulation Tasks from Demonstrations , 2017, ISRR.

[12] Marco Pavone,et al. Risk-sensitive Inverse Reinforcement Learning via Coherent Risk Models , 2017, Robotics: Science and Systems.

[13] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.

[14] David Silver,et al. Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[15] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[16] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[17] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[18] Sriraam Natarajan,et al. Active Advice Seeking for Inverse Reinforcement Learning , 2015, AAAI.

[19] Manuel Lopes,et al. Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.

[20] J. Lofberg,et al. YALMIP : a toolbox for modeling and optimization in MATLAB , 2004, 2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508).

[21] Marco Pavone,et al. Risk-sensitive inverse reinforcement learning via semi- and non-parametric methods , 2017, Int. J. Robotics Res..

[22] Germán Ros,et al. CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[23] D. Ellsberg. Decision, probability, and utility: Risk, ambiguity, and the Savage axioms , 1961 .