论文信息 - Online Inverse Reinforcement Learning via Bellman Gradient Iteration

Online Inverse Reinforcement Learning via Bellman Gradient Iteration

This paper develops an online inverse reinforcement learning algorithm aimed at efficiently recovering a reward function from ongoing observations of an agent's actions. To reduce the computation time and storage space in reward estimation, this work assumes that each observed action implies a change of the Q-value distribution, and relates the change to the reward function via the gradient of Q-value with respect to reward function parameter. The gradients are computed with a novel Bellman Gradient Iteration method that allows the reward function to be updated whenever a new observation is available. The method's convergence to a local optimum is proved. This work tests the proposed method in two simulated environments, and evaluates the algorithm's performance under a linear reward function and a non-linear reward function. The results show that the proposed algorithm only requires a limited computation time and storage space, but achieves an increasing accuracy as the number of observations grows. We also present a potential application to robot cleaners at home.

Kun Li | Joel W. Burdick

[1] Jean-Paul Laumond,et al. From human to humanoid locomotion—an inverse optimal control approach , 2010, Auton. Robots.

[2] Markus Wulfmeier,et al. Deep Inverse Reinforcement Learning , 2015, ArXiv.

[3] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[4] Sergey Levine,et al. Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[5] J. Andrew Bagnell,et al. Maximum margin planning , 2006, ICML.

[6] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[7] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[8] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[9] Kee-Eung Kim,et al. Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions , 2012, NIPS.

[10] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[11] Christos Dimitrakakis,et al. Bayesian Multitask Inverse Reinforcement Learning , 2011, EWRL.

[12] Csaba Szepesvári,et al. Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.

[13] Kian Hsiang Low,et al. Inverse Reinforcement Learning with Locally Consistent Reward Functions , 2015, NIPS.

[14] Sergey Levine,et al. Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16] Kee-Eung Kim,et al. Inverse Reinforcement Learning in Partially Observable Environments , 2009, IJCAI.