论文信息 - Inverse Reinforcement Learning with PI 2

Inverse Reinforcement Learning with PI 2

We present an algorithm that recovers an unknown cost function from expert-demonstrated trajectories in continuous space. We assume that the cost function is a weighted linear combination of features, and we are able to learn weights that result in a cost function under which the expert demonstrated trajectories are optimal. Unlike previous approaches [1], [2], our algorithm does not require repeated solving of the forward problem (i.e., finding optimal trajectories under a candidate cost function). At the core of our approach is the PI (Policy Improvement with Path Integrals) reinforcement learning algorithm [3], which optimizes a parameterized policy in continuous space and high dimensions. PI boasts convergence that is an order of magnitude faster than previous trajectory-based reinforcement learning algorithms on typical problems. We solve for the unknown cost function by enforcing the constraint that the expert-demonstrated trajectory does not change under the PI update rule, and hence is locally optimal.

Evangelos A. Theodorou | S. Schaal | E. Theodorou | Mrinal Kalakrishnan

[1] Jun Nakanishi,et al. Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[2] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[3] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[4] David Silver,et al. Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[5] Stefan Schaal,et al. Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.