We present an algorithm that recovers an unknown cost function from expert-demonstrated trajectories in continuous space. We assume that the cost function is a weighted linear combination of features, and we are able to learn weights that result in a cost function under which the expert demonstrated trajectories are optimal. Unlike previous approaches [1], [2], our algorithm does not require repeated solving of the forward problem (i.e., finding optimal trajectories under a candidate cost function). At the core of our approach is the PI (Policy Improvement with Path Integrals) reinforcement learning algorithm [3], which optimizes a parameterized policy in continuous space and high dimensions. PI boasts convergence that is an order of magnitude faster than previous trajectory-based reinforcement learning algorithms on typical problems. We solve for the unknown cost function by enforcing the constraint that the expert-demonstrated trajectory does not change under the PI update rule, and hence is locally optimal.
[1]
Jun Nakanishi,et al.
Learning Attractor Landscapes for Learning Motor Primitives
,
2002,
NIPS.
[2]
Pieter Abbeel,et al.
Apprenticeship learning via inverse reinforcement learning
,
2004,
ICML.
[3]
Stefan Schaal,et al.
2008 Special Issue: Reinforcement learning of motor skills with policy gradients
,
2008
.
[4]
David Silver,et al.
Learning to search: Functional gradient techniques for imitation learning
,
2009,
Auton. Robots.
[5]
Stefan Schaal,et al.
Reinforcement learning of motor skills in high dimensions: A path integral approach
,
2010,
2010 IEEE International Conference on Robotics and Automation.