论文信息 - Apprenticeship learning using linear programming

Apprenticeship learning using linear programming

In apprenticeship learning, the goal is to learn a policy in a Markov decision process that is at least as good as a policy demonstrated by an expert. The difficulty arises in that the MDP's true reward function is assumed to be unknown. We show how to frame apprenticeship learning as a linear programming problem, and show that using an off-the-shelf LP solver to solve this problem results in a substantial improvement in running time over existing methods---up to two orders of magnitude faster in our experiments. Additionally, our approach produces stationary policies, while all existing methods for apprenticeship learning output policies that are "mixed", i.e. randomized combinations of stationary policies. The technique used is general enough to convert any mixed policy to a stationary policy.

[1] R. Varga,et al. Proof of Theorem 2 , 1983 .

[2] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[3] Shu-Cherng Fang,et al. Linear Optimization and Extensions: Theory and Algorithms , 1993 .

[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5] A. Shwartz,et al. Handbook of Markov decision processes : methods and applications , 2002 .

[6] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[7] J. Andrew Bagnell,et al. Maximum margin planning , 2006, ICML.

[8] Michael H. Bowling,et al. Computing Robust Counter-Strategies , 2007, NIPS.

[9] Tao Wang,et al. Stable Dual Dynamic Programming , 2007, NIPS.

[10] Robert E. Schapire,et al. A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.