Recently researches on imitation learning have shown that Markov Decision Processes (MDPs) are a powerful way to characterize this problem. Inverse reinforcement learning tries to describe observed behavior by ascertaining a reward function (or respectively a cost function) by solving a Markov Decision Problem. This paper shows three different approaches to find an optimal policy which mimics observed behavior. The differences and issues will be pointed out and compared on some applications. The first approach handles different cases in which the policy and states are finite and known, the state size is continuous, and the policy is only known through a finite set of observed trajectories. The second approach LEARCH extends Maximum Margin Planning and is simpler to implement like many other approaches while satisfying constraints on the cost function in a more naturally way. The last approach is based on the principle of maximum entropy and reduces learning to the problem of recovering utility function that closely mimics demonstrated behavior.
[1]
Stuart J. Russell.
Learning agents for uncertain environments (extended abstract)
,
1998,
COLT' 98.
[2]
Andrew Y. Ng,et al.
Pharmacokinetics of a novel formulation of ivermectin after administration to goats
,
2000,
ICML.
[3]
Richard S. Sutton,et al.
Reinforcement Learning: An Introduction
,
1998,
IEEE Trans. Neural Networks.
[4]
J. Andrew Bagnell,et al.
Maximum margin planning
,
2006,
ICML.
[5]
Anind K. Dey,et al.
Maximum Entropy Inverse Reinforcement Learning
,
2008,
AAAI.
[6]
William W. Cohen,et al.
Proceedings of the 23rd international conference on Machine learning
,
2006,
ICML 2008.
[7]
David Silver,et al.
Learning to search: Functional gradient techniques for imitation learning
,
2009,
Auton. Robots.
[8]
Anind K. Dey,et al.
Modeling Interaction via the Principle of Maximum Causal Entropy
,
2010,
ICML.