论文信息 - Inverse Optimal Heuristic Control for Imitation Learning

Inverse Optimal Heuristic Control for Imitation Learning

One common approach to imitation learning is behavioral cloning (BC), which employs straightforward supervised learning (i.e., classification) to directly map observations to controls. A second approach is inverse optimal control (IOC), which formalizes the problem of learning sequential decision-making behavior over long horizons as a problem of recovering a utility function that explains observed behavior. This paper presents inverse optimal heuristic control (IOHC), a novel approach to imitation learning that capitalizes on the strengths of both paradigms. It employs long-horizon IOC-style modeling in a low-dimensional space where inference remains tractable, while incorporating an additional descriptive set of BC-style features to guide a higher-dimensional overall action selection. We provide experimental results demonstrating the capabilities of our model on a simple illustrative problem as well as on two real world problems: turn-prediction for taxi drivers, and pedestrian prediction within an office environment.

[1] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[2] Claude Sammut,et al. A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.

[3] Andrew McCallum,et al. Using Maximum Entropy for Text Classification , 1999 .

[4] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[5] Brian Roark,et al. Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[6] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[7] Yann LeCun,et al. Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[8] J. Andrew Bagnell,et al. Maximum margin planning , 2006, ICML.

[9] Brett Browning,et al. Learning to Predict Driver Route and Destination Intent , 2006, 2006 IEEE Intelligent Transportation Systems Conference.

[10] David M. Bradley,et al. Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[11] Pieter Abbeel,et al. Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[12] Csaba Szepesvári,et al. Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.

[13] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[14] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[15] Anind K. Dey,et al. Navigate like a cabbie: probabilistic reasoning from observed context-aware behavior , 2008, UbiComp.

[16] John Krumm. Number 2008-01-0195 A Markov Model for Driver Turn Prediction , 2008 .

[17] Oliver Brock,et al. High Performance Outdoor Navigation from Overhead Data using Imitation Learning , 2009 .