Inverse Optimal Heuristic Control for Imitation Learning

One common approach to imitation learning is behavioral cloning (BC), which employs straightforward supervised learning (i.e., classification) to directly map observations to controls. A second approach is inverse optimal control (IOC), which formalizes the problem of learning sequential decision-making behavior over long horizons as a problem of recovering a utility function that explains observed behavior. This paper presents inverse optimal heuristic control (IOHC), a novel approach to imitation learning that capitalizes on the strengths of both paradigms. It employs long-horizon IOC-style modeling in a low-dimensional space where inference remains tractable, while incorporating an additional descriptive set of BC-style features to guide a higher-dimensional overall action selection. We provide experimental results demonstrating the capabilities of our model on a simple illustrative problem as well as on two real world problems: turn-prediction for taxi drivers, and pedestrian prediction within an office environment.

[1]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[2]  Claude Sammut,et al.  A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.

[3]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[4]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[5]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[6]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[7]  Yann LeCun,et al.  Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[8]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[9]  Brett Browning,et al.  Learning to Predict Driver Route and Destination Intent , 2006, 2006 IEEE Intelligent Transportation Systems Conference.

[10]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[11]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[12]  Csaba Szepesvári,et al.  Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.

[13]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[14]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[15]  Anind K. Dey,et al.  Navigate like a cabbie: probabilistic reasoning from observed context-aware behavior , 2008, UbiComp.

[16]  John Krumm Number 2008-01-0195 A Markov Model for Driver Turn Prediction , 2008 .

[17]  Oliver Brock,et al.  High Performance Outdoor Navigation from Overhead Data using Imitation Learning , 2009 .