Imitation Learning in Relational Domains: A Functional-Gradient Boosting Approach

Imitation learning refers to the problem of learning how to behave by observing a teacher in action. We consider imitation learning in relational domains, in which there is a varying number of objects and relations among them. In prior work, simple relational policies are learned by viewing imitation learning as supervised learning of a function from states to actions. For propositional worlds, functional gradient methods have been proved to be beneficial. They are simpler to implement than most existing methods, more efficient, more naturally satisfy common constraints on the cost function, and better represent our prior beliefs about the form of the function. Building on recent generalizations of functional gradient boosting to relational representations, we implement a functional gradient boosting approach to imitation learning in relational domains. In particular, given a set of traces from the human teacher, our system learns a policy in the form of a set of relational regression trees that additively approximate the functional gradients. The use of multiple additive trees combined with relational representation allows for learning more expressive policies than what has been done before. We demonstrate the usefulness of our approach in several different domains.

[1]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[2]  Kristian Kersting,et al.  Boosting Relational Sequence Alignments , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[3]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[4]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[5]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[6]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[7]  Peter Stone,et al.  Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[8]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[9]  Robert Givan,et al.  Inductive Policy Selection for First-Order MDPs , 2002, UAI.

[10]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[11]  Roni Khardon,et al.  Stochastic Planning with First Order Decision Diagrams , 2008, ICAPS.

[12]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[13]  Vadim Bulitko,et al.  Batch Reinforcement Learning with State Importance , 2004, ECML.

[14]  Kristian Kersting,et al.  TildeCRF: Conditional Random Fields for Logical Sequences , 2006, ECML.

[15]  Luc De Raedt,et al.  Bellman goes relational , 2004, ICML.

[16]  Roland J. Zito-Wolf,et al.  Learning search control knowledge: An explanation-based approach , 1991, Machine Learning.

[17]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[18]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[19]  Kristian Kersting,et al.  Non-parametric policy gradients: a unified treatment of propositional and relational domains , 2008, ICML '08.

[20]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[21]  Gerald DeJong,et al.  Explanation-based manipulator learning: Acquisition of planning ability through observation , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[22]  Ben Taskar,et al.  Introduction to statistical relational learning , 2007 .

[23]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[24]  Gerald DeJong,et al.  BAGGER: An EBL System that Extends and Generalizes Explanations , 1987, AAAI.

[25]  Henry Lieberman Programming by example (introduction) , 2000, CACM.

[26]  Csaba Szepesvári,et al.  Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.

[27]  Kristian Kersting,et al.  Gradient-based boosting for statistical relational learning: The relational dependency network case , 2011, Machine Learning.

[28]  Ben Taskar,et al.  Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning) , 2007 .

[29]  Robert E. Schapire,et al.  A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[30]  Thomas G. Dietterich,et al.  Training conditional random fields via gradient tree boosting , 2004, ICML.

[31]  Sylvie Thiébaux,et al.  Exploiting First-Order Regression in Inductive Policy Selection , 2004, UAI.

[32]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[33]  Sylvain Calino,et al.  Robot programming by demonstration : a probabilistic approach , 2009 .

[34]  S. King Learning to fly. , 1998, Nursing times.

[35]  David Silver,et al.  Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[36]  Håkan L. S. Younes,et al.  The First Probabilistic Track of the International Planning Competition , 2005, J. Artif. Intell. Res..

[37]  Jude W. Shavlik,et al.  Relational Macros for Transfer in Reinforcement Learning , 2007, ILP.