Generalized Inverse Planning: Learning Lifted non-Markovian Utility for Generalizable Task Representation

In searching for a generalizable representation of temporally extended tasks, we spot two necessary constituents: the utility needs to be non-Markovian to transfer temporal relations invariant to a probability shift, the utility also needs to be lifted to abstract out specific grounding objects. In this work, we study learning such utility from human demonstrations. While inverse reinforcement learning (IRL) has been accepted as a general framework of utility learning, its fundamental formulation is one concrete Markov Decision Process. Thus the learned reward function does not specify the task independently of the environment. Going beyond that, we define a domain of generalization that spans a set of planning problems following a schema. We hence propose a new quest, Generalized Inverse Planning, for utility learning in this domain. We further outline a computational framework, Maximum Entropy Inverse Planning (MEIP), that learns non-Markovian utility and associated concepts in a generative manner. The learned utility and concepts form a task representation that generalizes regardless of probability shift or structural change. Seeing that the proposed generalization problem has not been widely studied yet, we carefully define an evaluation protocol, with which we illustrate the effectiveness of MEIP on two proof-of-concept domains and one challenging task: learning to fold from demonstrations.

[1]  David Mumford,et al.  The 2.1-D sketch , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[2]  Leslie Pack Kaelbling,et al.  Few-Shot Bayesian Imitation Learning with Logical Program Policies , 2020, AAAI.

[3]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[4]  Razvan Pascanu,et al.  Deep reinforcement learning with relational inductive biases , 2018, ICLR.

[5]  Hector Geffner,et al.  Learning Generalized Policies from Planning Examples Using Concept Languages , 2004, Applied Intelligence.

[6]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Werner Nutt,et al.  The Complexity of Concept Languages , 1997, KR.

[9]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[10]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[11]  Ulf Grenander,et al.  General Pattern Theory: A Mathematical Study of Regular Structures , 1993 .

[12]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[13]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[14]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[15]  Sanjit A. Seshia,et al.  Learning Task Specifications from Demonstrations , 2017, NeurIPS.

[16]  Katherine D. Kinzler,et al.  Core knowledge. , 2007, Developmental science.

[17]  Manuel Lopes,et al.  Inverse Reinforcement Learning in Relational Domains , 2015, IJCAI.

[18]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[19]  Daniel A. Braun,et al.  Thermodynamics as a theory of decision-making with information-processing costs , 2012, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[20]  Kristian Kersting,et al.  Imitation Learning in Relational Domains: A Functional-Gradient Boosting Approach , 2011, IJCAI.

[21]  Gillian M. Hayes,et al.  A Robot Controller Using Learning by Imitation , 1994 .

[22]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[23]  De,et al.  Relational Reinforcement Learning , 2001, Encyclopedia of Machine Learning and Data Mining.

[24]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[25]  Chris L. Baker,et al.  Action understanding as inverse planning , 2009, Cognition.

[26]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[27]  Blai Bonet,et al.  Features, Projections, and Representation Change for Generalized Planning , 2018, IJCAI.

[28]  R. Weale Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[29]  Kristian Kersting,et al.  Non-parametric policy gradients: a unified treatment of propositional and relational domains , 2008, ICML '08.

[30]  Neil Immerman,et al.  A new representation and associated algorithms for generalized planning , 2011, Artif. Intell..

[31]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[32]  R. Amit,et al.  Learning movement sequences from demonstration , 2002, Proceedings 2nd International Conference on Development and Learning. ICDL 2002.

[33]  Sheila A. McIlraith,et al.  Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning , 2018, ICML.

[34]  Tomaso Poggio,et al.  From Understanding Computation to Understanding Neural Circuitry , 1976 .

[35]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[36]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[37]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[38]  Leslie Pack Kaelbling,et al.  Learning to Rank for Synthesizing Planning Heuristics , 2016, IJCAI.

[39]  Tian Han,et al.  A Tale of Three Probabilistic Families: Discriminative, Descriptive and Generative Models , 2018, Quarterly of Applied Mathematics.

[40]  Alan Fern,et al.  Learning Linear Ranking Functions for Beam Search with Application to Planning , 2009, J. Mach. Learn. Res..

[41]  Sergey Levine,et al.  A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models , 2016, ArXiv.

[42]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[43]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[44]  Calin Belta,et al.  Reinforcement learning with temporal logic rewards , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[45]  Ufuk Topcu,et al.  Environment-Independent Task Specifications via GLTL , 2017, ArXiv.