Few-Shot Bayesian Imitation Learning with Logic over Programs

We describe an expressive class of policies that can be efficiently learned from a few demonstrations. Policies are represented as logical combinations of programs drawn from a small domain-specific language (DSL). We define a prior over policies with a probabilistic grammar and derive an approximate Bayesian inference algorithm to learn policies from demonstrations. In experiments, we study five strategy games played on a 2D grid with one shared DSL. After a few demonstrations of each game, the inferred policies generalize to new game instances that differ substantially from the demonstrations. We argue that the proposed method is an apt choice for policy learning tasks that have scarce training data and feature significant, structured variation between task instances.

[1]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[2]  Ion Stoica,et al.  DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations , 2017, CoRL.

[3]  Dileep George,et al.  Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics , 2017, ICML.

[4]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[5]  Dileep George,et al.  Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs , 2018, Science Robotics.

[6]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[7]  J. Tenenbaum A Bayesian framework for concept learning , 1999 .

[8]  Pravesh Ranchod,et al.  Nonparametric Bayesian reward segmentation for skill discovery using inverse reinforcement learning , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Jan Peters,et al.  Probabilistic inference for determining options in reinforcement learning , 2016, Machine Learning.

[10]  Xi Chen,et al.  Learning From Demonstration in the Wild , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[11]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[12]  J. R. Quinlan Induction of decision trees , 2004, Machine Learning.

[13]  Joshua B. Tenenbaum,et al.  Nonparametric Bayesian Policy Priors for Reinforcement Learning , 2010, NIPS.

[14]  Silvio Savarese,et al.  Neural Task Programming: Learning to Generalize Across Hierarchical Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Cynthia Rudin,et al.  A Bayesian Framework for Learning Rule Sets for Interpretable Classification , 2017, J. Mach. Learn. Res..

[16]  Scott Niekum,et al.  Semantically Grounded Learning from Unstructured Demonstrations , 2013 .

[17]  Leslie Pack Kaelbling,et al.  Bayesian Policy Search with Policy Priors , 2011, IJCAI.

[18]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[19]  Thomas G. Dietterich,et al.  Learning to Predict Sequences , 1985 .

[20]  Tom Silver,et al.  Behavior Is Everything: Towards Representing Concepts with Sensorimotor Contingencies , 2018, AAAI.

[21]  Noah D. Goodman,et al.  Concepts in a Probabilistic Language of Thought , 2014 .

[22]  Rajesh P. N. Rao,et al.  Embodiment is the foundation, not a level , 1996, Behavioral and Brain Sciences.

[23]  Dean Pomerleau,et al.  Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[24]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[25]  Timothy J. O'Donnell,et al.  Compositional Policy Priors , 2013 .

[26]  David Chapman,et al.  Pengi: An Implementation of a Theory of Activity , 1987, AAAI.

[27]  Noah D. Goodman,et al.  The logical primitives of thought: Empirical foundations for compositional cognitive models. , 2016, Psychological review.

[28]  Thomas L. Griffiths,et al.  A Rational Analysis of Rule-Based Concept Learning , 2008, Cogn. Sci..

[29]  Rina Dechter,et al.  AND/OR search spaces for graphical models , 2007, Artif. Intell..

[30]  Noah D. Goodman,et al.  A rational account of pedagogical reasoning: Teaching by, and learning from, examples , 2014, Cognitive Psychology.

[31]  Leslie G. Valiant,et al.  Learning Disjunction of Conjunctions , 1985, IJCAI.

[32]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[33]  S. Ullman Visual routines , 1984, Cognition.

[34]  Tom Michael Mitchell Version spaces: an approach to concept learning. , 1979 .

[35]  Armando Solar-Lezama,et al.  Learning to Infer Graphics Programs from Hand-Drawn Images , 2017, NeurIPS.

[36]  David Haussler,et al.  Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[37]  J. Tenenbaum,et al.  Generalization, similarity, and Bayesian inference. , 2001, The Behavioral and brain sciences.

[38]  Armando Solar-Lezama,et al.  Learning Libraries of Subroutines for Neurally-Guided Bayesian Program Induction , 2018, NeurIPS.