Few-Shot Bayesian Imitation Learning with Logical Program Policies

Humans can learn many novel tasks from a very small number (1–5) of demonstrations, in stark contrast to the data requirements of nearly tabula rasa deep learning methods. We propose an expressive class of policies, a strong but general prior, and a learning algorithm that, together, can learn interesting policies from very few examples. We represent policies as logical combinations of programs drawn from a domain-specific language (DSL), define a prior over policies with a probabilistic grammar, and derive an approximate Bayesian inference algorithm to learn policies from demonstrations. In experiments, we study six strategy games played on a 2D grid with one shared DSL. After a few demonstrations of each game, the inferred policies generalize to new game instances that differ substantially from the demonstrations. Our policy learning is 20–1,000x more data efficient than convolutional and fully convolutional policy learning and many orders of magnitude more computationally efficient than vanilla program induction. We argue that the proposed method is an apt choice for tasks that have scarce training data and feature significant, structured variation between task instances.

[1]  J. Tenenbaum A Bayesian framework for concept learning , 1999 .

[2]  Pravesh Ranchod,et al.  Nonparametric Bayesian reward segmentation for skill discovery using inverse reinforcement learning , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Rajeev Alur,et al.  Syntax-guided synthesis , 2013, FMCAD 2013.

[4]  Dileep George,et al.  Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics , 2017, ICML.

[5]  Armando Solar-Lezama,et al.  Learning Libraries of Subroutines for Neurally-Guided Bayesian Program Induction , 2018, NeurIPS.

[6]  Rajesh P. N. Rao,et al.  Embodiment is the foundation, not a level , 1996, Behavioral and Brain Sciences.

[7]  Timothy J. O'Donnell,et al.  Compositional Policy Priors , 2013 .

[8]  David Chapman,et al.  Pengi: An Implementation of a Theory of Activity , 1987, AAAI.

[9]  Rina Dechter,et al.  AND/OR search spaces for graphical models , 2007, Artif. Intell..

[10]  Noah D. Goodman,et al.  The logical primitives of thought: Empirical foundations for compositional cognitive models. , 2016, Psychological review.

[11]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[12]  Silvio Savarese,et al.  Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[14]  Tom Silver,et al.  Behavior Is Everything: Towards Representing Concepts with Sensorimotor Contingencies , 2018, AAAI.

[15]  Leslie G. Valiant,et al.  Learning Disjunction of Conjunctions , 1985, IJCAI.

[16]  Noah D. Goodman,et al.  Concepts in a Probabilistic Language of Thought , 2014 .

[17]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[18]  Joshua B. Tenenbaum,et al.  Nonparametric Bayesian Policy Priors for Reinforcement Learning , 2010, NIPS.

[19]  Abhinav Verma,et al.  Programmatically Interpretable Reinforcement Learning , 2018, ICML.

[20]  Noah D. Goodman,et al.  A rational account of pedagogical reasoning: Teaching by, and learning from, examples , 2014, Cognitive Psychology.

[21]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[22]  Ion Stoica,et al.  DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations , 2017, CoRL.

[23]  Dileep George,et al.  Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs , 2018, Science Robotics.

[24]  Hyeonwoo Noh,et al.  Neural Program Synthesis from Diverse Demonstration Videos , 2018, ICML.

[25]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[26]  Stefan Schaal,et al.  Learning from Demonstration , 1996, NIPS.

[27]  Armando Solar-Lezama,et al.  Learning to Infer Graphics Programs from Hand-Drawn Images , 2017, NeurIPS.

[28]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Jan Peters,et al.  Probabilistic inference for determining options in reinforcement learning , 2016, Machine Learning.

[30]  Matthew J. Hausknecht,et al.  Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis , 2018, ICLR.

[31]  Thomas L. Griffiths,et al.  A Rational Analysis of Rule-Based Concept Learning , 2008, Cogn. Sci..

[32]  Silvio Savarese,et al.  Neural Task Programming: Learning to Generalize Across Hierarchical Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Tom Michael Mitchell Version spaces: an approach to concept learning. , 1979 .

[34]  S. Ullman Visual routines , 1984, Cognition.

[35]  Kristian Kersting,et al.  Imitation Learning in Relational Domains: A Functional-Gradient Boosting Approach , 2011, IJCAI.

[36]  Arjun Radhakrishna,et al.  Scaling Enumerative Program Synthesis via Divide and Conquer , 2017, TACAS.

[37]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[38]  Lihong Li,et al.  Neuro-Symbolic Program Synthesis , 2016, ICLR.

[39]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[40]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[41]  Scott Niekum,et al.  Semantically Grounded Learning from Unstructured Demonstrations , 2013 .

[42]  Leslie Pack Kaelbling,et al.  Bayesian Policy Search with Policy Priors , 2011, IJCAI.

[43]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[44]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.