Deep Bayesian Nonparametric Learning of Rules and Plans from Demonstrations with a Learned Automaton Prior

We introduce a method to learn imitative policies from expert demonstrations that are interpretable and manipulable. We achieve interpretability by modeling the interactions between high-level actions as an automaton with connections to formal logic. We achieve manipulability by integrating this automaton into planning, so that changes to the automaton have predictable effects on the learned behavior. These qualities allow a human user to first understand what the model has learned, and then either correct the learned behavior or zero-shot generalize to new, similar tasks. We build upon previous work by no longer requiring additional supervised information which is hard to collect in practice. We achieve this by using a deep Bayesian nonparametric hierarchical model. We test our model on several domains and also show results for a real-world implementation on a mobile robotic arm platform.

[1]  Matthew A. Wilson,et al.  Bayesian nonparametric methods for discovering latent structures of rat hippocampal ensemble spikes , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[2]  Lydia E. Kavraki,et al.  The Open Motion Planning Library , 2012, IEEE Robotics & Automation Magazine.

[3]  Ali Farhadi,et al.  What Should I Do Now? Marrying Reinforcement Learning and Symbolic Planning , 2019, ArXiv.

[4]  J. Shah,et al.  Planning with Uncertain Specifications , 2010 .

[5]  Stephan Merz,et al.  Model Checking , 2000 .

[6]  Alexey Dosovitskiy,et al.  End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Noah D. Goodman,et al.  Deep Amortized Inference for Probabilistic Programs , 2016, ArXiv.

[8]  Michael I. Jordan,et al.  A Sticky HDP-HMM With Application to Speaker Diarization , 2009, 0905.2592.

[9]  Calin Belta,et al.  Automata Guided Hierarchical Reinforcement Learning for Zero-shot Skill Composition , 2017, ArXiv.

[10]  Daniel Kroening,et al.  Logically-Correct Reinforcement Learning , 2018, ArXiv.

[11]  Michael I. Jordan,et al.  Bayesian Nonparametric Inference of Switching Dynamic Linear Models , 2010, IEEE Transactions on Signal Processing.

[12]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[13]  Christel Baier,et al.  Principles of model checking , 2008 .

[14]  Calin Belta,et al.  Optimality and Robustness in Multi-Robot Path Planning with Temporal Logic Constraints , 2013, Int. J. Robotics Res..

[15]  David Hsu,et al.  QMDP-Net: Deep Learning for Planning under Partial Observability , 2017, NIPS.

[16]  Rémi Eyraud,et al.  Sp2Learn: A Toolbox for the Spectral Learning of Weighted Automata , 2016, ICGI.

[17]  Alexandre Duret-Lutz,et al.  Spot 2 . 0 — a framework for LTL and ω-automata manipulation , 2016 .

[18]  Michael Burke,et al.  From explanation to synthesis: Compositional program induction for learning from demonstration , 2019, Robotics: Science and Systems.

[19]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[20]  Gregory D. Hager,et al.  Combining neural networks and tree search for task and motion planning in challenging environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[21]  Kiran Vodrahalli,et al.  Learning to Plan with Logical Automata , 2019, Robotics: Science and Systems.

[22]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[23]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[24]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[25]  Shen Li,et al.  Bayesian Inference of Temporal Task Specifications from Demonstrations , 2018, NeurIPS.

[26]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[27]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[28]  Sergey Levine,et al.  Deep Imitative Models for Flexible Inference, Planning, and Control , 2018, ICLR.

[29]  Scott W. Linderman,et al.  Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems , 2017, AISTATS.

[30]  Yisong Yue,et al.  A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..

[31]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[32]  Sheila A. McIlraith,et al.  Teaching Multiple Tasks to an RL Agent using LTL , 2018, AAMAS.