Learning to Plan with Logical Automata

This paper introduces the Logic-based Value Iteration Network (LVIN) framework, which combines imitation learning and logical automata to enable agents to learn complex behaviors from demonstrations. We address two problems with learning from expert knowledge: (1) how to generalize learned policies for a task to larger classes of tasks, and (2) how to account for erroneous demonstrations. Our LVIN model solves finite gridworld environments by instantiating a recurrent, convolutional neural network as a value iteration procedure over a learned Markov Decision Process (MDP) that factors into two MDPs: a small finite state automaton (FSA) corresponding to logical rules, and a larger MDP corresponding to motions in the environment. The parameters of LVIN (value function, reward map, FSA transitions, large MDP transitions) are approximately learned from expert trajectories. Since the model represents the learned rules as an FSA, the model is interpretable; since the FSA is integrated into planning, the behavior of the agent can be manipulated by modifying the FSA transitions. We demonstrate these abilities in several domains of interest, including a lunchboxpacking manipulation task and a driving domain.

[1]  Ufuk Topcu,et al.  Receding horizon temporal logic planning for dynamical systems , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[2]  Calin Belta,et al.  Optimal control of MDPs with temporal logic constraints , 2013, 52nd IEEE Conference on Decision and Control.

[3]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[4]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[5]  Christel Baier,et al.  Principles of model checking , 2008 .

[6]  Yang Gao,et al.  Reinforcement Learning from Imperfect Demonstrations , 2018, ICLR.

[7]  Calin Belta,et al.  Control in belief space with Temporal Logic specifications , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[8]  Emma Brunskill,et al.  Strategic Object Oriented Reinforcement Learning , 2018, ArXiv.

[9]  Daniel Kroening,et al.  Logically-Correct Reinforcement Learning , 2018, ArXiv.

[10]  Gregory D. Hager,et al.  Visual Robot Task Planning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[11]  Rich Caruana,et al.  Learning Many Related Tasks at the Same Time with Backpropagation , 1994, NIPS.

[12]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[13]  Calin Belta,et al.  Minimum-violation scLTL motion planning for mobility-on-demand , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Dimos V. Dimarogonas,et al.  Simultaneous task allocation and planning for temporal logic goals in heterogeneous multi-robot systems , 2018, Int. J. Robotics Res..

[15]  Alexey Dosovitskiy,et al.  End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[17]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[19]  Gregory D. Hager,et al.  Combining neural networks and tree search for task and motion planning in challenging environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Alexandre Duret-Lutz,et al.  Spot 2 . 0 — a framework for LTL and ω-automata manipulation , 2016 .

[21]  Silvio Savarese,et al.  Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[23]  Nan Jiang,et al.  Hierarchical Imitation and Reinforcement Learning , 2018, ICML.

[24]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[25]  Hadas Kress-Gazit,et al.  Iterative temporal motion planning for hybrid systems in partially unknown environments , 2013, HSCC '13.

[26]  Pushmeet Kohli,et al.  Learning to Understand Goal Specifications by Modelling Reward , 2018, ICLR.

[27]  Emilio Frazzoli,et al.  Sampling-based motion planning with deterministic μ-calculus specifications , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[28]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[29]  Calin Belta,et al.  A Fully Automated Framework for Control of Linear Systems from Temporal Logic Specifications , 2008, IEEE Transactions on Automatic Control.

[30]  Calin Belta,et al.  Optimality and Robustness in Multi-Robot Path Planning with Temporal Logic Constraints , 2013, Int. J. Robotics Res..

[31]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[32]  Michael L. Littman,et al.  Between Imitation and Intention Learning , 2015, IJCAI.

[33]  Lydia E. Kavraki,et al.  Sampling-based motion planning with temporal goals , 2010, 2010 IEEE International Conference on Robotics and Automation.

[34]  Ufuk Topcu,et al.  Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints , 2014, Robotics: Science and Systems.

[35]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[36]  Yisong Yue,et al.  A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..

[37]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[38]  Calin Belta,et al.  Temporal Logic Motion Planning and Control With Probabilistic Satisfaction Guarantees , 2012, IEEE Transactions on Robotics.

[39]  Hadas Kress-Gazit,et al.  Where's Waldo? Sensor-Based Temporal Logic Motion Planning , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[40]  Lydia E. Kavraki,et al.  The Open Motion Planning Library , 2012, IEEE Robotics & Automation Magazine.

[41]  Yee Whye Teh,et al.  Transferring Task Goals via Hierarchical Reinforcement Learning , 2018 .

[42]  Calin Belta,et al.  Automata Guided Hierarchical Reinforcement Learning for Zero-shot Skill Composition , 2017, ArXiv.

[43]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[44]  Sheila A. McIlraith,et al.  Teaching Multiple Tasks to an RL Agent using LTL , 2018, AAMAS.