Combining neural networks and tree search for task and motion planning in challenging environments

Task and motion planning subject to Linear Temporal Logic (LTL) specifications in complex, dynamic environments requires efficient exploration of many possible future worlds. Model-free reinforcement learning has proven successful in a number of challenging tasks, but shows poor performance on tasks that require long-term planning. In this work, we integrate Monte Carlo Tree Search with hierarchical neural net policies trained on expressive LTL specifications. We use reinforcement learning to find deep neural networks representing both low-level control policies and task-level “option policies” that achieve high-level goals. Our combined architecture generates safe and responsive motion plans that respect the LTL constraints. We demonstrate our approach in a simulated autonomous driving setting, where a vehicle must drive down a road in traffic, avoid collisions, and navigate an intersection, all while obeying rules of the road.

[1]  Richard Bellman,et al.  Introduction to the mathematical theory of control processes , 1967 .

[2]  Viktor Schuppan,et al.  Linear Encodings of Bounded LTL Model Checking , 2006, Log. Methods Comput. Sci..

[3]  Michèle Sebag,et al.  Continuous Rapid Action Value Estimates , 2011, ACML 2011.

[4]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[5]  Hadas Kress-Gazit,et al.  Sorry Dave, I'm Afraid I Can't Do That: Explaining Unachievable Robot Tasks Using Natural Language , 2013, Robotics: Science and Systems.

[6]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[7]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[8]  Calin Belta,et al.  Optimal Control of Markov Decision Processes With Linear Temporal Logic Constraints , 2014, IEEE Transactions on Automatic Control.

[9]  Satyandra K. Gupta,et al.  Towards Integrating Hierarchical Goal Networks and Motion Planners to Support Planning for Human-Robot Teams , 2014 .

[10]  Honglak Lee,et al.  Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[11]  Marc Toussaint,et al.  Logic-Geometric Programming: An Optimization-Based Approach to Combined Task and Motion Planning , 2015, IJCAI.

[12]  Peter Kulchyski and , 2015 .

[13]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[14]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[15]  Michael H. Bowling,et al.  Monte Carlo Tree Search in Continuous Action Spaces with Execution Uncertainty , 2016, IJCAI.

[16]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[17]  Amnon Shashua,et al.  Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.

[18]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[19]  Erion Plaku,et al.  Motion planning with temporal-logic specifications: Progress and challenges , 2015, AI Commun..

[20]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[21]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[22]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Etienne Perot,et al.  Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.