Learning Heuristic Search via Imitation

Robotic motion planning problems are typically solved by constructing a search tree of valid maneuvers from a start to a goal configuration. Limited onboard computation and real-time planning constraints impose a limit on how large this search tree can grow. Heuristics play a crucial role in such situations by guiding the search towards potentially good directions and consequently minimizing search effort. Moreover, it must infer such directions in an efficient manner using only the information uncovered by the search up until that time. However, state of the art methods do not address the problem of computing a heuristic that explicitly minimizes search effort. In this paper, we do so by training a heuristic policy that maps the partial information from the search to decide which node of the search tree to expand. Unfortunately, naively training such policies leads to slow convergence and poor local minima. We present SaIL, an efficient algorithm that trains heuristic policies by imitating "clairvoyant oracles" - oracles that have full information about the world and demonstrate decisions that minimize search effort. We leverage the fact that such oracles can be efficiently computed using dynamic programming and derive performance guarantees for the learnt heuristic. We validate the approach on a spectrum of environments which show that SaIL consistently outperforms state of the art algorithms. Our approach paves the way forward for learning heuristics that demonstrate an anytime nature - finding feasible solutions quickly and incrementally refining it over time.

[1]  Maxim Likhachev,et al.  Planning Long Dynamically Feasible Maneuvers for Autonomous Vehicles , 2008, Int. J. Robotics Res..

[2]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[3]  Sergey Levine,et al.  Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[4]  D. Dolgov Practical Search Techniques in Path Planning for Autonomous Driving , 2008 .

[5]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[6]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[7]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[8]  Emilio Frazzoli,et al.  Sampling-based algorithms for optimal motion planning , 2011, Int. J. Robotics Res..

[9]  Sebastian Scherer,et al.  Sparse Tangential Network (SPARTAN): Motion planning for micro aerial vehicles , 2013, 2013 IEEE International Conference on Robotics and Automation.

[10]  Sandra Zilles,et al.  Learning heuristic functions for large state spaces , 2011, Artif. Intell..

[11]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[12]  Siddhartha S. Srinivasa,et al.  Pareto-optimal search over configuration space beliefs for anytime motion planning , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Gireeja Ranade,et al.  Learning to gather information via imitation , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[15]  Sergio Jiménez Celorrio,et al.  A review of machine learning for automated planning , 2012, The Knowledge Engineering Review.

[16]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[17]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[18]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[19]  Emilio Frazzoli,et al.  Verification and Synthesis of Admissible Heuristics for Kinodynamic Motion Planning , 2017, IEEE Robotics and Automation Letters.

[20]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[21]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[22]  Maxim Likhachev,et al.  Improved Multi-Heuristic A* for Searching with Uncalibrated Heuristics , 2015, SOCS.

[23]  Wheeler Ruml,et al.  Learning Inadmissible Heuristics During Search , 2011, ICAPS.

[24]  Wheeler Ruml,et al.  Building a Heuristic for Greedy Search , 2015, SOCS.

[25]  Alan Fern,et al.  Discriminative Learning of Beam-Search Heuristics for Planning , 2007, IJCAI.

[26]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[27]  Sergey Levine,et al.  PLATO: Policy learning using adaptive trajectory optimization , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Gireeja Ranade,et al.  Adaptive Information Gathering via Imitation Learning , 2017, Robotics: Science and Systems.

[29]  Maxim Likhachev,et al.  Efficient Search with an Ensemble of Heuristics , 2015, IJCAI.

[30]  Maxim Likhachev,et al.  E-Graphs: Bootstrapping Planning with Experience Graphs , 2012, SOCS.

[31]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[32]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[33]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[34]  Siddhartha S. Srinivasa,et al.  Pre- and post-contact policy decomposition for planar contact manipulation under uncertainty , 2014, Int. J. Robotics Res..

[35]  Alan Fern,et al.  Iterative Learning of Weighted Rule Sets for Greedy Search , 2010, ICAPS.

[36]  Maxim Likhachev,et al.  Dynamic Multi-Heuristic A* , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Siddhartha S. Srinivasa,et al.  Shared Autonomy via Hindsight Optimization , 2015, Robotics: Science and Systems.

[38]  Robert Givan,et al.  Learning Heuristic Functions from Relaxed Plans , 2006, ICAPS.

[39]  Sergey Levine,et al.  Learning from the hindsight plan — Episodic MPC improvement , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[41]  Siddhartha S. Srinivasa,et al.  Guided Manipulation Planning at the DARPA Robotics Challenge Trials , 2014, ISER.

[42]  Ronald P. A. Petrick,et al.  Learning heuristic functions for cost-based planning , 2013 .

[43]  Leslie Pack Kaelbling,et al.  Learning to Rank for Synthesizing Planning Heuristics , 2016, IJCAI.

[44]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[45]  Alan Fern,et al.  Learning Linear Ranking Functions for Beam Search with Application to Planning , 2009, J. Mach. Learn. Res..

[46]  Siddhartha S. Srinivasa,et al.  A Unifying Formalism for Shortest Path Problems with Expensive Edge Evaluations via Lazy Best-First Search over Paths with Edge Selectors , 2016, ICAPS.

[47]  Steven M. LaValle,et al.  RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[48]  Emilio Frazzoli,et al.  High-speed flight in an ergodic forest , 2012, 2012 IEEE International Conference on Robotics and Automation.

[49]  Maxim Likhachev,et al.  Multi-Heuristic A* , 2014, Int. J. Robotics Res..

[50]  Robert Givan,et al.  Learning Control Knowledge for Forward Search Planning , 2008, J. Mach. Learn. Res..

[51]  Siddhartha S. Srinivasa,et al.  Near-Optimal Edge Evaluation in Explicit Generalized Binomial Graphs , 2017, NIPS.

[52]  J. Andrew Bagnell,et al.  Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.