Synthesis of Search Heuristics for Temporal Planning via Reinforcement Learning

Automated temporal planning is the problem of synthesizing, starting from a model of a system, a course of actions to achieve a desired goal when temporal constraints, such as deadlines, are present in the problem. Despite considerable successes in the literature, scalability is still a severe limitation for existing planners, especially when confronted with real-world, industrial scenarios. In this paper, we aim at exploiting recent advances in reinforcement learning, for the synthesis of heuristics for temporal planning. Starting from a set of problems of interest for a specific domain, we use a customized reinforcement learning algorithm to construct a value function that is able to estimate the expected reward for as many problems as possible. We use a reward schema that captures the semantics of the temporal planning problem and we show how the value function can be transformed in a planning heuristic for a semi-symbolic heuristic search exploration of the planning model. We show on two case studies how this method can be used to extend the reach of current temporal planning technology with encouraging results.

[1]  Robert Givan,et al.  Learning Control Knowledge for Forward Search Planning , 2008, J. Mach. Learn. Res..

[2]  Angelo Montanari,et al.  Decidability and Complexity of Action-Based Temporal Planning over Dense Time , 2020, AAAI.

[3]  Masood Feyzbakhsh Rankooh,et al.  ITSAT: An Efficient SAT-Based Temporal Planner , 2015, J. Artif. Intell. Res..

[4]  Gireeja Ranade,et al.  Data-driven planning via imitation learning , 2017, Int. J. Robotics Res..

[5]  Andrea Micheli,et al.  Temporal Planning with Temporal Metric Trajectory Constraints , 2019, AAAI.

[6]  Fahiem Bacchus,et al.  Using temporal logics to express search control knowledge for planning , 2000, Artif. Intell..

[7]  Subbarao Kambhampati,et al.  Synthesizing Customized Planners from Speci cationsBiplav Srivastava biplav , 2008 .

[8]  Andrew Coles,et al.  Marvin: A Heuristic Search Planner with Online Macro-Action Learning , 2011, J. Artif. Intell. Res..

[9]  Patrick Doherty,et al.  TALplanner: A Temporal Logic-Based Planner , 2001, AI Mag..

[10]  Wolfram Burgard,et al.  Towards Service Robots for Everyday Environments - Recent Advances in Designing Service Robots for Complex Tasks in Everyday Environments , 2012, Springer Tracts in Advanced Robotics.

[11]  Manuela M. Veloso,et al.  DISTILL: Learning Domain-Specific Planners by Example , 2003, ICML.

[12]  David E. Smith,et al.  The ANML Language , 2007 .

[13]  Malte Helmert,et al.  Neural Network Heuristics for Classical Planning: A Study of Hyperparameter Space , 2020, ECAI.

[14]  Lexing Xie,et al.  Action Schema Networks: Generalised Policies with Deep Learning , 2017, AAAI.

[15]  Luca Spalazzi,et al.  A Survey on Case-Based Planning , 2004, Artificial Intelligence Review.

[16]  Daniel Borrajo,et al.  Using Cases Utility for Heuristic Planning Improvement , 2007, ICCBR.

[17]  Ivan Serina,et al.  Effective plan retrieval in case-based planning for metric-temporal problems , 2015, J. Exp. Theor. Artif. Intell..

[18]  Ronald P. A. Petrick,et al.  Learning heuristic functions for cost-based planning , 2013 .

[19]  Maria Fox,et al.  PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains , 2003, J. Artif. Intell. Res..

[20]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[21]  Robert Mattmüller,et al.  Using the Context-enhanced Additive Heuristic for Temporal and Numeric Planning , 2009, ICAPS.

[22]  Jonathan Schaeffer,et al.  Macro-FF: Improving AI Planning with Automatically Learned Macro-Operators , 2005, J. Artif. Intell. Res..

[23]  Sandra Zilles,et al.  Learning heuristic functions for large state spaces , 2011, Artif. Intell..

[24]  Lee Spector,et al.  Genetic Programming and AI Planning Systems , 1994, AAAI.

[25]  Javier Segovia Aguas,et al.  A review of generalized planning , 2019, The Knowledge Engineering Review.

[26]  Andrew Coles,et al.  Forward-Chaining Partial-Order Planning , 2010, ICAPS.

[27]  Alessandro Cimatti,et al.  Temporal Planning with Intermediate Conditions and Effects , 2019, AAAI.

[28]  Alex S. Fukunaga,et al.  Classical Planning in Deep Latent Space: Bridging the Subsymbolic-Symbolic Boundary , 2017, AAAI.