ASNets: Deep Learning for Generalised Planning

In this paper, we discuss the learning of generalised policies for probabilistic and classical planning problems using Action Schema Networks (ASNets). The ASNet is a neural network architecture that exploits the relational structure of (P)PDDL planning problems to learn a common set of weights that can be applied to any problem in a domain. By mimicking the actions chosen by a traditional, non-learning planner on a handful of small problems in a domain, ASNets are able to learn a generalised reactive policy that can quickly solve much larger instances from the domain. This work extends the ASNet architecture to make it more expressive, while still remaining invariant to a range of symmetries that exist in PPDDL problems. We also present a thorough experimental evaluation of ASNets, including a comparison with heuristic search planners on seven probabilistic and deterministic domains, an extended evaluation on over 18,000 Blocksworld instances, and an ablation study. Finally, we show that sparsity-inducing regularisation can produce ASNets that are compact enough for humans to understand, yielding insights into how the structure of ASNets allows them to generalise across a domain.

[1]  Joseph Culberson,et al.  Sokoban is PSPACE-complete , 1997 .

[2]  Raquel Fuentetaja,et al.  Bagging strategies for learning planning policies , 2017, Annals of Mathematics and Artificial Intelligence.

[3]  Jörg Hoffmann,et al.  FF: The Fast-Forward Planning System , 2001, AI Mag..

[4]  Sylvie Thiébaux,et al.  Exploiting First-Order Regression in Inductive Policy Selection , 2004, UAI.

[5]  Lexing Xie,et al.  Action Schema Networks: Generalised Policies with Deep Learning , 2017, AAAI.

[6]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[7]  Blai Bonet,et al.  Learning Features and Abstract Actions for Computing Generalized Plans , 2018, AAAI.

[8]  Scott Sanner,et al.  A Survey of the Seventh International Planning Competition , 2012, AI Mag..

[9]  Håkan L. S. Younes,et al.  PPDDL 1 . 0 : An Extension to PDDL for Expressing Planning Domains with Probabilistic Effects , 2004 .

[10]  Robert Givan,et al.  Inductive Policy Selection for First-Order MDPs , 2002, UAI.

[11]  S. Yoon Discrepancy Search with Reactive Policies for Planning , 2006 .

[12]  Shirin Sohrabi,et al.  Deep Learning for Cost-Optimal Planning: Task-Dependent Planner Selection , 2019, AAAI.

[13]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[14]  Scott Sanner,et al.  Nonlinear Hybrid Planning with Deep Net Learned Transition Models and Mixed-Integer Linear Programming , 2017, IJCAI.

[15]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[16]  Sergio Jiménez Celorrio,et al.  A review of machine learning for automated planning , 2012, The Knowledge Engineering Review.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Alex S. Fukunaga,et al.  Classical Planning in Deep Latent Space: Bridging the Subsymbolic-Symbolic Boundary , 2017, AAAI.

[19]  Robert Givan,et al.  Using Learned Policies in Heuristic-Search Planning , 2007, IJCAI.

[20]  Felipe W. Trevizan,et al.  Learning Domain-Independent Planning Heuristics with Hypergraph Networks , 2019, ICAPS.

[21]  Alan Fern,et al.  The first learning track of the international planning competition , 2011, Machine Learning.

[22]  Alan Fern,et al.  Training Deep Reactive Policies for Probabilistic Planning Problems , 2018, ICAPS.

[23]  Raquel Fuentetaja,et al.  Scaling up Heuristic Planning with Relational Decision Trees , 2014, J. Artif. Intell. Res..

[24]  Alan Fern,et al.  Discriminative Learning of Beam-Search Heuristics for Planning , 2007, IJCAI.

[25]  Blai Bonet,et al.  Features, Projections, and Representation Change for Generalized Planning , 2018, IJCAI.

[26]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[27]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[28]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[29]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[30]  Dileep George,et al.  Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics , 2017, ICML.

[31]  Guillem Francès,et al.  Generalized Potential Heuristics for Classical Planning , 2019, IJCAI.

[32]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[33]  Dalal Alrajeh,et al.  Learning Classical Planning Strategies with Policy Gradient , 2019, ICAPS.

[34]  Olivier Buffet,et al.  The factored policy-gradient planner , 2009, Artif. Intell..

[35]  Yuxiao Hu,et al.  Generalized Planning: Synthesizing Plans that Work for Multiple Environments , 2011, IJCAI.

[36]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[37]  Marko Bacic,et al.  Model predictive control , 2003 .

[38]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[39]  Hendrik Blockeel,et al.  Classifying Relational Data with Neural Networks , 2005, ILP.

[40]  Ion Stoica,et al.  Tune: A Research Platform for Distributed Model Selection and Training , 2018, ArXiv.

[41]  Steven Schockaert,et al.  Lifted Relational Neural Networks: Efficient Learning of Latent Relational Structures , 2018, J. Artif. Intell. Res..

[42]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[43]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Malte Helmert,et al.  Neural Network Heuristics for Classical Planning: A Study of Hyperparameter Space , 2020, ECAI.

[45]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[46]  David Q. Mayne,et al.  Model predictive control: Recent developments and future promise , 2014, Autom..

[47]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[48]  Mausam,et al.  Transfer of Deep Reactive Policies for MDP Planning , 2018, NeurIPS.

[49]  Sylvie Thiébaux,et al.  Probabilistic planning vs replanning , 2007 .

[50]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[51]  D. Bryce 6th International Planning Competition: Uncertainty Part , 2008 .

[52]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[53]  Pieter Abbeel,et al.  Learning Generalized Reactive Policies using Deep Neural Networks , 2017, ICAPS.

[54]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[55]  Carmel Domshlak,et al.  Landmarks, Critical Paths and Abstractions: What's the Difference Anyway? , 2009, ICAPS.

[56]  Hector Geffner,et al.  Model-free, Model-based, and General Intelligence , 2018, IJCAI.

[57]  Mausam,et al.  Size Independent Neural Transfer for RDDL Planning , 2019, ICAPS.

[58]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[59]  Manuela M. Veloso,et al.  Short-Sighted Stochastic Shortest Path Problems , 2012, ICAPS.

[60]  Jelena Kovacevic,et al.  Generalized Value Iteration Networks: Life Beyond Lattices , 2017, AAAI.

[61]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[62]  Patrik Haslum,et al.  Occupation Measure Heuristics for Probabilistic Planning , 2017, ICAPS.

[63]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[64]  John K. Slaney,et al.  Blocks World revisited , 2001, Artif. Intell..

[65]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[66]  Razvan Pascanu,et al.  Deep reinforcement learning with relational inductive biases , 2018, ICLR.

[67]  Robert Givan,et al.  Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[68]  Lexing Xie,et al.  Guiding Search with Generalized Policies for Probabilistic Planning , 2021, SOCS.

[69]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[70]  Neil Immerman,et al.  Directed Search for Generalized Plans Using Classical Planners , 2011, ICAPS.

[71]  Blai Bonet,et al.  Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[72]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[73]  Manfred K. Warmuth,et al.  Finding a Shortest Solution for the N × N Extension of the 15-PUZZLE Is Intractable , 1986, AAAI.

[74]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[76]  Håkan L. S. Younes,et al.  The First Probabilistic Track of the International Planning Competition , 2005, J. Artif. Intell. Res..

[77]  Russ Tedrake,et al.  Evaluating Robustness of Neural Networks with Mixed Integer Programming , 2017, ICLR.

[78]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[79]  Patrik Haslum,et al.  Admissible Heuristics for Optimal Planning , 2000, AIPS.

[80]  Hector Geffner,et al.  Learning Generalized Policies in Planning Using Concept Languages , 2000, KR.