论文信息 - ASNets: Deep Learning for Generalised Planning - 字舞流文

ASNets: Deep Learning for Generalised Planning

In this paper, we discuss the learning of generalised policies for probabilistic and classical planning problems using Action Schema Networks (ASNets). The ASNet is a neural network architecture that exploits the relational structure of (P)PDDL planning problems to learn a common set of weights that can be applied to any problem in a domain. By mimicking the actions chosen by a traditional, non-learning planner on a handful of small problems in a domain, ASNets are able to learn a generalised reactive policy that can quickly solve much larger instances from the domain. This work extends the ASNet architecture to make it more expressive, while still remaining invariant to a range of symmetries that exist in PPDDL problems. We also present a thorough experimental evaluation of ASNets, including a comparison with heuristic search planners on seven probabilistic and deterministic domains, an extended evaluation on over 18,000 Blocksworld instances, and an ablation study. Finally, we show that sparsity-inducing regularisation can produce ASNets that are compact enough for humans to understand, yielding insights into how the structure of ASNets allows them to generalise across a domain.

Lexing Xie | Sylvie Thiébaux | Felipe W. Trevizan | Sam Toyer | S. Toyer | S. Thiébaux | Lexing Xie

[1] Joseph Culberson,et al. Sokoban is PSPACE-complete , 1997 .

[2] Raquel Fuentetaja,et al. Bagging strategies for learning planning policies , 2017, Annals of Mathematics and Artificial Intelligence.

[3] Jörg Hoffmann,et al. FF: The Fast-Forward Planning System , 2001, AI Mag..

[4] Sylvie Thiébaux,et al. Exploiting First-Order Regression in Inductive Policy Selection , 2004, UAI.

[5] Lexing Xie,et al. Action Schema Networks: Generalised Policies with Deep Learning , 2017, AAAI.

[6] Mykel J. Kochenderfer,et al. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[7] Blai Bonet,et al. Learning Features and Abstract Actions for Computing Generalized Plans , 2018, AAAI.

[8] Scott Sanner,et al. A Survey of the Seventh International Planning Competition , 2012, AI Mag..

[9] Håkan L. S. Younes,et al. PPDDL 1 . 0 : An Extension to PDDL for Expressing Planning Domains with Probabilistic Effects , 2004 .

[10] Robert Givan,et al. Inductive Policy Selection for First-Order MDPs , 2002, UAI.

[11] S. Yoon. Discrepancy Search with Reactive Policies for Planning , 2006 .

[12] Shirin Sohrabi,et al. Deep Learning for Cost-Optimal Planning: Task-Dependent Planner Selection , 2019, AAAI.

[13] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[14] Scott Sanner,et al. Nonlinear Hybrid Planning with Deep Net Learned Transition Models and Mixed-Integer Linear Programming , 2017, IJCAI.

[15] Alán Aspuru-Guzik,et al. Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[16] Sergio Jiménez Celorrio,et al. A review of machine learning for automated planning , 2012, The Knowledge Engineering Review.

[17] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18] Alex S. Fukunaga,et al. Classical Planning in Deep Latent Space: Bridging the Subsymbolic-Symbolic Boundary , 2017, AAAI.

[19] Robert Givan,et al. Using Learned Policies in Heuristic-Search Planning , 2007, IJCAI.

[20] Felipe W. Trevizan,et al. Learning Domain-Independent Planning Heuristics with Hypergraph Networks , 2019, ICAPS.

[21] Alan Fern,et al. The first learning track of the international planning competition , 2011, Machine Learning.

[22] Alan Fern,et al. Training Deep Reactive Policies for Probabilistic Planning Problems , 2018, ICAPS.

[23] Raquel Fuentetaja,et al. Scaling up Heuristic Planning with Relational Decision Trees , 2014, J. Artif. Intell. Res..

[24] Alan Fern,et al. Discriminative Learning of Beam-Search Heuristics for Planning , 2007, IJCAI.

[25] Blai Bonet,et al. Features, Projections, and Representation Change for Generalized Planning , 2018, IJCAI.

[26] Pierre Vandergheynst,et al. Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[27] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[28] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[29] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.

[30] Dileep George,et al. Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics , 2017, ICML.

[31] Guillem Francès,et al. Generalized Potential Heuristics for Classical Planning , 2019, IJCAI.

[32] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[33] Dalal Alrajeh,et al. Learning Classical Planning Strategies with Policy Gradient , 2019, ICAPS.

[34] Olivier Buffet,et al. The factored policy-gradient planner , 2009, Artif. Intell..

[35] Yuxiao Hu,et al. Generalized Planning: Synthesizing Plans that Work for Multiple Environments , 2011, IJCAI.

[36] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[37] Marko Bacic,et al. Model predictive control , 2003 .

[38] Ronald L. Rivest,et al. Learning decision lists , 2004, Machine Learning.

[39] Hendrik Blockeel,et al. Classifying Relational Data with Neural Networks , 2005, ILP.

[40] Ion Stoica,et al. Tune: A Research Platform for Distributed Model Selection and Training , 2018, ArXiv.

[41] Steven Schockaert,et al. Lifted Relational Neural Networks: Efficient Learning of Latent Relational Structures , 2018, J. Artif. Intell. Res..

[42] Malte Helmert,et al. The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[43] Silvio Savarese,et al. Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Malte Helmert,et al. Neural Network Heuristics for Classical Planning: A Study of Hyperparameter Space , 2020, ECAI.

[45] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[46] David Q. Mayne,et al. Model predictive control: Recent developments and future promise , 2014, Autom..

[47] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[48] Mausam,et al. Transfer of Deep Reactive Policies for MDP Planning , 2018, NeurIPS.

[49] Sylvie Thiébaux,et al. Probabilistic planning vs replanning , 2007 .

[50] Thomas G. Dietterich. Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[51] D. Bryce. 6th International Planning Competition: Uncertainty Part , 2008 .

[52] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[53] Pieter Abbeel,et al. Learning Generalized Reactive Policies using Deep Neural Networks , 2017, ICAPS.

[54] Roni Khardon,et al. Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[55] Carmel Domshlak,et al. Landmarks, Critical Paths and Abstractions: What's the Difference Anyway? , 2009, ICAPS.

[56] Hector Geffner,et al. Model-free, Model-based, and General Intelligence , 2018, IJCAI.

[57] Mausam,et al. Size Independent Neural Transfer for RDDL Planning , 2019, ICAPS.

[58] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .

[59] Manuela M. Veloso,et al. Short-Sighted Stochastic Shortest Path Problems , 2012, ICAPS.

[60] Jelena Kovacevic,et al. Generalized Value Iteration Networks: Life Beyond Lattices , 2017, AAAI.

[61] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.

[62] Patrik Haslum,et al. Occupation Measure Heuristics for Probabilistic Planning , 2017, ICAPS.

[63] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.

[64] John K. Slaney,et al. Blocks World revisited , 2001, Artif. Intell..

[65] Hendrik Blockeel,et al. Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[66] Razvan Pascanu,et al. Deep reinforcement learning with relational inductive biases , 2018, ICLR.

[67] Robert Givan,et al. Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[68] Lexing Xie,et al. Guiding Search with Generalized Policies for Probabilistic Planning , 2021, SOCS.

[69] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[70] Neil Immerman,et al. Directed Search for Generalized Plans Using Classical Planners , 2011, ICAPS.

[71] Blai Bonet,et al. Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[72] Vijay S. Pande,et al. Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[73] Manfred K. Warmuth,et al. Finding a Shortest Solution for the N × N Extension of the 15-PUZZLE Is Intractable , 1986, AAAI.

[74] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75] Zachary Chase Lipton. The mythos of model interpretability , 2016, ACM Queue.

[76] Håkan L. S. Younes,et al. The First Probabilistic Track of the International Planning Competition , 2005, J. Artif. Intell. Res..

[77] Russ Tedrake,et al. Evaluating Robustness of Neural Networks with Mixed Integer Programming , 2017, ICLR.

[78] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[79] Patrik Haslum,et al. Admissible Heuristics for Optimal Planning , 2000, AIPS.

[80] Hector Geffner,et al. Learning Generalized Policies in Planning Using Concept Languages , 2000, KR.