Neural Network Heuristics for Classical Planning: A Study of Hyperparameter Space

Neural networks (NN) have been shown to be powerful state-value predictors in several complex games. Can similar successes be achieved in classical planning? Towards a systematic exploration of that question, we contribute a study of hyperparameter space in the most canonical setup: input = state, feed-forward NN, supervised learning, generalization only over initial state. We investigate a broad range of hyperparameters pertaining to NN design and training. We evaluate these techniques through their use as heuristic functions in Fast Downward. The results on IPC benchmarks show that highly competitive heuristics can be learned, yielding substantially smaller search spaces than standard techniques on some domains. But the heuristic functions are costly to evaluate, and the range of domains where useful heuristics are learned is limited. Our study provides the basis for further research improving on current weaknesses.

[1]  Ronald P. A. Petrick,et al.  Learning heuristic functions for cost-based planning , 2013 .

[2]  Sam Toyer,et al.  Generalised Policies for Probabilistic Planning with Deep Learning , 2017 .

[3]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[4]  Gianluca Pollastri,et al.  A neural network approach to ordinal regression , 2007, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[5]  Olivier Buffet,et al.  FF + FPG: Guiding a Policy-Gradient Planner , 2007, ICAPS.

[6]  Sandra Zilles,et al.  Bootstrap Learning of Heuristic Functions , 2010, SOCS.

[7]  Patrik Haslum,et al.  Merge-and-Shrink Abstraction , 2014, J. ACM.

[8]  Sandra Zilles,et al.  Learning heuristic functions for large state spaces , 2011, Artif. Intell..

[9]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[10]  Carmel Domshlak,et al.  Red-black planning: A new systematic approach to partial delete relaxation , 2015, Artif. Intell..

[11]  Alan Fern,et al.  Training Deep Reactive Policies for Probabilistic Planning Problems , 2018, ICAPS.

[12]  Lexing Xie,et al.  Action Schema Networks: Generalised Policies with Deep Learning , 2017, AAAI.

[13]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[14]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[15]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[18]  John W. L. Ogilvie,et al.  Heuristics: Intelligent Search Strategies for Com- Puter Problem , 2001 .

[19]  Bernhard Nebel,et al.  COMPLEXITY RESULTS FOR SAS+ PLANNING , 1995, Comput. Intell..

[20]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[21]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Luca Maria Gambardella,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Flexible, High Performance Convolutional Neural Networks for Image Classification , 2022 .

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  Shirin Sohrabi,et al.  Deep Learning for Cost-Optimal Planning: Task-Dependent Planner Selection , 2019, AAAI.

[27]  Blai Bonet,et al.  Planning as heuristic search , 2001, Artif. Intell..

[28]  Silvia Richter,et al.  The LAMA Planner: Guiding Cost-Based Anytime Planning with Landmarks , 2010, J. Artif. Intell. Res..

[29]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[30]  Donald E. Knuth,et al.  An Analysis of Alpha-Beta Pruning , 1975, Artif. Intell..

[31]  Jendrik Seipp Better Orders for Saturated Cost Partitioning in Optimal Classical Planning , 2017, SOCS.

[32]  Carmel Domshlak,et al.  Landmarks, Critical Paths and Abstractions: What's the Difference Anyway? , 2009, ICAPS.

[33]  Pierre Baldi,et al.  Solving the Rubik’s cube with deep reinforcement learning and search , 2019, Nature Machine Intelligence.