A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks

A* search is an informed search algorithm that uses a heuristic function to guide the order in which nodes are expanded. Since the computation required to expand a node and compute the heuristic values for all of its generated children grows linearly with the size of the action space, A* search can become impractical for problems with large action spaces. This computational burden becomes even more apparent when heuristic functions are learned by general, but computationally expensive, deep neural networks. To address this problem, we introduce DeepCubeAQ, a deep reinforcement learning and search algorithm that builds on the DeepCubeA algorithm and deep Q-networks. DeepCubeAQ learns a heuristic function that, with a single forward pass through a deep neural network, computes the sum of the transition cost and the heuristic value of all of the children of a node without explicitly generating any of the children, eliminating the need for node expansions. DeepCubeAQ then uses a novel variant of A* search, called AQ* search, that uses the deep Q-network to guide search. We use DeepCubeAQ to solve the Rubik’s cube when formulated with a large action space that includes 1872 meta-actions and show that this 157-fold increase in the size of the action space incurs less than a 4-fold increase in computation time when performing AQ* search and that AQ* search is orders of magnitude faster than A* search.

[1]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[2]  Sandra Zilles,et al.  Learning heuristic functions for large state spaces , 2011, Artif. Intell..

[3]  Pierre Baldi,et al.  Solving the Rubik's Cube Without Human Knowledge , 2018, ArXiv.

[4]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[5]  Marco Gori,et al.  Likely-Admissible and Sub-Symbolic Heuristics , 2004, ECAI.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[8]  Yunguan Fu,et al.  Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization , 2018, ArXiv.

[9]  Dong-Ling Deng,et al.  Topological Quantum Compiling with Reinforcement Learning , 2020, Physical review letters.

[10]  Carla P. Gomes,et al.  A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning Instances , 2021, NeurIPS.

[11]  Pierre Baldi,et al.  Solving the Rubik’s cube with deep reinforcement learning and search , 2019, Nature Machine Intelligence.

[12]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[13]  JUNGHA JIN,et al.  3D CUBE Algorithm for the Key Generation Method: Applying Deep Neural Network Learning-Based , 2020, IEEE Access.

[14]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[15]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[16]  M. Puterman,et al.  Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[17]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Teruhisa Miura,et al.  A* with Partial Expansion for Large Branching Factor Problems , 2000, AAAI/IAAI.

[20]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[21]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[22]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[23]  Nathan R. Sturtevant,et al.  Partial-Expansion A* with Selective Node Generation , 2012, SOCS.

[24]  Jyh-Da Wei,et al.  Using Neural Networks for Evaluation in Heuristic Search Algorithm , 2011, AAAI.

[25]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[26]  Larry S. Davis,et al.  Pattern Databases , 1979, Data Base Design Techniques II.

[27]  Le Song,et al.  Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search , 2020, ICML.

[28]  Ira Pohl,et al.  Heuristic Search Viewed as Path Finding in a Graph , 1970, Artif. Intell..

[29]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[30]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[31]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.