Subgoal Search For Complex Reasoning Tasks

Humans excel in solving complex reasoning tasks through a mental process of moving from one idea to a related one. Inspired by this, we propose Subgoal Search (kSubS) method. Its key component is a learned subgoal generator that produces a diversity of subgoals that are both achievable and closer to the solution. Using subgoals reduces the search space and induces a high-level search graph suitable for efficient planning. In this paper, we implement kSubS using a transformerbased subgoal module coupled with the classical best-first search framework. We show that a simple approach of generating k-th step ahead subgoals is surprisingly efficient on three challenging domains: two popular puzzle games, Sokoban and the Rubik’s Cube, and an inequality proving benchmark INT. kSubS achieves strong results including state-of-the-art on INT within a modest computational budget.

[1]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[2]  Leslie Pack Kaelbling,et al.  Learning to Achieve Goals , 1993, IJCAI.

[3]  Pierre Baldi,et al.  Solving the Rubik’s cube with deep reinforcement learning and search , 2019, Nature Machine Intelligence.

[4]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[5]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[6]  W. T. Gowers THE IMPORTANCE OF MATHEMATICS , 2002 .

[7]  Alexei A. Efros,et al.  Time-Agnostic Prediction: Predicting Predictable Video Frames , 2018, ICLR.

[8]  Bradly C. Stadie,et al.  World Model as a Graph: Learning Latent Landmarks for Planning , 2021, ICML.

[9]  Pieter Abbeel,et al.  Hallucinative Topological Memory for Zero-Shot Visual Planning , 2020, ICML.

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Amit K. Roy-Chowdhury,et al.  Learning from Trajectories via Subgoal Discovery , 2019, NeurIPS.

[12]  Jessica B. Hamrick,et al.  Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning , 2020, ArXiv.

[13]  Johann Schumann,et al.  Automated Theorem Proving in Software Engineering , 2001, Springer Berlin Heidelberg.

[14]  Alex S. Fukunaga,et al.  Classical Planning in Deep Latent Space: Bridging the Subsymbolic-Symbolic Boundary , 2017, AAAI.

[15]  Chelsea Finn,et al.  Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors , 2020, NeurIPS.

[16]  Yoshua Bengio,et al.  Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon , 2018, Eur. J. Oper. Res..

[17]  Yoshua Bengio,et al.  Variational Temporal Abstraction , 2019, NeurIPS.

[18]  Silvio Savarese,et al.  Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation , 2019, CoRL.

[19]  Ilya Sutskever,et al.  Generative Language Modeling for Automated Theorem Proving , 2020, ArXiv.

[20]  Kostas Daniilidis,et al.  Keyframing the Future: Keyframe Discovery for Visual Prediction and Planning , 2020, L4DC.

[21]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[22]  Razvan Pascanu,et al.  Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[23]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[24]  Tom Eccles,et al.  An investigation of model-free planning , 2019, ICML.

[25]  Chelsea Finn,et al.  Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation , 2019, ICLR.

[26]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[27]  J. Hollerman,et al.  Involvement of basal ganglia and orbitofrontal cortex in goal-directed behavior. , 2000, Progress in brain research.

[28]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[29]  Nicholas Roy,et al.  Learning over Subgoals for Efficient Navigation of Structured, Unknown Environments , 2018, CoRL.

[30]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[31]  Wei Gao,et al.  Intention-Net: Integrating Planning and Deep Learning for Goal-Directed Autonomous Navigation , 2017, CoRL.

[32]  Alan Fern,et al.  The first learning track of the international planning competition , 2011, Machine Learning.

[33]  Uri Zwick,et al.  SOKOBAN and other motion planning problems , 1999, Comput. Geom..

[34]  Vladlen Koltun,et al.  Semi-parametric Topological Memory for Navigation , 2018, ICLR.

[35]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[36]  David Wilkins,et al.  Using Patterns and Plans in Chess , 1980, Artif. Intell..

[37]  Konrad Czechowski,et al.  Uncertainty-sensitive Learning and Planning with Ensembles , 2019, ArXiv.

[38]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[39]  Hector Geffner,et al.  Purely Declarative Action Descriptions are Overrated: Classical Planning with Simulators , 2017, IJCAI.

[40]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[41]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[42]  Jessica B. Hamrick,et al.  On the role of planning in model-based deep reinforcement learning , 2020, ArXiv.

[43]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[44]  Jimmy Ba,et al.  INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving , 2020, ICLR.

[45]  Hector Geffner,et al.  Width and Serialization of Classical Planning Problems , 2012, ECAI.

[46]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[47]  Jimmy Ba,et al.  Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning , 2020, ICML.

[48]  Cordelia Schmid,et al.  Goal-Conditioned Reinforcement Learning with Imagined Subgoals , 2021, ICML.

[49]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[50]  Pieter Abbeel,et al.  Learning Plannable Representations with Causal InfoGAN , 2018, NeurIPS.

[51]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.