Classical Planning in Deep Latent Space: Bridging the Subsymbolic-Symbolic Boundary

Current domain-independent, classical planners require symbolic models of the problem domain and instance as input, resulting in a knowledge acquisition bottleneck. Meanwhile, although deep learning has achieved significant success in many fields, the knowledge is encoded in a subsymbolic representation which is incompatible with symbolic systems such as planners. We propose LatPlan, an unsupervised architecture combining deep learning and classical planning. Given only an unlabeled set of image pairs showing a subset of transitions allowed in the environment (training inputs), and a pair of images representing the initial and the goal states (planning inputs), LatPlan finds a plan to the goal state in a symbolic latent space and returns a visualized plan execution. The contribution of this paper is twofold: (1) State Autoencoder, which finds a propositional state representation of the environment using a Variational Autoencoder. It generates a discrete latent vector from the images, based on which a PDDL model can be constructed and then solved by an off-the-shelf planner. (2) Action Autoencoder / Discriminator, a neural architecture which jointly finds the action symbols and the implicit action models (preconditions/effects), and provides a successor function for the implicit graph search. We evaluate LatPlan using image-based versions of 3 planning domains: 8-puzzle, Towers of Hanoi and LightsOut.

[1]  E. Gumbel Statistical Theory of Extreme Values and Some Practical Applications : A Series of Lectures , 1954 .

[2]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[3]  H. Kalmus Biological Cybernetics , 1972, Nature.

[4]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[5]  J. Cullen,et al.  The Knowledge Acquisition Bottleneck: Time for Reassessment? , 1988 .

[6]  T. Michael Knasel,et al.  Robotics and autonomous systems , 1988, Robotics Auton. Syst..

[7]  P. Kandela Israel , 1989, The Lancet.

[8]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[9]  Alexander Reinefeld,et al.  Complete Solution of the Eight-Puzzle and the Benefit of Node Ordering in IDA , 1993, IJCAI.

[10]  Bernhard Nebel,et al.  COMPLEXITY RESULTS FOR SAS+ PLANNING , 1995, Comput. Intell..

[11]  Andrzej Bieszczad,et al.  Neurosolver: Neuromorphic General Problem Solver , 1998, Inf. Sci..

[12]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13]  Aravaipa Canyon Basin Volume 3 , 2012, Journal of Diabetes Investigation.

[14]  Drew McDermott,et al.  The 1998 AI Planning Systems Competition , 2000, AI Mag..

[15]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[16]  J. Nunemacher,et al.  Optimal management of giant cell arteritis and polymyalgia rheumatica , 2012, Therapeutics and clinical risk management.

[17]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[18]  J. J. Hopfield,et al.  “Neural” computation of decisions in optimization problems , 1985, Biological Cybernetics.

[19]  Malte Helmert,et al.  A Planning Heuristic Based on Causal Graph Analysis , 2004, ICAPS.

[20]  Leandro Nunes de Castro,et al.  Natural Computing , 2005, Encyclopedia of Information Science and Technology.

[21]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[22]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[23]  Patrik Haslum,et al.  Flexible Abstraction Heuristics for Optimal Sequential Planning , 2007, ICAPS.

[24]  Qiang Yang,et al.  Learning action models from plan examples using weighted MAX-SAT , 2007, Artif. Intell..

[25]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[26]  Malte Helmert,et al.  Landmarks Revisited , 2008, AAAI.

[27]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[28]  L. Steels The symbol grounding problem has been solved, so what’s next? , 2008 .

[29]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[30]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[31]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Jeffrey Mark Siskind,et al.  Learning physically-instantiated game play through visual observation , 2010, 2010 IEEE International Conference on Robotics and Automation.

[33]  Samia Nefti-Meziani,et al.  Advances in Cognitive Systems , 2010 .

[34]  Marco Aiello,et al.  AAAI Conference on Artificial Intelligence , 2011, AAAI Conference on Artificial Intelligence.

[35]  Sandra Zilles,et al.  Learning heuristic functions for large state spaces , 2011, Artif. Intell..

[36]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[37]  Lukasz Kaiser,et al.  Learning Games from Videos Guided by Descriptive Complexity , 2012, AAAI.

[38]  Mark Steedman,et al.  Learning STRIPS Operators from Noisy and Incomplete Observations , 2012, UAI.

[39]  Malte Helmert,et al.  Efficient Implementation of Pattern Database Heuristics for Classical Planning , 2021, SOCS.

[40]  Oliver Kramer,et al.  Goal distance estimation for automated planning using neural networks and support vector machines , 2013, Natural Computing.

[41]  Sergio Jiménez Celorrio,et al.  A review of machine learning for automated planning , 2012, The Knowledge Engineering Review.

[42]  Joshua B. Tenenbaum,et al.  One-shot learning by inverting a compositional causal process , 2013, NIPS.

[43]  J. Morgan Landmarks? , 2013 .

[44]  Blai Bonet,et al.  An Admissible Heuristic for SAS+ Planning Obtained from the State Equation , 2013, IJCAI.

[45]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[46]  T. L. McCluskey,et al.  Acquiring planning domain models using LOCM , 2013, The Knowledge Engineering Review.

[47]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[48]  Leslie Pack Kaelbling,et al.  Constructing Symbolic Representations for High-Level Planning , 2014, AAAI.

[49]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[50]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[51]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[52]  Tom Minka,et al.  A* Sampling , 2014, NIPS.

[53]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[55]  Andrzej Bieszczad,et al.  Neurosolver learning to solve Towers of Hanoi puzzles , 2015, 2015 7th International Joint Conference on Computational Intelligence (IJCCI).

[56]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[57]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[58]  Alex S. Fukunaga,et al.  Solving Large-Scale Planning Problems by Decomposition and Macro Generation , 2015, ICAPS.

[59]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[60]  Peter Kulchyski and , 2015 .

[61]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[62]  Jose M. Such,et al.  International Joint Conference on Artificial Intelligence (IJCAI) , 2016 .

[63]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[64]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[65]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[66]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[67]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[68]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[69]  Hector Geffner,et al.  Purely Declarative Action Descriptions are Overrated: Classical Planning with Simulators , 2017, IJCAI.

[70]  Hector Geffner,et al.  Purely Declarative Action Representations are Overrated : Classical Planning with Simulators , 2017 .

[71]  João Fernando Ferreira,et al.  Framer: Planning Models from Natural Language Action Descriptions , 2017, ICAPS.

[72]  L. Christophorou Science , 2018, Emerging Dynamics: Science, Energy, Society and Values.

[73]  Xianzhi Wang,et al.  THE IEEE , 2020 .

[74]  M. Zaghloul,et al.  IEEE Transactions , 2020, Computer.