Compact and efficient encodings for planning in factored state and action spaces with learned Binarized Neural Network transition models

In this paper, we leverage the efficiency of Binarized Neural Networks (BNNs) to learn complex state transition models of planning domains with discretized factored state and action spaces. In order to directly exploit this transition structure for planning, we present two novel compilations of the learned factored planning problem with BNNs based on reductions to Weighted Partial Maximum Boolean Satisfiability (FD-SAT-Plan+) as well as Binary Linear Programming (FD-BLP-Plan+). Theoretically, we show that our SAT-based Bi-Directional Neuron Activation Encoding is asymptotically the most compact encoding relative to the current literature and supports Unit Propagation (UP) -- an important property that facilitates efficiency in SAT solvers. Experimentally, we validate the computational efficiency of our Bi-Directional Neuron Activation Encoding in comparison to an existing neuron activation encoding and demonstrate the ability to learn complex transition models with BNNs. We test the runtime efficiency of both FD-SAT-Plan+ and FD-BLP-Plan+ on the learned factored planning problem showing that FD-SAT-Plan+ scales better with increasing BNN size and complexity. Finally, we present a finite-time incremental constraint generation algorithm based on generalized landmark constraints to improve the planning accuracy of our encodings through simulated or real-world interaction.

[1]  Carlos Guestrin,et al.  Max-norm Projections for Factored MDPs , 2001, IJCAI.

[2]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Scott Sanner,et al.  Metric Hybrid Factored Planning in Nonlinear Domains with Constraint Generation , 2019, CPAIOR.

[4]  Niklas Sörensson,et al.  Translating Pseudo-Boolean Constraints into SAT , 2006, J. Satisf. Boolean Model. Comput..

[5]  Benjamin Müller,et al.  The SCIP Optimization Suite 5.0 , 2017, 2112.08872.

[6]  Olivier Roussel,et al.  A Translation of Pseudo Boolean Constraints to SAT , 2006, J. Satisf. Boolean Model. Comput..

[7]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[8]  Fahiem Bacchus,et al.  Solving MAXSAT by Solving a Sequence of Simpler SAT Instances , 2011, CP.

[9]  Lexing Xie,et al.  Action Schema Networks: Generalised Policies with Deep Learning , 2017, AAAI.

[10]  Shie Mannor,et al.  Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations , 2014, ICML.

[11]  Carsten Sinz,et al.  Towards an Optimal CNF Encoding of Boolean Cardinality Constraints , 2005, CP.

[12]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[13]  Eyal Amir,et al.  Learning Partially Observable Deterministic Action Models , 2005, IJCAI.

[14]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[15]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[16]  Scott Sanner,et al.  Nonlinear Hybrid Planning with Deep Net Learned Transition Models and Mixed-Integer Linear Programming , 2017, IJCAI.

[17]  Hilary Putnam,et al.  A Computing Procedure for Quantification Theory , 1960, JACM.

[18]  Lakhdar Sais,et al.  Efficient SAT-Based Encodings of Conditional Cardinality Constraints , 2018, LPAR.

[19]  Lakhdar Sais,et al.  A Pigeon-Hole Based Encoding of Cardinality Constraints , 2013, ISAIM.

[20]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[21]  Silvia Richter,et al.  The LAMA Planner: Guiding Cost-Based Anytime Planning with Landmarks , 2010, J. Artif. Intell. Res..

[22]  Albert Oliveras,et al.  Cardinality Networks and Their Applications , 2009, SAT.

[23]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[24]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[25]  Malte Helmert,et al.  Trial-Based Heuristic Tree Search for Finite Horizon MDPs , 2013, ICAPS.

[26]  Jean H. Gallier,et al.  Linear-Time Algorithms for Testing the Satisfiability of Propositional Horn Formulae , 1984, J. Log. Program..

[27]  Peter J. Stuckey,et al.  Sequencing Operator Counts , 2015, ICAPS.

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[30]  Scott Sanner,et al.  Planning in Factored State and Action Spaces with Learned Binarized Neural Network Transition Models , 2018, IJCAI.

[31]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[32]  Qiang Yang,et al.  Learning action models from plan examples using weighted MAX-SAT , 2007, Artif. Intell..

[33]  Corbeil-Essonnes The Legend of Zelda , 2011 .

[34]  Peter J. Stuckey,et al.  Encoding Linear Constraints into SAT , 2014, CP.