论文信息 - Compact and efficient encodings for planning in factored state and action spaces with learned Binarized Neural Network transition models

Compact and efficient encodings for planning in factored state and action spaces with learned Binarized Neural Network transition models

In this paper, we leverage the efficiency of Binarized Neural Networks (BNNs) to learn complex state transition models of planning domains with discretized factored state and action spaces. In order to directly exploit this transition structure for planning, we present two novel compilations of the learned factored planning problem with BNNs based on reductions to Weighted Partial Maximum Boolean Satisfiability (FD-SAT-Plan+) as well as Binary Linear Programming (FD-BLP-Plan+). Theoretically, we show that our SAT-based Bi-Directional Neuron Activation Encoding is asymptotically the most compact encoding relative to the current literature and supports Unit Propagation (UP) -- an important property that facilitates efficiency in SAT solvers. Experimentally, we validate the computational efficiency of our Bi-Directional Neuron Activation Encoding in comparison to an existing neuron activation encoding and demonstrate the ability to learn complex transition models with BNNs. We test the runtime efficiency of both FD-SAT-Plan+ and FD-BLP-Plan+ on the learned factored planning problem showing that FD-SAT-Plan+ scales better with increasing BNN size and complexity. Finally, we present a finite-time incremental constraint generation algorithm based on generalized landmark constraints to improve the planning accuracy of our encodings through simulated or real-world interaction.

Scott Sanner | Buser Say

[1] Carlos Guestrin,et al. Max-norm Projections for Factored MDPs , 2001, IJCAI.

[2] Brian Kingsbury,et al. New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3] Scott Sanner,et al. Metric Hybrid Factored Planning in Nonlinear Domains with Constraint Generation , 2019, CPAIOR.

[4] Niklas Sörensson,et al. Translating Pseudo-Boolean Constraints into SAT , 2006, J. Satisf. Boolean Model. Comput..

[5] Benjamin Müller,et al. The SCIP Optimization Suite 5.0 , 2017, 2112.08872.

[6] Olivier Roussel,et al. A Translation of Pseudo Boolean Constraints to SAT , 2006, J. Satisf. Boolean Model. Comput..

[7] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[8] Fahiem Bacchus,et al. Solving MAXSAT by Solving a Sequence of Simpler SAT Instances , 2011, CP.

[9] Lexing Xie,et al. Action Schema Networks: Generalised Policies with Deep Learning , 2017, AAAI.

[10] Shie Mannor,et al. Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations , 2014, ICML.

[11] Carsten Sinz,et al. Towards an Optimal CNF Encoding of Boolean Cardinality Constraints , 2005, CP.