Nonlinear Hybrid Planning with Deep Net Learned Transition Models and Mixed-Integer Linear Programming

In many real-world hybrid (mixed discrete continuous) planning problems such as Reservoir Control, Heating, Ventilation and Air Conditioning (HVAC), and Navigation, it is difficult to obtain a model of the complex nonlinear dynamics that govern state evolution. However, the ubiquity of modern sensors allow us to collect large quantities of data from each of these complex systems and build accurate, nonlinear deep network models of their state transitions. But there remains one major problem for the task of control – how can we plan with deep network learned transition models without resorting to Monte Carlo Tree Search and other black-box transition model techniques that ignore model structure and do not easily extend to mixed discrete and continuous domains? In this paper, we make the critical observation that the popular Rectified Linear Unit (ReLU) transfer function for deep networks not only allows accurate nonlinear deep net model learning, but also permits a direct compilation of the deep network transition model to a MixedInteger Linear Program (MILP) encoding in a planner we call Hybrid Deep MILP Planning (HDMILP-PLAN). We identify deep net specific optimizations and a simple sparsification method for HD-MILP-PLAN that improve performance over a naı̈ve encoding, and show that we are able to plan optimally with respect to the learned deep network.

[1]  Patrik Haslum,et al.  Numeric Planning with Disjunctive Global Constraints via SMT , 2016, ICAPS.

[2]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Patrik Haslum,et al.  Optimal Planning with Global Numerical State Constraints , 2014, ICAPS.

[4]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[5]  Malte Helmert,et al.  Trial-Based Heuristic Tree Search for Finite Horizon MDPs , 2013, ICAPS.

[6]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[7]  Michael L. Littman,et al.  Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes , 2012, ICAPS.

[8]  Maria Fox,et al.  A Compilation of the Full PDDL+ Language into SMT , 2016, ICAPS.

[9]  Maria Fox,et al.  Modelling Mixed Discrete-Continuous Domains for Planning , 2006, J. Artif. Intell. Res..

[10]  Robert P. Goldman,et al.  SMT-Based Nonlinear PDDL+ Planning , 2015, AAAI.

[11]  Maria Fox,et al.  Heuristic Planning for Hybrid Systems , 2016, AAAI.

[12]  R. Findeisen,et al.  Nonlinear Model Predictive Path-Following Control , 2009 .

[13]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[14]  Thomas Weng,et al.  Occupancy-driven energy management for smart building automation , 2010, BuildSys '10.

[15]  Benedetto Intrigila,et al.  UPMurphi: A Tool for Universal Planning on PDDL+ Problems , 2009, ICAPS.

[16]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[17]  William W.-G. Yeh,et al.  Reservoir Management and Operations Models: A State‐of‐the‐Art Review , 1985 .

[18]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[19]  Patrik Haslum,et al.  Interval-Based Relaxation for General Numeric Planning , 2016, ECAI.

[20]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[21]  Bernhard Nebel,et al.  A Planning Based Framework for Controlling Hybrid Systems , 2012, ICAPS.

[22]  Fabio D'Andreagiovanni,et al.  Towards an accurate solution of wireless network design problems , 2016, ISCO.

[23]  Andrew Coles,et al.  A Hybrid LP-RPG Heuristic for Modelling Numeric Resource Flows in Planning , 2014, J. Artif. Intell. Res..