Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control

Recent work has shown that reinforcement learning (RL) is a promising approach to control dynamical systems described by partial differential equations (PDE). This paper shows how to use RL to tackle more general PDE control problems that have continuous high-dimensional action spaces with spatial relationship among action dimensions. In particular, we propose the concept of action descriptors, which encode regularities among spatially-extended action dimensions and enable the agent to control high-dimensional action PDEs. We provide theoretical evidence suggesting that this approach can be more sample efficient compared to a conventional approach that treats each action dimension separately and does not explicitly exploit the spatial regularity of the action space. The action descriptor approach is then used within the deep deterministic policy gradient algorithm. Experiments on two PDE control problems, with up to 256-dimensional continuous actions, show the advantage of the proposed approach over the conventional one.

[1]  Patrick Gallinari,et al.  Fast Reinforcement Learning with Large Action Sets Using Error-Correcting Output Codes for MDP Factorization , 2012, ECML/PKDD.

[2]  Maneesh Kumar Singh,et al.  Lipschitz Properties for Deep Convolutional Networks , 2017, ArXiv.

[3]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[4]  M. Krstić Boundary Control of PDEs: A Course on Backstepping Designs , 2008 .

[5]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[6]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[7]  M.A. Wiering,et al.  Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[8]  C. P. Caulfield,et al.  Optimal mixing in two-dimensional plane Poiseuille flow at finite Péclet number , 2014, Journal of Fluid Mechanics.

[9]  Jianfeng Gao,et al.  Deep Reinforcement Learning with an Unbounded Action Space , 2015, ArXiv.

[10]  M J Lighthill,et al.  ON KINEMATIC WAVES.. , 1955 .

[11]  Haiyan Wang,et al.  Modeling Information Diffusion in Online Social Networks with Partial Differential Equations , 2013, Surveys and Tutorials in the Applied Mathematical Sciences.

[12]  J. Lions Optimal Control of Systems Governed by Partial Differential Equations , 1971 .

[13]  P. I. Richards Shock Waves on the Highway , 1956 .

[14]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[15]  A. Barron Approximation and Estimation Bounds for Artificial Neural Networks , 1991, COLT '91.

[16]  Shie Mannor,et al.  Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems , 2009, 2009 American Control Conference.

[17]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[18]  Jianfeng Gao,et al.  Deep Reinforcement Learning with a Natural Language Action Space , 2015, ACL.

[19]  Geoffrey E. Hinton,et al.  Reinforcement Learning with Factored States and Actions , 2004, J. Mach. Learn. Res..

[20]  Shie Mannor,et al.  Regularized Policy Iteration with Nonparametric Function Spaces , 2016, J. Mach. Learn. Res..

[21]  A. Petrisor,et al.  Modelling and simulation of a variable speed air-conditioning system , 2008, 2008 IEEE International Conference on Automation, Quality and Testing, Robotics.

[22]  Xiaobo Tan,et al.  Biomimetic robotic artificial muscles , 2013 .

[23]  James A. Warren,et al.  FiPy: Partial Differential Equations with Python , 2009, Computing in Science & Engineering.

[24]  Jason Pazis,et al.  Generalized Value Functions for Large Action Sets , 2011, ICML.

[25]  Jan Peters,et al.  Reinforcement Learning for Robotics , 2008, EWRL 2008.

[26]  John A. Burns,et al.  Control, estimation and optimization of energy efficient buildings , 2009, 2009 American Control Conference.

[27]  John A. Burns,et al.  Approximation methods for boundary control of the Boussinesq equations , 2013, 52nd IEEE Conference on Decision and Control.

[28]  Daniel Nikovski,et al.  Learning to control partial differential equations: Regularized Fitted Q-Iteration approach , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[29]  Steven L. Brunton,et al.  Machine Learning Control – Taming Nonlinear Dynamics and Turbulence , 2016, Fluid Mechanics and Its Applications.

[30]  B. R. Noack,et al.  Closed-Loop Turbulence Control: Progress and Challenges , 2015 .

[31]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[32]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[33]  Xiaoming He,et al.  Feedback stabilization of a thermal fluid system with mixed boundary control , 2016, Comput. Math. Appl..

[34]  Alexander Zelinsky,et al.  Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.

[35]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[36]  Richard Evans,et al.  Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[37]  S. Geer Empirical Processes in M-Estimation , 2000 .

[38]  Kavosh Asadi,et al.  Lipschitz Continuity in Model-based Reinforcement Learning , 2018, ICML.

[39]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[40]  Csaba Szepesvári,et al.  Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.

[41]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[42]  Peter Sunehag,et al.  Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions , 2015, ArXiv.

[43]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[44]  Daniel Nikovski,et al.  Random Projection Filter Bank for Time Series Data , 2017, NIPS.

[45]  E Weinan,et al.  Deep Learning-Based Numerical Methods for High-Dimensional Parabolic Partial Differential Equations and Backward Stochastic Differential Equations , 2017, Communications in Mathematics and Statistics.

[46]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[47]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[48]  Sergey Levine,et al.  Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.

[49]  Sunil Ahuja,et al.  Reduced-order models for control of stratified flows in buildings , 2011, Proceedings of the 2011 American Control Conference.

[50]  Alessandro Lazaric,et al.  Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.

[51]  Leemon C Baird,et al.  Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .

[52]  Daniel Nikovski,et al.  Deep reinforcement learning for partial differential equation control , 2017, 2017 American Control Conference (ACC).

[53]  José del R. Millán,et al.  Continuous-Action Q-Learning , 2002, Machine Learning.

[54]  Alexandre M. Bayen,et al.  Expert Level Control of Ramp Metering Based on Multi-Task Deep Reinforcement Learning , 2017, IEEE Transactions on Intelligent Transportation Systems.

[55]  Dimitri P. Bertsekas,et al.  Abstract Dynamic Programming , 2013 .

[56]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.