Safe Chance Constrained Reinforcement Learning for Batch Process Control

Reinforcement Learning (RL) controllers have generated excitement within the control community. The primary advantage of RL controllers relative to existing methods is their ability to optimize uncertain systems independently of explicit assumption of process uncertainty. Recent focus on engineering applications has been directed towards the development of safe RL controllers. Previous works have proposed approaches to account for constraint satisfaction through constraint tightening from the domain of stochastic model predictive control. Here, we extend these approaches to account for plant-model mismatch. Specifically, we propose a data-driven approach that utilizes Gaussian processes for the offline simulation model and use the associated posterior uncertainty prediction to account for joint chance constraints and plant-model mismatch. The method is benchmarked against nonlinear model predictive control via case studies. The results demonstrate the ability of the methodology to account for process uncertainty, enabling satisfaction of joint chance constraints even in the presence of plant-model mismatch.

[1]  P. Alam ‘T’ , 2021, Composites Engineering: An A–Z Guide.

[2]  Jingliang Duan,et al.  Separated Proportional-Integral Lagrangian for Chance Constrained Reinforcement Learning , 2021, 2021 IEEE Intelligent Vehicles Symposium (IV).

[3]  V. Strassen Gaussian elimination is not optimal , 1969 .

[4]  Jonathan P. How,et al.  Safe Reinforcement Learning With Model Uncertainty Estimates , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[5]  Moritz Diehl,et al.  CasADi -- A symbolic package for automatic differentiation and optimal control , 2012 .

[6]  Flagot Yohannes Derivative free optimization methods , 2012 .

[7]  Matthew Kelly,et al.  An Introduction to Trajectory Optimization: How to Do Your Own Direct Collocation , 2017, SIAM Rev..

[8]  Marcello Farina,et al.  An approach to output-feedback MPC of stochastic linear discrete-time systems , 2015, Autom..

[9]  장윤희,et al.  Y. , 2003, Industrial and Labor Relations Terms.

[10]  Danna Zhou,et al.  d. , 1840, Microbial pathogenesis.

[11]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[12]  Roger Frigola,et al.  Bayesian Time Series Learning with Gaussian Processes , 2015 .

[13]  Vicenç Gómez,et al.  A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.

[14]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[15]  Harikumar Kandath,et al.  Twin actor twin delayed deep deterministic policy gradient (TATD3) learning for batch process control , 2021, Comput. Chem. Eng..

[16]  P. Alam,et al.  H , 1887, High Explosives, Propellants, Pyrotechnics.

[17]  Jay H. Lee,et al.  Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes , 2005, Autom..

[18]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[19]  Matthew J. Realff,et al.  Managing uncertainty in data-driven simulation-based optimization , 2020, Comput. Chem. Eng..

[20]  Felix Berkenkamp,et al.  Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning , 2020, NeurIPS.

[21]  Mohammad Norouzi,et al.  An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.

[22]  Thorsten Joachims,et al.  MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.

[23]  R. Bhushan Gopaluni,et al.  A Meta-Reinforcement Learning Approach to Process Control , 2021, IFAC-PapersOnLine.

[24]  Panagiotis Petsagkourakis,et al.  Safe model-based design of experiments using Gaussian processes , 2021, Comput. Chem. Eng..

[25]  Anouck Girard,et al.  Safe Reinforcement Learning Using Robust Action Governor , 2021, L4DC.

[26]  Harikumar Kandath,et al.  Application of twin delayed deep deterministic policy gradient learning for the control of transesterification process , 2021, ArXiv.

[27]  Richard D. Braatz,et al.  Stochastic model predictive control with joint chance constraints , 2015, Int. J. Control.

[28]  Jay H. Lee,et al.  Reinforcement Learning - Overview of recent progress and implications for process control , 2019, Comput. Chem. Eng..

[29]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[30]  Gebräuchliche Fertigarzneimittel,et al.  V , 1893, Therapielexikon Neurologie.

[31]  Dongda Zhang,et al.  Using process data to generate an optimal control policy via apprenticeship and reinforcement learning , 2021, AIChE Journal.

[32]  Dongda Zhang,et al.  Kinetic Modeling and Process Analysis for Desmodesmus sp. Lutein Photo-Production , 2017 .

[33]  Luis A. Ricardez-Sandoval,et al.  Stochastic Back-Off Approach for Integration of Design and Control Under Uncertainty , 2018 .

[34]  G. Fitzgerald,et al.  'I. , 2019, Australian journal of primary health.

[35]  Subin Huh,et al.  Safe reinforcement learning for probabilistic reachability and safety specifications: A Lyapunov-based approach , 2020, ArXiv.

[36]  Byung Jun Park,et al.  A model-based deep reinforcement learning method applied to finite-horizon optimal control of nonlinear control-affine system , 2020 .

[37]  Chunlin Chen,et al.  Incremental Reinforcement Learning in Continuous Spaces via Policy Relaxation and Importance Weighting , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[39]  Marko Bacic,et al.  Model predictive control , 2003 .

[40]  Artur M. Schweidtmann,et al.  Dynamic modeling and optimization of sustainable algal production with uncertainty using multivariate Gaussian processes , 2018, Comput. Chem. Eng..

[41]  Fengqi You,et al.  Soft‐constrained model predictive control based on data‐driven distributionally robust optimization , 2020 .

[42]  Luis A. Ricardez-Sandoval,et al.  Integration of design and control for industrial-scale applications under uncertainty: a trust region approach , 2020, Comput. Chem. Eng..

[43]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[44]  Federico Galvanin,et al.  Chance Constrained Policy Optimization for Process Control and Optimization , 2020, ArXiv.

[45]  P. Frazier Bayesian Optimization , 2018, Hyperparameter Optimization in Machine Learning.

[46]  Tube‐enhanced multi‐stage model predictive control for flexible robust control of constrained linear systems with additive and parametric uncertainties , 2021, International Journal of Robust and Nonlinear Control.

[47]  Derek Machalek,et al.  Real-time optimization using reinforcement learning , 2020, Comput. Chem. Eng..

[48]  Dongda Zhang,et al.  Reinforcement Learning for Batch Bioprocess Optimization , 2019, Comput. Chem. Eng..

[49]  Haruhiko Ogasawara,et al.  The multiple Cantelli inequalities , 2019, Stat. Methods Appl..

[50]  William R. Clements,et al.  Estimating Risk and Uncertainty in Deep Reinforcement Learning , 2019, ArXiv.

[51]  Panagiotis Petsagkourakis,et al.  Constrained Model-Free Reinforcement Learning for Process Optimization , 2020, Comput. Chem. Eng..

[52]  Stefano Di Cairano,et al.  Stochastic Model Predictive Control , 2018, Handbook of Model Predictive Control.

[53]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[54]  Marcello Farina,et al.  An MPC approach to output-feedback control of stochastic linear discrete-time systems , 2014, ArXiv.

[55]  Artur M. Schweidtmann,et al.  Efficient multiobjective optimization employing Gaussian processes, spectral sampling and a genetic algorithm , 2018, Journal of Global Optimization.

[56]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[57]  L. Brown,et al.  Interval Estimation for a Binomial Proportion , 2001 .

[58]  Odalric-Ambrym Maillard,et al.  Robust-Adaptive Control of Linear Systems: beyond Quadratic Costs , 2020, NeurIPS.

[59]  Jonathan L. Wagner,et al.  Hybrid physics‐based and data‐driven modeling for bioprocess online simulation and optimization , 2019, Biotechnology and bioengineering.

[60]  Lorenz T. Biegler,et al.  On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming , 2006, Math. Program..

[61]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[62]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[63]  P. Alam ‘A’ , 2021, Composites Engineering: An A–Z Guide.

[64]  Lars Imsland,et al.  Stochastic data-driven model predictive control using Gaussian processes , 2019, Comput. Chem. Eng..

[65]  Dongda Zhang,et al.  Reinforcement Learning for Batch-to-Batch Bioprocess Optimisation , 2019, Computer Aided Chemical Engineering.

[66]  Luis A. Ricardez-Sandoval,et al.  A Novel Back-off Algorithm for Integration of Scheduling and Control of Batch Processes under Uncertainty , 2019 .

[67]  David Q. Mayne,et al.  Robust model predictive control using tubes , 2004, Autom..

[68]  Mario Zanon,et al.  Safe Reinforcement Learning Using Robust MPC , 2019, IEEE Transactions on Automatic Control.

[69]  Donald E. Kirk,et al.  Optimal control theory : an introduction , 1970 .

[70]  Nicolas Le Roux,et al.  Understanding the impact of entropy on policy optimization , 2018, ICML.

[71]  L. Biegler An overview of simultaneous strategies for dynamic optimization , 2007 .

[72]  Thomas de Quincey [C] , 2000, The Works of Thomas De Quincey, Vol. 1: Writings, 1799–1820.

[73]  Ye Yuan,et al.  Principled reward shaping for reinforcement learning via lyapunov stability theory , 2020, Neurocomputing.

[74]  R. Bhushan Gopaluni,et al.  Deep Reinforcement Learning for Process Control: A Primer for Beginners , 2020, ArXiv.

[75]  Georg Lindgren,et al.  Stationary Stochastic Processes: Theory and Applications , 2012 .

[76]  Kim P. Wabersich,et al.  A predictive safety filter for learning-based control of constrained nonlinear dynamical systems , 2021, Autom..

[77]  Nikolaos V. Sahinidis,et al.  A deep reinforcement learning approach for chemical production scheduling , 2020, Comput. Chem. Eng..

[78]  James Hensman,et al.  Identification of Gaussian Process State Space Models , 2017, NIPS.

[79]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[80]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[81]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[82]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[83]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[84]  Riccardo Scattolini,et al.  Stochastic Model Predictive Control of constrained linear systems with additive uncertainty , 2009, 2009 European Control Conference (ECC).

[85]  Vikash Kumar,et al.  A Game Theoretic Framework for Model Based Reinforcement Learning , 2020, ICML.

[86]  Anthony O'Hagan,et al.  Diagnostics for Gaussian Process Emulators , 2009, Technometrics.

[87]  Sandra Hirche,et al.  Scenario-based Optimal Control for Gaussian Process State Space Models , 2018, 2018 European Control Conference (ECC).

[88]  P. Alam ‘S’ , 2021, Composites Engineering: An A–Z Guide.

[89]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[90]  I. Sobol On the distribution of points in a cube and the approximate evaluation of integrals , 1967 .

[91]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[92]  P. Alam,et al.  R , 1823, The Herodotus Encyclopedia.

[93]  S. Levine,et al.  Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[94]  P. Alam ‘E’ , 2021, Composites Engineering: An A–Z Guide.

[95]  Victor M. Zavala,et al.  Dynamic penalty function approach for constraints handling in reinforcement learning , 2021, IFAC-PapersOnLine.

[96]  Gorjan Alagic,et al.  #p , 2019, Quantum information & computation.

[97]  E. S. Pearson,et al.  THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL , 1934 .

[98]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[99]  Willem Waegeman,et al.  Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods , 2019, Machine Learning.

[100]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[101]  Panagiotis Petsagkourakis,et al.  Real-time optimization meets Bayesian optimization and derivative-free optimization: A tale of modifier adaptation , 2021, Comput. Chem. Eng..