Reinforcement Learning - Overview of recent progress and implications for process control

Abstract This paper provides an introduction to Reinforcement Learning (RL) technology, summarizes recent developments in this area, and discusses their potential implications for the field of process control, and more generally, of operational decision-making. The paper begins with an introduction to RL that allows an agent to learn, through trial and error, the best way to accomplish a task. We then highlight new developments in RL that have led to the recent wave of applications and media interest. A comparison of the key features of RL and mathematical programming based methods (e.g., model predictive control) is then presented to clarify their similarities and differences. This is followed by an assessment of several ways that RL technology can potentially be used in process control and operational decision applications. A final section summarizes our conclusions and lists directions for future RL research that may improve its relevance for the process systems engineering field.

[1]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[2]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[3]  Warren B. Powell,et al.  Information Collection on a Graph , 2011, Oper. Res..

[4]  Diana Barro,et al.  Combining stochastic programming and optimal control to decompose multistage stochastic optimization problems , 2016, OR Spectr..

[5]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[6]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[7]  Jay H. Lee,et al.  Machine learning: Overview of the recent progresses and implications for the process systems engineering field , 2017, Comput. Chem. Eng..

[8]  J. Dupacová,et al.  Comparison of multistage stochastic programs with recourse and stochastic dynamic programs with discrete time , 2002 .

[9]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10]  Derong Liu,et al.  Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming , 2014 .

[11]  Jong Min Lee,et al.  Approximate Dynamic Programming Strategies and Their Applicability for Process Control: A Review and Future Directions , 2004 .

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Warren B. Powell,et al.  SMART: A Stochastic Multiscale Model for the Analysis of Energy Resources, Technology, and Policy , 2012, INFORMS J. Comput..

[14]  Juan E. Morinelly,et al.  Dual MPC with Reinforcement Learning , 2016 .

[15]  S. Joe Qin,et al.  A survey of industrial model predictive control technology , 2003 .

[16]  Eswaran Subrahmanian,et al.  A comparison of optimal control and stochastic programming from a formulation and computation perspective , 2004, Comput. Chem. Eng..

[17]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[18]  Tamer Basar,et al.  Dual Control Theory , 2001 .

[19]  D. Himmelblau,et al.  Optimization of Chemical Processes , 1987 .

[20]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[21]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[22]  Sunil Kumar,et al.  Decision , Risk & Operations Working Papers Series Approximate and Data-Driven Dynamic Programming for Queueing Networks , 2008 .

[23]  J H Lee,et al.  Approximate dynamic programming approach for process control , 2010, ICCAS 2010.

[24]  Nikolaos E. Pratikakis,et al.  Multistage decisions and risk in Markov decision processes: towards effective approximate dynamic programming architectures , 2008 .

[25]  S. Bhatnagar,et al.  Hierarchical decision making in semiconductor fabs using multi-time scale Markov decision processes , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[26]  Marshall L. Fisher,et al.  Ending Inventory Valuation in Multiperiod Production Scheduling , 2001, Manag. Sci..

[27]  Jay H. Lee,et al.  Multi-time scale procurement planning considering multiple suppliers and uncertainty in supply and demand , 2016, Comput. Chem. Eng..

[28]  Jan Peters,et al.  Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..

[29]  Haibo He,et al.  Data-Driven Tracking Control With Adaptive Dynamic Programming for a Class of Continuous-Time Nonlinear Systems , 2017, IEEE Transactions on Cybernetics.

[30]  Lucian Busoniu,et al.  Reinforcement learning for control: Performance, stability, and deep approximators , 2018, Annu. Rev. Control..

[31]  Jay H. Lee,et al.  Operational planning and optimal sizing of microgrid considering multi-scale wind uncertainty , 2017 .

[32]  Xin Zhang,et al.  Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method , 2011, IEEE Transactions on Neural Networks.

[33]  Shie Mannor,et al.  Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..

[34]  Yuxi Li,et al.  Deep Reinforcement Learning , 2018, Reinforcement Learning for Cyber-Physical Systems.

[35]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[36]  Marc Peter Deisenroth,et al.  Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control , 2017, AISTATS.

[37]  Christos T. Maravelias,et al.  Integration of production planning and scheduling: Overview, challenges and opportunities , 2009, Comput. Chem. Eng..

[38]  Ian H. Witten,et al.  An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..

[39]  David Pisinger,et al.  A combined stochastic programming and optimal control approach to personal finance and pensions , 2014, OR Spectr..

[40]  Jay H. Lee,et al.  Crude Selection Integrated with Optimal Refinery Operation by Combining Optimal Learning and Mathematical Programming , 2017 .

[41]  Alberto Bemporad,et al.  A survey on explicit model predictive control , 2009 .

[42]  Peter Stone,et al.  Reinforcement learning , 2019, Scholarpedia.

[43]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[44]  Warren B. Powell,et al.  Approximate dynamic programming with correlated Bayesian beliefs , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[45]  Jay H. Lee,et al.  Energy supply planning and supply chain optimization under uncertainty , 2014 .

[46]  Eswaran Subrahmanian,et al.  Design and planning under uncertainty: issues on problem formulation and solution , 2003, Comput. Chem. Eng..

[47]  Richard S. Sutton,et al.  Reinforcement Learning is Direct Adaptive Optimal Control , 1992, 1991 American Control Conference.

[48]  Nikolaos V. Sahinidis,et al.  Optimization under uncertainty: state-of-the-art and opportunities , 2004, Comput. Chem. Eng..

[49]  Ignacio E. Grossmann,et al.  Enterprise‐wide optimization: A new frontier in process systems engineering , 2005 .

[50]  Mark A. Shayman,et al.  Multitime scale Markov decision processes , 2003, IEEE Trans. Autom. Control..

[51]  Zhong-Ping Jiang,et al.  Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design , 2016, Autom..