Continuous control of a polymerization system with deep reinforcement learning

Abstract Reinforcement learning is a branch of machine learning, where the machines gradually learn control behaviors via self-exploration of the environment. In this paper, we present a controller using deep reinforcement learning (DRL) with Deep Deterministic Policy Gradient (DDPG) for a non-linear semi-batch polymerization reaction. Several adaptations to apply DRL for chemical process control are addressed in this paper including the Markov state assumption, action boundaries and reward definition. This work illustrates that a DRL controller is capable of handling complicated control tasks for chemical processes with multiple inputs, non-linearity, large time delay and noise tolerance. The application of this AI-based framework, using DRL, is a promising direction in the field of chemical process control towards the goal of smart manufacturing.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[3]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[4]  R. B. Gopaluni,et al.  Deep reinforcement learning approaches for process control , 2017, 2017 6th International Symposium on Advanced Control of Industrial Processes (AdCONIP).

[5]  Wenbo Zhu,et al.  Online Optimal Feedback Control of Polymerization Reactors: Application to polymerization of acrylamide-water-potassium persulfate (KPS) system , 2017 .

[6]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[7]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[8]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[9]  Louis Wehenkel,et al.  Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[11]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[12]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[13]  Peter Stone,et al.  Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[14]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[15]  Dazhong Wu,et al.  Deep learning for smart manufacturing: Methods and applications , 2018, Journal of Manufacturing Systems.

[16]  John F. MacGregor,et al.  Latent variable MPC for trajectory tracking in batch processes , 2005 .

[17]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[18]  Anna Wolf,et al.  Polymerization Online Monitoring , 2010 .

[19]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[20]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[21]  Wayne F. Reed,et al.  Kinetics and Mechanisms of Acrylamide Polymerization from Absolute, Online Monitoring of Polymerization Reaction , 2001 .

[22]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .