Application of twin delayed deep deterministic policy gradient learning for the control of transesterification process

Persistent depletion of fossil fuels has encouraged mankind to look for alternatives fuels that are renewable and environment-friendly. One of the promising and renewable alternatives to fossil fuels is bio-diesel produced by means of batch transesterification process. Control of batch transesterification process is difficult due to its complex and non-linear dynamics. It is expected that some of these challenges can be addressed by developing control strategies which directly interact with the process and learning from the experiences. To achieve the same, this study explores the feasibility of reinforcement learning (RL) based control of batch transesterification process. In particular, the present study exploits the application of twin delayed deep deterministic policy gradient (TD3) based RL for the continuous control of batch transesterification process.This results showcase that TD3 based controller is able to control batch transesterification process and can be promising direction towards the goal of artificial intelligence based control in process industries.

[1]  Urmila M. Diwekar,et al.  Optimal control of biodiesel production in a batch reactor: Part II: Stochastic control , 2012 .

[2]  Jay H. Lee,et al.  ITERATIVE LEARNING CONTROL APPLIED TO BATCH PROCESSES: AN OVERVIEW , 2006 .

[3]  A. Mesbah,et al.  Stochastic Model Predictive Control: An Overview and Perspectives for Future Research , 2016, IEEE Control Systems.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[6]  M. Tabatabaei,et al.  Biodiesel production in batch tank reactor equipped to helical ribbon-like agitator. , 2012 .

[7]  Mohd Azlan Hussain,et al.  Approximate predictive versus self-tuning adaptive control strategies of biodiesel reactors , 2009 .

[8]  Grégory François,et al.  Modifier Adaptation for Real-Time Optimization—Methods and Applications , 2016 .

[9]  Wee Chin Wong,et al.  Approximate dynamic programming approach for process control , 2009 .

[10]  E. Barron,et al.  The Bellman equation for minimizing the maximum cost , 1989 .

[11]  A. Chanpirak,et al.  Improvement of Biodiesel Production in Batch Transesterification Process , 2022 .

[12]  Farouq S. Mjalli,et al.  Recursive Least Squares-Based Adaptive Control of a Biodiesel Transesterification Reactor , 2010 .

[13]  Khalizani Khalid,et al.  Transesterification of Palm Oil for the Production of Biodiesel , 2011 .

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Gavriel Salomon,et al.  T RANSFER OF LEARNING , 1992 .

[16]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[17]  Farouq S. Mjalli,et al.  Dynamics and control of a biodiesel transesterification reactor , 2009 .

[18]  Wenfeng Zheng,et al.  Modeling a Continuous Locomotion Behavior of an Intelligent Agent Using Deep Reinforcement Technique. , 2019, 2019 IEEE 2nd International Conference on Computer and Communication Engineering Technology (CCET).

[19]  Jay H. Lee,et al.  Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes , 2005, Autom..

[20]  Jong Min Lee,et al.  A Study on Architecture, Algorithms, and Applications of Approximate Dynamic Programming Based Approach to Optimal Control , 2004 .

[21]  R. Kern,et al.  Advanced control with parameter estimation of batch transesterification reactor , 2015 .

[22]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[23]  E. H. Pryde,et al.  Variables affecting the yields of fatty esters from transesterified vegetable oils , 1984 .

[24]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[25]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[26]  Lino O. Santos,et al.  Nonlinear model predictive control of biodiesel production via transesterification of used vegetable oils , 2013 .

[27]  Sharad Bhartiya,et al.  Dynamic optimization of a batch transesterification process for biodiesel production , 2016, 2016 Indian Control Conference (ICC).

[28]  Vikas Vikram Singh,et al.  Reinforcement learning based control of batch polymerisation processes , 2020 .

[29]  R. Bhushan Gopaluni,et al.  Deep Reinforcement Learning for Process Control: A Primer for Beginners , 2020, ArXiv.

[30]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[31]  Anurag S. Rathore,et al.  Reinforcement learning based optimization of process chromatography for continuous processing of biopharmaceuticals , 2021, Chemical Engineering Science.

[32]  J H Lee,et al.  Approximate dynamic programming approach for process control , 2010, ICCAS 2010.

[33]  Lino O. Santos,et al.  First principle modeling and predictive control of a continuous biodiesel plant , 2016 .

[34]  Jung-Su Kim,et al.  Motion Planning of Robot Manipulators for a Smoother Path Using a Twin Delayed Deep Deterministic Policy Gradient with Hindsight Experience Replay , 2020, Applied Sciences.

[35]  Yujun Wang,et al.  Transesterification of soybean oil to biodiesel using CaO as a solid base catalyst , 2008 .

[36]  Rui Nian,et al.  A review On reinforcement learning: Introduction and applications in industrial process control , 2020, Comput. Chem. Eng..