Model-free control based on reinforcement learning for a wastewater treatment problem

This article presents a proposal, based on the model-free learning control (MFLC) approach, for the control of the advanced oxidation process in wastewater plants. This is prompted by the fact that many organic pollutants in industrial wastewaters are resistant to conventional biological treatments, and the fact that advanced oxidation processes, controlled with learning controllers measuring the oxidation-reduction potential (ORP), give a cost-effective solution. The proposed automation strategy denoted MFLC-MSA is based on the integration of reinforcement learning with multiple step actions. This enables the most adequate control strategy to be learned directly from the process response to selected control inputs. Thus, the proposed methodology is satisfactory for oxidation processes of wastewater treatment plants, where the development of an adequate model for control design is usually too costly. The algorithm proposed has been tested in a lab pilot plant, where phenolic wastewater is oxidized to carboxylic acids and carbon dioxide. The obtained experimental results show that the proposed MFLC-MSA strategy can achieve good performance to guarantee on-specification discharge at maximum degradation rate using readily available measurements such as pH and ORP, inferential measurements of oxidation kinetics and peroxide consumption, respectively.

[1]  Fernando Tadeo,et al.  Model-free learning control of neutralization processes using reinforcement learning , 2007, Eng. Appl. Artif. Intell..

[2]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[3]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[4]  S. Syafiie,et al.  Learning to Control pH Processes at Multiple Time Scales: Performance Assessment in a Laboratory Plant , 2007 .

[5]  R. J. Bigda Consider Fenton`s chemistry for wastewater treatment , 1995 .

[6]  J J Rodríguez,et al.  Chemical pathway and kinetics of phenol oxidation by Fenton's reagent. , 2005, Environmental science & technology.

[7]  Fernando Tadeo,et al.  Softmax and ε-greedy policies applied to process control , 2004 .

[8]  Jer-Yiing Houng,et al.  Real-time control of an immobilized-cell reactor for wastewater treatment using ORP. , 2002, Water research.

[9]  Fernando Tadeo,et al.  Intelligent Control Based on Reinforcement Learning for Batch Thermal Sterilization of Canned Foods , 2008 .

[10]  Gerd Behrmann,et al.  IFAC World Congress , 2005 .

[11]  Donald S. Mavinic,et al.  Real-Time Control of Two-Stage Sequencing Batch Reactor System for the Treatment of Animal Wastewater , 1998 .

[12]  Ruey-Fang Yu,et al.  Feed-forward dose control of wastewater chlorination using on-line pH and ORP titration. , 2004, Chemosphere.

[13]  Jay H. Lee,et al.  Approximate dynamic programming based approach to process control and scheduling , 2006, Comput. Chem. Eng..

[14]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[15]  H. Gallard,et al.  Catalytic Decomposition of Hydrogen Peroxide by Fe(III) in Homogeneous Aqueous Solution: Mechanism and Kinetic Modeling , 1999 .

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[18]  Y Z Peng,et al.  Use pH and ORP as fuzzy control parameters of denitrification in SBR process. , 2002, Water science and technology : a journal of the International Association on Water Pollution Research.

[19]  Namgoo Kang,et al.  Kinetic modeling of fenton oxidation of phenol and monochlorophenols. , 2002, Chemosphere.

[20]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[21]  Etienne Paul,et al.  Process state evaluation of alternating oxic-anoxic activated sludge using ORP, pH and DO , 1998 .

[22]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[23]  Leslie Pack Kaelbling,et al.  Making Reinforcement Learning Work on Real Robots , 2002 .

[24]  Baikun Li,et al.  The Application of ORP in Activated Sludge Wastewater Treatment Processes , 2001 .

[25]  Martin A. Riedmiller,et al.  Learning to Control at Multiple Time Scales , 2003, ICANN.

[26]  Jay H. Lee,et al.  Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes , 2005, Autom..

[27]  Z. Rappoport,et al.  The chemistry of phenols , 2003 .

[28]  Michael Kearns,et al.  Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.