Data-driven modeling approaches to support wastewater treatment plant operation

Data-driven modeling techniques are applied to process data from wastewater treatment plants to provide valuable additional information for optimal plant control. The application of data-driven modeling techniques, however, bears some risk because the generated models are of non-mechanistic nature and they thus do not always describe the plant processes appropriately. In this study, a procedure to build software sensors based on sensor data available in the process information system is defined and used to compare several techniques suitable for data-driven modeling, including generalized least squares regression, artificial neural networks, self-organizing maps and random forests. Three different degrees of expert knowledge are defined and considered mainly for optimum input signal selection and model interpretation. In two full-scale experiments, software sensors are created. The experiments reveal that even with linear modeling techniques, it is possible to automatically generate accurate software sensors. Hence, this justifies the selection of the most parsimonious and transparent models and to motivate their investigation by taking into account available expert knowledge. A high degree of expert knowledge is valuable for long-term accuracy, but can lead to performance decreases in short-term predictions. With regard to safe on-site deployment, the consideration of uncertainty measures is crucial to prevent misinterpretation of software-sensor outputs in the cases of rare events or model input failures.

[1]  N. Fleischmann,et al.  A multivariate calibration procedure for UV/VIS spectrometric quantification of organic matter and nitrate in wastewater. , 2003, Water science and technology : a journal of the International Association on Water Pollution Research.

[2]  Athanasios Tsakonas,et al.  A comparison of classification accuracy of four genetic programming-evolved intelligent structures , 2006, Inf. Sci..

[3]  G T Daigger,et al.  A practitioner's perspective on the uses and future developments for wastewater treatment modelling. , 2011, Water science and technology : a journal of the International Association on Water Pollution Research.

[4]  E. Süli,et al.  Numerical Solution of Ordinary Differential Equations , 2021, Foundations of Space Dynamics.

[5]  小寺 武康,et al.  On the theory of the Brownian motion , 1959 .

[6]  Ingo Wegener,et al.  Modified repeated median filters , 2006, Stat. Comput..

[7]  Scott C. James,et al.  ON-LINE ESTIMATION IN BIOREACTORS: A REVIEW , 2000 .

[8]  George Tchobanoglous,et al.  Wastewater Engineering Treatment and Reuse ( Fourth Edition ) , 2011 .

[9]  Dae Sung Lee,et al.  Nonlinear dynamic partial least squares modeling of a full-scale biological wastewater treatment plant , 2006 .

[10]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[11]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[12]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[13]  D. Braun,et al.  The role of the flow pattern in wastewater aeration tanks. , 2010, Water science and technology : a journal of the International Association on Water Pollution Research.

[14]  M. Forina,et al.  Multivariate calibration. , 2007, Journal of chromatography. A.

[15]  O. A. Sotomayor,et al.  Software sensor for on-line estimation of the microbial activity in activated sludge systems. , 2002, ISA transactions.

[16]  Mukta Paliwal,et al.  Assessing the contribution of variables in feed forward neural network , 2011, Appl. Soft Comput..

[17]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[18]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[19]  J P Steyer,et al.  Software sensor design for COD estimation in an anaerobic fluidized bed reactor. , 2001, Water science and technology : a journal of the International Association on Water Pollution Research.

[20]  Matthew P. Evett,et al.  Numeric Mutation Improves the Discovery of Numeric Constants in Genetic Programming , 2007 .

[21]  S. Reinikainen,et al.  Wavelength selection using the measure of topological relevance on the self‐organizing map , 2008 .

[22]  Poul Harremoës,et al.  Software sensors based on the grey-box modelling approach , 1996 .

[23]  G Langergraber,et al.  Time-resolved delta spectrometry: a method to define alarm parameters from spectral data. , 2004, Water science and technology : a journal of the International Association on Water Pollution Research.

[24]  Philippe Bogaerts,et al.  Monitoring of bioprocesses: mechanistic and data-driven approaches , 2009 .

[25]  T. Patel,et al.  The efficiency of gravity distribution devices for on-site wastewater treatment systems. , 2008, Water science and technology : a journal of the International Association on Water Pollution Research.

[26]  Esa Alhoniemi,et al.  SOM Toolbox for Matlab 5 , 2000 .

[27]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[28]  C K Yoo,et al.  A systematic approach to data-driven modeling and soft sensing in a full-scale plant. , 2009, Water science and technology : a journal of the International Association on Water Pollution Research.

[29]  David J. Hill,et al.  Anomaly detection in streaming environmental sensor data: A data-driven modeling approach , 2010, Environ. Model. Softw..

[30]  Krist V. Gernaey,et al.  Artificial neural networks for rapid WWTP performance evaluation: Methodology and case study , 2007, Environ. Model. Softw..

[31]  Olcay Tünay,et al.  The effect of reactor hydraulics on the performance of activated sludge systems—I. The traditional modelling approach , 1989 .

[32]  Karel J. Keesman,et al.  On compartmental modelling of mixing phenomena , 2002 .

[33]  B H Jun,et al.  Fault detection using dynamic time warping (DTW) algorithm and discriminant analysis for swine wastewater treatment. , 2011, Journal of hazardous materials.

[34]  Yves Grandvalet,et al.  Software sensor design based on empirical data , 1999 .

[35]  Laurent Simon,et al.  Data‐Based Modeling and Analysis of Bioprocesses: Some Real Experiences , 2003, Biotechnology progress.

[36]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  L Rieger,et al.  Computer-aided monitoring and operation of continuous measuring devices. , 2004, Water science and technology : a journal of the International Association on Water Pollution Research.

[38]  Xiaofeng Liu,et al.  Computational Fluid Dynamics (CFD) modeling of flow into the aerated grit chamber of the MWRD's North Side Water Reclamation Plant, Illinois. , 2010 .

[39]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[40]  Thomas Reinartz,et al.  CRISP-DM 1.0: Step-by-step data mining guide , 2000 .

[41]  D. Himmelblau Accounts of Experiences in the Application of Artificial Neural Networks in Chemical Engineering , 2008 .

[42]  Serge Domenech,et al.  Computer aided synthesis of RTD models to simulate the air flow distribution in ventilated rooms , 2001 .

[43]  W. Frey,et al.  Nitrification Inhibition - A Source Identification Method for Combined Municipal and/or Industrial Wastewater Treatment Plants , 1992 .

[44]  R. Mazo On the theory of brownian motion , 1973 .

[45]  U Jeppsson,et al.  Multivariate on-line monitoring: challenges and solutions for modern wastewater treatment operation. , 2003, Water science and technology : a journal of the International Association on Water Pollution Research.

[46]  Peter Krebs,et al.  Temperature as an alternative tracer for the determination of the mixing characteristics in wastewater treatment plants. , 2010, Water research.

[47]  Krist V. Gernaey,et al.  Activated sludge wastewater treatment plant modelling and simulation: state of the art , 2004, Environ. Model. Softw..

[48]  J Lafuente,et al.  In-line fast OUR (oxygen uptake rate) measurements for monitoring and control of WWTP. , 2002, Water science and technology : a journal of the International Association on Water Pollution Research.

[49]  Mogens Henze,et al.  Activated sludge models ASM1, ASM2, ASM2d and ASM3 , 2015 .

[50]  B. Carlsson,et al.  Estimation of the respiration rate and oxygen transfer function utilizing a slow do sensor , 1996 .

[51]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[52]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[53]  Niels Kjølstad Poulsen,et al.  Identification of wastewater treatment processes for nutrient removal on a full-scale WWTP by statistical methods , 1994 .

[54]  P A Vanrolleghem,et al.  Dynamic monitoring system for full-scale wastewater treatment plants. , 2004, Water science and technology : a journal of the International Association on Water Pollution Research.

[55]  M. KaltehA.,et al.  Review of the self-organizing map (SOM) approach in water resources , 2008 .

[56]  Willi H. Hager,et al.  Wastewater Hydraulics: Theory and Practice , 1999 .

[57]  Antanas Verikas,et al.  Mining data with random forests: A survey and results of new tests , 2011, Pattern Recognit..

[58]  Michel Verleysen,et al.  The Curse of Dimensionality in Data Mining and Time Series Prediction , 2005, IWANN.

[59]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[60]  Young-Seuk Park,et al.  Review of the Self-Organizing Map (SOM) approach in water resources: Commentary , 2009, Environ. Model. Softw..

[61]  Paola Mello,et al.  Formal verification of wastewater treatment processes using events detected from continuous signals by means of artificial neural networks. Case study: SBR plant , 2010, Environ. Model. Softw..

[62]  A. M. Kalteh,et al.  Review of the self-organizing map (SOM) approach in water resources: Analysis, modelling and application , 2008, Environ. Model. Softw..

[63]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[64]  Bernard De Baets,et al.  Data-driven fuzzy habitat suitability models for brown trout in Spanish Mediterranean rivers , 2011, Environ. Model. Softw..

[65]  Mark J. Willis,et al.  Steady-state modelling of chemical process systems using genetic programming , 1997 .

[66]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[67]  S Winkler,et al.  In-situ measurement of ammonium and nitrate in the activated sludge process. , 2002, Water science and technology : a journal of the International Association on Water Pollution Research.

[68]  C Rosen,et al.  Multivariate and multiscale monitoring of wastewater treatment operation. , 2001, Water research.

[69]  Lutgarde M. C. Buydens,et al.  Self- and Super-organizing Maps in R: The kohonen Package , 2007 .

[70]  Eamonn J. Keogh,et al.  Scaling up Dynamic Time Warping to Massive Dataset , 1999, PKDD.

[71]  Young-Seuk Park,et al.  Application of a self-organizing map to select representative species in multivariate analysis: A case study determining diatom distribution patterns across France , 2006, Ecol. Informatics.

[72]  L. Canetta *,et al.  Applying two-stage SOM-based clustering approaches to industrial data analysis , 2005 .

[73]  Lluís A. Belanche Muñoz,et al.  Towards a model of input-output behaviour os wastewater treatment plants using soft computing techniques , 1999, Environ. Model. Softw..

[74]  W. Gujer,et al.  Systems Analysis for Water Technology , 2008 .

[75]  Karel J. Keesman,et al.  System Identification: An Introduction , 2011 .

[76]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[77]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[78]  Terence Soule,et al.  Effects of Code Growth and Parsimony Pressure on Populations in Genetic Programming , 1998, Evolutionary Computation.

[79]  D. Cecil,et al.  Software sensors are a real alternative to true sensors , 2010, Environ. Model. Softw..

[80]  Michalis Vazirgiannis,et al.  Clustering validity checking methods: part II , 2002, SGMD.

[81]  Serge Domenech,et al.  Modelling systems defined by RTD curves , 2008, Comput. Chem. Eng..

[82]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[83]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[84]  N D Lourenço,et al.  UV spectra analysis for water quality monitoring in a fuel park wastewater treatment plant. , 2006, Chemosphere.

[85]  Andrea G. Capodaglio,et al.  Wastewater Treatment Plants under Transient Loading – Performance, Modelling and Control , 1993 .

[86]  J Alex,et al.  Model structure identification for wastewater treatment simulation based on computational fluid dynamics. , 2002, Water science and technology : a journal of the International Association on Water Pollution Research.

[87]  Kris Villez,et al.  Performance evaluation of fault detection methods for wastewater treatment processes , 2011, Biotechnology and bioengineering.

[88]  Dimitri P. Solomatine,et al.  Data-Driven Modelling: Concepts, Approaches and Experiences , 2009 .

[89]  U Jeppsson,et al.  ON-LINE ESTIMATION AND DETECTION OF ABNORMAL SUBSTRATE CONCENTRATIONS IN WWTPS USING A SOFTWARE SENSOR: A BENCHMARK STUDY , 2007, Environmental technology.

[90]  Alex A. Freitas,et al.  The Knowledge Discovery Process , 2000 .

[91]  Åsa Jansson,et al.  Development of a software sensor for phosphorus in municipal wastewater , 2002 .

[92]  David West,et al.  Predictive modeling for wastewater applications: Linear and nonlinear approaches , 2009, Environ. Model. Softw..

[93]  Mario Graff,et al.  System Identification Using Genetic Programming and Gene Expression Programming , 2005, ISCIS.

[94]  Elizabeth A. Peck,et al.  Introduction to Linear Regression Analysis , 2001 .

[95]  C. W. Chan Editorial: special issue on data-driven modelling methods and their applications , 2003, Int. J. Syst. Sci..