A random forest model for inflow prediction at wastewater treatment plants

Influent flow of wastewater treatment plants (WWTPs) is a crucial variable for plant operation and management. In this study, a random forest (RF) model was applied for daily wastewater inflow prediction, and a new probabilistic prediction approach was, for the first time, applied for quantifying the uncertainties associated with wastewater inflow prediction. The RF model uses regression trees to capture the nonlinear relationship between wastewater inflow and various influencing factors, such as weather features and domestic water usage patterns. The proposed model was applied to the daily wastewater inflow prediction for two WWTPs (i.e., Humber and one confidential plant) in Ontario, Canada. For the confidential WWTP, the coefficient of determination (R2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{R}^{2}$$\end{document}) values for training and testing were 0.971 and 0.722, respectively. The R2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{R}^{2}$$\end{document} values at the Humber WWTP were 0.957 and 0.584 for training and testing, respectively. In comparison with other approaches such as the multilayer perceptron neural networks (MLP) models and autoregressive integrated moving average models, the results show that the RF model performs well on predicting inflow. In addition, probabilistic prediction of daily inflow was generated. For the Humber station, 93.56% of the total testing samples fall into its corresponding predicted interval. For the confidential plant, 78 observed values of the total 89 samples fall into its corresponding interval, accounting for 87.64% of the total testing samples. The results show that the probabilistic approach can provide robust decision support for the operation, management, and optimization of WWTPs.

[1]  Demetris Koutsoyiannis,et al.  Comparison of stochastic and machine learning methods for multi-step ahead forecasting of hydrological processes , 2019, Stochastic Environmental Research and Risk Assessment.

[2]  Elfatih M. Abdel-Rahman,et al.  Random forest regression and spectral band selection for estimating sugarcane leaf nitrogen concentration using EO-1 Hyperion hyperspectral data , 2013 .

[3]  I. R. Dunsmore,et al.  A Bayesian Approach to Calibration , 1968 .

[4]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[5]  Willi Gujer,et al.  Data-driven modeling approaches to support wastewater treatment plant operation , 2012, Environ. Model. Softw..

[6]  Hüsamettin Bulut,et al.  Analysis of variable-base heating and cooling degree-days for Turkey , 2001 .

[7]  C. Mello,et al.  Development and application of a simple hydrologic model simulation for a Brazilian headwater basin , 2008 .

[8]  Spencer Snowling,et al.  Influent Forecasting for Wastewater Treatment Plants in North America , 2019, Sustainability.

[9]  J H Ko,et al.  Forecasting influent flow rate and composition with occasional data for supervisory management system by time series model. , 2006, Water science and technology : a journal of the International Association on Water Pollution Research.

[10]  Andrew Kusiak,et al.  Prediction of Influent Flow Rate: Data-Mining Approach , 2013 .

[11]  Brett A. McKinney,et al.  Random forest regression prediction of solid particle Erosion in elbows , 2018, Powder Technology.

[12]  Chenglin Wen,et al.  Fault Detection Using Random Projections and k-Nearest Neighbor Rule for Semiconductor Manufacturing Processes , 2015, IEEE Transactions on Semiconductor Manufacturing.

[13]  Philipp Probst,et al.  To tune or not to tune the number of trees in random forest? , 2017, J. Mach. Learn. Res..

[14]  Demetris Koutsoyiannis,et al.  One-step ahead forecasting of geophysical processes within a purely statistical framework , 2018, Geoscience Letters.

[15]  Ana I. González Acuña An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, Boosting, and Randomization , 2012 .

[16]  K H Ahn,et al.  A high filtration system with synthetic permeable media for wastewater reclamation. , 2006, Water science and technology : a journal of the International Association on Water Pollution Research.

[17]  Anthony Gar-On Yeh,et al.  Urban Simulation Using Neural Networks and Cellular Automata for Land Use Planning , 2002 .

[18]  Jesús M. Zamarreño,et al.  Prediction of hourly energy consumption in buildings based on a feedback artificial neural network , 2005 .

[19]  Hoshin Vijai Gupta,et al.  Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling , 2009 .

[20]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[21]  Rayman Preet Singh,et al.  On hourly home peak load prediction , 2012, 2012 IEEE Third International Conference on Smart Grid Communications (SmartGridComm).

[22]  V. Jothiprakash,et al.  Improving the performance of data-driven techniques through data pre-processing for modelling daily reservoir inflow , 2011 .

[23]  S. Jain,et al.  Fitting of Hydrologic Models: A Close Look at the Nash–Sutcliffe Index , 2008 .

[24]  R. W. Skaggs,et al.  Evaluation of a watershed scale forest hydrologic model , 1997 .

[25]  Wei-Yin Loh,et al.  Classification and Regression Tree Methods , 2008 .

[26]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[27]  D. Muschalla,et al.  Potential and limitations of modern equipment for real time control of urban wastewater systems , 2013 .

[28]  J. Nash,et al.  River flow forecasting through conceptual models part I — A discussion of principles☆ , 1970 .

[29]  A. Langousis,et al.  A Brief Review of Random Forests for Water Scientists and Practitioners and Their Recent History in Water Resources , 2019, Water.

[30]  Georgia Papacharalampous,et al.  How to explain and predict the shape parameter of the generalized extreme value distribution of streamflow extremes using a big dataset , 2018, Journal of Hydrology.

[31]  U. Grömping Dependence of Variable Importance in Random Forests on the Shape of the Regressor Space , 2009 .

[32]  Jeffrey G. Arnold,et al.  Model Evaluation Guidelines for Systematic Quantification of Accuracy in Watershed Simulations , 2007 .

[33]  Martin Paegelow,et al.  Geomatic Approaches for Modeling Land Change Scenarios , 2018 .

[34]  Johannes R. Sveinsson,et al.  Random Forests for land cover classification , 2006, Pattern Recognit. Lett..

[35]  Mahesh Pal,et al.  Random forest classifier for remote sensing classification , 2005 .

[36]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[37]  Yassine Djebbar,et al.  Estimating sanitary flows using neural networks , 1998 .

[38]  Duo Zhang,et al.  Manage Sewer In-Line Storage Control Using Hydraulic Model and Recurrent Neural Network , 2018, Water Resources Management.

[39]  Bo Dai,et al.  Statistical model optimized random forest regression model for concrete dam deformation monitoring , 2018 .

[40]  Mustafa Neamah Jebur,et al.  Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS , 2013 .

[41]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[42]  R. L. Winkler A Decision-Theoretic Approach to Interval Estimation , 1972 .

[43]  Chandranath Chatterjee,et al.  A new wavelet-bootstrap-ANN hybrid model for daily discharge forecasting , 2011 .

[44]  Georgia Papacharalampous,et al.  Variable Selection in Time Series Forecasting Using Random Forests , 2017, Algorithms.

[45]  Chang-won Kim,et al.  Evaluation of the k-nearest neighbor method for forecasting the influent characteristics of wastewater treatment plant , 2016, Frontiers of Environmental Science & Engineering.

[46]  A. Kusiak,et al.  Short-term prediction of influent flow in wastewater treatment plant , 2014, Stochastic Environmental Research and Risk Assessment.

[47]  F. Othman,et al.  Time Series Analysis and Forecasting of Wastewater Inflow into Bandar Tun Razak Sewage Treatment Plant in Selangor, Malaysia , 2017 .

[48]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[49]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  Joseph H. A. Guillaume,et al.  Characterising performance of environmental models , 2013, Environ. Model. Softw..

[51]  Xiaohong Chen,et al.  Flood hazard risk assessment model based on random forest , 2015 .

[52]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[53]  Bartosz Szeląg,et al.  Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear , 2017 .

[54]  Qing-shan Yang,et al.  Investigation of wind load on 1,000 m‐high super‐tall buildings based on HFFB tests , 2018 .

[55]  Leonid Boytsov,et al.  Comparative Analysis of Data Structures for Approximate Nearest Neighbor Search , 2014 .

[56]  Daniel W Smith,et al.  A neural network model to predict the wastewater inflow incorporating rainfall events. , 2002, Water research.

[57]  D. Garen,et al.  Daily Updating of Operational Statistical Seasonal Water Supply Forecasts for the western U.S. 1 , 2009 .

[58]  Farhad Samadzadegan,et al.  Urban simulation Using Neural Networks and Cellular Automata for Land Use Planning , 2009 .

[59]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[60]  Guohe Huang,et al.  Development of a Stepwise-Clustered Hydrological Inference Model , 2015 .