Hybrid decision tree-based machine learning models for short-term water quality prediction.

Water resources are the foundation of people's life and economic development, and are closely related to health and the environment. Accurate prediction of water quality is the key to improving water management and pollution control. In this paper, two novel hybrid decision tree-based machine learning models are proposed to obtain more accurate short-term water quality prediction results. The basic models of the two hybrid models are extreme gradient boosting (XGBoost) and random forest (RF), which respectively introduce an advanced data denoising technique - complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN). Taking the water resources of Gales Creek site in Tualatin River (one of the most polluted rivers in the world) Basin as an example, a total of 1875 data (hourly data) from May 1, 2019 to July 20, 2019 are collected. Two hybrid models are used to predict six water quality indicators, including water temperature, dissolved oxygen, pH value, specific conductance, turbidity, and fluorescent dissolved organic matter. Six error metrics are introduced as the basis of performance evaluation, and the results of the two models are compared with the other four conventional models. The results reveal that: (1) CEEMDAN-RF performs best in the prediction of temperature, dissolved oxygen and specific conductance, the mean absolute percentage errors (MAPEs) are 0.69%, 1.05%, and 0.90%, respectively. CEEMDAN-XGBoost performs best in the prediction of pH value, turbidity, and fluorescent dissolved organic matter, the MAPEs are 0.27%, 14.94%, and 1.59%, respectively. (2) The average MAPEs of CEEMDAN-RF and CEEMMDAN-XGBoost models are the smallest, which are 3.90% and 3.71% respectively, indicating that their overall prediction performance is the best. In addition, the stability of the prediction model is also discussed in this paper. The analysis shows that the prediction stability of CEEMDAN-RF and CEEMDAN-XGBoost is higher than other benchmark models.

[1]  K. Kilminster,et al.  Artificially oxygenating the Swan River estuary increases dissolved oxygen concentrations in the water and at the sediment interface , 2019, Ecological Engineering.

[2]  Bin Li,et al.  A novel hybrid multivariate nonlinear grey model for forecasting the traffic-related emissions , 2020 .

[3]  Atefeh Aliashrafi,et al.  Integration of weather conditions for predicting microbial water quality using Bayesian Belief Networks. , 2019, Water research.

[4]  Zhaoliang Peng,et al.  Development and evaluation of a real-time forecasting framework for daily water quality forecasts for Lake Chaohu to Lead time of six days. , 2019, The Science of the total environment.

[5]  Nu Li,et al.  Using the seasonal FGM(1,1) model to predict the air quality indicators in Xingtai and Handan , 2019, Environmental Science and Pollution Research.

[6]  Zhiying Wang,et al.  Wireless MapReduce Distributed Computing , 2019, IEEE Transactions on Information Theory.

[7]  Miaomiao Chen,et al.  An intelligent IoT-based control and traceability system to forecast and maintain water quality in freshwater fish farms , 2019, Comput. Electron. Agric..

[8]  M. Kruk,et al.  Monitoring heavy metal concentrations in turbid rivers: Can fixed frequency sampling regimes accurately determine criteria exceedance frequencies, distribution statistics and temporal trends? , 2018, Ecological Indicators.

[9]  David West,et al.  An empirical analysis of neural network memory structures for basin water quality forecasting , 2011 .

[10]  Lifeng Wu,et al.  Potential of kernel-based nonlinear extension of Arps decline model and gradient boosting with categorical features support for predicting daily global solar radiation in humid regions , 2019, Energy Conversion and Management.

[11]  Ying Zhao,et al.  Water quality forecast through application of BP neural network at Yuqiao reservoir , 2007 .

[12]  A. Malik,et al.  Artificial neural network modeling of the river water quality—A case study , 2009 .

[13]  Xin Ma,et al.  A novel fractional time delayed grey model with Grey Wolf Optimizer and its applications in forecasting the natural gas and coal consumption in Chongqing China , 2019, Energy.

[14]  Mohammadamin Azimi,et al.  Carbon trading volume and price forecasting in China using multiple machine learning models , 2020 .

[15]  Shie-Yui Liong,et al.  An ANN application for water quality forecasting. , 2008, Marine pollution bulletin.

[16]  T. W. Lewis,et al.  The influence of streams on nearshore water chemistry, Lake Ontario , 2012 .

[17]  Bellie Sivakumar,et al.  Forecasting river water temperature time series using a wavelet–neural network hybrid modelling approach , 2019, Journal of Hydrology.

[18]  A. Hounslow Water Quality Data: Analysis and Interpretation , 1995 .

[19]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[20]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[21]  Lingcun Kong,et al.  Comparison study on the nonlinear parameter optimization of nonlinear grey Bernoulli model (NGBM(1, 1)) between intelligent optimizers , 2018, Grey Syst. Theory Appl..

[22]  Andrew George,et al.  Enforcing mean reversion in state space models for prawn pond water quality forecasting , 2020, Comput. Electron. Agric..

[23]  Wei Shan,et al.  Development of a method for comprehensive water quality forecasting and its application in Miyun reservoir of Beijing, China. , 2017, Journal of environmental sciences.

[24]  Yijia Song,et al.  Prediction of the sulfur solubility in pure H2S and sour gas by intelligent models , 2020 .

[25]  L. M. Mosley,et al.  Modelling of pH and inorganic carbon speciation in estuaries using the composition of the river and seawater end members , 2010, Environ. Model. Softw..

[26]  Yong Liu,et al.  Simulate the forecast capacity of a complicated water quality model using the long short-term memory approach , 2020 .

[27]  Jing Zhao,et al.  Multi-step wind speed forecasting based on numerical simulations and an optimized stochastic ensemble method , 2019 .

[28]  Wei He,et al.  Effects of fluorescent dissolved organic matters (FDOMs) on perfluoroalkyl acids (PFAAs) in lake and river water. , 2019, The Science of the total environment.

[29]  Andrés García,et al.  Artificial neural networks as emulators of process-based models to analyse bathing water quality in estuaries. , 2019, Water research.

[30]  S N Chan,et al.  Real-time forecasting of Hong Kong beach water quality by 3D deterministic model. , 2013, Water research.

[31]  Mohamed Abd Elaziz,et al.  A new approach to predict the missing values of algae during water quality monitoring programs based on a hybrid moth search algorithm and the random vector functional link network , 2019, Journal of Hydrology.

[32]  Zoran Kapelan,et al.  Short-term forecasting of turbidity in trunk main networks. , 2017, Water research.

[33]  Yu Zheng,et al.  Urban Water Quality Prediction Based on Multi-Task Multi-View Learning , 2016, IJCAI.

[34]  Patrick Flandrin,et al.  A complete ensemble empirical mode decomposition with adaptive noise , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[35]  Mohammadamin Azimi,et al.  US natural gas consumption prediction using an improved kernel-based nonlinear extension of the Arps decline model , 2020 .

[36]  Xin Ma,et al.  A Novel Power-Driven Grey Model with Whale Optimization Algorithm and Its Application in Forecasting the Residential Energy Consumption in China , 2019, Complex..

[37]  Dong Wang,et al.  Quantifying the impacts of the Three Gorges Reservoir on water temperature in the middle reach of the Yangtze River , 2020 .

[38]  Xin Ma,et al.  An innovative hybrid model based on outlier detection and correction algorithm and heuristic intelligent optimization algorithm for daily air quality index forecasting. , 2019, Journal of environmental management.

[39]  D. Graham,et al.  Predicted Impact of Climate Change on Trihalomethanes Formation in Drinking Water Treatment , 2019, Scientific Reports.

[40]  Wenqing Wu,et al.  The conformable fractional grey system model. , 2018, ISA transactions.

[41]  M. Hipsey,et al.  An integrated modelling system for water quality forecasting in an urban eutrophic estuary: The Swan-Canning Estuary virtual observatory , 2019, Journal of Marine Systems.

[42]  Jianzhou Wang,et al.  A novel system for multi-step electricity price forecasting for electricity market management , 2020, Appl. Soft Comput..

[43]  Rui Ferreira Neves,et al.  Combining Principal Component Analysis, Discrete Wavelet Transform and XGBoost to trade in the financial markets , 2019, Expert Syst. Appl..