A Data Cleaning Framework for Water Quality Based on NLDIW-PSO Based Optimal SVR

Water quality monitoring is an essential part of water big data analysis. Spatiotemporal variations of water quality and constraints on measurement make it very complex. The objective of this study is to establish a water quality data cleaning framework based on time series, in order to clean the water quality data of the Gaobeidian Sewage Treatment Plant inlet in Beijing. Pauta criterion was used to deal with single water quality indicator. For abnormal values and missing values that are discontinuously distributed over time, the average of the non-abnormal data for three days before and after was used to fill it; For abnormal values and missing values that are continuously distributed over time, using the Non-Linear decreasing inertia weight particle swarm algorithm (NLDIW-PSO) based optimal Support Vector Regression (SVR) to forecast. And Pearson's correlation coefficient was used to reduce the dimension of the inputs of the model, k-fold cross validation was also used to train the model. The performance of the model was evaluated in terms of the coefficient of determination (R2), Pearson's correlation coefficient. Water quality data of Gaobeidian wastewater treatment inlet in Beijing, China was taken as the study case to examine effectiveness of this approach. The experiment results also revealed that the proposed model has advantages of stability and time reduction in comparison with other data-driven models including traditional BP ANN, Bayesian network model and Decision Tree model. And this framework can be used as an effective approach to deal with General time series data.

[1]  Harri Niska,et al.  Methods for imputation of missing values in air quality data sets , 2004 .

[2]  Kuan-Yu Chen,et al.  Forecasting systems reliability based on support vector regression with genetic algorithms , 2007, Reliab. Eng. Syst. Saf..

[3]  J. González‐Oreja,et al.  Stress in estuarine communities: Lessons from the highly-impacted Bilbao estuary (Spain) , 2000 .

[4]  Krist V. Gernaey,et al.  Activated sludge wastewater treatment plant modelling and simulation: state of the art , 2004, Environ. Model. Softw..

[5]  Bing Han,et al.  A hybrid PSO-SVM-based model for determination of oil recovery factor in the low-permeability reservoir , 2017 .

[6]  Maged M. Hamed,et al.  Prediction of wastewater treatment plant performance using artificial neural networks , 2004, Environ. Model. Softw..

[7]  Bo Tang,et al.  Semisupervised Feature Selection Based on Relevance and Redundancy Criteria , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Y. J. Cai,et al.  The Use of Combined Neural Networks and Genetic Algorithms for Prediction of River Water Quality , 2014 .

[9]  Zhang Xuegong,et al.  INTRODUCTION TO STATISTICAL LEARNING THEORY AND SUPPORT VECTOR MACHINES , 2000 .

[10]  Wei Shan,et al.  Development of a method for comprehensive water quality forecasting and its application in Miyun reservoir of Beijing, China. , 2017, Journal of environmental sciences.

[11]  P. Krause,et al.  COMPARISON OF DIFFERENT EFFICIENCY CRITERIA FOR HYDROLOGICAL MODEL ASSESSMENT , 2005 .

[12]  Qinghua Hu,et al.  Rank Entropy-Based Decision Trees for Monotonic Classification , 2012, IEEE Transactions on Knowledge and Data Engineering.

[13]  Charles Elkan,et al.  Quadratic Programming Feature Selection , 2010, J. Mach. Learn. Res..

[14]  E. Sanderson,et al.  The Human Footprint and the Last of the Wild , 2002 .

[15]  Sanjay Kumar Malik,et al.  PSO-ANN based diagnostic model for the early detection of dengue disease , 2017 .

[16]  Jose Ma García-Barcina,et al.  Modelling the faecal coliform concentrations in the Bilbao estuary , 2002 .

[17]  R. Eberhart,et al.  Empirical study of particle swarm optimization , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[18]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[19]  Hong Guo,et al.  Prediction of effluent concentration in a wastewater treatment plant using machine learning models. , 2015, Journal of environmental sciences.

[20]  S J Hawkins,et al.  Recovery of polluted ecosystems: the case for long-term studies. , 2002, Marine environmental research.