An Approach for Predicting River Water Quality Using Data Mining Technique

Water contains many chemical, physical, and biological impurities. Some impurities are benign while others are toxic. The quality of water is defined in terms of its physical, chemical, and biological parameters and ascertaining its quality is crucial before use for various intended purposes such as potable water, agricultural, industrial, etc. Various water analysis methods are employed to determine water quality parameters such as DO, COD, BOD, pH, TDS, salinity, chlorophyll-a, coli form, and organic contaminants such as pesticides. The list of potential water contaminants is exhaustive and impractical to test for in its entirety. Such water testing is sometimes costly and time consuming. This paper attempts to present application of data mining technique to build a model to predict a widely used gross water quality parameter called Biochemical oxygen demand (BOD). BOD is a measure of the amount of dissolved oxygen used by microbial oxidation of organic matter in wastewater. The standard method for measuring BOD is a 5-day process. Dilution of sample, constant pH and nutrient content besides the temperature of 20 °C and dark area are required for correct results. High levels of nitrogen compounds yield false BOD results. Winkler titration which is also used to measure BOD is a chemical intensive process. Hence an automatic prediction model for BOD has been sought for accurate, cost-effective and time saving measurement. Based on data available for BOD measurements, this paper describes the development of a prediction model for BOD using a technique of data mining, namely, support vector machines (SVM). A correlation coefficient of 0.9471 and RMSE of 0.5019 was obtained for the BOD prediction model on river water quality data. The performance of the proposed model was also compared with two other models namely artificial neural network (ANN) and regression by discretization. Simulation results show that the proposed model performs better than the other two in terms of correlation coefficient and RMSE.

[1]  Jianxu Luo,et al.  Prediction of Effluent Parameters of Wastewater Treatment Plant Based on Improved Least Square Support Vector Machine with PSO , 2009, 2009 First International Conference on Information Science and Engineering.

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Y. Abu Hasan Predicting Biochemical Oxygen Demand As Indicator Of River Pollution Using Artificial Neural Networks , 2009 .

[4]  Ahmed El-Shafie,et al.  Prediction of johor river water quality parameters using artificial neural networks , 2009 .

[5]  Ahmed El-Shafie,et al.  An application of different artificial intelligences techniques for water quality prediction , 2011 .

[6]  Jason Weston,et al.  A user's guide to support vector machines. , 2010, Methods in molecular biology.

[7]  Lv Jiake,et al.  A hybrid approach of support vector machine with particle swarm optimization for water quality prediction , 2010, 2010 5th International Conference on Computer Science & Education.

[8]  Liang Gao,et al.  Pattern Classification and Prediction of Water Quality by Neural Network with Particle Swarm Optimization , 2006, 2006 6th World Congress on Intelligent Control and Automation.

[9]  Ying Zhao,et al.  Water quality forecast through application of BP neural network at Yuqiao reservoir , 2007 .

[10]  A. Malik,et al.  Artificial neural network modeling of the river water quality—A case study , 2009 .

[11]  Siripun Sanguansintukul,et al.  Classification and Regression Trees and MLP Neural Network to Classify Water Quality of Canals in Bangkok, Thailand , 2010 .

[12]  Andrew Kusiak,et al.  Predicting the total suspended solids in wastewater: A data-mining approach , 2013, Eng. Appl. Artif. Intell..

[13]  Mogeeb A. A. Mosleh,et al.  Dissolved Oxygen Prediction Using Support Vector Machine , 2014 .

[14]  Perry L. McCarty,et al.  Chemistry for environmental engineering and science , 2002 .

[15]  H. D. Stensel,et al.  Wastewater Engineering: Treatment and Reuse , 2002 .

[16]  Miklas Scholz,et al.  A comparative study: Prediction of constructed treatment wetland performance with k-nearest neighbors and neural networks , 2006 .

[17]  Zhiqiang Zheng,et al.  Constructing Ensembles from Data Envelopment Analysis , 2007, INFORMS J. Comput..