An ensemble method for predicting biochemical oxygen demand in river water using data mining techniques

ABSTRACT Biochemical oxygen demand (BOD) is used to determine the amount of dissolved oxygen used by microbial oxidation of organic content. BOD is a parameter describing the quality of water, especially its extent of pollution. Water from wastewater treatment plants have high BOD values and as such require to be treated. For this purpose, BOD is used as an indicator in determining the quality of water being discharged. The standard method for measuring BOD is a 5-day process. Dilution of sample, constant pH and nutrient content besides temperature of 20°C and dark area are required for correct results. High levels of nitrogen compounds yield false BOD results. Winkler titration, which is also used to measure DO (as part of BOD measurement), is a chemical-intensive process. Hence an automatic prediction model for BOD is required for accurate, cost-effective and time-saving measurement. Based on data available for BOD measurements, the present study focuses on devising a prediction model for BOD using ensemble techniques in data mining. A correlation coefficient of 0.9541 and a root mean-squared error of 0.4679 were obtained for the proposed BOD prediction model on river water quality data. Comparative analysis of the proposed model with existing models built for the same data set was also performed.

[1]  Perry L. McCarty,et al.  Chemistry for environmental engineering and science , 2002 .

[2]  A. Malik,et al.  Artificial neural network modeling of the river water quality—A case study , 2009 .

[3]  M. Diamantopoulou,et al.  The use of a neural network technique for the prediction of water quality parameters of Axios river in Northern Greece , 2004 .

[4]  Siripun Sanguansintukul,et al.  Classification and Regression Trees and MLP Neural Network to Classify Water Quality of Canals in Bangkok, Thailand , 2010 .

[5]  Liang Gao,et al.  Pattern Classification and Prediction of Water Quality by Neural Network with Particle Swarm Optimization , 2006, 2006 6th World Congress on Intelligent Control and Automation.

[6]  WU Hong-bi,et al.  A study of multivariate linear regression analysis model for groundwater quality prediction , 2007 .

[7]  Ahmed El-Shafie,et al.  Prediction of johor river water quality parameters using artificial neural networks , 2009 .

[8]  Zhiqiang Zheng,et al.  Constructing Ensembles from Data Envelopment Analysis , 2007, INFORMS J. Comput..

[9]  John G. Cleary,et al.  K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[10]  Manabu Ichino,et al.  Generalized Minkowski metrics for mixed feature-type data analysis , 1994, IEEE Trans. Syst. Man Cybern..

[11]  Andrew Kusiak,et al.  Predicting the total suspended solids in wastewater: A data-mining approach , 2013, Eng. Appl. Artif. Intell..

[12]  Huiru Zheng,et al.  Machine learning and statistical approaches to assessing gait patterns of younger and older healthy adults climbing stairs , 2011, 2011 Seventh International Conference on Natural Computation.

[13]  Marcel Abendroth,et al.  Data Mining Practical Machine Learning Tools And Techniques With Java Implementations , 2016 .

[14]  Miklas Scholz,et al.  A comparative study: Prediction of constructed treatment wetland performance with k-nearest neighbors and neural networks , 2006 .

[15]  Rural Affairs Department for Environment, Food and Rural Affairs (Defra) Interface to Government Gateway , 2013 .

[16]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[17]  Wu Hong THE APPLICATION OF ARTIFICIAL NEURAL NETWORKS IN THE RESOURCES AND ENVIRONMENT , 2000 .

[18]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[19]  Albert Y. Zomaya,et al.  A Review of Ensemble Methods in Bioinformatics , 2010, Current Bioinformatics.

[20]  Y. Abu Hasan Predicting Biochemical Oxygen Demand As Indicator Of River Pollution Using Artificial Neural Networks , 2009 .

[21]  Victor Cheng,et al.  Dissimilarity learning for nominal data , 2004, Pattern Recognit..

[22]  H. D. Stensel,et al.  Wastewater Engineering: Treatment and Reuse , 2002 .

[23]  Ying Zhao,et al.  Water quality forecast through application of BP neural network at Yuqiao reservoir , 2007 .