Induction of Model Trees for Predicting BOD in River Water: A Data Mining Perspective

Water is a primary natural resource and its quality is negatively affected by various anthropogenic activities. Deterioration of water bodies has triggered serious management efforts by many countries. BOD is an important water quality parameter as it measures the amount of biodegradable organic matter in water. Testing for BOD is a time-consuming task as it takes 5 days from data collection to analyzing with lengthy incubation of samples. Also, interpolations of BOD results and their implications are mired in uncertainties. So, there is a need for suitable secondary (indirect) method for predicting BOD. A model tree for predicting BOD in river water from a data mining perspective is proposed in this paper. The proposed model is also compared with two other tree based predictive methods namely decision stump and regression trees. The predictive accuracy of the models is evaluated using two metrics namely correlation coefficient and RMSE. Results show that the model tree has a correlation coefficient of 0.9397 which is higher than the other two methods. It also has the least RMSE of 0.5339 among these models.

[1]  Ian H. Witten,et al.  Induction of model trees for predicting continuous classes , 1996 .

[2]  Richard J. Roiger,et al.  Data Mining: A Tutorial Based Primer , 2002 .

[3]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[4]  I. Ayodele,et al.  Water quality parameters in the major rivers of Kainji Lake National Park, Nigeria , 2008 .

[5]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[6]  Siripun Sanguansintukul,et al.  Classification and Regression Trees and MLP Neural Network to Classify Water Quality of Canals in Bangkok, Thailand , 2010 .

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[9]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[10]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[11]  Pat Langley,et al.  Induction of One-Level Decision Trees , 1992, ML.

[12]  Rituparna Chaki,et al.  A survey of data mining applications in water quality management , 2012, CUBE.

[13]  A. Malik,et al.  Artificial neural network modeling of the river water quality—A case study , 2009 .

[14]  Y. Abu Hasan Predicting Biochemical Oxygen Demand As Indicator Of River Pollution Using Artificial Neural Networks , 2009 .

[15]  Shie-Yui Liong,et al.  An ANN application for water quality forecasting. , 2008, Marine pollution bulletin.

[16]  Md. Rafiqul Islam,et al.  Water quality parameters along rivers , 2007 .

[17]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[18]  Holger R. Maier,et al.  Neural networks for the prediction and forecasting of water resource variables: a review of modelling issues and applications , 2000, Environ. Model. Softw..

[19]  Liangzhong Jiang,et al.  Water Quality Prediction Using LS-SVM and Particle Swarm Optimization , 2009 .