The Study on the Accuracy of Classifiers for Water Quality Application

Dirty water is the world's biggest health risk. When water from rain roads into rivers, it picks up toxic chemicals, dirt, trash and disease-carrying organisms along the way. Many of our water resources lack basic protections, making them vulnerable to pollution from factory farms and industrial plants. Due to that, a classification model is needed to present the quality of the water environment. In this paper, the data mining techniques are used in this research by applying the classification method for water quality application. Various classifiers were studied in order to find the most accurate classifier for the dataset. This paper presents the comparison of accuracies for the five classifiers (NB, MLP, J48, SMO, and IBk) based on a 10-fold cross validation as a test method with respect to water quality from the datasets of Kinta River, Perak Malaysia. This study also explores which classifier is suitable to classify the dataset. The selected attributes used in this study were: DO Sat, DO Mgl, BOD Mgl, COD Mgl, TS Mgl, DO Index, AN Index, SS Index, Class, and Degree of pollution. The data consisted of 166 instances and obtained from the East Coast Environmental Research Institute (ESERI) of Universiti Sultan Zainal Abidin (UniSZA). The result of MLP and IBk performed better than other classifiers for Kinta River dataset because these classifiers showed the highest accuracy with the same percentage of 91.57%. In the future, we will propose the multiclassifier approach by introducing a fusion at a classification level between these classifiers to get a higher accuracy of classification.

[1]  Saso Dzeroski,et al.  Simultaneous Prediction of Mulriple Chemical Parameters of River Water Quality with TILDE , 1999, PKDD.

[2]  Qiuwen Chen,et al.  Predicting Phaeocystis globosa bloom in Dutch coastal waters by decision trees and nonlinear piecewise regression , 2004 .

[3]  C. Lakshmi Devasena,et al.  CLASSIFICATION OF MULTIVARIATE DATA SETS WITHOUT MISSING VALUES USING MEMORY BASED CLASSIFIERS - AN EFFECTIVENESS EVALUATION , 2013 .

[4]  Filippo Menczer,et al.  Feature selection in data mining , 2003 .

[5]  Hafizan Juahir,et al.  Artificial Neural Networks Combined with Sensitivity Analysis as a Prediction Model for Water Quality Index in Juru River, Malaysia , 2011 .

[6]  Brett Lantz,et al.  Machine learning with R : learn how to use R to apply powerful machine learning methods and gain an insight into real-world applications , 2013 .

[7]  Sirilak Areerachakul,et al.  Application of Artificial Neural Network to Classification Surface Water Quality , 2012 .

[8]  Jian Pei,et al.  2012- Data Mining. Concepts and Techniques, 3rd Edition.pdf , 2012 .

[9]  Tinglin Huang,et al.  Data Mining on Forecast Raw Water Quality from Online Monitoring Station Based on Decision-Making Tree , 2009, 2009 Fifth International Joint Conference on INC, IMS and IDC.

[10]  Michel Verleysen,et al.  Feature Selection for Multi-label Classification Problems , 2011, IWANN.

[11]  Thair Nu Phyu Survey of Classification Techniques in Data Mining , 2009 .

[12]  Mohamed Bouamar,et al.  Evaluation of the performances of ANN and SVM techniques used in water quality classification , 2007, 2007 14th IEEE International Conference on Electronics, Circuits and Systems.

[13]  Zaki Zainudin Benchmarking River Water Quality in Malaysia , 2010 .

[14]  H WittenIan,et al.  The WEKA data mining software , 2009 .

[15]  Hao Liao,et al.  Forecasting and Evaluating Water Quality of Chao Lake based on an Improved Decision Tree Method , 2010 .

[16]  Siripun Sanguansintukul,et al.  Classification and Regression Trees and MLP Neural Network to Classify Water Quality of Canals in Bangkok, Thailand , 2010 .

[17]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[18]  Wilfried N. Gansterer,et al.  On the Relationship Between Feature Selection and Classification Accuracy , 2008, FSDM.

[19]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[20]  Fuzhen Zhuang,et al.  Parallel Implementation of Classification Algorithms Based on MapReduce , 2010, RSKT.

[21]  Zhaohui Luo,et al.  Sea Water Pollution Assessment Based on Ensemble of Classifiers , 2008, 2008 Fourth International Conference on Natural Computation.

[22]  Fatma Susilawati Mohamad,et al.  COMPARISON OF IMAGE CLASSIFICATION TECHNIQUES USING CALTECH 101 DATASET , 2015 .

[23]  J. Camejo,et al.  Classifier for drinking water quality in real time , 2013, 2013 International Conference on Computer Applications Technology (ICCAT).

[24]  Giuseppe Pirlo,et al.  A Feedback-Based Multi-Classifier System , 2009, 2009 10th International Conference on Document Analysis and Recognition.