A study of the effect of different types of noise on the precision of supervised learning techniques

Machine learning techniques often have to deal with noisy data, which may affect the accuracy of the resulting data models. Therefore, effectively dealing with noise is a key aspect in supervised learning to obtain reliable models from data. Although several authors have studied the effect of noise for some particular learners, comparisons of its effect among different learners are lacking. In this paper, we address this issue by systematically comparing how different degrees of noise affect four supervised learners that belong to different paradigms. Specifically, we consider the Naïve Bayes probabilistic classifier, the C4.5 decision tree, the IBk instance-based learner and the SMO support vector machine. We have selected four methods which enable us to contrast different learning paradigms, and which are considered to be four of the top ten algorithms in data mining (Yu et al. 2007). We test them on a collection of data sets that are perturbed with noise in the input attributes and noise in the output class. As an initial hypothesis, we assign the techniques to two groups, NB with C4.5 and IBk with SMO, based on their proposed sensitivity to noise, the first group being the least sensitive. The analysis enables us to extract key observations about the effect of different types and degrees of noise on these learning techniques. In general, we find that Naïve Bayes appears as the most robust algorithm, and SMO the least, relative to the other two techniques. However, we find that the underlying empirical behavior of the techniques is more complex, and varies depending on the noise type and the specific data set being processed. In general, noise in the training data set is found to give the most difficulty to the learners.

[1]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[2]  Vicenç Torra,et al.  A comparison of active set method and genetic algorithm approaches for learning weighting vectors in some aggregation operators , 2001, Int. J. Intell. Syst..

[3]  Johannes Fürnkranz,et al.  Noise-Tolerant Windowing , 1997, IJCAI.

[4]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[5]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[6]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[7]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[8]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[9]  Vicenç Torra,et al.  The weighted OWA operator , 1997, Int. J. Intell. Syst..

[10]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[11]  Dana Angluin,et al.  Learning from noisy examples , 1988, Machine Learning.

[12]  Robert H. Sloan,et al.  Corrigendum to types of noise in data for concept learning , 1988, COLT '92.

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  Xindong Wu,et al.  Eliminating Class Noise in Large Datasets , 2003, ICML.

[16]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[17]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[18]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[19]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[20]  V. Torra The weighted OWA operator , 1997, International Journal of Intelligent Systems.

[21]  Ian Witten,et al.  Data Mining , 2000 .

[22]  B H Blott,et al.  EIT data noise evaluation in the clinical environment. , 1996, Physiological measurement.

[23]  David F. Nettleton,et al.  Processing and representation of meta-data for sleep apnea diagnosis with an artificial intelligence approach , 2001, Int. J. Medical Informatics.

[24]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[25]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[26]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[27]  Sally A. Goldman,et al.  Can PAC learning algorithms tolerate random attribute noise? , 1995, Algorithmica.

[28]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[29]  Robert H. Sloan,et al.  Four Types of Noise in Data for PAC Learning , 1995, Inf. Process. Lett..

[30]  David W. Aha,et al.  Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms , 1992, Int. J. Man Mach. Stud..