Using Decision Trees and Soft Labeling to Filter Mislabeled Data

In this paper we present a new noise filtering method, called soft decision tree noise filter (SDTNF), to identify and remove mislabeled data items in a data set. In this method, a sequence of decision trees are built from a data set, in which each data item is assigned a soft class label (in the form of a class probability vector). A modified decision tree algorithm is applied to adjust the soft class labeling during the tree building process. After each decision tree is built, the soft class label of each item in the data set is adjusted using the decision tree’s predictions as the learning targets. In the next iteration, a new decision tree is built from a data set with the updated soft class labels. This tree building process repeats iteratively until the data set labeling converges. This procedure provides a mechanism to gradually modify and correct mislabeled items. It is applied in SDTNF as a filtering method by identifying data items whose classes have been relabeled by decision trees as mislabeled data. The performance of SDTNF is evaluated using 16 data sets drawn from the UCI data repository. The results show that it is capable of identifying a substantial amount of noise for most of the tested data sets and significantly improving performance of nearest neighbor classifiers at a wide range of noise levels. We also compare SDTNF to the consensus and majority voting methods proposed by Brodley and Friedl [1996, 1999] for noise filtering. The results show SDTNF has a more efficient and balanced filtering capability than these two methods in terms of filtering mislabeled data and keeping non-mislabeled data. The results also show that the filtering capability of SDTNF can significantly improve the performance of nearest neighbor classifiers, especially at high noise levels. At a noise level of 40%, the improvement on the accuracy of nearest neighbor classifiers is 13.1% by the consensus voting method and 18.7% by the majority voting method, while SDTNF is able to achieve an improvement by 31.3%.

[1]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[2]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[3]  Tony R. Martinez,et al.  A noise filtering method using neural networks , 2003, IEEE International Workshop on Soft Computing Techniques in Instrumentation, Measurement and Related Applications, 2003. SCIMA 2003..

[4]  Ralph Martinez,et al.  Reduction Techniques for Exemplar-Based Learning Algorithms , 1998 .

[5]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .

[6]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[7]  Choh-Man Teng Evaluating Noise Correction , 2000, PRICAI.

[8]  Tony R. Martinez,et al.  Instance Pruning Techniques , 1997, ICML.

[9]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[10]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[11]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[12]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[13]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[14]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[15]  David W. Aha,et al.  Noise-Tolerant Instance-Based Learning Algorithms , 1989, IJCAI.

[16]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[17]  J. J. Hopfield,et al.  “Neural” computation of decisions in optimization problems , 1985, Biological Cybernetics.

[18]  Saso Dzeroski,et al.  Noise Elimination in Inductive Concept Learning: A Case Study in Medical Diagnosois , 1996, ALT.

[19]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[20]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[21]  Choh-Man Teng,et al.  Correcting Noisy Data , 1999, ICML.

[22]  Tony R. Martinez,et al.  An algorithm for correcting mislabeled data , 2001, Intell. Data Anal..

[23]  Carla E. Brodley,et al.  Identifying and Eliminating Mislabeled Training Instances , 1996, AAAI/IAAI, Vol. 1.

[24]  George H. John Robust Decision Trees: Removing Outliers from Databases , 1995, KDD.

[25]  Joseph Drish,et al.  Obtaining Calibrated Probability Estimates from Support Vector Machines , 2001 .

[26]  Belur V. Dasarathy,et al.  Nosing Around the Neighborhood: A New System Structure and Classification Rule for Recognition in Partially Exposed Environments , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[28]  Patrick Henry Winston,et al.  Learning structural descriptions from examples , 1970 .

[29]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.