Bagging Decision Trees on Data Sets with Classification Noise

In many of the real applications of supervised classification techniques, the data sets employed to learn the models contains classification noise (some instances of the data set have wrong assignations of the class label), principally due to deficiencies in the data capture process. Bagging ensembles of decision trees are considered to be one of the most outperforming supervised classification models in these situations. In this paper, we propose Bagging ensemble of credal decision trees, which are based on imprecise probabilities, via the Imprecise Dirichlet model, and information based uncertainty measures, via the maximum of entropy function. We remark that our method can be applied on data sets with continuous variables and missing data. With an experimental study, we prove that Bagging credal decision trees outperforms more complex Bagging approaches in data sets with classification noise. Furthermore, using a bias-variance error decomposition analysis, we also justify the performance of our approach showing that it achieves a stronger and more robust reduction of the variance error component.

[1]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  P. Walley Inferences from Multinomial Data: Learning About a Bag of Marbles , 1996 .

[4]  G. Klir Uncertainty and Information: Foundations of Generalized Information Theory , 2005 .

[5]  Geoffrey I. Webb,et al.  Estimating bias and variance from data , 2003 .

[6]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[7]  Serafín Moral,et al.  Building classification trees using the total uncertainty criterion , 2003, Int. J. Intell. Syst..

[8]  Raymond J. Mooney,et al.  Experiments on Ensembles with Missing and Noisy Data , 2004, Multiple Classifier Systems.

[9]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[10]  Ian Witten,et al.  Data Mining , 2000 .

[11]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[12]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[15]  Serafín Moral,et al.  An Algorithm to Compute the Upper Entropy for Order-2 Capacities , 2006, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[16]  Serafín Moral,et al.  Maximum of Entropy for Credal Sets , 2003, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[17]  Andrés R. Masegosa,et al.  An Experimental Study about Simple Decision Trees for Bagging Ensemble on Datasets with Classification Noise , 2009, ECSQARU.

[18]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[19]  George J. Klir,et al.  Disaggregated total uncertainty measure for credal sets , 2006, Int. J. Gen. Syst..

[20]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[21]  Serafín Moral,et al.  Upper entropy of credal sets. Applications to credal classification , 2005, Int. J. Approx. Reason..

[22]  Joaquín Abellán,et al.  Uncertainty measures on probability intervals from the imprecise Dirichlet model , 2006, Int. J. Gen. Syst..

[23]  Jean-Marc Bernard,et al.  An introduction to the imprecise Dirichlet model for multinomial data , 2005, Int. J. Approx. Reason..

[24]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[25]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.