Avoiding Boosting Overfitting by Removing Confusing Samples

Boosting methods are known to exhibit noticeable overfitting on some datasets, while being immune to overfitting on other ones. In this paper we show that standard boosting algorithms are not appropriate in case of overlapping classes. This inadequateness is likely to be the major source of boosting overfitting while working with real world data. To verify our conclusion we use the fact that any overlapping classes' task can be reduced to a deterministic task with the same Bayesian separating surface. This can be done by removing "confusing samples" --- samples that are misclassified by a "perfect" Bayesian classifier. We propose an algorithm for removing confusing samples and experimentally study behavior of AdaBoost trained on the resulting data sets. Experiments confirm that removing confusing samples helps boosting to reduce the generalization error and to avoid overfitting on both synthetic and real world. Process of removing confusing samples also provides an accurate error prediction based on the work with the training sets.

[1]  Pedro M. Domingos A Unified Bias-Variance Decomposition , 2022 .

[2]  Fabrice Muhlenbach,et al.  Identifying and Handling Mislabelled Instances , 2004, Journal of Intelligent Information Systems.

[3]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[4]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[5]  Yoram Singer,et al.  Leveraging the margin more carefully , 2004, ICML.

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  Roberto Alejo,et al.  Analysis of new techniques to obtain quality training sets , 2003, Pattern Recognit. Lett..

[8]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[9]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[10]  Cesare Furlanello,et al.  Bias-Variance Control Via Hard Points Shaving , 2004, Int. J. Pattern Recognit. Artif. Intell..

[11]  Robert E. Schapire,et al.  How boosting the margin can also boost classifier complexity , 2006, ICML.

[12]  Y. Abu-Mostafa,et al.  Generalization error estimates and training data valuation , 2002 .

[13]  Shinto Eguchi,et al.  Robustifying AdaBoost by Adding the Naive Error Rate , 2004, Neural Computation.

[14]  Osamu Watanabe,et al.  MadaBoost: A Modification of AdaBoost , 2000, COLT.

[15]  Rich Caruana,et al.  Obtaining Calibrated Probabilities from Boosting , 2005, UAI.

[16]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[17]  Saharon Rosset,et al.  Robust boosting and its relation to bagging , 2005, KDD '05.

[18]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[19]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[20]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[21]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[22]  Andrew P. Sage,et al.  Uncertainty in Artificial Intelligence , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[23]  Zhi-Hua Zhou,et al.  Editing Training Data for kNN Classifiers with Neural Network Ensemble , 2004, ISNN.

[24]  Yoav Freund,et al.  An Adaptive Version of the Boost by Majority Algorithm , 1999, COLT.

[25]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[26]  G. Rätsch Robust Boosting via Convex Optimization , 2001 .

[27]  M. Perrone Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization , 1993 .

[28]  Volker Tresp,et al.  Averaging Regularized Estimators , 1997, Neural Computation.

[29]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[30]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[31]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[32]  Pietro Perona,et al.  Pruning training sets for learning of object categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).