Robust Loss Functions for Boosting

Boosting is known as a gradient descent algorithm over loss functions. It is often pointed out that the typical boosting algorithm, Adaboost, is highly affected by outliers. In this letter, loss functions for robust boosting are studied. Based on the concept of robust statistics, we propose a transformation of loss functions that makes boosting algorithms robust against extreme outliers. Next, the truncation of loss functions is applied to contamination models that describe the occurrence of mislabels near decision boundaries. Numerical experiments illustrate that the proposed loss functions derived from the contamination models are useful for handling highly noisy data in comparison with other loss functions.

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[2]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[3]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[4]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[5]  J. K. Hunter,et al.  Measure Theory , 2007 .

[6]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[7]  Rocco A. Servedio,et al.  Boosting in the presence of noise , 2003, STOC '03.

[8]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[9]  G. Rätsch Robust Boosting via Convex Optimization , 2001 .

[10]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Klaus-Robert Müller,et al.  Optimal dyadic decision trees , 2007, Machine Learning.

[13]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[14]  A. Dawid,et al.  Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory , 2004, math/0410076.

[15]  Osamu Watanabe,et al.  MadaBoost: A Modification of AdaBoost , 2000, COLT.

[16]  Takafumi Kanamori,et al.  The Most Robust Loss Function for Boosting , 2004, ICONIP.

[17]  D. Ruppert Robust Statistics: The Approach Based on Influence Functions , 1987 .

[18]  Gunnar Rätsch,et al.  Robust Ensemble Learning , 2000 .

[19]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[20]  Takafumi Kanamori,et al.  Information Geometry of U-Boost and Bregman Divergence , 2004, Neural Computation.

[21]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[22]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[23]  Rocco A. Servedio,et al.  Smooth Boosting and Learning with Malicious Noise , 2001, J. Mach. Learn. Res..

[24]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[25]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[26]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[27]  John D. Lafferty,et al.  Boosting and Maximum Likelihood for Exponential Models , 2001, NIPS.

[28]  Saharon Rosset,et al.  Robust boosting and its relation to bagging , 2005, KDD '05.

[29]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[30]  J. Copas Binary Regression Models for Contaminated Data , 1988 .

[31]  Gunnar Rätsch,et al.  Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces , 2002, Machine Learning.

[32]  Shinto Eguchi,et al.  Robustifying AdaBoost by Adding the Naive Error Rate , 2004, Neural Computation.

[33]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[34]  Maria-Pia Victoria-Feser,et al.  Robust inference with binary data , 2001 .

[35]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.