Empirical Bernstein Boosting

Concentration inequalities that incorporate variance information (such as Bernstein’s or Bennett’s inequality) are often significantly tighter than counterparts (such as Hoeffding’s inequality) that disregard variance. Nevertheless, many state of the art machine learning algorithms for classification problems like AdaBoost and support vector machines (SVMs) extensively use Hoeffding’s inequalities to justify empirical risk minimization and its variants. This article proposes a novel boosting algorithm based on a recently introduced principle—sample variance penalization—which is motivated from an empirical version of Bernstein’s inequality. This framework leads to an efficient algorithm that is as easy to implement as AdaBoost while producing a strict generalization. Experiments on a large number of datasets show significant performance gains over AdaBoost. This paper shows that sample variance penalization could be a viable alternative to empirical risk minimization.

[1]  Csaba Szepesvári,et al.  Tuning Bandit Algorithms in Stochastic Environments , 2007, ALT.

[2]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[3]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[4]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[5]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[6]  Claudio Gentile,et al.  A Second-Order Perceptron Algorithm , 2002, SIAM J. Comput..

[7]  Koby Crammer,et al.  Gaussian Margin Machines , 2009, AISTATS.

[8]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[10]  Koby Crammer,et al.  Exact Convex Confidence-Weighted Learning , 2008, NIPS.

[11]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[12]  Csaba Szepesvári,et al.  Empirical Bernstein stopping , 2008, ICML '08.

[13]  Tony Jebara,et al.  Maximum Relative Margin and Data-Dependent Regularization , 2010, J. Mach. Learn. Res..

[14]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[15]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .