Agnostic Boosting

We extend the boosting paradigm to the realistic setting of agnostic learning, that is, to a setting where the training sample is generated by an arbitrary (unknown) probability distribution over examples and labels. We define a β-weak agnostic learner with respect to a hypothesis class F as follows: given a distribution P it outputs some hypothesis h ∈ F whose error is at most erP (F) + β, where erP (F) is the minimal error of an hypothesis from F under the distribution P (note that for some distributions the bound may exceed a half). We show a boosting algorithm that using the weak agnostic learner computes a hypothesis whose error is at most max{c1(β)er(F)c2(β), Ɛ}, in time polynomial in 1/Ɛ. While this generalization guarantee is significantly weaker than the one resulting from the known PAC boosting algorithms, one should note that the assumption required for β-weak agnostic learner is much weaker. In fact, an important virtue of the notion of weak agnostic learning is that in many cases such learning is achieved by efficient algorithms.

[1]  Donald Ervin Knuth,et al.  The Art of Computer Programming, Volume II: Seminumerical Algorithms , 1970 .

[2]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[3]  David Haussler,et al.  Equivalence of models for polynomial learnability , 1988, COLT '88.

[4]  Robert E. Schapire,et al.  Design and analysis of efficient learning algorithms , 1992, ACM Doctoral dissertation award ; 1991.

[5]  R. Schapire Toward Eecient Agnostic Learning , 1992 .

[6]  Linda Sellie,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[7]  Yoav Freund,et al.  Data filtering and distribution modeling algorithms for machine learning , 1993 .

[8]  R. Schapire,et al.  Toward Efficient Agnostic Learning , 1994 .

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[10]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[11]  Robert E. Schapire Theoretical Views of Boosting and Applications , 1999, ALT.

[12]  Shai Ben-David,et al.  Hardness Results for Neural Network Approximation Problems , 1999, EuroCOLT.

[13]  Hans Ulrich Simon,et al.  Efficient Learning of Linear Perceptrons , 2000, NIPS.

[14]  S. Ben-David,et al.  Eecient Learning of Linear Perceptrons , 2000 .

[15]  P. Bartlett,et al.  Hardness results for neural network approximation problems , 1999, Theor. Comput. Sci..

[16]  Shai Ben-David,et al.  On the difficulty of approximately maximizing agreements , 2000, J. Comput. Syst. Sci..