Distribution-Specific Agnostic Boosting

We consider the problem of boosting the accuracy of weak learning algorithms in the agnostic learning framework of Haussler (1992) and Kearns et al. (1992). Known algorithms for this problem (Ben-David et al., 2001; Gavinsky, 2002; Kalai et al., 2008) follow the same strategy as boosting algorithms in the PAC model: the weak learner is executed on the same target function but over different distributions on the domain. We demonstrate boosting algorithms for the agnostic learning framework that only modify the distribution on the labels of the points (or, equivalently, modify the target function). This allows boosting a distribution-specific weak agnostic learner to a strong agnostic learner with respect to the same distribution. When applied to the weak agnostic parity learning algorithm of Goldreich and Levin (1989) our algorithm yields a simple PAC learning algorithm for DNF and an agnostic learning algorithm for decision trees over the uniform distribution using membership queries. These results substantially simplify Jackson's famous DNF learning algorithm (1994) and the recent result of Gopalan et al. (2008). We also strengthen the connection to hard-core set constructions discovered by Klivans and Servedio (1999) by demonstrating that hard-core set constructions that achieve the optimal hard-core set size (given by Holenstein (2005) and Barak et al. (2009)) imply distribution-specific agnostic boosting algorithms. Conversely, our boosting algorithm gives a simple hard-core set construction with an (almost) optimal hard-core set size.

[1]  Leonid A. Levin,et al.  A hard-core predicate for all one-way functions , 1989, STOC '89.

[2]  Dmitry Gavinsky Optimally-Smooth Adaptive Boosting and Application to Agnostic Learning , 2003, J. Mach. Learn. Res..

[3]  Yoav Freund,et al.  An Adaptive Version of the Boost by Majority Algorithm , 1999, COLT '99.

[4]  Ryan O'Donnell,et al.  Learning DNF from random walks , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[5]  Vitaly Feldman,et al.  On Agnostic Learning of Parities, Monomials, and Halfspaces , 2009, SIAM J. Comput..

[6]  Russell Impagliazzo,et al.  Hard-core distributions for somewhat hard problems , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[7]  Yishay Mansour,et al.  Boosting Using Branching Programs , 2000, J. Comput. Syst. Sci..

[8]  Yishay Mansour,et al.  Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[9]  Adam Tauman Kalai,et al.  Potential-Based Agnostic Boosting , 2009, NIPS.

[10]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[11]  Eyal Kushilevitz,et al.  Learning decision trees using the Fourier spectrum , 1991, STOC '91.

[12]  Shai Ben-David,et al.  Agnostic Boosting , 2001, COLT/EuroCOLT.

[13]  Rocco A. Servedio,et al.  Boosting in the presence of noise , 2003, STOC '03.

[14]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[15]  Alexander A. Sherstov,et al.  Cryptographic Hardness for Learning Intersections of Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[16]  Vitaly Feldman,et al.  A Complete Characterization of Statistical Query Learning with Applications to Evolvability , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[17]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[18]  Rocco A. Servedio,et al.  Boosting and Hard-Core Set Construction , 2003, Machine Learning.

[19]  Adam Tauman Kalai,et al.  Agnostically learning decision trees , 2008, STOC.

[20]  Yoav Freund,et al.  An Adaptive Version of the Boost by Majority Algorithm , 1999, COLT.

[21]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[22]  Vitaly Feldman,et al.  New Results for Learning Noisy Parities and Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[23]  Pavel Pudlák,et al.  Threshold circuits of bounded depth , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[24]  Adam Tauman Kalai,et al.  On agnostic boosting and parity learning , 2008, STOC.

[25]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[26]  Nader H. Bshouty,et al.  More efficient PAC-learning of DNF with membership queries under the uniform distribution , 1999, COLT '99.

[27]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[28]  Boaz Barak,et al.  The uniform hardcore lemma via approximate Bregman projections , 2009, SODA.

[29]  Jeffrey C. Jackson,et al.  An efficient membership-query algorithm for learning DNF with respect to the uniform distribution , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[30]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[31]  Thomas Holenstein,et al.  Key agreement from weak bit agreement , 2005, STOC '05.

[32]  Osamu Watanabe,et al.  MadaBoost: A Modification of AdaBoost , 2000, COLT.

[33]  Ana I. González Acuña An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, Boosting, and Randomization , 2012 .

[34]  Rocco A. Servedio,et al.  Smooth Boosting and Learning with Malicious Noise , 2001, J. Mach. Learn. Res..

[35]  R. Schapire,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[36]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.