Agnostically learning halfspaces

We give the first algorithm that (under distributional assumptions) efficiently learns halfspaces in the notoriously difficult agnostic framework of Kearns, Schapire, & Sellie, where a learner is given access to labeled examples drawn from a distribution, without restriction on the labels (e.g. adversarial noise). The algorithm constructs a hypothesis whose error rate on future examples is within an additive /spl epsi/ of the optimal halfspace, in time poly(n) for any constant /spl epsi/ > 0, under the uniform distribution over {-1, 1}/sup n/ or the unit sphere in /spl Ropf//sup n/ , as well as under any log-concave distribution over /spl Ropf/ /sup n/. It also agnostically learns Boolean disjunctions in time 2/sup O~(/spl radic/n)/ with respect to any distribution. The new algorithm, essentially L/sub 1/ polynomial regression, is a noise-tolerant arbitrary distribution generalization of the "low degree" Fourier algorithm of Linial, Mansour, & Nisan. We also give a new algorithm for PAC learning halfspaces under the uniform distribution on the unit sphere with the current best bounds on tolerable rate of "malicious noise".

[1]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[2]  Pavel Pudlák,et al.  Threshold circuits of bounded depth , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[3]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[4]  Noam Nisan,et al.  Constant depth circuits, Fourier transform, and learnability , 1989, 30th Annual Symposium on Foundations of Computer Science.

[5]  D. Clark,et al.  Estimates of the Hermite and the Freud polynomials , 1990 .

[6]  Eric B. Baum,et al.  The Perceptron Algorithm is Fast for Nonmalicious Distributions , 1990, Neural Computation.

[7]  Yuh-Dauh Lyuu,et al.  The Transition to Perfect Generalization in Perceptrons , 1991, Neural Computation.

[8]  Noam Nisan,et al.  On the degree of boolean functions as real polynomials , 1992, STOC '92.

[9]  Ramamohan Paturi,et al.  On the degree of polynomials that approximate symmetric Boolean functions (preliminary version) , 1992, STOC '92.

[10]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[11]  Scott E. Decatur Statistical queries and faulty PAC oracles , 1993, COLT '93.

[12]  Yishay Mansour,et al.  Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[13]  Jeffrey C. Jackson,et al.  An efficient membership-query algorithm for learning DNF with respect to the uniform distribution , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[14]  Philip M. Long On the sample complexity of PAC learning half-spaces against the uniform distribution , 1995, IEEE Trans. Neural Networks.

[15]  Peter L. Bartlett,et al.  On efficient agnostic learning of linear combinations of basis functions , 1995, COLT '95.

[16]  Robert E. Schapire,et al.  On the Sample Complexity of Weakly Learning , 1995, Inf. Comput..

[17]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[18]  Peter L. Bartlett,et al.  Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[19]  Nader H. Bshouty,et al.  On the Fourier spectrum of monotone functions , 1996, JACM.

[20]  J. C. Jackson The harmonic sieve: a novel application of Fourier analysis to machine learning theory and practice , 1996 .

[21]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[22]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[23]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[24]  Yishay Mansour,et al.  Learning Conjunctions with Noise under Product Distributions , 1998, Inf. Process. Lett..

[25]  Rocco A. Servedio,et al.  Boosting and hard-core sets , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[26]  Rocco A. Servedio,et al.  On PAC learning using Winnow, Perceptron, and a Perceptron-like algorithm , 1999, COLT '99.

[27]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[28]  V. Zinoviev,et al.  Codes on euclidean spheres , 2001 .

[29]  Adam R. Klivans,et al.  Learnability Beyond AC 0 , 2002 .

[30]  Rocco A. Servedio,et al.  Learnability beyond AC0 , 2002, STOC '02.

[31]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[32]  Philip M. Long An upper bound on the sample complexity of PAC-learning halfspaces with respect to the uniform distribution , 2003, Inf. Process. Lett..

[33]  Ryan O'Donnell,et al.  New degree bounds for polynomial threshold functions , 2003, STOC '03.

[34]  Santosh S. Vempala,et al.  Logconcave functions: geometry and efficient sampling algorithms , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[35]  Adam Tauman Kalai,et al.  Noise-tolerant learning, the parity problem, and the statistical query model , 2000, STOC '00.

[36]  Adam R. Klivans,et al.  Learning intersections and thresholds of halfspaces , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[37]  R. Schapire,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[38]  Rocco A. Servedio,et al.  Agnostically Learning Halfspaces , 2005, FOCS.

[39]  Oded Regev,et al.  On lattices, learning with errors, random linear codes, and cryptography , 2005, STOC '05.

[40]  K. Clarkson Subgradient and sampling algorithms for l1 regression , 2005, SODA '05.

[41]  Rene F. Swarttouw,et al.  Orthogonal polynomials , 2020, NIST Handbook of Mathematical Functions.

[42]  Prasad Raghavendra,et al.  Hardness of Learning Halfspaces with Noise , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[43]  Vitaly Feldman,et al.  New Results for Learning Noisy Parities and Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[44]  Vitaly Feldman Optimal hardness results for maximizing agreements with monomials , 2006, 21st Annual IEEE Conference on Computational Complexity (CCC'06).