Agnostic Proper Learning of Halfspaces under Gaussian Marginals

We study the problem of agnostically learning halfspaces under the Gaussian distribution. Our main result is the first proper learning algorithm for this problem whose sample complexity and computational complexity qualitatively match those of the best known improper agnostic learner. Building on this result, we also obtain the first proper polynomial-time approximation scheme (PTAS) for agnostically learning homogeneous halfspaces. Our techniques naturally extend to agnostically learning linear models with respect to other non-linear activations, yielding in particular the first proper agnostic algorithm for ReLU regression.

[1]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[2]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[3]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[4]  Ilias Diakonikolas,et al.  Approximation Schemes for ReLU Regression , 2020, COLT.

[5]  T. Sanders,et al.  Analysis of Boolean Functions , 2012, ArXiv.

[6]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[7]  Amit Daniely A PTAS for Agnostically Learning Halfspaces , 2015, COLT.

[8]  Daniel M. Kane,et al.  Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks , 2020, COLT.

[9]  Daniel M. Kane,et al.  Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals , 2020, NeurIPS.

[10]  Prasad Raghavendra,et al.  Hardness of Learning Halfspaces with Noise , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[11]  Daniel M. Kane,et al.  The Optimality of Polynomial Regression for Agnostic Learning under Gaussian Marginals , 2021, COLT.

[12]  Christos Tzamos,et al.  Non-Convex SGD Learns Halfspaces with Adversarial Label Noise , 2020, NeurIPS.

[13]  Wolfgang Maass,et al.  How fast can a threshold gate learn , 1994, COLT 1994.

[14]  Christos Tzamos,et al.  Efficient Truncated Statistics with Unknown Truncation , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[15]  Prasad Raghavendra,et al.  Average Sensitivity and Noise Sensitivity of Polynomial Threshold Functions , 2009, SIAM J. Comput..

[16]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[17]  R. Schapire,et al.  Toward efficient agnostic learning , 2004, Machine Learning.

[18]  Rocco A. Servedio,et al.  Learning Halfspaces with Malicious Noise , 2009, ICALP.

[19]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[20]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[21]  Prahladh Harsha,et al.  Bounding the Sensitivity of Polynomial Threshold Functions , 2014, Theory Comput..

[22]  Daniel M. Kane,et al.  Bounded Independence Fools Degree-2 Threshold Functions , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[23]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[24]  Amit Daniely,et al.  Complexity theoretic limitations on learning halfspaces , 2015, STOC.

[25]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[26]  Daniel M. Kane The average sensitivity of an intersection of half spaces , 2014, STOC.

[27]  Daniel M. Kane,et al.  Learning geometric concepts with nasty noise , 2017, STOC.

[28]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[29]  Vitaly Feldman,et al.  New Results for Learning Noisy Parities and Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[30]  Maria-Florina Balcan,et al.  The Power of Localization for Efficiently Learning Linear Separators with Noise , 2013, J. ACM.

[31]  Rene F. Swarttouw,et al.  Orthogonal polynomials , 2020, NIST Handbook of Mathematical Functions.

[32]  Daniel M. Kane The Gaussian Surface Area and Noise Sensitivity of Degree-d Polynomial Threshold Functions , 2010, 2010 IEEE 25th Annual Conference on Computational Complexity.

[33]  Adam R. Klivans,et al.  Statistical-Query Lower Bounds via Functional Gradients , 2020, NeurIPS.

[34]  Ryan O'Donnell,et al.  Learning Geometric Concepts via Gaussian Surface Area , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[35]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[36]  Prasad Raghavendra,et al.  Average Sensitivity and Noise Sensitivity of Polynomial Threshold Functions , 2014, SIAM J. Comput..