论文信息 - Learning Halfspaces with Massart Noise Under Structured Distributions

Learning Halfspaces with Massart Noise Under Structured Distributions

We study the problem of learning halfspaces with Massart noise in the distribution-specific PAC model. We give the first computationally efficient algorithm for this problem with respect to a broad family of distributions, including log-concave distributions. This resolves an open question posed in a number of prior works. Our approach is extremely simple: We identify a smooth {\em non-convex} surrogate loss with the property that any approximate stationary point of this loss defines a halfspace that is close to the target halfspace. Given this structural result, we can use SGD to solve the underlying learning problem.

[1] Vitaly Feldman,et al. New Results for Learning Noisy Parities and Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[2] Yuchen Zhang,et al. A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[3] Christos Tzamos,et al. Distribution-Independent PAC Learning of Halfspaces with Massart Noise , 2019, NeurIPS.

[4] Amit Daniely,et al. Complexity theoretic limitations on learning halfspaces , 2015, STOC.

[5] S. Vempala,et al. The geometry of logconcave functions and sampling algorithms , 2007 .

[6] Dustin Boswell,et al. Introduction to Support Vector Machines , 2002 .

[7] Luc Devroye,et al. Combinatorial methods in density estimation , 2001, Springer series in statistics.

[8] R.L. Rivest,et al. A Formal Model of Hierarchical Concept Learning , 1994, Inf. Comput..

[9] Nathan Srebro,et al. Lower Bounds for Non-Convex Stochastic Optimization , 2019, ArXiv.

[10] Prasad Raghavendra,et al. Hardness of Learning Halfspaces with Noise , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[11] T. Sanders,et al. Analysis of Boolean Functions , 2012, ArXiv.

[12] Daniel M. Kane,et al. Learning geometric concepts with nasty noise , 2017, STOC.

[13] Maria-Florina Balcan,et al. Sample and Computationally Efficient Learning Algorithms under S-Concave Distributions , 2017, NIPS.

[14] Robert H. Sloan,et al. Corrigendum to types of noise in data for concept learning , 1988, COLT '92.

[15] G. Paouris. Concentration of mass on convex bodies , 2006 .

[16] David Haussler,et al. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[17] Ohad Shamir,et al. The Complexity of Finding Stationary Points with Stochastic Gradient Descent , 2020, ICML.

[18] P. Massart,et al. Risk bounds for statistical learning , 2007, math/0702683.

[19] Nisheeth K. Vishnoi,et al. Nonconvex sampling with the Metropolis-adjusted Langevin algorithm , 2019, COLT.

[20] Rocco A. Servedio,et al. Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[21] Vladimir Vapnik,et al. Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[22] Maria-Florina Balcan,et al. The Power of Localization for Efficiently Learning Linear Separators with Noise , 2013, J. ACM.

[23] Alan M. Frieze,et al. A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[24] Maria-Florina Balcan,et al. Efficient Learning of Linear Separators under Bounded Noise , 2015, COLT.

[25] Maria-Florina Balcan,et al. Learning and 1-bit Compressed Sensing under Asymmetric Noise , 2016, COLT.

[26] Maria-Florina Balcan,et al. Noise in Classification , 2020, Beyond the Worst-Case Analysis of Algorithms.

[27] Andrew Chi-Chih Yao,et al. ON ACC and threshold circuits , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[28] Nello Cristianini,et al. An introduction to Support Vector Machines , 2000 .

[29] Ryan O'Donnell,et al. Analysis of Boolean Functions , 2014, ArXiv.

[30] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[31] Wolfgang Maass,et al. How fast can a threshold gate learn , 1994, COLT 1994.

[32] Marvin Minsky,et al. Perceptrons: An Introduction to Computational Geometry , 1969 .

[33] R. Schapire,et al. Toward efficient agnostic learning , 1992, COLT '92.

[34] Chicheng Zhang,et al. Revisiting Perceptron: Efficient and Label-Optimal Learning of Halfspaces , 2017, NIPS.

[35] D. Angluin,et al. Learning From Noisy Examples , 1988, Machine Learning.

[36] Pravesh Kothari,et al. Embedding Hard Learning Problems into Gaussian Space , 2014, Electron. Colloquium Comput. Complex..

[37] Santosh S. Vempala,et al. The geometry of logconcave functions and sampling algorithms , 2007, Random Struct. Algorithms.

[38] Alexander A. Razborov,et al. Majority gates vs. general weighted threshold gates , 1992, [1992] Proceedings of the Seventh Annual Structure in Complexity Theory Conference.