Forster Decomposition and Learning Halfspaces with Noise

A Forster transform is an operation that turns a distribution into one with good anticoncentration properties. While a Forster transform does not always exist, we show that any distribution can be efficiently decomposed as a disjoint mixture of few distributions for which a Forster transform exists and can be computed efficiently. As the main application of this result, we obtain the first polynomial-time algorithm for distribution-independent PAC learning of halfspaces in the Massart noise model with strongly polynomial sample complexity, i.e., independent of the bit complexity of the examples. Previous algorithms for this learning problem incurred sample complexity scaling polynomially with the bit complexity, even though such a dependence is not information-theoretically necessary. Supported by NSF Award CCF-1652862 (CAREER) and a Sloan Research Fellowship. Supported by NSF Award CCF-1553288 (CAREER) and a Sloan Research Fellowship.

[1]  Santosh S. Vempala,et al.  Optimal outlier removal in high-dimensional spaces , 2004, J. Comput. Syst. Sci..

[2]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[3]  Vitaly Feldman,et al.  New Results for Learning Noisy Parities and Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[4]  Jürgen Forster,et al.  A linear lower bound on the unbounded error probabilistic communication complexity , 2001, Proceedings 16th Annual IEEE Conference on Computational Complexity.

[5]  Prasad Raghavendra,et al.  Hardness of Learning Halfspaces with Noise , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[6]  P. Massart,et al.  Risk bounds for statistical learning , 2007, math/0702683.

[7]  Shachar Lovett,et al.  Point Location and Active Learning: Learning Halfspaces Almost Optimally , 2020, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS).

[8]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[9]  Chicheng Zhang,et al.  Revisiting Perceptron: Efficient and Label-Optimal Learning of Halfspaces , 2017, NIPS.

[10]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[11]  Ankur Moitra,et al.  Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Evolvability , 2020, NeurIPS.

[12]  Yuchen Zhang,et al.  A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[13]  Daniel M. Kane,et al.  Hardness of Learning Halfspaces with Massart Noise , 2020, ArXiv.

[14]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[15]  Maria-Florina Balcan,et al.  Sample and Computationally Efficient Learning Algorithms under S-Concave Distributions , 2017, NIPS.

[16]  Robert H. Sloan,et al.  Corrigendum to types of noise in data for concept learning , 1988, COLT '92.

[17]  Edith Cohen,et al.  Learning noisy perceptrons by a perceptron in polynomial time , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[18]  Amit Daniely,et al.  Complexity theoretic limitations on learning halfspaces , 2015, STOC.

[19]  Christos Tzamos,et al.  Distribution-Independent PAC Learning of Halfspaces with Massart Noise , 2019, NeurIPS.

[20]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[21]  Pranjal Awasthi,et al.  Efficient active learning of sparse halfspaces with arbitrary bounded noise , 2020, NeurIPS.

[22]  Avrim Blum,et al.  Machine learning: my favorite results, directions, and open problems , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[23]  Maria-Florina Balcan,et al.  Learning and 1-bit Compressed Sensing under Asymmetric Noise , 2016, COLT.

[24]  Haim Kaplan,et al.  On Radial Isotropic Position: Theory and Algorithms , 2020, ArXiv.

[25]  Maria-Florina Balcan,et al.  Efficient Learning of Linear Separators under Bounded Noise , 2015, COLT.

[26]  Chicheng Zhang,et al.  Improved Algorithms for Efficient Active Learning Halfspaces with Massart and Tsybakov noise , 2021, COLT.

[27]  Christos Tzamos,et al.  Boosting in the Presence of Massart Noise , 2021, COLT.

[28]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[29]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[30]  Christos Tzamos,et al.  Learning Halfspaces with Massart Noise Under Structured Distributions , 2020, COLT 2020.

[31]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[32]  Ankur Moitra,et al.  Algorithms and Hardness for Robust Subspace Recovery , 2012, COLT.

[33]  Wolfgang Maass,et al.  How fast can a threshold gate learn , 1994, COLT 1994.

[34]  R.L. Rivest,et al.  A Formal Model of Hierarchical Concept Learning , 1994, Inf. Comput..