A Polynomial Time Algorithm for Learning Halfspaces with Tsybakov Noise

We study the problem of PAC learning homogeneous halfspaces in the presence of Tsybakov noise. In the Tsybakov noise model, the label of every sample is independently flipped with an adversarially controlled probability that can be arbitrarily close to $1/2$ for a fraction of the samples. {\em We give the first polynomial-time algorithm for this fundamental learning problem.} Our algorithm learns the true halfspace within any desired accuracy $\epsilon$ and succeeds under a broad family of well-behaved distributions including log-concave distributions. Prior to our work, the only previous algorithm for this problem required quasi-polynomial runtime in $1/\epsilon$. Our algorithm employs a recently developed reduction \cite{DKTZ20b} from learning to certifying the non-optimality of a candidate halfspace. This prior work developed a quasi-polynomial time certificate algorithm based on polynomial regression. {\em The main technical contribution of the current paper is the first polynomial-time certificate algorithm.} Starting from a non-trivial warm-start, our algorithm performs a novel "win-win" iterative process which, at each step, either finds a valid certificate or improves the angle between the current halfspace and the true one. Our warm-start algorithm for isotropic log-concave distributions involves a number of analytic tools that may be of broader interest. These include a new efficient method for reweighting the distribution in order to recenter it and a novel characterization of the spectrum of the degree-$2$ Chow parameters.

[1]  Jie Shen,et al.  Efficient active learning of sparse halfspaces with arbitrary bounded noise , 2020, NeurIPS.

[2]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[3]  Christos Tzamos,et al.  Non-Convex SGD Learns Halfspaces with Adversarial Label Noise , 2020, NeurIPS.

[4]  Yuchen Zhang,et al.  A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[5]  Christos Tzamos,et al.  Distribution-Independent PAC Learning of Halfspaces with Massart Noise , 2019, NeurIPS.

[6]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[7]  Amit Daniely,et al.  Complexity theoretic limitations on learning halfspaces , 2015, STOC.

[8]  Adam R. Klivans,et al.  Statistical-Query Lower Bounds via Functional Gradients , 2020, NeurIPS.

[9]  Liu Yang,et al.  Minimax Analysis of Active Learning , 2014, J. Mach. Learn. Res..

[10]  Daniel M. Kane,et al.  Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals , 2020, NeurIPS.

[11]  Jerry Li,et al.  Being Robust (in High Dimensions) Can Be Practical , 2017, ICML.

[12]  Ankur Moitra,et al.  Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Connections to Evolvability , 2020, ArXiv.

[13]  Pravesh Kothari,et al.  Efficient Algorithms for Outlier-Robust Regression , 2018, COLT.

[14]  Robert H. Sloan,et al.  Corrigendum to types of noise in data for concept learning , 1988, COLT '92.

[15]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[16]  Maria-Florina Balcan,et al.  The Power of Localization for Efficiently Learning Linear Separators with Noise , 2013, J. ACM.

[17]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[18]  Maria-Florina Balcan,et al.  Efficient Learning of Linear Separators under Bounded Noise , 2015, COLT.

[19]  Maria-Florina Balcan,et al.  Margin Based Active Learning , 2007, COLT.

[20]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[21]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[22]  P. Massart,et al.  Risk bounds for statistical learning , 2007, math/0702683.

[23]  Chicheng Zhang,et al.  Revisiting Perceptron: Efficient and Label-Optimal Learning of Halfspaces , 2017, NIPS.

[24]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[25]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[26]  Daniel M. Kane,et al.  Learning geometric concepts with nasty noise , 2017, STOC.

[27]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[28]  Amit Daniely A PTAS for Agnostically Learning Halfspaces , 2015, COLT.

[29]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[30]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[31]  Santosh S. Vempala,et al.  Eldan's Stochastic Localization and the KLS Hyperplane Conjecture: An Improved Lower Bound for Expansion , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[32]  Prasad Raghavendra,et al.  Hardness of Learning Halfspaces with Noise , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[33]  A. Carbery,et al.  Distributional and L-q norm inequalities for polynomials over convex bodies in R-n , 2001 .

[34]  Jerry Li,et al.  Sever: A Robust Meta-Algorithm for Stochastic Optimization , 2018, ICML.

[35]  Vitaly Feldman,et al.  New Results for Learning Noisy Parities and Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[36]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[37]  Philip M. Long,et al.  Baum's Algorithm Learns Intersections of Halfspaces with Respect to Log-Concave Distributions , 2009, APPROX-RANDOM.

[38]  Maria-Florina Balcan,et al.  Learning and 1-bit Compressed Sensing under Asymmetric Noise , 2016, COLT.

[39]  R. Schapire,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[40]  Daniel M. Kane,et al.  Recent Advances in Algorithmic High-Dimensional Robust Statistics , 2019, ArXiv.

[41]  Christos Tzamos,et al.  Learning Halfspaces with Tsybakov Noise , 2020, ArXiv.

[42]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[43]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[44]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[45]  Santosh S. Vempala,et al.  The geometry of logconcave functions and sampling algorithms , 2007, Random Struct. Algorithms.

[46]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[47]  Christos Tzamos,et al.  Learning Halfspaces with Massart Noise Under Structured Distributions , 2020, COLT 2020.

[48]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[49]  Ilias Diakonikolas,et al.  Efficient Algorithms and Lower Bounds for Robust Linear Regression , 2018, SODA.

[50]  Jerry Li,et al.  Robustly Learning a Gaussian: Getting Optimal Error, Efficiently , 2017, SODA.

[51]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[52]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[53]  Steve Hanneke Rates of convergence in active learning , 2011, 1103.1790.

[54]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[55]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[56]  Rocco A. Servedio,et al.  Learning Halfspaces with Malicious Noise , 2009, ICALP.

[57]  Wolfgang Maass,et al.  How fast can a threshold gate learn , 1994, COLT 1994.

[58]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .