Learning geometric concepts with nasty noise

We study the efficient learnability of geometric concept classes — specifically, low-degree polynomial threshold functions (PTFs) and intersections of halfspaces — when a fraction of the training data is adversarially corrupted. We give the first polynomial-time PAC learning algorithms for these concept classes with dimension-independent error guarantees in the presence of nasty noise under the Gaussian distribution. In the nasty noise model, an omniscient adversary can arbitrarily corrupt a small fraction of both the unlabeled data points and their labels. This model generalizes well-studied noise models, including the malicious noise model and the agnostic (adversarial label noise) model. Prior to our work, the only concept class for which efficient malicious learning algorithms were known was the class of origin-centered halfspaces. At the core of our results is an efficient algorithm to approximate the low-degree Chow-parameters of any bounded function in the presence of nasty noise. Our robust approximation algorithm for the Chow parameters provides near-optimal error guarantees for a range of distribution families satisfying mild concentration bounds and moment conditions. At the technical level, this algorithm employs an iterative “spectral” technique for outlier detection and removal inspired by recent work in robust unsupervised learning, which makes essential use of low-degree multivariate polynomials. Our robust learning algorithm for low-degree PTFs provides dimension-independent error guarantees for a class of tame distributions, including Gaussians and, more generally, any logconcave distribution with (approximately) known low-degree moments. For LTFs under the Gaussian distribution, using a refinement of the localization technique, we give a polynomial-time algorithm that achieves a near-optimal error of O(є), where є is the noise rate. Our robust learning algorithm for intersections of halfspaces proceeds by projecting down to an appropriate low-dimensional subspace. Its correctness makes essential use of a novel robust inverse independence lemma that is of independent interest.

[1]  C. K. Chow,et al.  On the characterization of threshold functions , 1961, SWCT.

[2]  Michael L. Dertouzos,et al.  Threshold Logic: A Synthesis Approach , 1965 .

[3]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[4]  Saburo Muroga,et al.  Threshold logic and its applications , 1971 .

[5]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[6]  Jehoshua Bruck,et al.  Harmonic Analysis of Polynomial Threshold Functions , 1990, SIAM J. Discret. Math..

[7]  Eric B. Baum,et al.  A Polynomial Time Algorithm That Learns Two Hidden Unit Nets , 1990, Neural Computation.

[8]  Linda Sellie,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[9]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[10]  Ming Li,et al.  Learning in the Presence of Malicious Errors , 1993, SIAM J. Comput..

[11]  Noam Nisan,et al.  Constant depth circuits, Fourier transform, and learnability , 1993, JACM.

[12]  Eyal Kushilevitz,et al.  Learning Decision Trees Using the Fourier Spectrum , 1993, SIAM J. Comput..

[13]  Wolfgang Maass,et al.  How fast can a threshold gate learn , 1994, COLT 1994.

[14]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[15]  Eyal Kushilevitz,et al.  PAC learning with nasty noise , 1999, Theor. Comput. Sci..

[16]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[17]  Ryan O'Donnell,et al.  Learning Geometric Concepts via Gaussian Surface Area , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[18]  Philip M. Long,et al.  Baum's Algorithm Learns Intersections of Halfspaces with Respect to Log-Concave Distributions , 2009, APPROX-RANDOM.

[19]  Madhur Tulsiani,et al.  Regularity, Boosting, and Efficiently Simulating Every High-Entropy Distribution , 2009, 2009 24th Annual IEEE Conference on Computational Complexity.

[20]  Rocco A. Servedio,et al.  Learning Halfspaces with Malicious Noise , 2009, ICALP.

[21]  Prasad Raghavendra,et al.  Average Sensitivity and Noise Sensitivity of Polynomial Threshold Functions , 2009, SIAM J. Comput..

[22]  Daniel M. Kane The Gaussian Surface Area and Noise Sensitivity of Degree-d Polynomial Threshold Functions , 2010, 2010 IEEE 25th Annual Conference on Computational Complexity.

[23]  Santosh S. Vempala,et al.  Learning Convex Concepts from Gaussian Distributions with PCA , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[24]  Santosh S. Vempala,et al.  A random-sampling-based algorithm for learning intersections of halfspaces , 2010, JACM.

[25]  Learning poisson binomial distributions , 2012, STOC '12.

[26]  Rocco A. Servedio,et al.  The Inverse Shapley value problem , 2012, Games Econ. Behav..

[27]  Rocco A. Servedio,et al.  Nearly Optimal Solutions for the Chow Parameters Problem and Low-Weight Approximation of Halfspaces , 2012, J. ACM.

[28]  Daniel M. Kane,et al.  The correct exponent for the Gotsman–Linial Conjecture , 2012, 2013 IEEE Conference on Computational Complexity.

[29]  Prahladh Harsha,et al.  Bounding the Sensitivity of Polynomial Threshold Functions , 2014, Theory Comput..

[30]  Nathan Linial,et al.  From average case complexity to improper learning complexity , 2013, STOC.

[31]  Daniel M. Kane The average sensitivity of an intersection of half spaces , 2014, STOC.

[32]  Prasad Raghavendra,et al.  Average Sensitivity and Noise Sensitivity of Polynomial Threshold Functions , 2014, SIAM J. Comput..

[33]  Learning from satisfying assignments , 2015, SODA 2015.

[34]  Amit Daniely A PTAS for Agnostically Learning Halfspaces , 2015, COLT.

[35]  Amit Daniely,et al.  Complexity theoretic limitations on learning halfspaces , 2015, STOC.

[36]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[37]  Jerry Li,et al.  Being Robust (in High Dimensions) Can Be Practical , 2017, ICML.

[38]  Maria-Florina Balcan,et al.  The Power of Localization for Efficiently Learning Linear Separators with Noise , 2013, J. ACM.

[39]  Daniel M. Kane,et al.  Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[40]  Daniel M. Kane,et al.  Robust Learning of Fixed-Structure Bayesian Networks , 2016, NeurIPS.

[41]  Jerry Li,et al.  Robustly Learning a Gaussian: Getting Optimal Error, Efficiently , 2017, SODA.