Nearly Tight Bounds for Robust Proper Learning of Halfspaces with a Margin

We study the problem of {\em properly} learning large margin halfspaces in the agnostic PAC model. In more detail, we study the complexity of properly learning $d$-dimensional halfspaces on the unit ball within misclassification error $\alpha \cdot \opt_{\gamma} + \eps$, where $\opt_{\gamma}$ is the optimal $\gamma$-margin error rate and $\alpha \geq 1$ is the approximation ratio. We give learning algorithms and computational hardness results for this problem, for all values of the approximation ratio $\alpha \geq 1$, that are nearly-matching for a range of parameters. Specifically, for the natural setting that $\alpha$ is any constant bigger than one, we provide an essentially tight complexity characterization. On the positive side, we give an $\alpha = 1.01$-approximate proper learner that uses $O(1/(\eps^2\gamma^2))$ samples (which is optimal) and runs in time $\poly(d/\eps) \cdot 2^{\tilde{O}(1/\gamma^2)}$. On the negative side, we show that {\em any} constant factor approximate proper learner has runtime $\poly(d/\eps) \cdot 2^{(1/\gamma)^{2-o(1)}}$, assuming the Exponential Time Hypothesis.

[1]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[2]  Ilias Diakonikolas,et al.  Efficient Algorithms and Lower Bounds for Robust Linear Regression , 2018, SODA.

[3]  Irit Dinur,et al.  The PCP theorem by gap amplification , 2006, STOC.

[4]  Amit Daniely,et al.  Complexity theoretic limitations on learning halfspaces , 2015, STOC.

[5]  Ilias Diakonikolas,et al.  Degree-𝑑 chow parameters robustly determine degree-𝑑 PTFs (and algorithmic applications) , 2018, Electron. Colloquium Comput. Complex..

[6]  Rocco A. Servedio,et al.  Learning intersections of halfspaces with a margin , 2004, J. Comput. Syst. Sci..

[7]  Ohad Shamir,et al.  Learning Kernel-Based Halfspaces with the Zero-One Loss , 2010, COLT 2010.

[8]  Madhur Tulsiani,et al.  Regularity, Boosting, and Efficiently Simulating Every High-Entropy Distribution , 2009, 2009 24th Annual IEEE Conference on Computational Complexity.

[9]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[10]  Guy Kindler,et al.  Polynomially Low Error PCPs with polyloglog n Queries via Modular Composition , 2015, STOC.

[11]  David A. McAllester Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[12]  Ran Raz,et al.  Two Query PCP with Sub-Constant Error , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[13]  Rocco A. Servedio,et al.  Smooth boosting and learning with malicious noise , 2003 .

[14]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[15]  Russell Impagliazzo,et al.  Which Problems Have Strongly Exponential Complexity? , 2001, J. Comput. Syst. Sci..

[16]  Preetum Nakkiran,et al.  Adversarial Robustness May Be at Odds With Simplicity , 2019, ArXiv.

[17]  Carsten Lund,et al.  Efficient probabilistic checkable proofs and applications to approximation , 1994, STOC '94.

[18]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[19]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[20]  Rishi Saket,et al.  Hardness of learning noisy halfspaces using polynomial thresholds , 2017, Electron. Colloquium Comput. Complex..

[21]  Jerry Li,et al.  Sever: A Robust Meta-Algorithm for Stochastic Optimization , 2018, ICML.

[22]  Daniel M. Kane,et al.  Learning geometric concepts with nasty noise , 2017, STOC.

[23]  Rocco A. Servedio,et al.  Hardness results for agnostically learning low-degree polynomial threshold functions , 2011, SODA '11.

[24]  Russell Impagliazzo,et al.  On the Complexity of k-SAT , 2001, J. Comput. Syst. Sci..

[25]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[26]  Vinod Vaikuntanathan,et al.  Computational Limitations in Robust Classification and Win-Win Results , 2019, IACR Cryptol. ePrint Arch..

[27]  Nathan Linial,et al.  The Complexity of Learning Halfspaces using Generalized Linear Methods , 2012, COLT.

[28]  Prasad Raghavendra,et al.  Hardness of Learning Halfspaces with Noise , 2006, FOCS.

[29]  Rocco A. Servedio,et al.  Nearly Optimal Solutions for the Chow Parameters Problem and Low-Weight Approximation of Halfspaces , 2012, J. ACM.

[30]  Jerry Li,et al.  Robustly Learning a Gaussian: Getting Optimal Error, Efficiently , 2017, SODA.

[31]  Michael R. Fellows,et al.  Fixed-Parameter Tractability and Completeness II: On Completeness for W[1] , 1995, Theor. Comput. Sci..

[32]  Rocco A. Servedio,et al.  Learning Halfspaces with Malicious Noise , 2009, ICALP.

[33]  Shai Shalev-Shwartz,et al.  Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs , 2012, NIPS.

[34]  Ilya P. Razenshteyn,et al.  Adversarial examples from computational constraints , 2018, ICML.

[35]  Jacques Stern,et al.  The hardness of approximate optima in lattices, codes, and systems of linear equations , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[36]  R. Schapire,et al.  Toward Efficient Agnostic Learning , 1994 .

[37]  Michael R. Fellows,et al.  Fundamentals of Parameterized Complexity , 2013 .

[38]  Rocco A. Servedio,et al.  Learning large-margin halfspaces with more malicious noise , 2011, NIPS.

[39]  Nathan Srebro,et al.  VC Classes are Adversarially Robustly Learnable, but Only Improperly , 2019, COLT.

[40]  Shai Shalev-Shwartz,et al.  Agnostically Learning Halfspaces with Margin Errors , 2009 .

[41]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[42]  Maria-Florina Balcan,et al.  The Power of Localization for Efficiently Learning Linear Separators with Noise , 2013, J. ACM.

[43]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[44]  Ryan O'Donnell,et al.  The Chow Parameters Problem , 2011, SIAM J. Comput..

[45]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[46]  Dániel Marx,et al.  Lower bounds based on the Exponential Time Hypothesis , 2011, Bull. EATCS.

[47]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[48]  Carsten Lund,et al.  Proof verification and hardness of approximation problems , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[49]  C. K. Chow,et al.  On the characterization of threshold functions , 1961, SWCT.

[50]  Paul W. Goldberg,et al.  A Bound on the Precision Required to Estimate a Boolean Perceptron from Its Average Satisfying Assignment , 2006, SIAM J. Discret. Math..

[51]  Rocco A. Servedio,et al.  Improved Approximation of Linear Threshold Functions , 2009, 2009 24th Annual IEEE Conference on Computational Complexity.

[52]  Vitaly Feldman,et al.  New Results for Learning Noisy Parities and Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[53]  Sanjeev Arora,et al.  Probabilistic checking of proofs; a new characterization of NP , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[54]  Hans Ulrich Simon,et al.  Efficient Learning of Linear Perceptrons , 2000, NIPS.

[55]  Jerry Li,et al.  Being Robust (in High Dimensions) Can Be Practical , 2017, ICML.

[56]  Pravesh Kothari,et al.  Efficient Algorithms for Outlier-Robust Regression , 2018, COLT.

[57]  Leslie G. Valiant,et al.  Learning Disjunction of Conjunctions , 1985, IJCAI.

[58]  Carsten Lund,et al.  Efficient probabilistically checkable proofs and applications to approximations , 1993, STOC.

[59]  Carsten Lund,et al.  Proof verification and the hardness of approximation problems , 1998, JACM.