Agnostic Learning of Monomials by Halfspaces Is Hard

We prove the following strong hardness result for learning: Given a distribution on labeled examples from the hypercube such that there exists a monomial (or conjunction) consistent with (1-ε)-fraction of the examples, it is NP-hard to find a halfspace that is correct on ( 1/2 +ε)-fraction of the examples, for arbitrary constant ε ≫ 0. In learning theory terms, weak agnostic learning of monomials by halfspaces is NP-hard. This hardness result bridges between and subsumes two previous results which showed similar hardness results for the proper learning of monomials and halfspaces. As immediate corollaries of our result, we give the first optimal hardness results for weak agnostic learning of decision lists and majorities. Our techniques are quite different from previous hardness proofs for learning. We use an invariance principle and sparse approximation of halfspaces from recent work on fooling halfspaces to give a new natural list decoding of a halfspace in the context of dictatorship tests/label cover reductions. In addition, unlike previous invariance principle based proofs which are only known to give Unique Games hardness, we give a reduction from a smooth version of Label Cover that is known to be NP-hard.

[1]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[2]  Franco P. Preparata,et al.  The Densest Hemisphere Problem , 1978, Theor. Comput. Sci..

[3]  Temple F. Smith Occam's razor , 1980, Nature.

[4]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[5]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[6]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[7]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[8]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[9]  Stephen I. Gallant,et al.  Perceptron-based learning algorithms , 1990, IEEE Trans. Neural Networks.

[10]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[11]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[12]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[13]  Tom Bylander,et al.  Learning linear threshold functions in the presence of classification noise , 1994, COLT '94.

[14]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[15]  Peter L. Bartlett,et al.  On efficient agnostic learning of linear combinations of basis functions , 1995, COLT '95.

[16]  Jacques Stern,et al.  The Hardness of Approximate Optima in Lattices, Codes, and Systems of Linear Equations , 1997, J. Comput. Syst. Sci..

[17]  Edith Cohen,et al.  Learning noisy perceptrons by a perceptron in polynomial time , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[18]  Claudio Gentile,et al.  Linear Hinge Loss and Average Margin , 1998, NIPS.

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[21]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[22]  Nader H. Bshouty,et al.  Maximizing Agreements and CoAgnostic Learning , 2002, ALT.

[23]  Nader H. Bshouty,et al.  Bounds for the Minimum Disagreement Problem with Applications to Learning Theory , 2002, COLT.

[24]  Subhash Khot,et al.  On the power of unique 2-prover 1-round games , 2002, Proceedings 17th IEEE Annual Conference on Computational Complexity.

[25]  Subhash Khot On the power of unique 2-prover 1-round games , 2002, STOC '02.

[26]  Subhash Khot,et al.  New techniques for probabilistically checkable proofs and inapproximability results , 2003 .

[27]  Shai Ben-David,et al.  On the difficulty of approximately maximizing agreements , 2000, J. Comput. Syst. Sci..

[28]  Peter Auer,et al.  Tracking the Best Disjunction , 1998, Machine Learning.

[29]  Guy Kindler,et al.  Optimal inapproximability results for MAX-CUT and other 2-variable CSPs? , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[30]  R. Schapire,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[31]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[32]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[33]  S. Chatterjee A simple invariance theorem , 2005, math/0508213.

[34]  Ryan O'Donnell,et al.  Noise stability of functions with low influences: Invariance and optimality , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[35]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[36]  Prasad Raghavendra,et al.  Hardness of Learning Halfspaces with Noise , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[37]  Vitaly Feldman,et al.  New Results for Learning Noisy Parities and Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[38]  Nader H. Bshouty,et al.  Maximizing agreements and coagnostic learning , 2006, Theor. Comput. Sci..

[39]  Vitaly Feldman Optimal hardness results for maximizing agreements with monomials , 2006, 21st Annual IEEE Conference on Computational Complexity (CCC'06).

[40]  Alexander A. Sherstov,et al.  Cryptographic Hardness for Learning Intersections of Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[41]  Rocco A. Servedio,et al.  Every Linear Threshold Function has a Low-Weight Approximator , 2006, 21st Annual IEEE Conference on Computational Complexity (CCC'06).

[42]  Elchanan Mossel Gaussian Bounds for Noise Correlation of Functions , 2007, FOCS 2007.

[43]  Rocco A. Servedio,et al.  Testing Halfspaces , 2007, SIAM J. Comput..

[44]  Subhash Khot,et al.  Hardness of Reconstructing Multivariate Polynomials over Finite Fields , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[45]  Alexander A. Sherstov,et al.  Unconditional lower bounds for learning intersections of halfspaces , 2007, Machine Learning.

[46]  Subhash Khot,et al.  Hardness of Minimizing and Learning DNF Expressions , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[47]  Elchanan Mossel,et al.  Gaussian Bounds for Noise Correlation of Functions and Tight Analysis of Long Codes , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[48]  Subhash Khot,et al.  Vertex cover might be hard to approximate to within 2-epsilon , 2008, J. Comput. Syst. Sci..

[49]  Ryan O'Donnell,et al.  The chow parameters problem , 2008, SIAM J. Comput..

[50]  Vitaly Feldman,et al.  On Agnostic Learning of Parities, Monomials, and Halfspaces , 2009, SIAM J. Comput..

[51]  Rocco A. Servedio,et al.  Bounded Independence Fools Halfspaces , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[52]  Vitaly Feldman A Complete Characterization of Statistical Query Learning with Applications to Evolvability , 2009, FOCS.

[53]  Rocco A. Servedio,et al.  Hardness results for agnostically learning low-degree polynomial threshold functions , 2011, SODA '11.