Agnostic Learning of Monomials by Halfspaces Is Hard

We prove the following strong hardness result for learning: Given a distribution of labeled examples from the hypercube such that there exists a monomial consistent with $(1-\epsilon)$ of the examples it is NP-hard to find a halfspace that is correct on $(1/2+\epsilon)$ of the examples for arbitrary constants $\epsilon>0$. In learning theory terms, weak agnostic learning of monomials is hard, even if one is allowed to output a hypothesis from the much bigger concept class of halfspaces. This hardness result subsumes a long line of previous results, including two recent hardness results for the proper learning of monomials and halfspaces. As an immediate corollary of our result we show that weak agnostic learning of decision lists is NP-hard. Our techniques are quite different from previous hardness proofs for learning. We define distributions on positive and negative examples for monomials whose first few moments match. We use the invariance principle to argue that regular halfspaces (all of whose coeffic...

[1]  Claudio Gentile,et al.  Linear Hinge Loss and Average Margin , 1998, NIPS.

[2]  Nader H. Bshouty,et al.  Maximizing Agreements and CoAgnostic Learning , 2002, ALT.

[3]  Alexander A. Sherstov,et al.  Unconditional lower bounds for learning intersections of halfspaces , 2007, Machine Learning.

[4]  Tom Bylander,et al.  Learning linear threshold functions in the presence of classification noise , 1994, COLT '94.

[5]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[6]  S. Chatterjee A simple invariance theorem , 2005, math/0508213.

[7]  Alexander A. Sherstov,et al.  Cryptographic Hardness for Learning Intersections of Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[8]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[9]  Rocco A. Servedio,et al.  Every Linear Threshold Function has a Low-Weight Approximator , 2006, 21st Annual IEEE Conference on Computational Complexity (CCC'06).

[10]  Edith Cohen,et al.  Learning noisy perceptrons by a perceptron in polynomial time , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[11]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[12]  Elchanan Mossel Gaussian Bounds for Noise Correlation of Functions , 2007, FOCS 2007.

[13]  Temple F. Smith Occam's razor , 1980, Nature.

[14]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[15]  Subhash Khot,et al.  On the power of unique 2-prover 1-round games , 2002, Proceedings 17th IEEE Annual Conference on Computational Complexity.

[16]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[17]  Subhash Khot,et al.  Hardness of Reconstructing Multivariate Polynomials over Finite Fields , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[18]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[19]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[20]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[21]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[22]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[23]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[24]  Ryan O'Donnell,et al.  The chow parameters problem , 2008, SIAM J. Comput..

[25]  Subhash Khot,et al.  Vertex cover might be hard to approximate to within 2-/spl epsiv/ , 2003, 18th IEEE Annual Conference on Computational Complexity, 2003. Proceedings..

[26]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[27]  Stephen I. Gallant,et al.  Perceptron-based learning algorithms , 1990, IEEE Trans. Neural Networks.

[28]  Ryan O'Donnell,et al.  Noise stability of functions with low influences: Invariance and optimality , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[29]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[30]  Peter Auer,et al.  Tracking the Best Disjunction , 1998, Machine Learning.

[31]  R. Schapire,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[32]  Jacques Stern,et al.  The Hardness of Approximate Optima in Lattices, Codes, and Systems of Linear Equations , 1997, J. Comput. Syst. Sci..

[33]  Prasad Raghavendra,et al.  Hardness of Learning Halfspaces with Noise , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[34]  Subhash Khot,et al.  Hardness of Minimizing and Learning DNF Expressions , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[35]  Shai Ben-David,et al.  On the difficulty of approximately maximizing agreements , 2000, J. Comput. Syst. Sci..

[36]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[37]  Rocco A. Servedio,et al.  Bounded Independence Fools Halfspaces , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[38]  Guy Kindler,et al.  Optimal inapproximability results for MAX-CUT and other 2-variable CSPs? , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[39]  Peter L. Bartlett,et al.  On efficient agnostic learning of linear combinations of basis functions , 1995, COLT '95.

[40]  Vitaly Feldman Optimal hardness results for maximizing agreements with monomials , 2006, 21st Annual IEEE Conference on Computational Complexity (CCC'06).

[41]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[42]  Rocco A. Servedio,et al.  Testing Halfspaces , 2007, SIAM J. Comput..

[43]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[44]  Vitaly Feldman,et al.  A Complete Characterization of Statistical Query Learning with Applications to Evolvability , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[45]  Franco P. Preparata,et al.  The Densest Hemisphere Problem , 1978, Theor. Comput. Sci..

[46]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[47]  Subhash Khot,et al.  New techniques for probabilistically checkable proofs and inapproximability results , 2003 .

[48]  Rocco A. Servedio,et al.  Hardness results for agnostically learning low-degree polynomial threshold functions , 2011, SODA '11.

[49]  Nader H. Bshouty,et al.  Bounds for the Minimum Disagreement Problem with Applications to Learning Theory , 2002, COLT.

[50]  Vitaly Feldman,et al.  On Agnostic Learning of Parities, Monomials, and Halfspaces , 2009, SIAM J. Comput..