Hardness of Learning Halfspaces with Noise

Learning an unknown halfspace (also called a perceptron) from, labeled examples is one of the classic problems in machine learning. In the noise-free case, when a half-space consistent with all the training examples exists, the problem can be solved in polynomial time using linear programming. However, under the promise that a halfspace consistent with a fraction (1 - epsiv) of the examples exists (for some small constant epsiv > 0), it was not known how to efficiently find a halfspace that is correct on even 51% of the examples. Nor was a hardness result that ruled out getting agreement on more than 99.9% of the examples known. In this work, we close this gap in our understanding, and prove that even a tiny amount of worst-case noise makes the problem of learning halfspaces intractable in a strong sense. Specifically, for arbitrary epsiv,delta > 0, we prove that given a set of examples-label pairs from the hypercube a fraction (1 - epsiv) of which can be explained by a halfspace, it is NP-hard to find a halfspace that correctly labels a fraction (frac12 + delta) of the examples. The hardness result is tight since it is trivial to get agreement on frac12 the examples. In learning theory parlance, we prove that weak proper agnostic learning of halfspaces is hard. This settles a question that was raised by Blum et. al in their work on learning halfspaces in the presence of random classification noise (A. Blum et. al, 1996), and in some more recent works as well. Along the way, we also obtain a strong hardness for another basic computational problem: solving a linear system over the rationals

[1]  Carsten Lund,et al.  Proof verification and the hardness of approximation problems , 1998, JACM.

[2]  Uri Zwick,et al.  Finding almost-satisfying assignments , 1998, STOC '98.

[3]  S. Agmon The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.

[4]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[5]  Magnús M. Halldórsson,et al.  Journal of Graph Algorithms and Applications Approximations of Weighted Independent Set and Hereditary Subset Problems , 2022 .

[6]  Prasad Raghavendra,et al.  A 3-query PCP over integers , 2007, STOC '07.

[7]  R. Schapire,et al.  Toward Efficient Agnostic Learning , 1994 .

[8]  Nader H. Bshouty,et al.  Maximizing Agreements and CoAgnostic Learning , 2002, ALT.

[9]  Moses Charikar,et al.  Near-optimal algorithms for maximum constraint satisfaction problems , 2007, SODA '07.

[10]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[11]  Jacques Stern,et al.  The hardness of approximate optima in lattices, codes, and systems of linear equations , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[12]  Vitaly Feldman,et al.  New Results for Learning Noisy Parities and Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[13]  Johan Håstad,et al.  Some optimal inapproximability results , 2001, JACM.

[14]  Carsten Lund,et al.  Proof verification and hardness of approximation problems , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[15]  Ran Raz,et al.  A parallel repetition theorem , 1995, STOC '95.

[16]  Vitaly Feldman Optimal hardness results for maximizing agreements with monomials , 2006, 21st Annual IEEE Conference on Computational Complexity (CCC'06).

[17]  Moni Naor,et al.  Derandomized Constructions of k-Wise (Almost) Independent Permutations , 2005, Algorithmica.

[18]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[19]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[20]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[21]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[22]  Noga Alon,et al.  A Fast and Simple Randomized Parallel Algorithm for the Maximal Independent Set Problem , 1985, J. Algorithms.

[23]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[24]  Shai Ben-David,et al.  On the difficulty of approximately maximizing agreements , 2000, J. Comput. Syst. Sci..

[25]  Edith Cohen,et al.  Learning noisy perceptrons by a perceptron in polynomial time , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[26]  Uriel Feige,et al.  On the hardness of approximating Max-Satisfy , 2006, Inf. Process. Lett..

[27]  Hunter S. Snevily Combinatorics of finite sets , 1991 .

[28]  P. Erdös On a lemma of Littlewood and Offord , 1945 .