Hardness of Learning Halfspaces with Noise

Learning an unknown halfspace (also called a perceptron) from labeled examples is one of the classic problems in machine learning. In the noise-free case, when a halfspace consistent with all the training examples exists, the problem can be solved in polynomial time using linear programming. However, under the promise that a halfspace consistent with a fraction (1-\varepsilon ) of the examples exists (for some small constant \varepsilon > 0), it was not known how to efficiently find a halfspace that is correct on even 51% of the examples. Nor was a hardness result that ruled out getting agreement on more than 99.9% of the examples known. In this work, we close this gap in our understanding, and prove that even a tiny amount of worst-case noise makes the problem of learning halfspaces intractable in a strong sense. Specifically, for arbitrary \varepsilon, \delta \ge 0, we prove that given a set of examples-label pairs from the hypercube a fraction (1-\varepsilon ) of which can be explained by a halfspace, it is NP-hard to find a halfspace that correctly labels a fraction (1/2 + \delta ) of the examples. The hardness result is tight since it is trivial to get agreement on 1/2 the examples. In learning theory parlance, we prove that weak proper agnostic learning of halfspaces is hard. This settles a question that was raised by Blum et al in their work on learning halfspaces in the presence of random classification noise [7], and in some more recent works as well. Along the way, we also obtain a strong hardness for another basic computational problem: solving a linear system over the rationals.

[1]  Edith Cohen,et al.  Learning noisy perceptrons by a perceptron in polynomial time , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[2]  Uri Zwick,et al.  Finding almost-satisfying assignments , 1998, STOC '98.

[3]  Rocco A. Servedio,et al.  Agnostically Learning Halfspaces , 2005, FOCS.

[4]  Vitaly Feldman Optimal hardness results for maximizing agreements with monomials , 2006, 21st Annual IEEE Conference on Computational Complexity (CCC'06).

[5]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[6]  Shai Ben-David,et al.  On the difficulty of approximately maximizing agreements , 2000, J. Comput. Syst. Sci..

[7]  Uriel Feige,et al.  On the hardness of approximating Max-Satisfy , 2006, Inf. Process. Lett..

[8]  Linda Sellie,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[9]  Magnús M. Halldórsson,et al.  Journal of Graph Algorithms and Applications Approximations of Weighted Independent Set and Hereditary Subset Problems , 2022 .

[10]  Vitaly Feldman,et al.  New Results for Learning Noisy Parities and Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[11]  Johan Håstad,et al.  Some optimal inapproximability results , 2001, JACM.

[12]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[13]  Ran Raz A Parallel Repetition Theorem , 1998, SIAM J. Comput..

[14]  S. Agmon The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.

[15]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[16]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[17]  Noga Alon,et al.  A Fast and Simple Randomized Parallel Algorithm for the Maximal Independent Set Problem , 1985, J. Algorithms.

[18]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[19]  Moses Charikar,et al.  Near-optimal algorithms for maximum constraint satisfaction problems , 2007, SODA '07.

[20]  Prasad Raghavendra,et al.  A 3-query PCP over integers , 2007, STOC '07.

[21]  I. Anderson Combinatorics of Finite Sets , 1987 .

[22]  Carsten Lund,et al.  Proof verification and the hardness of approximation problems , 1998, JACM.

[23]  P. Erdös On a lemma of Littlewood and Offord , 1945 .

[24]  Moni Naor,et al.  Derandomized Constructions of k-Wise (Almost) Independent Permutations , 2005, APPROX-RANDOM.

[25]  Jacques Stern,et al.  The hardness of approximate optima in lattices, codes, and systems of linear equations , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[26]  Nader H. Bshouty,et al.  Maximizing agreements and coagnostic learning , 2006, Theor. Comput. Sci..