A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions

Abstract. In this paper we consider the problem of learning a linear threshold function (a half-space in n dimensions, also called a ``perceptron''). Methods for solving this problem generally fall into two categories. In the absence of noise, this problem can be formulated as a Linear Program and solved in polynomial time with the Ellipsoid Algorithm or Interior Point methods. Alternatively, simple greedy algorithms such as the Perceptron Algorithm are often used in practice and have certain provable noise-tolerance properties; but their running time depends on a separation parameter, which quantifies the amount of ``wiggle room'' available for a solution, and can be exponential in the description length of the input. In this paper we show how simple greedy methods can be used to find weak hypotheses (hypotheses that correctly classify noticeably more than half of the examples) in polynomial time, without dependence on any separation parameter. Suitably combining these hypotheses results in a polynomial-time algorithm for learning linear threshold functions in the PAC model in the presence of random classification noise. (Also, a polynomial-time algorithm for learning linear threshold functions in the Statistical Query model of Kearns.) Our algorithm is based on a new method for removing outliers in data. Specifically, for any set S of points in Rn , each given to b bits of precision, we show that one can remove only a small fraction of S so that in the remaining set T , for every vector v , maxx ∈ T(v . x)2≤ poly(n,b)Ex ∈ T(v . x)2; i.e., for any hyperplane through the origin, the maximum distance (squared) from a point in T to the plane is at most polynomially larger than the average. After removing these outliers, we are able to show that a modified version of the Perceptron Algorithm finds a weak hypothesis in polynomial time, even in the presence of random classification noise.

[1]  S. Agmon The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.

[2]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[3]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[4]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[5]  L. G. H. Cijan A polynomial algorithm in linear programming , 1979 .

[6]  L. Khachiyan Polynomial algorithms in linear programming , 1980 .

[7]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, STOC '84.

[8]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[9]  W. Maass,et al.  On the complexity of learning from counterexamples , 1989, 30th Annual Symposium on Foundations of Computer Science.

[10]  Stephen I. Gallant,et al.  Perceptron-based learning algorithms , 1990, IEEE Trans. Neural Networks.

[11]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[12]  Yoav Freund,et al.  An improved boosting algorithm and its implications on learning complexity , 1992, COLT '92.

[13]  Tom Bylander Polynomial learnability of linear threshold approximations , 1993, COLT '93.

[14]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[15]  Javed A. Aslam,et al.  General bounds on statistical query learning and PAC learning with noise via hypothesis boosting , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[16]  James Aspnes,et al.  The expressive power of voting polynomials , 1994, Comb..

[17]  Edoardo Amaldi,et al.  From finding maximum feasible subsystems of linear systems to feedforward neural network design , 1994 .

[18]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[19]  Javed A. Aslam,et al.  Improved Noise-Tolerant Learning and Generalized Statistical Queries , 1994 .

[20]  Tom Bylander,et al.  Learning linear threshold functions in the presence of classification noise , 1994, COLT '94.

[21]  Edith Cohen,et al.  Learning noisy perceptrons by a perceptron in polynomial time , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.