Large Margin Classification Using the Perceptron Algorithm

We introduce and analyze a new algorithm for linear classification which combines Rosenblatt‘s perceptron algorithm with Helmbold and Warmuth‘s leave-one-out method. Like Vapnik‘s maximal-margin classifier, our algorithm takes advantage of data that are linearly separable with large margins. Compared to Vapnik‘s algorithm, however, ours is much simpler to implement, and much more efficient in terms of computation time. We also show that our algorithm can be efficiently used in very high dimensional spaces using kernel functions. We performed some experiments using our algorithm, and some variants of it, for classifying images of handwritten digits. The performance of our algorithm is close to, but not as good as, the performance of maximal-margin classifiers on the same problem, while saving significantly on computation time and programming effort.

[1]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[2]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[3]  H. D. Block The perceptron: a model for brain functioning. I , 1962 .

[4]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[5]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[6]  Michael Biehl,et al.  The AdaTron: An Adaptive Perceptron Algorithm , 1989 .

[7]  Nick Littlestone,et al.  From on-line to batch learning , 1989, COLT '89.

[8]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[9]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[10]  Hans Ulrich Simon,et al.  From noise-free to noise-tolerant and from on-line to batch learning , 1995, COLT '95.

[11]  Manfred K. Warmuth,et al.  On Weak Learning , 1995, J. Comput. Syst. Sci..

[12]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[13]  Manfred K. Warmuth,et al.  How to use expert advice , 1997, JACM.

[14]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  N. Cristianini,et al.  Robust Bounds on Generalization from the Margin Distribution , 1998 .

[17]  Nello Cristianini,et al.  The Kernel-Adatron Algorithm: A Fast and Simple Learning Procedure for Support Vector Machines , 1998, ICML.

[18]  Nello Cristianini,et al.  The Kernel-Adatron : A fast and simple learning procedure for support vector machines , 1998, ICML 1998.

[19]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.