Large Margin Classification Using the Perceptron Algorithm

We introduce and analyze a new algorithm for linear classification which combines Rosenblatt's perceptron algorithm with Helmbold and Warmuth's leave-one-out method. Like Vapnik's maximal-margin classifier, our algorithm takes advantage of data that are linearly separable with large margins. Compared to Vapnik's algorithm, however, ours is much simpler to implement, and much more efficient in terms of computation time. We also show that our algorithm can be efficiently used in very high dimensional spaces using kernel functions. We performed some experiments using our algorithm, and some variants of it, for classifying images of handwritten digits. The performance of our algorithm is close to, but not as good as, the performance of maximal-margin classifiers on the same problem, while saving significantly on computation time and programming effort.

[1]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[2]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[3]  H. D. Block The perceptron: a model for brain functioning. I , 1962 .

[4]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[5]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[6]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[7]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[8]  Nick Littlestone,et al.  From on-line to batch learning , 1989, COLT '89.

[9]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[10]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[11]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[12]  Hans Ulrich Simon,et al.  From noise-free to noise-tolerant and from on-line to batch learning , 1995, COLT '95.

[13]  Manfred K. Warmuth,et al.  On Weak Learning , 1995, J. Comput. Syst. Sci..

[14]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  N. Cristianini,et al.  Robust Bounds on Generalization from the Margin Distribution , 1998 .

[17]  Nello Cristianini,et al.  The Kernel-Adatron : A fast and simple learning procedure for support vector machines , 1998, ICML 1998.