A training algorithm for optimal margin classifiers

A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to the decision boundary. Bounds on the generalization performance based on the leave-one-out method and the VC-dimension are given. Experimental results on optical character recognition problems demonstrate the good generalization obtained when compared with other learning algorithms.

[1]  Dr. M. G. Worster Methods of Mathematical Physics , 1947, Nature.

[2]  R. Courant,et al.  Methods of Mathematical Physics , 1962 .

[3]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[4]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[5]  F. A. Lootsma,et al.  Numerical methods for non-linear optimization , 1974 .

[6]  S. Vajda,et al.  Numerical Methods for Non-Linear Optimization , 1973 .

[7]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[8]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[9]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[10]  W. Krauth,et al.  Learning algorithms with optimal stability in neural networks , 1987 .

[11]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[12]  David Haussler,et al.  Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[13]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[14]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[15]  Y. Le Cun,et al.  Comparing different neural network architectures for classifying handwritten digits , 1989, International 1989 Joint Conference on Neural Networks.

[16]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[17]  Naftali Tishby,et al.  Consistent inference of probabilities in layered networks: predictions and generalizations , 1989, International 1989 Joint Conference on Neural Networks.

[18]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[19]  Stephen M. Omohundro,et al.  Bumptrees for Efficient Function, Constraint and Classification Learning , 1990, NIPS.

[20]  Isabelle Guyon,et al.  Structural Risk Minimization for Character Recognition , 1991, NIPS.

[21]  D. Mackay,et al.  A Practical Bayesian Framework for Backprop Networks , 1991 .

[22]  Yann LeCun,et al.  Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[23]  Isabelle Guyon,et al.  Computer aided cleaning of large databases for character recognition , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[24]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.