The Kernel-Adatron Algorithm: A Fast and Simple Learning Procedure for Support Vector Machines

Support Vector Machines work by mapping training data for classiication tasks into a high dimensional feature space. In the feature space they then nd a maximal margin hyperplane which separates the data. This hyperplane is usually found using a quadratic programming routine which is computation-ally intensive, and is non trivial to implement. In this paper we propose an adaptation of the Adatron algorithm for clas-siication with kernels in high dimensional spaces. The algorithm is simple and can nd a solution very rapidly with an exponentially fast rate of convergence (in the number of iterations) towards the optimal solution. Experimental results with real and artiicial datasets are provided.

[1]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[2]  Opper Learning times of neural networks: Exact solution for a PERCEPTRON algorithm. , 1988, Physical review. A, General physics.

[3]  M. Opper Learning in Neural Networks: Solvable Dynamics , 1989 .

[4]  W. Kinzel Statistical mechanics of the perceptron with maximal stability , 1990 .

[5]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[6]  T. Watkin,et al.  THE STATISTICAL-MECHANICS OF LEARNING A RULE , 1993 .

[7]  Isabelle Guyon,et al.  Discovering Informative Patterns and Data Cleaning , 1996, Advances in Knowledge Discovery and Data Mining.

[8]  Corinna Cortes,et al.  Prediction of Generalization Ability in Learning Machines , 1994 .

[9]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[10]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[11]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[12]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[13]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[14]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[16]  D. Saad Europhysics Letters , 1997 .

[17]  P. Bartlett,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[18]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[19]  Nello Cristianini,et al.  Bayesian Classifiers Are Large Margin Hyperplanes in a Hilbert Space , 1998, ICML.

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.