A Study on Sigmoid Kernels for SVM and the Training of non-PSD Kernels by SMO-type Methods

The sigmoid kernel was quite popular for support vector machines due to its origin from neural networks. Although it is known that the kernel matrix may not be positive semi-definite (PSD), other properties are not fully studied. In this paper, we discuss such non-PSD kernels through the viewpoint of separability. Results help to validate the possible use of non-PSD kernels. One example shows that the sigmoid kernel matrix is conditionally positive definite (CPD) in certain parameters and thus are valid kernels there. However, we also explain that the sigmoid kernel is not better than the RBF kernel in general. Experiments are given to illustrate our analysis. Finally, we discuss how to solve the non-convex dual problems by SMO-type decomposition methods. Suitable modifications for any symmetric non-PSD kernel matrices are proposed with convergence proofs.

[1]  C. Berg,et al.  Harmonic Analysis on Semigroups , 1984 .

[2]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[3]  J. W. Baker,et al.  ANALYSIS ON SEMIGROUPS , 1990 .

[4]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[5]  Armando Freitas da Rocha,et al.  Neural Nets , 1992, Lecture Notes in Computer Science.

[6]  David Goldberg What Every Computer Scientist Should Know About Floating-Point Arithmetic , 1992 .

[7]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[8]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[9]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[10]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Federico Girosi,et al.  Reducing the run-time complexity of Support Vector Machines , 1999 .

[12]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[13]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[14]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[15]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[16]  Christopher J. C. Burges,et al.  Geometry and invariance in kernel based methods , 1999 .

[17]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[18]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[19]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[20]  Mathini Sellathurai,et al.  The separability theory of hyperbolic tangent kernels and support vector machines for pattern classification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[21]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[22]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[23]  Chih-Jen Lin,et al.  On the convergence of the decomposition method for support vector machines , 2001, IEEE Trans. Neural Networks.

[24]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[25]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[26]  Bernard Haasdonk,et al.  Tangent distance kernels for support vector machines , 2002, Object recognition supported by user interaction for service robots.

[27]  Chih-Jen Lin,et al.  Asymptotic convergence of an SMO algorithm without any assumptions , 2002, IEEE Trans. Neural Networks.

[28]  Chih-Jen Lin,et al.  A study on reduced support vector machines , 2003, IEEE Trans. Neural Networks.

[29]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[30]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[31]  Chih-Jen Lin,et al.  A Simple Decomposition Method for Support Vector Machines , 2002, Machine Learning.

[32]  Bernhard Schölkopf,et al.  Training Invariant Support Vector Machines , 2002, Machine Learning.

[33]  Laura Palagi,et al.  On the convergence of a modified version of SVM light algorithm , 2005, Optim. Methods Softw..