Feature space interpretation of SVMs with indefinite kernels

Kernel methods are becoming increasingly popular for various kinds of machine learning tasks, the most famous being the support vector machine (SVM) for classification. The SVM is well understood when using conditionally positive definite (cpd) kernel functions. However, in practice, non-cpd kernels arise and demand application in SVM. The procedure of "plugging" these indefinite kernels in SVM often yields good empirical classification results. However, they are hard to interpret due to missing geometrical and theoretical understanding. In this paper, we provide a step toward the comprehension of SVM classifiers in these situations. We give a geometric interpretation of SVM with indefinite kernel functions. We show that such SVM are optimal hyperplane classifiers not by margin maximization, but by minimization of distances between convex hulls in pseudo-Euclidean spaces. By this, we obtain a sound framework and motivation for indefinite SVM. This interpretation is the basis for further theoretical analysis, e.g., investigating uniqueness, and for the derivation of practical guidelines like characterizing the suitability of indefinite SVM.

[1]  Olivier Chapelle,et al.  Model Selection for Support Vector Machines , 1999, NIPS.

[2]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[3]  Matthias Hein,et al.  Maximal Margin Classification for Metric Spaces , 2003, COLT.

[4]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[5]  Mathini Sellathurai,et al.  The separability theory of hyperbolic tangent kernels and support vector machines for pattern classification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[7]  Claus Bahlmann,et al.  Learning with Distance Substitution Kernels , 2004, DAGM-Symposium.

[8]  Shigeki Sagayama,et al.  Dynamic Time-Alignment Kernel in Support Vector Machine , 2001, NIPS.

[9]  Robert P. W. Duin,et al.  A Generalized Kernel Approach to Dissimilarity-based Classification , 2002, J. Mach. Learn. Res..

[10]  Klaus Obermayer,et al.  Classi cation on Pairwise Proximity , 2007 .

[11]  Claus Bahlmann,et al.  Online handwriting recognition with support vector machines - a kernel approach , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[12]  Bernard Haasdonk,et al.  Tangent distance kernels for support vector machines , 2002, Object recognition supported by user interaction for service robots.

[13]  R. C. Williamson,et al.  Classification on proximity data with LP-machines , 1999 .

[14]  Panos M. Pardalos,et al.  Constrained Global Optimization: Algorithms and Applications , 1987, Lecture Notes in Computer Science.

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[16]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[17]  Bernhard Schölkopf,et al.  Training Invariant Support Vector Machines , 2002, Machine Learning.

[18]  Kristin P. Bennett,et al.  Duality and Geometry in SVM Classifiers , 2000, ICML.

[19]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[20]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[21]  Hsuan-Tien Lin A Study on Sigmoid Kernels for SVM and the Training of non-PSD Kernels by SMO-type Methods , 2005 .

[22]  Chih-Jen Lin,et al.  Training v-Support Vector Classifiers: Theory and Algorithms , 2001, Neural Computation.

[23]  Mehryar Mohri,et al.  Rational Kernels , 2002, NIPS.

[24]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[25]  N. JARDINE,et al.  A New Approach to Pattern Recognition , 1971, Nature.

[26]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.