Generalized Support Vector Machines

By setting apart the two functions of a support vector machine: separation of points by a nonlinear surface in the original space of patterns, and maximizing the distance between separating planes in a higher dimensional space, we are able to deene indeenite, possibly discontinuous, kernels, not necessarily inner product ones, that generate highly nonlin-ear separating surfaces. Maximizing the distance between the separating planes in the higher dimensional space is surrogated by support vector suppression, which is achieved by minimizing any desired norm of support vector multipliers. The norm may be one induced by the separation kernel if it happens to be positive deenite, or a Euclidean or a polyhe-dral norm. The latter norm leads to a linear program whereas the former norms lead to convex quadratic programs, all with an arbitrary separation kernel. A standard support vector machine can be recovered by using the same kernel for separation and support vector suppression. On a simple test example, all models perform equally well when a positive deenite kernel is used. When a negative deenite kernel is used, we are unable to solve the nonconvex quadratic program associated with a conventional support vector machine, while all other proposed models remain convex and easily generate a surface that separates all given points.

[1]  R. Courant,et al.  Methods of Mathematical Physics , 1962 .

[2]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[3]  S. Sinha A Duality Theorem for Nonlinear Programming , 1966 .

[4]  Olvi L. Mangasarian,et al.  Nonlinear Programming , 1969 .

[5]  S. Dirkse,et al.  The path solver: a nommonotone stabilization scheme for mixed complementarity problems , 1995 .

[6]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[7]  K. Bennett,et al.  A support vector machine approach to decision trees , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[8]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[9]  Kristin P. Bennett,et al.  Feature minimization within decision trees , 1998 .

[10]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[11]  G. Wahba Support Vector Machines, Reproducing Kernel Hilbert Spaces and the Randomized GACV 1 , 1998 .

[12]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[13]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[14]  Erin J. Bredensteiner Optimization methods in data mining and machine learning , 1998 .

[15]  Kristin P. Bennett,et al.  On support vector decision trees for database marketing , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[16]  Michael C. Ferris,et al.  Interfaces to PATH 3.0: Design, Implementation and Usage , 1999, Comput. Optim. Appl..

[17]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.