A support vector machine with a hybrid kernel and minimal Vapnik-Chervonenkis dimension

We present a mechanism to train support vector machines (SVMs) with a hybrid kernel and minimal Vapnik-Chervonenkis (VC) dimension. After describing the VC dimension of sets of separating hyperplanes in a high-dimensional feature space produced by a mapping related to kernels from the input space, we proposed an optimization criterion to design SVMs by minimizing the upper bound of the VC dimension. This method realizes a structural risk minimization and utilizes a flexible kernel function such that a superior generalization over test data can be obtained. In order to obtain a flexible kernel function, we develop a hybrid kernel function and a sufficient condition to be an admissible Mercer kernel based on common Mercer kernels (polynomial, radial basis function, two-layer neural network, etc.). The nonnegative combination coefficients and parameters of the hybrid kernel are determined subject to the minimal upper bound of the VC dimension of the learning machine. The use of the hybrid kernel results in a better performance than those with a single common kernel. Experimental results are discussed to illustrate the proposed method and show that the SVM with the hybrid kernel outperforms that with a single common kernel in terms of generalization power.

[1]  B. Carl,et al.  Entropy, Compactness and the Approximation of Operators , 1990 .

[2]  Si Wu,et al.  Improving support vector machine classifiers by modifying kernel functions , 1999, Neural Networks.

[3]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[4]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[5]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[6]  Bernhard Schölkopf,et al.  Regularization Networks and Support Vector Machines , 2000 .

[7]  Massimiliano Pontil,et al.  Support Vector Machines for 3D Object Recognition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[9]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[10]  S. Sathiya Keerthi,et al.  Evaluation of simple performance measures for tuning SVM hyperparameters , 2003, Neurocomputing.

[12]  Bernhard Schölkopf,et al.  Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators , 1998 .

[13]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[14]  Jun Wang,et al.  Neural network realization of support vector methods for pattern classification , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[15]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[16]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[17]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[18]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[19]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[20]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[21]  David R. Musicant,et al.  Successive overrelaxation for support vector machines , 1999, IEEE Trans. Neural Networks.

[22]  Bernhard Schölkopf,et al.  Improving the Accuracy and Speed of Support Vector Machines , 1996, NIPS.

[23]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[24]  Bernhard Schölkopf,et al.  Extracting Support Data for a Given Task , 1995, KDD.

[25]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[26]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[27]  Nello Cristianini,et al.  Dynamically Adapting Kernels in Support Vector Machines , 1998, NIPS.

[28]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[31]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Johan A. K. Suykens,et al.  Training multilayer perceptron classifiers based on a modified support vector method , 1999, IEEE Trans. Neural Networks.

[33]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[34]  Cheng Kok Tan Reliable IP multicast services over satellite links , 2000, Proceedings IEEE International Conference on Networks 2000 (ICON 2000). Networking Trends and Challenges in the New Millennium.