Weighted Mahalanobis Distance Kernels for Support Vector Machines

The support vector machine (SVM) has been demonstrated to be a very effective classifier in many applications, but its performance is still limited as the data distribution information is underutilized in determining the decision hyperplane. Most of the existing kernels employed in nonlinear SVMs measure the similarity between a pair of pattern images based on the Euclidean inner product or the Euclidean distance of corresponding input patterns, which ignores data distribution tendency and makes the SVM essentially a ldquolocalrdquo classifier. In this paper, we provide a step toward a paradigm of kernels by incorporating data specific knowledge into existing kernels. We first find the data structure for each class adaptively in the input space via agglomerative hierarchical clustering (AHC), and then construct the weighted Mahalanobis distance (WMD) kernels using the detected data distribution information. In WMD kernels, the similarity between two pattern images is determined not only by the Mahalanobis distance (MD) between their corresponding input patterns but also by the sizes of the clusters they reside in. Although WMD kernels are not guaranteed to be positive definite (pd) or conditionally positive definite (cpd), satisfactory classification results can still be achieved because regularizers in SVMs with WMD kernels are empirically positive in pseudo-Euclidean (pE) spaces. Experimental results on both synthetic and real-world data sets show the effectiveness of ldquopluggingrdquo data structure into existing kernels.

[1]  Bernard Haasdonk,et al.  Tangent distance kernels for support vector machines , 2002, Object recognition supported by user interaction for service robots.

[2]  Pedro E. López-de-Teruel,et al.  Nonlinear kernel-based statistical pattern analysis , 2001, IEEE Trans. Neural Networks.

[3]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[4]  Claus Bahlmann,et al.  Learning with Distance Substitution Kernels , 2004, DAGM-Symposium.

[5]  Todd L. Heberlein,et al.  Network intrusion detection , 1994, IEEE Network.

[6]  Chih-Jen Lin,et al.  Training nu-Support Vector Classifiers: Theory and Algorithms , 2001, Neural Comput..

[7]  Bernhard Schölkopf,et al.  Incorporating Invariances in Support Vector Learning Machines , 1996, ICANN.

[8]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[9]  Meng Joo Er,et al.  High-speed face recognition based on discrete cosine transform and RBF neural networks , 2005, IEEE Transactions on Neural Networks.

[10]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Claus Bahlmann,et al.  Online handwriting recognition with support vector machines - a kernel approach , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[13]  Michael R. Lyu,et al.  Learning large margin classifiers locally and globally , 2004, ICML.

[14]  Peter Tino,et al.  IEEE Transactions on Neural Networks , 2009 .

[15]  Chih-Jen Lin,et al.  Training v-Support Vector Classifiers: Theory and Algorithms , 2001, Neural Computation.

[16]  Bernhard Schölkopf,et al.  Training Invariant Support Vector Machines , 2002, Machine Learning.

[17]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[18]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[19]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[20]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[21]  Hsuan-Tien Lin A Study on Sigmoid Kernels for SVM and the Training of non-PSD Kernels by SMO-type Methods , 2005 .

[22]  C. Berg,et al.  Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions , 1984 .

[23]  Bernard Haasdonk,et al.  Feature space interpretation of SVMs with indefinite kernels , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Kazushi Ikeda,et al.  Effects of kernel function on Nu support vector machines in extreme cases , 2006, IEEE Transactions on Neural Networks.

[25]  Hermann Ney,et al.  Learning of Variability for Invariant Statistical Pattern Recognition , 2001, ECML.

[26]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[27]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[28]  George Nagy,et al.  Style context with second-order statistics , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  C.I. Christodoulou,et al.  Unsupervised pattern recognition for the classification of EMG signals , 1999, IEEE Transactions on Biomedical Engineering.

[30]  David Eppstein,et al.  Fast hierarchical clustering and other applications of dynamic closest pairs , 1999, SODA '98.

[31]  Chih-Jen Lin,et al.  A Study on SMO-Type Decomposition Methods for Support Vector Machines , 2006, IEEE Transactions on Neural Networks.

[32]  Changshui Zhang,et al.  Probabilistic tangent subspace: a unified view , 2004, ICML.