The responsibility weighted Mahalanobis kernel for semi-supervised training of support vector machines for classification

The responsibility weighted Mahalanobis (RWM) kernel considers structure information in data with help of a parametric density model.It is perfectly suited for semi-supervised learning as the parameters of the density model can be found in an unsupervised way.For semi-supervised learning the RWM kernel outperforms some other kernel functions including the Laplacian kernel (Laplacian SVM). SVM with RWM kernels can be parameterized as easily as an SVM with standard RBF kernels, as known heuristics for the RBF kernel can be transferred to the new kernel.Standard training techniques such as SMO and standard implementations of SVM such as LIBSVM can be used with the RWM kernel without any algorithmic adjustments or extensions.Results are shown for 20 publicly available benchmark data sets. Kernel functions in support vector machines (SVM) are needed to assess the similarity of input samples in order to classify these samples, for instance. Besides standard kernels such as Gaussian (i.e., radial basis function, RBF) or polynomial kernels, there are also specific kernels tailored to consider structure in the data for similarity assessment. In this paper, we will capture structure in data by means of probabilistic mixture density models, for example Gaussian mixtures in the case of real-valued input spaces. From the distance measures that are inherently contained in these models, e.g., Mahalanobis distances in the case of Gaussian mixtures, we derive a new kernel, the responsibility weighted Mahalanobis (RWM) kernel. Basically, this kernel emphasizes the influence of model components from which any two samples that are compared are assumed to originate (that is, the "responsible" model components). We will see that this kernel outperforms the RBF kernel and other kernels capturing structure in data (such as the LAP kernel in Laplacian SVM) in many applications where partially labeled data are available, i.e., for semi-supervised training of SVM. Other key advantages are that the RWM kernel can easily be used with standard SVM implementations and training algorithms such as sequential minimal optimization, and heuristics known for the parametrization of RBF kernels in a C-SVM can easily be transferred to this new kernel. Properties of the RWM kernel are demonstrated with 20 benchmark data sets and an increasing percentage of labeled samples in the training data.

[1]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Kernel Machines , 2012, ArXiv.

[2]  Seppo J. Ovaska,et al.  In Your Interest - Objective Interestingness Measures for a Generative Classifier , 2011, ICAART.

[3]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[4]  Misha Pavel,et al.  Adjustment Learning and Relevant Component Analysis , 2002, ECCV.

[5]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[6]  Bernhard Sick,et al.  Training of radial basis function classifiers with resilient propagation and variational Bayesian inference , 2009, 2009 International Joint Conference on Neural Networks.

[7]  Bernhard Sick,et al.  Transductive active learning - A new semi-supervised learning approach based on iteratively refined generative models to capture structure in data , 2015, Inf. Sci..

[8]  Mohamed Cheriet,et al.  Help-Training for semi-supervised support vector machines , 2011, Pattern Recognit..

[9]  Sharon L. Lohr,et al.  Sampling: Design and Analysis , 1999 .

[10]  Lutz Hamel,et al.  Knowledge Discovery with Support Vector Machines , 2009 .

[11]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[12]  Mikhail Belkin,et al.  Laplacian Support Vector Machines Trained in the Primal , 2009, J. Mach. Learn. Res..

[13]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[14]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[15]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[16]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[17]  Mikhail Belkin,et al.  Using manifold structure for partially labelled classification , 2002, NIPS 2002.

[18]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[19]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[20]  M. Narasimha Murty,et al.  A fast quasi-Newton method for semi-supervised SVM , 2011, Pattern Recognit..

[21]  Korris Fu-Lai Chung,et al.  Support vector machine with manifold regularization and partially labeling privacy protection , 2015, Inf. Sci..

[22]  Edward Y. Chang,et al.  Learning with non-metric proximity matrices , 2005, MULTIMEDIA '05.

[23]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[24]  S. Sathiya Keerthi,et al.  Optimization Techniques for Semi-Supervised Support Vector Machines , 2008, J. Mach. Learn. Res..

[25]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[26]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[27]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[28]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[29]  R. E. Lee,et al.  Distribution-free multiple comparisons between successive treatments , 1995 .

[30]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[31]  Chih-Jen Lin,et al.  A Study on SMO-Type Decomposition Methods for Support Vector Machines , 2006, IEEE Transactions on Neural Networks.

[32]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[33]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[34]  Bo Zhang,et al.  Sparse regularization for semi-supervised classification , 2011, Pattern Recognit..

[35]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[36]  Yunsong Guo,et al.  Metric Learning: A Support Vector Approach , 2008, ECML/PKDD.

[37]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[38]  David G. Stork,et al.  Pattern Classification , 1973 .

[39]  Wallace Alvin Wilson,et al.  On Semi-Metric Spaces , 1931 .

[40]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[41]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[42]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[43]  Bernhard Sick,et al.  Let us know your decision: Pool-based active training of a generative classifier with the selection strategy 4DS , 2013, Inf. Sci..