Distance metric learning-based kernel gram matrix learning for pattern analysis tasks in kernel feature space

Approaches to distance metric learning (DML) for Mahalanobis distance metric involve estimating a parametric matrix that is associated with a linear transformation. For complex pattern analysis tasks, it is necessary to consider the approaches to DML that involve estimating a parametric matrix that is associated with a nonlinear transformation. One such approach involves performing the DML of Mahalanobis distance in the feature space of a Mercer kernel. In this approach, the problem of estimation of a parametric matrix of Mahalanobis distance is formulated as a problem of learning an optimal kernel gram matrix from the kernel gram matrix of a base kernel by minimizing the logdet divergence between the kernel gram matrices. We propose to use the optimal kernel gram matrices learnt from the kernel gram matrix of the base kernels in pattern analysis tasks such as clustering, multi-class pattern classification and nonlinear principal component analysis. We consider the commonly used kernels such as linear kernel, polynomial kernel, radial basis function kernel and exponential kernel as well as hyper-ellipsoidal kernels as the base kernels for optimal kernel learning. We study the performance of the DML-based class-specific kernels for multi-class pattern classification using support vector machines. Results of our experimental studies on benchmark datasets demonstrate the effectiveness of the DML-based kernels for different pattern analysis tasks.

[1]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[2]  Kilian Q. Weinberger,et al.  Fast solvers and efficient implementations for distance metric learning , 2008, ICML '08.

[3]  Glenn Fung,et al.  Proximal support vector machine classifiers , 2001, KDD '01.

[4]  Chellu Chandra Sekhar,et al.  Class-specific GMM based intermediate matching kernel for classification of varying length patterns of long duration speech using support vector machines , 2014, Speech Commun..

[5]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[6]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[7]  Napoleon H. Reyes,et al.  A New 2D Static Hand Gesture Colour Image Dataset for ASL Gestures , 2011 .

[8]  Rong Jin,et al.  Distance Metric Learning: A Comprehensive Survey , 2006 .

[9]  Inderjit S. Dhillon,et al.  Metric and Kernel Learning Using a Linear Transformation , 2009, J. Mach. Learn. Res..

[10]  Chellu Chandra Sekhar,et al.  Class-Specific Mahalanobis Distance Metric Learning for Biological Image Classification , 2012, ICIAR.

[11]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[12]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Kernel Machines , 2012, ArXiv.

[13]  Stephen Tyree,et al.  Non-linear Metric Learning , 2012, NIPS.

[14]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[15]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[16]  Kilian Q. Weinberger,et al.  Convex Optimizations for Distance Metric Learning and Pattern Classification [Applications Corner] , 2010, IEEE Signal Processing Magazine.

[17]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[18]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[19]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[20]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[21]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[22]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[23]  Inderjit S. Dhillon,et al.  Matrix Nearness Problems with Bregman Divergences , 2007, SIAM J. Matrix Anal. Appl..

[24]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[25]  Inderjit S. Dhillon,et al.  Low-Rank Kernel Learning with Bregman Matrix Divergences , 2009, J. Mach. Learn. Res..

[26]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[27]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[28]  Xun Liang,et al.  Hyperellipsoidal Statistical Classifications in a Reproducing Kernel Hilbert Space , 2011, IEEE Transactions on Neural Networks.

[29]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[30]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[31]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[32]  Pannagadatta K. Shivaswamy Ellipsoidal Kernel Machines , 2007 .