Learning Hierarchical Feature Space Using CLAss-specific Subspace Multiple Kernel - Metric Learning for Classification

Metric learning for classification has been intensively studied over the last decade. The idea is to learn a metric space induced from a normed vector space on which data from different classes are well separated. Different measures of the separation thus lead to various designs of the objective function in the metric learning model. One classical metric is the Mahalanobis distance, where a linear transformation matrix is designed and applied on the original dataset to obtain a new subspace equipped with the Euclidean norm. The kernelized version has also been developed, followed by Multiple-Kernel learning models. In this paper, we consider metric learning to be the identification of the best kernel function with respect to a high class separability in the corresponding metric space. The contribution is twofold: 1) No pairwise computations are required as in most metric learning techniques; 2) Better flexibility and lower computational complexity is achieved using the CLAss-Specific (Multiple) Kernel - Metric Learning (CLAS(M)K-ML). The proposed techniques can be considered as a preprocessing step to any kernel method or kernel approximation technique. An extension to a hierarchical learning structure is also proposed to further improve the classification performance, where on each layer, the CLASMK is computed based on a selected "marginal" subset and feature vectors are constructed by concatenating the features from all previous layers.

[1]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2]  Michael I. Jordan,et al.  Predictive low-rank decomposition for kernel methods , 2005, ICML.

[3]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[4]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[5]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[6]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[7]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[8]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[9]  Lei Wang,et al.  Learning kernel parameters by using class separability measure , 2002 .

[10]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  S. Kung Kernel Methods and Machine Learning , 2014 .

[13]  Yoshua Bengio,et al.  Greedy Spectral Embedding , 2005, AISTATS.

[14]  Ameet Talwalkar,et al.  On the Impact of Kernel Approximation on Learning Accuracy , 2010, AISTATS.

[15]  Ivor W. Tsang,et al.  Two-Layer Multiple Kernel Learning , 2011, AISTATS.

[16]  Ameet Talwalkar,et al.  Sampling Methods for the Nyström Method , 2012, J. Mach. Learn. Res..

[17]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[18]  Inderjit S. Dhillon,et al.  Metric and Kernel Learning Using a Linear Transformation , 2009, J. Mach. Learn. Res..

[19]  Zhihua Zhang,et al.  Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling , 2013, J. Mach. Learn. Res..