Ideal regularization for learning kernels from labels

In this paper, we propose a new form of regularization that is able to utilize the label information of a data set for learning kernels. The proposed regularization, referred to as ideal regularization, is a linear function of the kernel matrix to be learned. The ideal regularization allows us to develop efficient algorithms to exploit labels. Three applications of the ideal regularization are considered. Firstly, we use the ideal regularization to incorporate the labels into a standard kernel, making the resulting kernel more appropriate for learning tasks. Next, we employ the ideal regularization to learn a data-dependent kernel matrix from an initial kernel matrix (which contains prior similarity information, geometric structures, and labels of the data). Finally, we incorporate the ideal regularization to some state-of-the-art kernel learning problems. With this regularization, these learning problems can be formulated as simpler ones which permit more efficient solvers. Empirical results show that the ideal regularization exploits the labels effectively and efficiently.

[1]  Ivor W. Tsang,et al.  A Family of Simple Non-Parametric Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[2]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[3]  M. Omair Ahmad,et al.  Optimizing the kernel in the empirical feature space , 2005, IEEE Transactions on Neural Networks.

[4]  Jian-Huang Lai,et al.  Nonlinear nonnegative matrix factorization based on Mercer kernel construction , 2011, Pattern Recognit..

[5]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[6]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[7]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[8]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[9]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[10]  Ivor W. Tsang,et al.  Learning with Idealized Kernels , 2003, ICML.

[11]  Rong Jin,et al.  Learning nonparametric kernel matrices from pairwise constraints , 2007, ICML '07.

[12]  Inderjit S. Dhillon,et al.  Metric and Kernel Learning Using a Linear Transformation , 2009, J. Mach. Learn. Res..

[13]  Hong Chang,et al.  Learning the kernel matrix by maximizing a KFD-based class separability criterion , 2007, Pattern Recognit..

[14]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[15]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[16]  Michael I. Jordan,et al.  Predictive low-rank decomposition for kernel methods , 2005, ICML.

[17]  Johan Löfberg,et al.  YALMIP : a toolbox for modeling and optimization in MATLAB , 2004 .

[18]  Xiaobo Zhou,et al.  Incremental Kernel Ridge Regression for the Prediction of Soft Tissue Deformations , 2012, MICCAI.

[19]  Gene H. Golub,et al.  Matrix computations , 1983 .

[20]  Zoubin Ghahramani,et al.  Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning , 2004, NIPS.

[21]  Le Song,et al.  Colored Maximum Variance Unfolding , 2007, NIPS.

[22]  Rong Jin,et al.  Learning from Noisy Side Information by Generalized Maximum Entropy Model , 2010, ICML.

[23]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[24]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[25]  Inderjit S. Dhillon,et al.  Geometry-aware metric learning , 2009, ICML '09.

[26]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[27]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.