Kernel Learning by Unconstrained Optimization

We study the problem of learning a kernel matrix from an apriori kernel and training data. An unconstrained convex optimization formulation is proposed, with an arbitrary convex smooth loss function on kernel entries and a LogDet divergence for regularization. Since the number of variables is of order O(n), standard Newton and quasi-Newton methods are too time-consuming. An operator form Hessian is used to develop an O(n) trust-region inexact Newton method, where the Newton direction is computed using several conjugate gradient steps on the Hessian operator equation. On the uspst dataset, our algorithm can handle 2 million optimization variables within one hour. Experiments are shown for both linear (Mahalanobis) metric learning and for kernel learning. The convergence rate, speed and performance of several loss functions and algorithms are discussed.

[1]  Heinz H. Bauschke,et al.  Legendre functions and the method of random Bregman projections , 1997 .

[2]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[3]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[4]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[5]  James Renegar,et al.  A mathematical view of interior-point methods in convex optimization , 2001, MPS-SIAM series on optimization.

[6]  Risi Kondor,et al.  Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[7]  Zhihua Zhang,et al.  Learning Metrics via Discriminant Kernels and Multidimensional Scaling: Toward Expected Euclidean Representation , 2003, ICML.

[8]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[9]  Gunnar Rätsch,et al.  Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection , 2004, J. Mach. Learn. Res..

[10]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[11]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[12]  Inderjit S. Dhillon,et al.  Learning low-rank kernel matrices , 2006, ICML.

[13]  Rong Jin,et al.  Learning nonparametric kernel matrices from pairwise constraints , 2007, ICML '07.

[14]  Jian Yang,et al.  A transductive framework of distance metric learning by spectral dimensionality reduction , 2007, ICML '07.

[15]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[16]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[17]  Kilian Q. Weinberger,et al.  Fast solvers and efficient implementations for distance metric learning , 2008, ICML '08.