Low-Rank Kernel Learning with Bregman Matrix Divergences

In this paper, we study low-rank matrix nearness problems, with a focus on learning low-rank positive semidefinite (kernel) matrices for machine learning applications. We propose efficient algorithms that scale linearly in the number of data points and quadratically in the rank of the input matrix. Existing algorithms for learning kernel matrices often scale poorly, with running times that are cubic in the number of data points. We employ Bregman matrix divergences as the measures of nearness---these divergences are natural for learning low-rank kernels since they preserve rank as well as positive semidefiniteness. Special cases of our framework yield faster algorithms for various existing learning problems, and experimental results demonstrate that our algorithms can effectively learn both low-rank and full-rank kernel matrices.

[1]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[2]  D. Anderson,et al.  Algorithms for minimization without derivatives , 1974 .

[3]  R. Brent Table errata: Algorithms for minimization without derivatives (Prentice-Hall, Englewood Cliffs, N. J., 1973) , 1975 .

[4]  Gene H. Golub,et al.  Matrix computations , 1983 .

[5]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[6]  Leslie Greengard,et al.  A fast algorithm for particle simulations , 1987 .

[7]  Roger Fletcher,et al.  A New Variational Result for Quasi-Newton Formulae , 1991, SIAM J. Optim..

[8]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[9]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[10]  S. Eisenstat,et al.  A Stable and Efficient Algorithm for the Rank-One Modification of the Symmetric Eigenproblem , 1994, SIAM J. Matrix Anal. Appl..

[11]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[12]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[13]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[14]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[15]  N. Higham Computing the nearest correlation matrix—a problem from finance , 2002 .

[16]  Ivor W. Tsang,et al.  Learning with Idealized Kernels , 2003, ICML.

[17]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[18]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[19]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[20]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[21]  Kilian Q. Weinberger,et al.  Learning a kernel matrix for nonlinear dimensionality reduction , 2004, ICML.

[22]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[23]  Gunnar Rätsch,et al.  Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection , 2004, J. Mach. Learn. Res..

[24]  Michael I. Jordan,et al.  Predictive low-rank decomposition for kernel methods , 2005, ICML.

[25]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[26]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[27]  Kevin J. Lang Fixing two weaknesses of the Spectral Method , 2005, NIPS.

[28]  Kilian Q. Weinberger,et al.  Graph Laplacian Regularization for Large-Scale Semidefinite Programming , 2006, NIPS.

[29]  Manfred K. Warmuth,et al.  Randomized PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension , 2006, NIPS.

[30]  Inderjit S. Dhillon,et al.  Learning low-rank kernel matrices , 2006, ICML.

[31]  Lorenzo Torresani,et al.  Large Margin Component Analysis , 2006, NIPS.

[32]  Thierry Paul,et al.  Quantum computation and quantum information , 2007, Mathematical Structures in Computer Science.

[33]  Inderjit S. Dhillon,et al.  Matrix Nearness Problems with Bregman Divergences , 2007, SIAM J. Matrix Anal. Appl..

[34]  Stefanie Jegelka,et al.  Scalable Semidefinite Programming using Convex Perturbations , 2007 .

[35]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[36]  Gene H. Golub,et al.  Some modified matrix eigenvalue problems , 1973, Milestones in Matrix Computation.

[37]  John C. Platt,et al.  Fast Low-Rank Semidefinite Programming for Embedding and Clustering , 2007, AISTATS.

[38]  Prateek Jain,et al.  Fast image search for learned metrics , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.