Mirror Descent for Metric Learning: A Unified Approach

Most metric learning methods are characterized by diverse loss functions and projection methods, which naturally begs the question: is there a wider framework that can generalize many of these methods? In addition, ever persistent issues are those of scalability to large data sets and the question of kernelizability. We propose a unified approach to Mahalanobis metric learning: an online regularized metric learning algorithm based on the ideas of composite objective mirror descent (comid). The metric learning problem is formulated as a regularized positive semi-definite matrix learning problem, whose update rules can be derived using the comid framework. This approach aims to be scalable, kernelizable, and admissible to many different types of Bregman and loss functions, which allows for the tailoring of several different classes of algorithms. The most novel contribution is the use of the trace norm, which yields a sparse metric in its eigenspectrum, thus simultaneously performing feature selection along with metric learning.

[1]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[2]  Lei Wang,et al.  Positive Semidefinite Metric Learning with Boosting , 2009, NIPS.

[3]  Ambuj Tewari,et al.  Applications of strong convexity--strong smoothness duality to learning with matrices , 2009, ArXiv.

[4]  Trevor F. Cox,et al.  Metric multidimensional scaling , 2000 .

[5]  Boonserm Kijsirikul,et al.  On Kernelization of Supervised Mahalanobis Distance Learners , 2008, ArXiv.

[6]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[7]  S. Eisenstat,et al.  A Stable and Efficient Algorithm for the Rank-One Modification of the Symmetric Eigenproblem , 1994, SIAM J. Matrix Anal. Appl..

[8]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[9]  Inderjit S. Dhillon,et al.  Online Metric Learning and Fast Similarity Search , 2008, NIPS.

[10]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[11]  Andrzej Stachurski,et al.  Parallel Optimization: Theory, Algorithms and Applications , 2000, Parallel Distributed Comput. Pract..

[12]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[13]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[14]  Inderjit S. Dhillon,et al.  Low-Rank Kernel Learning with Bregman Matrix Divergences , 2009, J. Mach. Learn. Res..

[15]  Cordelia Schmid,et al.  Is that you? Metric learning approaches for face identification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[17]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[18]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[19]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Adrian Lewis,et al.  The mathematics of eigenvalue optimization , 2003, Math. Program..

[21]  R. M. Redheffer,et al.  SOME APPLICATIONS OF MONOTONE OPERATORS IN MARKOV PROCESSES , 1965 .

[22]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[23]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[24]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[25]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[26]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  A. Lewis The Convex Analysis of Unitarily Invariant Matrix Functions , 1995 .

[28]  S. Kakade,et al.  On the duality of strong convexity and strong smoothness : Learning applications and matrix regularization , 2009 .

[29]  G. Watson Characterization of the subdifferential of some matrix norms , 1992 .

[30]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.

[31]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[32]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[33]  Anne Lohrli Chapman and Hall , 1985 .