MIRROR DESCENT FOR METRIC LEARNING

Introduction. The concepts of similarity, distance or metric are central to a many well-known and popular algorithms such as k-means clustering [13], the nearest neighbor algorithm [6], locallylinear embedding [14], multi-dimensional scaling [7] and semi-supervised clustering [18]. While there are many approaches to metric learning, a large body of work is focussed on learning the Mahalanobis distance, which amounts to learning a transformation and computing the distance in the transformed space. Among these approaches are the work of Xing et al., [20], relevant components analysis [17], the large-margin nearest neighbor (LMNN) algorithm [19], Globerson and Roweis’ method of collapsing classes [10], information-theoretic metric learning (ITML) [8] and BoostMetric [16]. Aside from the batch approaches above, online algorithms such as the online ITML algorithm [8] and the pseudo-metric online learning algorithm (POLA) [15] have proven successful. All these approaches are characterized by diverse loss functions and projection methods, which naturally begs the question: is there a wider framework that can generalize many of these existing methods? In addition, ever persistent issues are t hose of scalability to large data sets and the question of kernelizability. Thus, we propose a unifi ed approach to Mahalanobis metric learning: an online regularized metric learningalgorithm based on the ideas of composite objective mirror descent(COMID) [9]. We propose to formulate the metric learning problem as a regularized positive semi-definite matrix learningproblem, whose update rules can be derived using the COMID framework. This approach aims to be scalable, kernelizable, and admissible to many different types of Bregman and loss functions which allows for the tailoring of several different classes of algorithms. The most novel contribution is the use of the trace norm, which yields a sparse metric in its eigenspectrum, thus simultaneously performing feature selection along with me tric learning. Unifying Framework. The goal is to incrementally learn a squared Mahalanobis metric d(x, z) 2 = (xt −zt) ′ M(xt −zt), given training data of the form (xt, zt, yt) T=1 , where labels yt = ±1 indicate

[1]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[2]  Lei Wang,et al.  Positive Semidefinite Metric Learning with Boosting , 2009, NIPS.

[3]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Boonserm Kijsirikul,et al.  On Kernelization of Supervised Mahalanobis Distance Learners , 2008, ArXiv.

[6]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[7]  Misha Pavel,et al.  Adjustment Learning and Relevant Component Analysis , 2002, ECCV.

[8]  J. Bunch,et al.  Rank-one modification of the symmetric eigenproblem , 1978 .

[9]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[10]  A. Lewis The Convex Analysis of Unitarily Invariant Matrix Functions , 1995 .

[11]  Adrian Lewis,et al.  The mathematics of eigenvalue optimization , 2003, Math. Program..

[12]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.

[13]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[14]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[15]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[16]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[17]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[18]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[19]  G. Watson Characterization of the subdifferential of some matrix norms , 1992 .

[20]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[21]  S. Eisenstat,et al.  A Stable and Efficient Algorithm for the Rank-One Modification of the Symmetric Eigenproblem , 1994, SIAM J. Matrix Anal. Appl..

[22]  Inderjit S. Dhillon,et al.  Online Metric Learning and Fast Similarity Search , 2008, NIPS.

[23]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[24]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[25]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[26]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[27]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[28]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[29]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[30]  Inderjit S. Dhillon,et al.  Low-Rank Kernel Learning with Bregman Matrix Divergences , 2009, J. Mach. Learn. Res..

[31]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[32]  Anne Lohrli Chapman and Hall , 1985 .

[33]  Cordelia Schmid,et al.  Is that you? Metric learning approaches for face identification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[34]  Y. Censor,et al.  Parallel Optimization: Theory, Algorithms, and Applications , 1997 .