An efficient method for clustered multi-metric learning

Abstract Distance metric learning, which aims at finding a distance metric that separates examples of one class from examples of the other classes, is the key to the success of many machine learning tasks. Although there has been an increasing interest in this field, learning a global distance metric is insufficient to obtain satisfactory results when dealing with heterogeneously distributed data. A simple solution to tackle this kind of data is based on kernel embedding methods. However, it quickly becomes computationally intractable as the number of examples increases. In this paper, we propose an efficient method that learns multiple local distance metrics instead of a single global one. More specifically, the training examples are divided into several disjoint clusters, in each of which a distance metric is trained to separate the data locally. Additionally, a global regularization is introduced to preserve some common properties of different clusters in the learned metric space. By learning multiple distance metrics jointly within a single unified optimization framework, our method consistently outperforms single distance metric learning methods, while being more efficient than other state-of-the-art multi-metric learning methods.

[1]  Chunyan Miao,et al.  Online Multitask Relative Similarity Learning , 2017, IJCAI.

[2]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[3]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[4]  Jitendra Malik,et al.  Image Retrieval and Classification Using Local Distance Functions , 2006, NIPS.

[5]  Jianping Fan,et al.  Hierarchical learning of multi-task sparse metrics for large-scale image classification , 2017, Pattern Recognit..

[6]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Inderjit S. Dhillon,et al.  Metric and Kernel Learning Using a Linear Transformation , 2009, J. Mach. Learn. Res..

[9]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[11]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[12]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[13]  Bernard De Baets,et al.  Supervised distance metric learning through maximization of the Jeffrey divergence , 2017, Pattern Recognit..

[14]  Nan Jiang,et al.  Online similarity learning for visual tracking , 2016, Inf. Sci..

[15]  Jitendra Malik,et al.  Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Dimitrios Gunopulos,et al.  An Adaptive Metric Machine for Pattern Classification , 2000, NIPS.

[17]  Ohad Shamir,et al.  Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.

[18]  Massimiliano Pontil,et al.  Large Margin Local Metric Learning , 2014, ECCV.

[19]  Ronald A. Cole,et al.  Spoken Letter Recognition , 1990, HLT.

[20]  Peng Li,et al.  Distance Metric Learning with Eigenvalue Optimization , 2012, J. Mach. Learn. Res..

[21]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[22]  David Zhang,et al.  Joint distance and similarity measure learning based on triplet-based constraints , 2017, Inf. Sci..

[23]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[24]  Rama Chellappa,et al.  Hierarchical Multimodal Metric Learning for Multimodal Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Gene H. Golub,et al.  Matrix computations , 1983 .

[26]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[27]  Deva Ramanan,et al.  Local Distance Functions: A Taxonomy, New Algorithms, and an Evaluation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[29]  N. Higham COMPUTING A NEAREST SYMMETRIC POSITIVE SEMIDEFINITE MATRIX , 1988 .

[30]  Yuan Shi,et al.  Sparse Compositional Metric Learning , 2014, AAAI.

[31]  Qinghua Hu,et al.  Efficient multi-modal geometric mean metric learning , 2018, Pattern Recognit..

[32]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[33]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[34]  Kaizhu Huang,et al.  Geometry preserving multi-task metric learning , 2012, Machine Learning.

[35]  Dacheng Tao,et al.  Local discriminative distance metrics ensemble learning , 2013, Pattern Recognit..

[36]  Jiwen Lu,et al.  Sharable and Individual Multi-View Metric Learning , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[38]  J. Hiriart-Urruty,et al.  Fundamentals of Convex Analysis , 2004 .

[39]  Alexandros Kalousis,et al.  Parametric Local Metric Learning for Nearest Neighbor Classification , 2012, NIPS.

[40]  Kilian Q. Weinberger,et al.  Large Margin Multi-Task Metric Learning , 2010, NIPS.

[41]  R. Sargent,et al.  On the convergence of sequential minimization algorithms , 1973 .

[42]  Gert R. G. Lanckriet,et al.  Efficient Learning of Mahalanobis Metrics for Ranking , 2014, ICML.

[43]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[44]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[45]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[46]  Bernard De Baets,et al.  Large-scale distance metric learning for k-nearest neighbors regression , 2016, Neurocomputing.

[47]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  María José del Jesús,et al.  KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining , 2017, Int. J. Comput. Intell. Syst..

[49]  Jiawei Han,et al.  Clustered Support Vector Machines , 2013, AISTATS.

[50]  Lorenzo Torresani,et al.  Large Margin Component Analysis , 2006, NIPS.