Robustness and generalization for metric learning

Abstract Metric learning has attracted a lot of interest over the last decade, but the generalization ability of such methods has not been thoroughly studied. In this paper, we introduce an adaptation of the notion of algorithmic robustness (previously introduced by Xu and Mannor) that can be used to derive generalization bounds for metric learning. We further show that a weak notion of robustness is in fact a necessary and sufficient condition for a metric learning algorithm to generalize. To illustrate the applicability of the proposed framework, we derive generalization results for a large family of existing metric learning algorithms, including some sparse formulations that are not covered by the previous results.

[1]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[2]  Shie Mannor,et al.  Sparse Algorithms Are Not Stable , 2012 .

[3]  Jude W. Shavlik,et al.  Mirror Descent for Metric Learning: A Unified Approach , 2012, ECML/PKDD.

[4]  Chi-Kwong Li,et al.  Isometries for the vector (p,q) norm and the induced (p,q) norm , 1995 .

[5]  Daphna Weinshall,et al.  Online Learning in The Manifold of Low-Rank Matrices , 2010, NIPS.

[6]  Inderjit S. Dhillon,et al.  Online Metric Learning and Fast Similarity Search , 2008, NIPS.

[7]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[8]  Trevor Darrell,et al.  What you saw is not what you get: Domain adaptation using asymmetric kernel transforms , 2011, CVPR 2011.

[9]  Ali Mustafa Qamar Generalized Cosine and Similarity Metrics: A Supervised Learning Approach based on Nearest Neighbors. (Mesures de similarité et cosinus généralisé : une approche d'apprentissage supervisé fondée sur les k plus proches voisins) , 2010 .

[10]  Yuan Shi,et al.  Sparse Compositional Metric Learning , 2014, AAAI.

[11]  Bao Qi Feng,et al.  Equivalence constants for certain matrix norms , 2003 .

[12]  Yiming Ying,et al.  Guaranteed Classification via Regularized Similarity Learning , 2013, Neural Computation.

[13]  Bo Geng,et al.  DAML: Domain Adaptation Metric Learning , 2011, IEEE Transactions on Image Processing.

[14]  Alexandros Kalousis,et al.  Parametric Local Metric Learning for Nearest Neighbor Classification , 2012, NIPS.

[15]  Dacheng Tao,et al.  Constrained Empirical Risk Minimization Framework for Distance Metric Learning , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[17]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[18]  Dacheng Tao,et al.  Learning a Distance Metric by Empirical Loss Minimization , 2011, IJCAI.

[19]  Glenn Fung,et al.  Learning sparse metrics via linear programming , 2006, KDD '06.

[20]  A. Kolmogorov,et al.  Entropy and "-capacity of sets in func-tional spaces , 1961 .

[21]  Rong Jin,et al.  Regularized Distance Metric Learning: Theory and Algorithm , 2009, NIPS.

[22]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[23]  Marc Sebban,et al.  Similarity Learning for Provably Accurate Sparse Linear Classification , 2012, ICML.

[24]  Roni Khardon,et al.  Generalization Bounds for Online Learning Algorithms with Pairwise Loss Functions , 2012, COLT.

[25]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[26]  Shie Mannor,et al.  Robustness and generalization , 2010, Machine Learning.

[27]  Marc Sebban,et al.  A Survey on Metric Learning for Feature Vectors and Structured Data , 2013, ArXiv.

[28]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[29]  Qiong Cao,et al.  Generalization bounds for metric and similarity learning , 2012, Machine Learning.

[30]  Shie Mannor,et al.  Sparse algorithms are not stable: A no-free-lunch theorem , 2008, Allerton 2008.

[31]  Tat-Seng Chua,et al.  An efficient sparse metric learning in high-dimensional space via l1-penalized log-determinant regularization , 2009, ICML '09.

[32]  Kaizhu Huang,et al.  Sparse Metric Learning via Smooth Optimization , 2009, NIPS.

[33]  Shie Mannor,et al.  Sparse Algorithms Are Not Stable: A No-Free-Lunch Theorem , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Samy Bengio,et al.  An Online Algorithm for Large Scale Image Similarity Learning , 2009, NIPS.

[35]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[36]  Gert R. G. Lanckriet,et al.  Metric Learning to Rank , 2010, ICML.

[37]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[38]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[39]  Jean-Philippe Vial,et al.  Robust Optimization , 2021, ICORES.

[40]  Prateek Jain,et al.  Similarity-based Learning via Data Driven Embeddings , 2011, NIPS.