A Probabilistic Theory of Supervised Similarity Learning for Pointwise ROC Curve Optimization

The performance of many machine learning techniques depends on the choice of an appropriate similarity or distance measure on the input space. Similarity learning (or metric learning) aims at building such a measure from training data so that observations with the same (resp. different) label are as close (resp. far) as possible. In this paper, similarity learning is investigated from the perspective of pairwise bipartite ranking, where the goal is to rank the elements of a database by decreasing order of the probability that they share the same label with some query data point, based on the similarity scores. A natural performance criterion in this setting is pointwise ROC optimization: maximize the true positive rate under a fixed false positive rate. We study this novel perspective on similarity learning through a rigorous probabilistic framework. The empirical version of the problem gives rise to a constrained optimization formulation involving U-statistics, for which we derive universal learning rates as well as faster rates under a noise assumption on the data distribution. We also address the large-scale setting by analyzing the effect of sampling-based approximations. Our theoretical results are supported by illustrative numerical experiments.

[1]  Cordelia Schmid,et al.  Is that you? Metric learning approaches for face identification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[3]  R. Serfling Approximation Theorems of Mathematical Statistics , 1980 .

[4]  Samy Bengio,et al.  Large Scale Online Learning of Image Similarity Through Ranking , 2009, J. Mach. Learn. Res..

[5]  N. Vayatis,et al.  Overlaying Classifiers: A Practical Approach to Optimal Scoring , 2010 .

[6]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[7]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[8]  Horst Bischof,et al.  Large scale metric learning from equivalence constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  E. Mammen,et al.  Asymptotical minimax recovery of sets with smooth boundaries , 1995 .

[10]  Amaury Habrard,et al.  Robustness and generalization for metric learning , 2012, Neurocomputing.

[11]  Lei Wang,et al.  Positive Semidefinite Metric Learning Using Boosting-like Algorithms , 2011, J. Mach. Learn. Res..

[12]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[13]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[14]  Kristin Branson,et al.  Sample Complexity of Learning Mahalanobis Distance Metrics , 2015, NIPS.

[15]  Robert D. Nowak,et al.  A Neyman-Pearson approach to statistical learning , 2005, IEEE Transactions on Information Theory.

[16]  Yinghuan Shi,et al.  Cross-Modal Metric Learning for AUC Optimization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Rong Jin,et al.  Regularized Distance Metric Learning: Theory and Algorithm , 2009, NIPS.

[18]  Steve Branson,et al.  Similarity metrics for categorization: From monolithic to category specific , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Arun Ross,et al.  An introduction to biometric recognition , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  Terrance E. Boult,et al.  Towards Open World Recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[22]  Sharath Pankanti,et al.  BIOMETRIC IDENTIFICATION , 2000 .

[23]  G. Lugosi,et al.  Ranking and empirical minimization of U-statistics , 2006, math/0603123.

[24]  Felipe Cucker,et al.  Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[25]  Gert R. G. Lanckriet,et al.  Metric Learning to Rank , 2010, ICML.

[26]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[27]  Marc Sebban,et al.  Similarity Learning for Provably Accurate Sparse Linear Classification , 2012, ICML.

[28]  E. Giné,et al.  U-processes indexed by Vapnik-Červonenkis classes of functions with applications to asymptotics and bootstrap of U-statistics with estimated parameters , 1994 .

[29]  M. H. Bretherton,et al.  Statistics in Theory and Practice , 1966 .

[30]  Qiong Cao,et al.  Generalization bounds for metric and similarity learning , 2012, Machine Learning.

[31]  Samy Bengio,et al.  Large Scale Online Learning of Image Similarity through Ranking , 2009, IbPRIA.

[32]  Stéphan Clémençon,et al.  Tree-Based Ranking Methods , 2009, IEEE Transactions on Information Theory.

[33]  Wei Liu,et al.  Constrained Metric Learning Via Distance Gap Maximization , 2010, AAAI.

[34]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[35]  Guninar Blom Some properties of incomplete U-statistics , 1976 .

[36]  Soumendu Sundar Mukherjee,et al.  Weak convergence and empirical processes , 2019 .

[37]  Alan J. Lee,et al.  U-Statistics: Theory and Practice , 1990 .

[38]  Stéphan Clémençon,et al.  Ranking the Best Instances , 2006, J. Mach. Learn. Res..

[39]  Stéphan Clémençon,et al.  Scaling-up Empirical Risk Minimization: Optimization of Incomplete $U$-statistics , 2015, J. Mach. Learn. Res..

[40]  Pengtao Xie,et al.  Large Scale Distributed Distance Metric Learning , 2014, ArXiv.

[41]  Chunyan Miao,et al.  Online multimodal deep similarity learning with application to image retrieval , 2013, ACM Multimedia.

[42]  Arun Ross,et al.  An introduction to biometrics , 2008, ICPR 2008.

[43]  A. W. van der Vaart,et al.  Uniform Central Limit Theorems , 2001 .

[44]  W. Hoeffding A Class of Statistics with Asymptotically Normal Distribution , 1948 .