On Tree-based Methods for Similarity Learning

In many situations, the choice of an adequate similarity measure or metric on the feature space dramatically determines the performance of machine learning methods. Building automatically such measures is the specific purpose of metric/similarity learning. In Vogel et al. (2018), similarity learning is formulated as a pairwise bipartite ranking problem: ideally, the larger the probability that two observations in the feature space belong to the same class (or share the same label), the higher the similarity measure between them. From this perspective, the ROC curve is an appropriate performance criterion and it is the goal of this article to extend recursive tree-based ROC optimization techniques in order to propose efficient similarity learning algorithms. The validity of such iterative partitioning procedures in the pairwise setting is established by means of results pertaining to the theory of U-processes and from a practical angle, it is discussed at length how to implement them by means of splitting rules specifically tailored to the similarity learning task. Beyond these theoretical/methodological contributions, numerical experiments are displayed and provide strong empirical evidence of the performance of the algorithmic approaches we propose.

[1]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[2]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[3]  Kristin Branson,et al.  Sample Complexity of Learning Mahalanobis Distance Metrics , 2015, NIPS.

[4]  Marc Sebban,et al.  Metric Learning , 2015, Metric Learning.

[5]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[6]  Stéphan Clémençon,et al.  A Probabilistic Theory of Supervised Similarity Learning for Pointwise ROC Curve Optimization , 2018, ICML.

[7]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[8]  Stéphan Clémençon,et al.  Adaptive partitioning schemes for bipartite ranking , 2011, Machine Learning.

[9]  Lalit Jain,et al.  Learning Low-Dimensional Metrics , 2017, NIPS.

[10]  Alan J. Lee,et al.  U-Statistics: Theory and Practice , 1990 .

[11]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[12]  Yinghuan Shi,et al.  Cross-Modal Metric Learning for AUC Optimization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Rong Jin,et al.  Regularized Distance Metric Learning: Theory and Algorithm , 2009, NIPS.

[14]  Arun Ross,et al.  An introduction to biometric recognition , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Stéphan Clémençon,et al.  Tree-Based Ranking Methods , 2009, IEEE Transactions on Information Theory.

[16]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[17]  G. Lugosi,et al.  Ranking and empirical minimization of U-statistics , 2006, math/0603123.

[18]  Boris Vrdoljak,et al.  Statistical hierarchical clustering algorithm for outlier detection in evolving data streams , 2020, Mach. Learn..

[19]  Gert R. G. Lanckriet,et al.  Metric Learning to Rank , 2010, ICML.

[20]  Sharath Pankanti,et al.  BIOMETRIC IDENTIFICATION , 2000 .

[21]  Stéphan Clémençon,et al.  Ranking forests , 2013, J. Mach. Learn. Res..

[22]  Qiong Cao,et al.  Generalization bounds for metric and similarity learning , 2012, Machine Learning.

[23]  Amaury Habrard,et al.  Robustness and generalization for metric learning , 2012, Neurocomputing.