论文信息 - Differentiable Ranking and Sorting using Optimal Transport

Differentiable Ranking and Sorting using Optimal Transport

Sorting an array is a fundamental routine in machine learning, one that is used to compute rank-based statistics, cumulative distribution functions (CDFs), quantiles, or to select closest neighbors and labels. The sorting function is however piece-wise constant (the sorting permutation of a vector does not change if the entries of that vector are infinitesimally perturbed) and therefore has no gradient information to back-propagate. We propose a framework to sort elements that is algorithmically differentiable. We leverage the fact that sorting can be seen as a particular instance of the optimal transport (OT) problem on $\mathbb{R}$, from input values to a predefined array of sorted values (e.g. $1,2,\dots,n$ if the input array has $n$ elements). Building upon this link , we propose generalized CDFs and quantile operators by varying the size and weights of the target presorted array. Because this amounts to using the so-called Kantorovich formulation of OT, we call these quantities K-sorts, K-CDFs and K-quantiles. We recover differentiable algorithms by adding to the OT problem an entropic regularization, and approximate it using a few Sinkhorn iterations. We call these operators S-sorts, S-CDFs and S-quantiles, and use them in various learning settings: we benchmark them against the recently proposed neuralsort [Grover et al. 2019], propose applications to quantile regression and introduce differentiable formulations of the top-k accuracy that deliver state-of-the art performance.

Jean-Philippe Vert | Marco Cuturi | Olivier Teboul

[1] Carlos Eduardo Scheidegger,et al. Certifying and Removing Disparate Impact , 2014, KDD.

[2] Tao Qin,et al. A general approximation framework for direct optimization of information retrieval measures , 2010, Information Retrieval.

[3] J. Lorenz,et al. On the scaling of multidimensional matrices , 1989 .

[4] Yann Brenier,et al. Rearrangement, convection, convexity and entropy , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[5] Arnaud Doucet,et al. Fast Computation of Wasserstein Barycenters , 2013, ICML.

[6] Julien Rabin,et al. Wasserstein Barycenter and Its Application to Texture Mixing , 2011, SSVM.

[7] Filippo Santambrogio,et al. Optimal Transport for Applied Mathematicians , 2015 .

[8] G. Lugosi,et al. Regularization, sparse recovery, and median-of-means tournaments , 2017, Bernoulli.

[9] Tommi S. Jaakkola,et al. Learning Population-Level Diffusions with Generative RNNs , 2016, ICML.

[10] Matthieu Lerasle,et al. ROBUST MACHINE LEARNING BY MEDIAN-OF-MEANS: THEORY AND PRACTICE , 2019 .

[11] Marco Cuturi,et al. Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[12] Stephen E. Robertson,et al. SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[13] Jaana Kekäläinen,et al. Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[14] Robert E. Tarjan,et al. Dynamic trees as search trees via euler tours, applied to the network simplex algorithm , 1997, Math. Program..

[15] Gabriel Peyré,et al. Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[16] John N. Tsitsiklis,et al. Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[17] Julie Delon,et al. Local Matching Indicators for Transport Problems with Concave Costs , 2011, SIAM J. Discret. Math..

[18] Andrew Zisserman,et al. Smooth Loss Functions for Deep Top-k Classification , 2018, ICLR.

[19] Tian Xia,et al. Direct 0-1 Loss Minimization and Margin Maximization with Boosting , 2013, NIPS.

[20] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[21] Silvia Chiappa,et al. Wasserstein Fair Classification , 2019, UAI.

[22] A Wilson,et al. Use of entropy maximizing models in theory of trip distribution, mode split and route split , 1969 .

[23] Qiang Wu,et al. Learning to Rank Using an Ensemble of Lambda-Gradient Models , 2010, Yahoo! Learning to Rank Challenge.

[24] Jean-Philippe Vert,et al. Supervised Quantile Normalisation , 2017, ArXiv.

[25] Gabriel Peyré,et al. Wasserstein barycentric coordinates , 2016, ACM Trans. Graph..

[26] Scott W. Linderman,et al. Learning Latent Permutations with Gumbel-Sinkhorn Networks , 2018, ICLR.

[27] Scott Sanner,et al. Algorithms for Direct 0-1 Loss Optimization in Binary Classification , 2013, ICML.

[28] A. Galichon,et al. Matching with Trade-Offs: Revealed Preferences Over Competing Characteristics , 2009, 2102.12811.

[29] Stefano Ermon,et al. Stochastic Optimization of Sorting Networks via Continuous Relaxations , 2019, ICLR.

[30] Alan L. Yuille,et al. The invisible hand algorithm: Solving the assignment problem with statistical physics , 1994, Neural Networks.

[31] Ryan P. Adams,et al. Ranking via Sinkhorn Propagation , 2011, ArXiv.

[32] Stephen P. Boyd,et al. Accuracy at the Top , 2012, NIPS.

[33] Julien Rabin,et al. Sliced and Radon Wasserstein Barycenters of Measures , 2014, Journal of Mathematical Imaging and Vision.

[34] Yaniv Romano,et al. Conformalized Quantile Regression , 2019, NeurIPS.

[35] R. Koenker,et al. An interior point algorithm for nonlinear quantile regression , 1996 .

[36] Bernhard Schmitzer,et al. Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems , 2016, SIAM J. Sci. Comput..

[37] Nicolas Courty,et al. Wasserstein discriminant analysis , 2016, Machine Learning.

[38] I. Barrodale,et al. An Improved Algorithm for Discrete $l_1 $ Linear Approximation , 1973 .

[39] Yang Zou,et al. Sliced Wasserstein Kernels for Probability Distributions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Thomas Hofmann,et al. Learning to Rank with Nonsmooth Cost Functions , 2006, NIPS.

[41] F. Santambrogio. Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling , 2015 .

[42] P. Rousseeuw. Least Median of Squares Regression , 1984 .