Low-Rank Sinkhorn Factorization

Several recent applications of optimal transport (OT) theory to machine learning have relied on regularization, notably entropy and the Sinkhorn algorithm. Because matrix-vector products are pervasive in the Sinkhorn algorithm, several works have proposed to approximate kernel matrices appearing in its iterations using low-rank factors. Another route lies instead in imposing low-nonnegative rank constraints on the feasible set of couplings considered in OT problems, with no approximations on cost nor kernel matrices. This route was first explored by Forrow et al. (2018), who proposed an algorithm tailored for the squared Euclidean ground cost, using a proxy objective that can be solved through the machinery of regularized 2-Wasserstein barycenters. Building on this, we introduce in this work a generic approach that aims at solving, in full generality, the OT problem under low-nonnegative rank constraints with arbitrary costs. Our algorithm relies on an explicit factorization of lowrank couplings as a product of sub-coupling factors linked by a common marginal; similar to an NMF approach, we alternatively updates these factors. We prove the non-asymptotic stationary convergence of this algorithm and illustrate its efficiency on benchmark experiments.

[1]  Dirk A. Lorenz,et al.  Entropic regularization of continuous optimal transport problems , 2019, 1906.01333.

[2]  Michael I. Jordan,et al.  On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms , 2019, ICML.

[3]  Jean-Luc Starck,et al.  Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning , 2017, SIAM J. Imaging Sci..

[4]  Jason Altschuler,et al.  Polynomial-time algorithms for Multimarginal Optimal Transport problems with structure , 2020, ArXiv.

[5]  Bernhard Schmitzer,et al.  Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems , 2016, SIAM J. Sci. Comput..

[6]  Joel E. Cohen,et al.  Nonnegative ranks, decompositions, and factorizations of nonnegative matrices , 1993 .

[7]  Mohamed-Jalal Fadili,et al.  Wasserstein Control of Mirror Langevin Monte Carlo , 2020, COLT.

[8]  R. Dykstra An Algorithm for Restricted Least Squares Regression , 1983 .

[9]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[10]  David P. Woodruff,et al.  Sublinear Time Low-Rank Approximation of Distance Matrices , 2018, NeurIPS.

[11]  P. Rigollet,et al.  Optimal-Transport Analysis of Single-Cell Gene Expression Identifies Developmental Trajectories in Reprogramming , 2019, Cell.

[12]  Jonathan Weed,et al.  Statistical Optimal Transport via Factored Couplings , 2018, AISTATS.

[13]  Alessandro Rudi,et al.  Massively scalable Sinkhorn distances via the Nyström method , 2018, NeurIPS.

[14]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[15]  Saradha Venkatachalapathy,et al.  Predicting cell lineages using autoencoders and optimal transport , 2020, PLoS Comput. Biol..

[16]  David P. Woodruff,et al.  Sample-Optimal Low-Rank Approximation of Distance Matrices , 2019, COLT.

[17]  Alexander Gasnikov,et al.  Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn's Algorithm , 2018, ICML.

[18]  A. Guillin,et al.  On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.

[19]  Lénaïc Chizat,et al.  Faster Wasserstein Distance Estimation with the Sinkhorn Divergence , 2020, NeurIPS.

[20]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[21]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[22]  Marc Teboulle,et al.  A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications , 2017, Math. Oper. Res..

[23]  Bertrand Thirion,et al.  Multi-subject MEG/EEG source imaging with sparse multi-task regression , 2019, NeuroImage.

[24]  J. Lorenz,et al.  On the scaling of multidimensional matrices , 1989 .

[25]  Marco Cuturi,et al.  Linear Time Sinkhorn Divergences using Positive Features , 2020, NeurIPS.

[26]  Sewoong Oh,et al.  Optimal transport mapping via input convex neural networks , 2019, ICML.

[27]  Nicolas Papadakis,et al.  Regularized Optimal Transport and the Rot Mover's Distance , 2016, J. Mach. Learn. Res..

[28]  Joanna Wardlaw,et al.  Optimal Mass Transport with Lagrangian Workflow Reveals Advective and Diffusion Driven Solute Transport in the Glymphatic System , 2020, Scientific Reports.

[29]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[30]  Guillaume Carlier,et al.  Barycenters in the Wasserstein Space , 2011, SIAM J. Math. Anal..

[31]  Gabriel Peyré,et al.  Sample Complexity of Sinkhorn Divergences , 2018, AISTATS.

[32]  Elwood S. Buffa,et al.  Graph Theory with Applications , 1977 .

[33]  Yurii Nesterov,et al.  Relatively Smooth Convex Optimization by First-Order Methods, and Applications , 2016, SIAM J. Optim..

[34]  Evgeny Burnaev,et al.  Continuous Wasserstein-2 Barycenter Estimation without Minimax Optimization , 2021, ICLR.

[35]  Gabriel Peyré,et al.  Iterative Bregman Projections for Regularized Transportation Problems , 2014, SIAM J. Sci. Comput..

[36]  ZhangHongchao,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2016 .

[37]  Hongyuan Zha,et al.  A Fast Proximal Point Method for Computing Exact Wasserstein Distance , 2018, UAI.

[38]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[39]  Heinz H. Bauschke,et al.  Dykstras algorithm with bregman projections: A convergence proof , 2000 .

[40]  David Coeurjolly,et al.  Ground Metric Learning on Graphs , 2019, ArXiv.

[41]  David van Dijk,et al.  TrajectoryNet: A Dynamic Optimal Transport Network for Modeling Cellular Dynamics , 2020, ICML.