Projection Robust Wasserstein Distance and Riemannian Optimization

Projection robust Wasserstein (PRW) distance, or Wasserstein projection pursuit (WPP), is a robust variant of the Wasserstein distance. Recent work suggests that this quantity is more robust than the standard Wasserstein distance, in particular when comparing probability measures in high-dimensions. However, it is ruled out for practical application because the optimization model is essentially non-convex and non-smooth which makes the computation intractable. Our contribution in this paper is to revisit the original motivation behind WPP/PRW, but take the hard route of showing that, despite its non-convexity and lack of nonsmoothness, and even despite some hardness results proved by~\citet{Niles-2019-Estimation} in a minimax sense, the original formulation for PRW/WPP \textit{can} be efficiently computed in practice using Riemannian optimization, yielding in relevant cases better behavior than its convex relaxation. More specifically, we provide three simple algorithms with solid theoretical guarantee on their complexity bound (one in the appendix), and demonstrate their effectiveness and efficiency by conducing extensive experiments on synthetic and real data. This paper provides a first step into a computational theory of the PRW distance and provides the links between optimal transport and Riemannian optimization.

[1]  A. Hoffman On approximate solutions of systems of linear inequalities , 1952 .

[2]  M. Sion On general minimax theorems , 1958 .

[3]  R. Dudley The Speed of Mean Glivenko-Cantelli Convergence , 1969 .

[4]  W. Boothby An introduction to differentiable manifolds and Riemannian geometry , 1975 .

[5]  Jean-Philippe Vial,et al.  Strong and Weak Convexity of Sets and Functions , 1983, Math. Oper. Res..

[6]  Katta G. Murty,et al.  Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..

[7]  Wu Li Sharp Lipschitz Constants for Basic Optimal Solutions and Basic Feasible Solutions of Linear Programs , 1994 .

[8]  Uriel G. Rothblum,et al.  Approximations to Solutions to Systems of Linear Inequalities , 1995, SIAM J. Matrix Anal. Appl..

[9]  Diethard Klatte,et al.  Error bounds for solutions of linear equations and inequalities , 1995, Math. Methods Oper. Res..

[10]  Robert E. Tarjan,et al.  Dynamic trees as search trees via euler tours, applied to the network simplex algorithm , 1997, Math. Program..

[11]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[12]  O. P. Ferreira,et al.  Subgradient Algorithm on Riemannian Manifolds , 1998 .

[13]  O. P. Ferreira,et al.  Proximal Point Algorithm On Riemannian Manifolds , 2002 .

[14]  C. Villani Topics in Optimal Transportation , 2003 .

[15]  David W. Jacobs,et al.  Approximate earth mover’s distance in linear time , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  C. Villani Optimal Transport: Old and New , 2008 .

[17]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[18]  R. Bishop,et al.  Manifolds of negative curvature , 1969 .

[19]  Julien Rabin,et al.  Wasserstein Barycenter and Its Application to Texture Mixing , 2011, SSVM.

[20]  M. V. D. Panne,et al.  Displacement Interpolation Using Lagrangian Mass Transport , 2011 .

[21]  Lorenzo Rosasco,et al.  Learning Probability Measures with respect to Optimal Transport Metrics , 2012, NIPS.

[22]  W. Yang,et al.  Optimality conditions for the nonlinear programming problems on Riemannian manifolds , 2012 .

[23]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[24]  A. Guillin,et al.  On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.

[25]  Silvere Bonnabel,et al.  Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.

[26]  Wotao Yin,et al.  A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[27]  Chih-Jen Lin,et al.  Iteration complexity of feasible descent methods for convex optimization , 2014, J. Mach. Learn. Res..

[28]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[29]  Julien Rabin,et al.  Sliced and Radon Wasserstein Barycenters of Measures , 2014, Journal of Mathematical Imaging and Vision.

[30]  Nhat Ho,et al.  Convergence rates of parameter estimation for some weakly identifiable finite mixtures , 2016 .

[31]  Volkan Cevher,et al.  WASP: Scalable Bayes via barycenters of subset posteriors , 2015, AISTATS.

[32]  A. Zaslavski Proximal Point Algorithm , 2016 .

[33]  Suvrit Sra,et al.  First-order Methods for Geodesically Convex Optimization , 2016, COLT.

[34]  Suvrit Sra,et al.  Fast stochastic optimization on Riemannian manifolds , 2016, ArXiv.

[35]  Gabriel Peyré,et al.  Fast Dictionary Learning with a Smoothed Wasserstein Loss , 2016, AISTATS.

[36]  Dinh Q. Phung,et al.  Multilevel Clustering via Wasserstein Means , 2017, ICML.

[37]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[38]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Jefferson G. Melo,et al.  Iteration-Complexity of Gradient, Subgradient and Proximal Point Methods on Riemannian Manifolds , 2016, Journal of Optimization Theory and Applications.

[40]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[41]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[42]  Michael I. Jordan,et al.  Averaging Stochastic Gradient Descent on Riemannian Manifolds , 2018, COLT.

[43]  Espen Bernton,et al.  Langevin Monte Carlo and JKO splitting , 2018, COLT.

[44]  Michael I. Jordan,et al.  Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[45]  Alexander Gasnikov,et al.  Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn's Algorithm , 2018, ICML.

[46]  Hiroyuki Kasai,et al.  Inexact trust-region algorithms on Riemannian manifolds , 2018, NeurIPS.

[47]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[48]  Han Zhang,et al.  Improving GANs Using Optimal Transport , 2018, ICLR.

[49]  Marco Cuturi,et al.  Generalizing Point Embeddings using the Wasserstein Space of Elliptical Distributions , 2018, NeurIPS.

[50]  Tomas Mikolov,et al.  Advances in Pre-Training Distributed Word Representations , 2017, LREC.

[51]  Nicolas Papadakis,et al.  Regularized Optimal Transport and the Rot Mover's Distance , 2016, J. Mach. Learn. Res..

[52]  Gabriel Peyré,et al.  Learning Generative Models with Sinkhorn Divergences , 2017, AISTATS.

[53]  Ya-Xiang Yuan,et al.  Adaptive Quadratically Regularized Newton Method for Riemannian Optimization , 2018, SIAM J. Matrix Anal. Appl..

[54]  Vivien Seguy,et al.  Smooth and Sparse Optimal Transport , 2017, AISTATS.

[55]  Jean-Luc Starck,et al.  Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning , 2017, SIAM J. Imaging Sci..

[56]  Xiaojun Chen,et al.  A New First-Order Algorithmic Framework for Optimization Problems with Orthogonality Constraints , 2018, SIAM J. Optim..

[57]  Michael I. Jordan,et al.  On the Acceleration of the Sinkhorn and Greenkhorn Algorithms for Optimal Transport , 2019, ArXiv.

[58]  Marco Cuturi,et al.  Subspace Robust Wasserstein distances , 2019, ICML.

[59]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[60]  Hiroyuki Kasai,et al.  Riemannian adaptive stochastic gradient algorithms on matrix manifolds , 2019, ICML.

[61]  Tryphon T. Georgiou,et al.  Optimal Transport for Gaussian Mixture Models , 2017, IEEE Access.

[62]  Jonathan Niles-Weed,et al.  Estimation of Wasserstein distances in the Spiked Transport Model , 2019, Bernoulli.

[63]  Bo Jiang,et al.  Structured Quasi-Newton Methods for Optimization with Orthogonality Constraints , 2018, SIAM J. Sci. Comput..

[64]  Gary Bécigneul,et al.  Riemannian Adaptive Optimization Methods , 2018, ICLR.

[65]  F. Bach,et al.  Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance , 2017, Bernoulli.

[66]  R. Bhatia,et al.  On the Bures–Wasserstein distance between positive definite matrices , 2017, Expositiones Mathematicae.

[67]  Anthony Man-Cho So,et al.  Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods , 2018, Math. Program..

[68]  Prateek Jain,et al.  SGD without Replacement: Sharper Rates for General Smooth Convex Functions , 2019, ICML.

[69]  P. Rigollet,et al.  Optimal-Transport Analysis of Single-Cell Gene Expression Identifies Developmental Trajectories in Reprogramming , 2019, Cell.

[70]  Michael I. Jordan,et al.  On the Efficiency of the Sinkhorn and Greenkhorn Algorithms and Their Acceleration for Optimal Transport , 2019 .

[71]  Dmitriy Drusvyatskiy,et al.  Stochastic model-based minimization of weakly convex functions , 2018, SIAM J. Optim..

[72]  S. Guminov,et al.  Accelerated Alternating Minimization, Accelerated Sinkhorn's Algorithm and Accelerated Iterative Bregman Projections. , 2019 .

[73]  Xin Guo,et al.  Sparsemax and Relaxed Wasserstein for Topic Sparsity , 2018, WSDM.

[74]  Pierre-Antoine Absil,et al.  A Collection of Nonsmooth Riemannian Optimization Problems , 2019, Nonsmooth Optimization and Its Applications.

[75]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[76]  Maryam Fazel,et al.  Escaping from saddle points on Riemannian manifolds , 2019, NeurIPS.

[77]  Jonathan Weed,et al.  Statistical Optimal Transport via Factored Couplings , 2018, AISTATS.

[78]  P. Absil,et al.  Erratum to: ``Global rates of convergence for nonconvex optimization on manifolds'' , 2016, IMA Journal of Numerical Analysis.

[79]  Anthony Man-Cho So,et al.  Nonsmooth Optimization over Stiefel Manifold: Riemannian Subgradient Methods , 2019, ArXiv.

[80]  Roland Badeau,et al.  Generalized Sliced Wasserstein Distances , 2019, NeurIPS.

[81]  Gabriel Peyré,et al.  Sample Complexity of Sinkhorn Divergences , 2018, AISTATS.

[82]  Jonathan Weed,et al.  Statistical bounds for entropic optimal transport: sample complexity and the central limit theorem , 2019, NeurIPS.

[83]  Nicolas Boumal,et al.  Efficiently escaping saddle points on manifolds , 2019, NeurIPS.

[84]  Michael I. Jordan,et al.  On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms , 2019, ICML.

[85]  A. Jadbabaie,et al.  On Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions , 2020, arXiv.org.

[86]  Bertrand Thirion,et al.  Multi-subject MEG/EEG source imaging with sparse multi-task regression , 2019, NeuroImage.

[87]  Shiqian Ma,et al.  Proximal Gradient Method for Nonsmooth Optimization over the Stiefel Manifold , 2018, SIAM J. Optim..

[88]  Shiqian Ma,et al.  Primal-dual optimization algorithms over Riemannian manifolds: an iteration complexity analysis , 2017, Mathematical Programming.

[89]  Ohad Shamir,et al.  Can We Find Near-Approximately-Stationary Points of Nonsmooth Nonconvex Functions? , 2020, ArXiv.

[90]  Jing Lei Convergence and concentration of empirical measures under Wasserstein distance in unbounded functional spaces , 2018, Bernoulli.

[91]  Saradha Venkatachalapathy,et al.  Predicting cell lineages using autoencoders and optimal transport , 2020, PLoS Comput. Biol..

[92]  Khai Nguyen,et al.  Distributional Sliced-Wasserstein and Applications to Generative Modeling , 2020, ICLR.

[93]  Michael I. Jordan,et al.  On Projection Robust Optimal Transport: Sample Complexity and Model Misspecification , 2020, AISTATS.

[94]  Martin J. Wainwright,et al.  High-Order Langevin Diffusion Yields an Accelerated MCMC Algorithm , 2019, J. Mach. Learn. Res..