On Projection Robust Optimal Transport: Sample Complexity and Model Misspecification

Optimal transport (OT) distances are increasingly used as loss functions for statistical inference, notably in the learning of generative models or supervised learning. Yet, the behavior of minimum Wasserstein estimators is poorly understood, notably in high-dimensional regimes or under model misspecification. In this work we adopt the viewpoint of projection robust (PR) OT, which seeks to maximize the OT cost between two measures by choosing a $k$-dimensional subspace onto which they can be projected. Our first contribution is to establish several fundamental statistical properties of PR Wasserstein distances, complementing and improving previous literature that has been restricted to one-dimensional and well-specified cases. Next, we propose the integral PR Wasserstein (IPRW) distance as an alternative to the PRW distance, by averaging rather than optimizing on subspaces. Our complexity bounds can help explain why both PRW and IPRW distances outperform Wasserstein distances empirically in high-dimensional inference tasks. Finally, we consider parametric inference using the PRW distance. We provide an asymptotic guarantee of two types of minimum PRW estimators and formulate a central limit theorem for max-sliced Wasserstein estimator under model misspecification. To enable our analysis on PRW with projection dimension larger than one, we devise a novel combination of variational analysis and statistical theory.

[1]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[2]  Bertrand Thirion,et al.  Multi-subject MEG/EEG source imaging with sparse multi-task regression , 2019, NeuroImage.

[3]  Nicolas Bonnotte Unidimensional and Evolution Methods for Optimal Transportation , 2013 .

[4]  Yang Zou,et al.  Sliced Wasserstein Kernels for Probability Distributions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Marco Cuturi,et al.  On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests , 2015, Entropy.

[6]  Antoine Liutkus,et al.  Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions , 2018, ICML.

[7]  Pratik Jawanpuria,et al.  Statistical Optimal Transport posed as Learning Kernel Embedding , 2020, NeurIPS.

[8]  R. Bass,et al.  Review: P. Billingsley, Convergence of probability measures , 1971 .

[9]  Gabriel Peyré,et al.  Wasserstein Barycentric Coordinates: Histogram Regression Using Optimal Transport , 2021 .

[10]  P. Rigollet,et al.  Reconstruction of developmental landscapes by optimal-transport analysis of single-cell gene expression sheds light on cellular reprogramming , 2017, bioRxiv.

[11]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[12]  Gustavo K. Rohde,et al.  Sliced Wasserstein Auto-Encoders , 2018, ICLR.

[13]  David Pollard,et al.  The minimum distance method of testing , 1980 .

[14]  E. Giné,et al.  Central limit theorems for the wasserstein distance between the empirical and the true distributions , 1999 .

[15]  Alexander G. Schwing,et al.  Generative Modeling Using the Sliced Wasserstein Distance , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  F. Bach,et al.  Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance , 2017, Bernoulli.

[17]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[18]  Julien Rabin,et al.  Wasserstein Barycenter and Its Application to Texture Mixing , 2011, SSVM.

[19]  Sophie Dede,et al.  An empirical Central Limit Theorem in L1 for stationary sequences , 2008, 0812.2839.

[20]  David van Dijk,et al.  TrajectoryNet: A Dynamic Optimal Transport Network for Modeling Cellular Dynamics , 2020, ICML.

[21]  James Zijun Wang,et al.  Fast Discrete Distribution Clustering Using Wasserstein Barycenter With Sparse Support , 2015, IEEE Transactions on Signal Processing.

[22]  J. Wolfowitz The Minimum Distance Method , 1957 .

[23]  Michael I. Jordan,et al.  Projection Robust Wasserstein Distance and Riemannian Optimization , 2020, NeurIPS.

[24]  Roland Badeau,et al.  Generalized Sliced Wasserstein Distances , 2019, NeurIPS.

[25]  Saradha Venkatachalapathy,et al.  Predicting cell lineages using autoencoders and optimal transport , 2020, PLoS Comput. Biol..

[26]  M. Talagrand Transportation cost for Gaussian and other product measures , 1996 .

[27]  Jing Lei Convergence and concentration of empirical measures under Wasserstein distance in unbounded functional spaces , 2018, Bernoulli.

[28]  John P. Nolan,et al.  Multivariate elliptically contoured stable distributions: theory and estimation , 2013, Computational Statistics.

[29]  Julien Rabin,et al.  Sliced and Radon Wasserstein Barycenters of Measures , 2014, Journal of Mathematical Imaging and Vision.

[30]  Erhan Bayraktar,et al.  Strong equivalence between metrics of Wasserstein type , 2019, 1912.08247.

[31]  M. Ledoux Concentration of measure and logarithmic Sobolev inequalities , 1999 .

[32]  Luc Van Gool,et al.  Sliced Wasserstein Generative Models , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Tommi S. Jaakkola,et al.  Learning Population-Level Diffusions with Generative RNNs , 2016, ICML.

[34]  M. Taqqu,et al.  Stable Non-Gaussian Random Processes : Stochastic Models with Infinite Variance , 1995 .

[35]  Khai Nguyen,et al.  Distributional Sliced-Wasserstein and Applications to Generative Modeling , 2020, ICLR.

[36]  Jonathan Niles-Weed,et al.  Estimation of Wasserstein distances in the Spiked Transport Model , 2019, Bernoulli.

[37]  Barnabás Póczos,et al.  Minimax Distribution Estimation in Wasserstein Distance , 2018, ArXiv.

[38]  F. Bassetti,et al.  On minimum Kantorovich distance estimators , 2006 .

[39]  Anthony Man-Cho So,et al.  Nonsmooth Optimization over Stiefel Manifold: Riemannian Subgradient Methods , 2019, ArXiv.

[40]  L. Brown,et al.  Measurable Selections of Extrema , 1973 .

[41]  Kim C. Border,et al.  Infinite Dimensional Analysis: A Hitchhiker’s Guide , 1994 .

[42]  Jonas Adler,et al.  Banach Wasserstein GAN , 2018, NeurIPS.

[43]  Marco Cuturi,et al.  Subspace Robust Wasserstein distances , 2019, ICML.

[44]  Dinh Q. Phung,et al.  Multilevel Clustering via Wasserstein Means , 2017, ICML.

[45]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[46]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[47]  A. Guillin,et al.  On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.

[48]  Klaus-Robert Müller,et al.  Wasserstein Training of Restricted Boltzmann Machines , 2016, NIPS.

[49]  C. Villani Optimal Transport: Old and New , 2008 .

[50]  R. Dudley The Speed of Mean Glivenko-Cantelli Convergence , 1969 .

[51]  Sivaraman Balakrishnan,et al.  Minimax Confidence Intervals for the Sliced Wasserstein Distance , 2019 .

[52]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[53]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[54]  Christian P. Robert,et al.  On parameter estimation with the Wasserstein distance , 2017, Information and Inference: A Journal of the IMA.

[55]  Shahin Shahrampour,et al.  Statistical and Topological Properties of Sliced Probability Divergences , 2020, NeurIPS.

[56]  A. Basu,et al.  Statistical Inference: The Minimum Distance Approach , 2011 .

[57]  Steve Oudot,et al.  Sliced Wasserstein Kernel for Persistence Diagrams , 2017, ICML.

[58]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[59]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[60]  Yoav Zemel,et al.  Statistical Aspects of Wasserstein Distances , 2018, Annual Review of Statistics and Its Application.

[61]  Roland Badeau,et al.  Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance , 2019, NeurIPS.

[62]  Anthony Man-Cho So,et al.  Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods , 2018, Mathematical Programming.

[63]  Mingkui Tan,et al.  Multi-marginal Wasserstein GAN , 2019, NeurIPS.

[64]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[65]  David A. Forsyth,et al.  Max-Sliced Wasserstein Distance and Its Use for GANs , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  W. Linde STABLE NON‐GAUSSIAN RANDOM PROCESSES: STOCHASTIC MODELS WITH INFINITE VARIANCE , 1996 .