Convergence and concentration of empirical measures under Wasserstein distance in unbounded functional spaces

We provide upper bounds of the expected Wasserstein distance between a probability measure and its empirical version, generalizing recent results for finite dimensional Euclidean spaces and bounded functional spaces. Such a generalization can cover Euclidean spaces with large dimensionality, with the optimal dependence on the dimensionality. Our method also covers the important case of Gaussian processes in separable Hilbert spaces, with rate-optimal upper bounds for functional data distributions whose coordinates decay geometrically or polynomially. Moreover, our bounds of the expected value can be combined with mean-concentration results to yield improved exponential tail probability bounds for the Wasserstein error of empirical measures under a Bernstein-type tail condition.

[1]  R. Dudley The Speed of Mean Glivenko-Cantelli Convergence , 1969 .

[2]  D. Aldous Representations for partially exchangeable arrays of random variables , 1981 .

[3]  Olav Kallenberg,et al.  On the representation theorem for exchangeable arrays , 1989 .

[4]  M. Talagrand THE TRANSPORTATION COST FROM THE UNIFORM MEASURE TO THE EMPIRICAL MEASURE IN DIMENSION > 3 , 1994 .

[5]  S. Geer Exponential Inequalities for Martingales, with Application to Maximum Likelihood Estimation for Counting Processes , 1995 .

[6]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[7]  V. Peña A General Class of Exponential Inequalities for Martingales and Ratios , 1999 .

[8]  M. Ledoux Concentration of measure and logarithmic Sobolev inequalities , 1999 .

[9]  Bernhard Schölkopf,et al.  Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators , 1998 .

[10]  Gilles Pagès,et al.  Sharp asymptotics of the Kolmogorov entropy for Gaussian measures , 2004 .

[11]  Mark Semenovich Pinsker,et al.  On coverings of ellipsoids in Euclidean spaces , 2004, IEEE Transactions on Information Theory.

[12]  C. Villani,et al.  Quantitative Concentration Inequalities for Empirical Measures on Non-compact Spaces , 2005, math/0503123.

[13]  Jean-Louis Verger-Gaugry,et al.  Covering a Ball with Smaller Equal Balls in ℝn , 2005, Discret. Comput. Geom..

[14]  Ilya Dumer,et al.  Covering an ellipsoid with equal balls , 2006, J. Comb. Theory, Ser. A.

[15]  L. Kantorovich On the Translocation of Masses , 2006 .

[16]  P. Hall,et al.  Properties of principal component methods for functional and longitudinal data analysis , 2006, math/0608022.

[17]  C. Villani Optimal Transport: Old and New , 2008 .

[18]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[19]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[20]  Christian L'eonard,et al.  Transport Inequalities. A Survey , 2010, 1003.3852.

[21]  Emmanuel Boissard Simple Bounds for the Convergence of Empirical and Occupation Measures in 1-Wasserstein Distance , 2011, 1103.3188.

[22]  S. Dereich,et al.  Constructive quantization: Approximation by empirical measures , 2011, 1108.5346.

[23]  Thibaut Le Gouic,et al.  On the mean speed of convergence of empirical and occupation measures in Wasserstein distance , 2011, 1105.5263.

[24]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[25]  Edoardo M. Airoldi,et al.  Stochastic blockmodel approximation of a graphon: Theory and consistent estimation , 2013, NIPS.

[26]  A. Guillin,et al.  On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.

[27]  P. Wolfe,et al.  Nonparametric graphon estimation , 2013, 1309.5936.

[28]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[29]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[30]  Aryeh Kontorovich,et al.  Concentration in unbounded metric spaces and algorithmic stability , 2013, ICML.

[31]  J. Dedecker,et al.  Deviation inequalities for separately Lipschitz functionals of iterated random functions , 2015 .

[32]  Harrison H. Zhou,et al.  Rate-optimal graphon estimation , 2014, 1410.5837.

[33]  Max Sommerfeld,et al.  Inference for empirical Wasserstein distances on finite spaces , 2016, 1610.03287.

[34]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[35]  Jing Lei Network representation using graph root distributions , 2018, The Annals of Statistics.

[36]  Barnabás Póczos,et al.  Minimax Distribution Estimation in Wasserstein Distance , 2018, ArXiv.

[37]  F. Bach,et al.  Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance , 2017, Bernoulli.

[38]  Soumendu Sundar Mukherjee,et al.  Weak convergence and empirical processes , 2019 .

[39]  S. Bobkov,et al.  One-dimensional empirical measures, order statistics, and Kantorovich transport distances , 2019, Memoirs of the American Mathematical Society.

[40]  M. Bálek,et al.  Large Networks and Graph Limits , 2022 .