Two-sample Test using Projected Wasserstein Distance

We develop a projected Wasserstein distance for the two-sample test, a fundamental problem in statistics and machine learning: given two sets of samples, to determine whether they are from the same distribution. In particular, we aim to circumvent the curse of dimensionality in Wasserstein distance: when the dimension is high, it has diminishing testing power, which is inherently due to the slow concentration property of Wasserstein metrics in the high dimension space. A key contribution is to couple optimal projection to find the low dimensional linear mapping to maximize the Wasserstein distance between projected probability distributions. We characterize theoretical properties of the two-sample convergence rate on IPMs and this new distance. Numerical examples validate our theoretical results.

[1]  H. Hotelling The Generalization of Student’s Ratio , 1931 .

[2]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[3]  Jean D. Gibbons,et al.  Kolmogorov-Smirnov Two-Sample Tests , 1981 .

[4]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[5]  L. Györfi,et al.  A Consistent Goodness of Fit Test Based on the Total Variation Distance , 1991 .

[6]  J. Pfanzagl,et al.  Studies in the history of probability and statistics XLIV A forerunner of the t-distribution , 1996 .

[7]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[8]  Ulrike von Luxburg,et al.  Distance-Based Classification with Lipschitz Functions , 2004, J. Mach. Learn. Res..

[9]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[10]  Y. Mukaigawa,et al.  Large Deviations Estimates for Some Non-local Equations I. Fast Decaying Kernels and Explicit Bounds , 2022 .

[11]  Hans-Peter Kriegel,et al.  Integrating structured biological data by Kernel Maximum Mean Discrepancy , 2006, ISMB.

[12]  C. Matr,et al.  Tests of Goodness of Fit Based on the L 2 -wasserstein Distance , 2007 .

[13]  B. Z. Moroz,et al.  london mathematical society lecture note series , 2007 .

[14]  C. Villani Optimal Transport: Old and New , 2008 .

[15]  Zaïd Harchaoui,et al.  A Fast, Consistent Kernel Two-Sample Test , 2009, NIPS.

[16]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[17]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[18]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[19]  W. Marsden I and J , 2012 .

[20]  Vipin Kumar,et al.  Anomaly Detection for Discrete Sequences: A Survey , 2012, IEEE Transactions on Knowledge and Data Engineering.

[21]  Takafumi Kanamori,et al.  $f$ -Divergence Estimation and Two-Sample Homogeneity Test Under Semiparametric Density-Ratio Models , 2010, IEEE Transactions on Information Theory.

[22]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[23]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[24]  J. S. Marron,et al.  Direction-Projection-Permutation for High-Dimensional Hypothesis Tests , 2013, 1304.0796.

[25]  Marco Cuturi Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances , 2013, 1306.0895.

[26]  Rebecca Willett,et al.  Change-Point Detection for High-Dimensional Time Series With Missing Data , 2012, IEEE Journal of Selected Topics in Signal Processing.

[27]  Julien Rabin,et al.  Sliced and Radon Wasserstein Barycenters of Measures , 2014, Journal of Mathematical Imaging and Vision.

[28]  Jugal K. Kalita,et al.  Network Anomaly Detection: Methods, Systems and Tools , 2014, IEEE Communications Surveys & Tutorials.

[29]  Tommi S. Jaakkola,et al.  Principal Differences Analysis: Interpretable Characterization of Differences between Distributions , 2015, NIPS.

[30]  Richard Combes,et al.  An extension of McDiarmid's inequality , 2015, ArXiv.

[31]  Sashank J. Reddi,et al.  On the Decreasing Power of Kernel and Distance Based Nonparametric Hypothesis Tests in High Dimensions , 2014, AAAI.

[32]  Zoubin Ghahramani,et al.  Statistical Model Criticism using Kernel Two Sample Tests , 2015, NIPS.

[33]  Le Song,et al.  M-Statistic for Kernel Change-Point Detection , 2015, NIPS.

[34]  Andreas Maurer,et al.  A Vector-Contraction Inequality for Rademacher Complexities , 2016, ALT.

[35]  Anil K. Ghosh,et al.  Distribution-free high-dimensional two-sample tests based on discriminating hyperplanes , 2015, TEST.

[36]  Oluwasanmi Koyejo,et al.  Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[37]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[38]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[39]  Marco Cuturi,et al.  On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests , 2015, Entropy.

[40]  Xinxing Wu,et al.  Distribution-dependent concentration inequalities for tighter generalization bounds , 2016, Science China Information Sciences.

[41]  Jérémie Bigot,et al.  Central limit theorems for Sinkhorn divergence between probability distributions on finite spaces and statistical applications , 2017 .

[42]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[43]  R. Sarpong,et al.  Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.

[44]  Tao Xu,et al.  On the Discrimination-Generalization Tradeoff in GANs , 2017, ICLR.

[45]  Marco Cuturi,et al.  Subspace Robust Wasserstein distances , 2019, ICML.

[46]  Diederik P. Kingma,et al.  An Introduction to Variational Autoencoders , 2019, Found. Trends Mach. Learn..

[47]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[48]  Stefanie Jegelka,et al.  Distributionally Robust Optimization and Generalization in Kernel Methods , 2019, NeurIPS.

[49]  W. Hager,et al.  and s , 2019, Shallow Water Hydraulics.

[50]  David A. Forsyth,et al.  Max-Sliced Wasserstein Distance and Its Use for GANs , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  T. Vetter,et al.  Two-Sample Unpaired t Tests in Medical Research. , 2019, Anesthesia and analgesia.

[52]  Gabriel Peyré,et al.  Sample Complexity of Sinkhorn Divergences , 2018, AISTATS.

[53]  Rui Gao Finite-Sample Guarantees for Wasserstein Distributionally Robust Optimization: Breaking the Curse of Dimensionality , 2020, Oper. Res..

[54]  Projection Robust Wasserstein Distance and Riemannian Optimization , 2020, NeurIPS.

[55]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[56]  Feng Liu,et al.  Learning Deep Kernels for Non-Parametric Two-Sample Tests , 2020, ICML.

[57]  Michael I. Jordan,et al.  On Projection Robust Optimal Transport: Sample Complexity and Model Misspecification , 2020, AISTATS.

[58]  Shiqian Ma,et al.  A Riemannian Block Coordinate Descent Method for Computing the Projection Robust Wasserstein Distance , 2020, ICML.

[59]  P. Alam ‘G’ , 2021, Composites Engineering: An A–Z Guide.

[60]  Larry A. Wasserman,et al.  Classification Accuracy as a Proxy for Two Sample Testing , 2016, The Annals of Statistics.

[61]  Yao Xie,et al.  Sequential Change Detection by Optimal Weighted ℓ₂ Divergence , 2021, IEEE Journal on Selected Areas in Information Theory.

[62]  L. Wasserman,et al.  Minimax confidence intervals for the Sliced Wasserstein distance , 2019, Electronic Journal of Statistics.

[63]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .