论文信息 - Two-sample Test using Projected Wasserstein Distance

Two-sample Test using Projected Wasserstein Distance

We develop a projected Wasserstein distance for the two-sample test, a fundamental problem in statistics and machine learning: given two sets of samples, to determine whether they are from the same distribution. In particular, we aim to circumvent the curse of dimensionality in Wasserstein distance: when the dimension is high, it has diminishing testing power, which is inherently due to the slow concentration property of Wasserstein metrics in the high dimension space. A key contribution is to couple optimal projection to find the low dimensional linear mapping to maximize the Wasserstein distance between projected probability distributions. We characterize theoretical properties of the two-sample convergence rate on IPMs and this new distance. Numerical examples validate our theoretical results.

[1] H. Hotelling. The Generalization of Student’s Ratio , 1931 .

[2] F. Massey. The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[3] Jean D. Gibbons,et al. Kolmogorov-Smirnov Two-Sample Tests , 1981 .

[4] Colin McDiarmid,et al. Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[5] L. Györfi,et al. A Consistent Goodness of Fit Test Based on the Total Variation Distance , 1991 .

[6] J. Pfanzagl,et al. Studies in the history of probability and statistics XLIV A forerunner of the t-distribution , 1996 .

[7] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[8] Ulrike von Luxburg,et al. Distance-Based Classification with Lipschitz Functions , 2004, J. Mach. Learn. Res..

[9] Stephen E. Fienberg,et al. Testing Statistical Hypotheses , 2005 .

[10] Y. Mukaigawa,et al. Large Deviations Estimates for Some Non-local Equations I. Fast Decaying Kernels and Explicit Bounds , 2022 .

[11] Hans-Peter Kriegel,et al. Integrating structured biological data by Kernel Maximum Mean Discrepancy , 2006, ISMB.

[12] C. Matr,et al. Tests of Goodness of Fit Based on the L 2 -wasserstein Distance , 2007 .

[13] B. Z. Moroz,et al. london mathematical society lecture note series , 2007 .

[14] C. Villani. Optimal Transport: Old and New , 2008 .

[15] Zaïd Harchaoui,et al. A Fast, Consistent Kernel Two-Sample Test , 2009, NIPS.

[16] VARUN CHANDOLA,et al. Anomaly detection: A survey , 2009, CSUR.

[17] Heng Tao Shen,et al. Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[18] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[19] W. Marsden. I and J , 2012 .

[20] Vipin Kumar,et al. Anomaly Detection for Discrete Sequences: A Survey , 2012, IEEE Transactions on Knowledge and Data Engineering.

[21] Takafumi Kanamori,et al. $f$ -Divergence Estimation and Two-Sample Homogeneity Test Under Semiparametric Density-Ratio Models , 2010, IEEE Transactions on Information Theory.

[22] Sivaraman Balakrishnan,et al. Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[23] Marco Cuturi,et al. Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[24] J. S. Marron,et al. Direction-Projection-Permutation for High-Dimensional Hypothesis Tests , 2013, 1304.0796.

[25] Marco Cuturi. Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances , 2013, 1306.0895.

[26] Rebecca Willett,et al. Change-Point Detection for High-Dimensional Time Series With Missing Data , 2012, IEEE Journal of Selected Topics in Signal Processing.

[27] Julien Rabin,et al. Sliced and Radon Wasserstein Barycenters of Measures , 2014, Journal of Mathematical Imaging and Vision.

[28] Jugal K. Kalita,et al. Network Anomaly Detection: Methods, Systems and Tools , 2014, IEEE Communications Surveys & Tutorials.

[29] Tommi S. Jaakkola,et al. Principal Differences Analysis: Interpretable Characterization of Differences between Distributions , 2015, NIPS.

[30] Richard Combes,et al. An extension of McDiarmid's inequality , 2015, ArXiv.

[31] Sashank J. Reddi,et al. On the Decreasing Power of Kernel and Distance Based Nonparametric Hypothesis Tests in High Dimensions , 2014, AAAI.

[32] Zoubin Ghahramani,et al. Statistical Model Criticism using Kernel Two Sample Tests , 2015, NIPS.

[33] Le Song,et al. M-Statistic for Kernel Change-Point Detection , 2015, NIPS.

[34] Andreas Maurer,et al. A Vector-Contraction Inequality for Rademacher Complexities , 2016, ALT.

[35] Anil K. Ghosh,et al. Distribution-free high-dimensional two-sample tests based on discriminating hyperplanes , 2015, TEST.

[36] Oluwasanmi Koyejo,et al. Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[37] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[38] Arthur Gretton,et al. A Kernel Test of Goodness of Fit , 2016, ICML.

[39] Marco Cuturi,et al. On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests , 2015, Entropy.

[40] Xinxing Wu,et al. Distribution-dependent concentration inequalities for tighter generalization bounds , 2016, Science China Information Sciences.

[41] Jérémie Bigot,et al. Central limit theorems for Sinkhorn divergence between probability distributions on finite spaces and statistical applications , 2017 .

[42] Yiming Yang,et al. MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[43] R. Sarpong,et al. Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.

[44] Tao Xu,et al. On the Discrimination-Generalization Tradeoff in GANs , 2017, ICLR.

[45] Marco Cuturi,et al. Subspace Robust Wasserstein distances , 2019, ICML.

[46] Diederik P. Kingma,et al. An Introduction to Variational Autoencoders , 2019, Found. Trends Mach. Learn..

[47] Aaas News,et al. Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[48] Stefanie Jegelka,et al. Distributionally Robust Optimization and Generalization in Kernel Methods , 2019, NeurIPS.

[49] W. Hager,et al. and s , 2019, Shallow Water Hydraulics.

[50] David A. Forsyth,et al. Max-Sliced Wasserstein Distance and Its Use for GANs , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51] T. Vetter,et al. Two-Sample Unpaired t Tests in Medical Research. , 2019, Anesthesia and analgesia.

[52] Gabriel Peyré,et al. Sample Complexity of Sinkhorn Divergences , 2018, AISTATS.

[53] Rui Gao. Finite-Sample Guarantees for Wasserstein Distributionally Robust Optimization: Breaking the Curse of Dimensionality , 2020, Oper. Res..

[54] Projection Robust Wasserstein Distance and Riemannian Optimization , 2020, NeurIPS.

[55] Tsuyoshi Murata,et al. {m , 1934, ACML.

[56] Feng Liu,et al. Learning Deep Kernels for Non-Parametric Two-Sample Tests , 2020, ICML.

[57] Michael I. Jordan,et al. On Projection Robust Optimal Transport: Sample Complexity and Model Misspecification , 2020, AISTATS.

[58] Shiqian Ma,et al. A Riemannian Block Coordinate Descent Method for Computing the Projection Robust Wasserstein Distance , 2020, ICML.

[59] P. Alam. ‘G’ , 2021, Composites Engineering: An A–Z Guide.

[60] Larry A. Wasserman,et al. Classification Accuracy as a Proxy for Two Sample Testing , 2016, The Annals of Statistics.

[61] Yao Xie,et al. Sequential Change Detection by Optimal Weighted ℓ₂ Divergence , 2021, IEEE Journal on Selected Areas in Information Theory.

[62] L. Wasserman,et al. Minimax confidence intervals for the Sliced Wasserstein distance , 2019, Electronic Journal of Statistics.

[63] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .