Soft and subspace robust multivariate rank tests based on entropy regularized optimal transport

In this paper, we extend the recently proposed multivariate rank energy distance, based on the theory of optimal transport, for statistical testing of distributional similarity, to soft rank energy distance. Being differentiable, this in turn allows us to extend the rank energy to a subspace robust rank energy distance, dubbed Projected soft-Rank Energy distance, which can be computed via optimization over the Stiefel manifold. We show via experiments that using projected soft rank energy one can trade-off the detection power vs the false alarm via projections onto an appropriately selected low dimensional subspace. We also show the utility of the proposed tests on unsupervised change point detection in multivariate time series data. All codes are publicly available at the link provided in the experiment section.

[1]  Peter J. Bickel,et al.  On Some Asymptotically Nonparametric Competitors of Hotelling's $T^{2 1}$ , 1965 .

[2]  Kevin C. Cheng,et al.  On Matched Filtering for Statistical Change Point Detection , 2020, IEEE Open Journal of Signal Processing.

[3]  Bodhisattva Sen,et al.  Multivariate Rank-Based Distribution-Free Nonparametric Testing Using Measure Transportation , 2019, Journal of the American Statistical Association.

[4]  M. Schilling Multivariate Two-Sample Tests Based on Nearest Neighbors , 1986 .

[5]  P. Bickel A Distribution Free Version of the Smirnov Two Sample Test in the $p$-Variate Case , 1969 .

[6]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[7]  Marc Hallin,et al.  Rank-based optimal tests of the adequacy of an elliptic VARMA model , 2004 .

[8]  M. Hallin On Distribution and Quantile Functions, Ranks and Signs in R_d , 2017 .

[9]  Marc Hallin,et al.  Parametric and semiparametric inference for shape: the role of the scale functional , 2006 .

[10]  Marco Cuturi,et al.  Computational Optimal Transport: With Applications to Data Science , 2019 .

[11]  Alain Trouvé,et al.  Interpolating between Optimal Transport and MMD using Sinkhorn Divergences , 2018, AISTATS.

[13]  F. Wilcoxon,et al.  Probability tables for individual comparisons by ranking methods. , 1947, Biometrics.

[14]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[15]  Suresh Venkatasubramanian,et al.  A Gentle Introduction to the Kernel Distance , 2011, ArXiv.

[16]  Diane J. Cook,et al.  A survey of methods for time series change point detection , 2017, Knowledge and Information Systems.

[17]  Eric L. Miller,et al.  Optimal Transport Based Change Point Detection and Time Series Segment Clustering , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Hongmei Chi,et al.  On the optimal Halton sequence , 2005, Math. Comput. Simul..

[19]  Gabriel Peyré,et al.  Convergence of Entropic Schemes for Optimal Transport and Gradient Flows , 2015, SIAM J. Math. Anal..

[20]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[21]  B. Sen,et al.  Multivariate Ranks and Quantiles using Optimal Transportation and Applications to Goodness-of-fit Testing , 2019 .

[22]  Maria L. Rizzo,et al.  Energy statistics: A class of statistics based on distances , 2013 .

[23]  Regina Y. Liu,et al.  A Quality Index Based on Data Depth and Multivariate Rank Tests , 1993 .

[24]  L. Baringhaus,et al.  On a new multivariate two-sample test , 2004 .

[25]  Marco Cuturi,et al.  Subspace Robust Wasserstein distances , 2019, ICML.

[26]  R. McCann Existence and uniqueness of monotone measure-preserving maps , 1995 .

[27]  Stergios B. Fotopoulos,et al.  All of Nonparametric Statistics , 2007, Technometrics.

[28]  Pranab Kumar Sen,et al.  On a Class of Multivariate Multisample Rank Order Tests II: Tests for Homogeneity of Dispersion Matrices , 2015 .

[29]  J. Wolfowitz,et al.  On a Test Whether Two Samples are from the Same Population , 1940 .

[30]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[31]  Lionel Weiss,et al.  Two-Sample Tests for Multivariate Distributions , 1960 .

[32]  Multivariate Analysis, Design of Experiments, and Survey Sampling , 2000 .

[33]  P. Chaudhuri On a geometric notion of quantiles for multivariate data , 1996 .

[34]  V. Chernozhukov,et al.  Monge-Kantorovich Depth, Quantiles, Ranks and Signs , 2014, 1412.8434.