A Pseudo-Metric between Probability Distributions based on Depth-Trimmed Regions

The design of a metric between probability distributions is a longstanding problem motivated by numerous applications in Machine Learning. Focusing on continuous probability distributions on the Euclidean space R, we introduce a novel pseudo-metric between probability distributions by leveraging the extension of univariate quantiles to multivariate spaces. Data depth is a nonparametric statistical tool that measures the centrality of any element x ∈ R with respect to (w.r.t.) a probability distribution or a data set. It is a natural median-oriented extension of the cumulative distribution function (cdf) to the multivariate case. Thus, its upper-level sets— the depth-trimmed regions—give rise to a definition of multivariate quantiles. The new pseudo-metric relies on the average of the Hausdorff distance between the depth-based quantile regions w.r.t. each distribution. Its good behavior w.r.t. major transformation groups as well as its ability to factor out translations are depicted. Robustness, an appealing feature of this pseudo-metric, is studied through the finite sample breakdown point. Moreover, we propose an efficient approximation method with linear time complexity w.r.t. the size of the data set and its dimension. The quality of this approximation as well as the performance of the proposed approach are illustrated in numerical experiments.

[1]  Tony Jebara,et al.  Images as bags of pixels , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Michael A. Burr,et al.  Uniform convergence rates for halfspace depth , 2017 .

[3]  Illumination Depth , 2019, 1905.04119.

[4]  Xiaohui Liu,et al.  Fast Computation of Tukey Trimmed Regions and Median in Dimension p > 2 , 2014, Journal of Computational and Graphical Statistics.

[5]  Victor-Emmanuel Brunel,et al.  Concentration of the empirical level sets of Tukey’s halfspace depth , 2016 .

[6]  Rama Chellappa,et al.  Robust Optimal Transport with Applications in Generative Modeling and Domain Adaptation , 2020, NeurIPS.

[7]  D. Donoho,et al.  Breakdown Properties of Location Estimates Based on Halfspace Depth and Projected Outlyingness , 1992 .

[8]  Pat Morin,et al.  Absolute approximation of Tukey depth: Theory and experiments , 2013, Comput. Geom..

[9]  Germain Van Bever,et al.  Halfspace depths for scatter, concentration and shape matrices , 2017, The Annals of Statistics.

[10]  Stéphan Clémençon,et al.  Generalization Bounds in the Presence of Outliers: a Median-of-Means Study , 2021, ICML.

[11]  Filippo Santambrogio,et al.  Optimal Transport for Applied Mathematicians , 2015 .

[12]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[13]  Stanislav Nagy Halfspace depth does not characterize probability distributions , 2018, Statistical Papers.

[14]  C. Villani Topics in Optimal Transportation , 2003 .

[15]  Julien Rabin,et al.  Sliced and Radon Wasserstein Barycenters of Measures , 2014, Journal of Mathematical Imaging and Vision.

[16]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transportation , 2013, NIPS 2013.

[17]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[18]  Xiaohui Liu,et al.  Computing projection depth and its associated estimators , 2012, Statistics and Computing.

[19]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[20]  R. Dyckerhoff Data depths satisfying the projection property , 2004 .

[21]  M. Cugmas,et al.  On comparing partitions , 2015 .

[22]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[23]  K. Mosler,et al.  Zonoid trimming for multivariate distributions , 1997 .

[24]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[25]  C. Schuett,et al.  Halfspace depth and floating body , 2018, Statistics Surveys.

[26]  J. Tukey Mathematics and the Picturing of Data , 1975 .

[27]  Pierre Lafaye de Micheaux,et al.  Depth for Curve Data and Applications , 2019, Journal of the American Statistical Association.

[28]  D. Paindaveine,et al.  Multivariate quantiles and multiple-output regression quantiles: from L1 optimization to halfspace depth , 2010, 1002.4486.

[29]  Stéphan Clémençon,et al.  Affine-Invariant Integrated Rank-Weighted Depth: Definition, Properties and Finite Sample Analysis , 2021, ArXiv.

[30]  Marco Cuturi,et al.  Subspace Robust Wasserstein distances , 2019, ICML.

[31]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[32]  J. Wellner,et al.  Empirical Processes with Applications to Statistics , 2009 .

[33]  Mia Hubert,et al.  Anomaly detection by robust statistics , 2017, WIREs Data Mining Knowl. Discov..

[34]  Igor Vajda,et al.  On Bregman Distances and Divergences of Probability Measures , 2012, IEEE Transactions on Information Theory.

[35]  Pavlo Mozharovskyi,et al.  When OT meets MoM: Robust estimation of Wasserstein Distance , 2020, AISTATS.

[36]  R. Dudley The Speed of Mean Glivenko-Cantelli Convergence , 1969 .

[37]  Ilya Molchanov,et al.  Depth and outliers for samples of sets and random sets distributions , 2021, Australian & New Zealand Journal of Statistics.

[38]  Regina Y. Liu,et al.  DD-Classifier: Nonparametric Classification Procedure Based on DD-Plot , 2012 .

[39]  Matthieu Lerasle,et al.  ROBUST MACHINE LEARNING BY MEDIAN-OF-MEANS: THEORY AND PRACTICE , 2019 .

[40]  Jim Freeman Probability Metrics and the Stability of Stochastic Models , 1991 .

[41]  Yoav Zemel,et al.  Statistical Aspects of Wasserstein Distances , 2018, Annual Review of Statistics and Its Application.

[42]  Ery Arias-Castro,et al.  From Graph Centrality to Data Depth , 2021, 2105.03122.

[43]  Jonathan Weed,et al.  Statistical Optimal Transport via Factored Couplings , 2018, AISTATS.

[44]  MozharovskyiPavlo,et al.  Classifying real-world data with the DDα-procedure , 2015 .

[45]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[46]  Sung-Hyuk Cha,et al.  On measuring the distance between histograms , 2002, Pattern Recognit..

[47]  Pavlo Mozharovskyi,et al.  Approximate computation of projection depths , 2020, Comput. Stat. Data Anal..

[48]  Pavlo Mozharovskyi,et al.  Depth and Depth-Based Classification with R Package ddalpha , 2016, Journal of Statistical Software.

[49]  Yijun Zuo,et al.  Smooth depth contours characterize the underlying distribution , 2010, J. Multivar. Anal..

[50]  Stéphan Clémençon,et al.  The Area of the Convex Hull of Sampled Curves: a Robust Functional Statistical Depth Measure , 2020, AISTATS.

[51]  Regina Y. Liu On a Notion of Data Depth Based on Random Simplices , 1990 .

[52]  Gert R. G. Lanckriet,et al.  On the empirical estimation of integral probability metrics , 2012 .

[53]  Tatjana Lange,et al.  Fast nonparametric classification based on data depth , 2012, Statistical Papers.

[54]  Nicolas Courty,et al.  Domain Adaptation with Regularized Optimal Transport , 2014, ECML/PKDD.

[55]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[56]  Christophe Ley,et al.  A new concept of quantiles for directional data , 2013 .

[57]  H. Oja Descriptive Statistics for Multivariate Distributions , 1983 .

[58]  Robert Serfling,et al.  Depth functions in nonparametric multivariate inference , 2003, Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications.

[59]  Tomas Mikolov,et al.  Advances in Pre-Training Distributed Word Representations , 2017, LREC.

[60]  Uniform convergence rates for the approximated halfspace and projection depth , 2019, 1910.05956.

[61]  R. Serfling,et al.  General notions of statistical depth function , 2000 .

[62]  K. Mosler Multivariate Dispersion, Central Regions, and Depth , 2002 .

[63]  G. Crooks On Measures of Entropy and Information , 2015 .

[64]  V. Barnett The Ordering of Multivariate Data , 1976 .

[65]  Peter Rousseeuw,et al.  Computing location depth and regression depth in higher dimensions , 1998, Stat. Comput..

[66]  Justin Solomon,et al.  Outlier-Robust Optimal Transport , 2020, ICML.

[67]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[68]  H. Battey,et al.  A topologically valid definition of depth for functional data , 2014, 1410.5686.

[69]  P. Rousseeuw,et al.  Halfspace Depth and Regression Depth Characterize the Empirical Distribution , 1999 .

[70]  A. Hassairi,et al.  On the Tukey depth of a continuous probability distribution , 2008 .