MONK - Outlier-Robust Mean Embedding Estimation by Median-of-Means

Mean embeddings provide an extremely flexible and powerful tool in machine learning and statistics to represent probability distributions and define a semi-metric (MMD, maximum mean discrepancy; also called N-distance or energy distance), with numerous successful applications. The representation is constructed as the expectation of the feature map defined by a kernel. As a mean, its classical empirical estimator, however, can be arbitrary severely affected even by a single outlier in case of unbounded features. To the best of our knowledge, unfortunately even the consistency of the existing few techniques trying to alleviate this serious sensitivity bottleneck is unknown. In this paper, we show how the recently emerged principle of median-of-means can be used to design estimators for kernel mean embedding and MMD with excessive resistance properties to outliers, and optimal sub-Gaussian deviation bounds under mild assumptions.

[1]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[2]  L. Lecam Convergence of Estimates Under Dimensionality Restrictions , 1973 .

[3]  J. Kuelbs Probability on Banach spaces , 1978 .

[4]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[5]  Leslie G. Valiant,et al.  Random Generation of Combinatorial Structures from a Uniform Distribution , 1986, Theor. Comput. Sci..

[6]  L. Klebanov,et al.  A characterization of distributions by mean values of statistics and certain probabilistic metrics , 1992 .

[7]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[8]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[9]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[10]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[11]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[12]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[13]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[14]  Thomas Gärtner,et al.  Multi-Instance Kernels , 2002, ICML.

[15]  Hisashi Kashima,et al.  Kernels for Semi-Structured Data , 2002, ICML.

[16]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[17]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[18]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[19]  L. Baringhaus,et al.  On a new multivariate two-sample test , 2004 .

[20]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[21]  G. Székely,et al.  TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION , 2004 .

[22]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[23]  Matthias Hein,et al.  Hilbertian Metrics and Positive Definite Kernels on Probability Measures , 2005, AISTATS.

[24]  Maria L. Rizzo,et al.  A new test for multivariate normality , 2005 .

[25]  Kenji Fukumizu,et al.  Semigroup Kernels on Measures , 2005, J. Mach. Learn. Res..

[26]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[27]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[28]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[29]  O. Cappé,et al.  Retrospective Mutiple Change-Point Estimation with Kernels , 2007, 2007 IEEE/SP 14th Workshop on Statistical Signal Processing.

[30]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[31]  Zaïd Harchaoui,et al.  Testing for Homogeneity with Kernel Fisher Discriminant Analysis , 2007, NIPS.

[32]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[33]  Clayton D. Scott,et al.  Robust kernel density estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[34]  Eric P. Xing,et al.  Nonextensive Information Theoretic Kernels on Measures , 2009, J. Mach. Learn. Res..

[35]  O. Catoni Challenging the empirical mean and empirical variance: a deviation study , 2010, 1009.2048.

[36]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[37]  V. Koltchinskii,et al.  Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .

[38]  Marco Cuturi,et al.  Fast Global Alignment Kernels , 2011, ICML.

[39]  Le Song,et al.  Kernel Belief Propagation , 2011, AISTATS.

[40]  Jean-Yves Audibert,et al.  Robust linear least squares regression , 2010, 1010.0074.

[41]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[42]  Bernhard Schölkopf,et al.  Learning from Distributions via Support Measure Machines , 2012, NIPS.

[43]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[44]  Clayton D. Scott,et al.  Consistency of Robust Kernel Density Estimators , 2013, COLT.

[45]  V. Koltchinskii,et al.  Bounding the smallest singular value of a random matrix without concentration , 2013, 1312.3580.

[46]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[47]  Le Song,et al.  Kernel Bayes' rule: Bayesian inference with positive definite kernels , 2013, J. Mach. Learn. Res..

[48]  Joshua B. Tenenbaum,et al.  Automatic Construction and Natural-Language Description of Nonparametric Regression Models , 2014, AAAI.

[49]  Zoltán Szabó,et al.  Information theoretical estimators toolbox , 2014, J. Mach. Learn. Res..

[50]  Shahar Mendelson,et al.  Learning without Concentration , 2014, COLT.

[51]  Zoubin Ghahramani,et al.  Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.

[52]  Bernhard Schölkopf,et al.  Computing functions of random variables via reproducing kernel Hilbert space representations , 2015, Statistics and Computing.

[53]  Stanislav Minsker Geometric median and robust estimation in Banach spaces , 2013, 1308.1334.

[54]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[55]  G. Lugosi,et al.  Sub-Gaussian mean estimators , 2015, 1509.05845.

[56]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[57]  Wittawat Jitkrittum,et al.  K2-ABC: Approximate Bayesian Computation with Kernel Embeddings , 2015, AISTATS.

[58]  Oluwasanmi Koyejo,et al.  Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[59]  Kenji Fukumizu,et al.  Persistence weighted Gaussian kernel for topological data analysis , 2016, ICML.

[60]  Bernhard Schölkopf,et al.  Kernel Mean Shrinkage Estimators , 2014, J. Mach. Learn. Res..

[61]  G. Lugosi,et al.  Risk minimization by median-of-means tournaments , 2016, Journal of the European Mathematical Society.

[62]  Risi Kondor,et al.  The Multiscale Laplacian Graph Kernel , 2016, NIPS.

[63]  Bernhard Schölkopf,et al.  Minimax Estimation of Maximum Mean Discrepancy with Radial Kernels , 2016, NIPS.

[64]  Arthur Gretton,et al.  Learning Theory for Distribution Regression , 2014, J. Mach. Learn. Res..

[65]  B. Schölkopf,et al.  Kernel‐based tests for joint independence , 2016, 1603.00285.

[66]  Kenji Fukumizu,et al.  A Linear-Time Kernel Goodness-of-Fit Test , 2017, NIPS.

[67]  Nate Strawn,et al.  Distributed Statistical Estimation and Rates of Convergence in Normal Approximation , 2017, Electronic Journal of Statistics.

[68]  Krikamol Muandet,et al.  Minimax Estimation of Kernel Mean Embeddings , 2016, J. Mach. Learn. Res..

[69]  Matthieu Lerasle,et al.  ROBUST MACHINE LEARNING BY MEDIAN-OF-MEANS: THEORY AND PRACTICE , 2019 .

[70]  Stéphane Canu,et al.  Cross product kernels for fuzzy set similarity , 2017, 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[71]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[72]  O. Catoni,et al.  Dimension-free PAC-Bayesian bounds for matrices, vectors, and linear least squares regression , 2017, 1712.02747.

[73]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[74]  Kenji Fukumizu,et al.  Post Selection Inference with Kernels , 2016, AISTATS.

[75]  Samuel B. Hopkins Mean estimation with sub-Gaussian rates in polynomial time , 2018, The Annals of Statistics.

[76]  Arthur Gretton,et al.  Demystifying MMD GANs , 2018, ICLR.

[77]  Stefan Klus,et al.  A kernel-based approach to molecular conformation analysis , 2018, The Journal of chemical physics.

[78]  Jean-Philippe Vert,et al.  The Kendall and Mallows Kernels for Permutations , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  Kenji Fukumizu,et al.  Influence function and robust variant of kernel canonical correlation analysis , 2017, Neurocomputing.

[80]  Dino Sejdinovic,et al.  Bayesian Approaches to Distribution Regression , 2017, AISTATS.

[81]  S. Van Aelst,et al.  M-estimators of location for functional data , 2018, Bernoulli.

[82]  Anant Raj,et al.  A Differentially Private Kernel Two-Sample Test , 2018, ECML/PKDD.

[83]  Lecu'e Guillaume,et al.  Learning from MOM’s principles: Le Cam’s approach , 2017, Stochastic Processes and their Applications.

[84]  G. Lugosi,et al.  Sub-Gaussian estimators of the mean of a random vector , 2017, The Annals of Statistics.

[85]  Peter L. Bartlett,et al.  Fast Mean Estimation with Sub-Gaussian Rates , 2019, COLT.

[86]  Stefan Klus,et al.  Eigendecompositions of Transfer Operators in Reproducing Kernel Hilbert Spaces , 2017, J. Nonlinear Sci..

[87]  Gilles Blanchard,et al.  Domain Generalization by Marginal Transfer Learning , 2017, J. Mach. Learn. Res..

[88]  Krishnakumar Balasubramanian,et al.  On the Optimality of Kernel-Embedding Based Goodness-of-Fit Tests , 2017, J. Mach. Learn. Res..