Characteristic and Universal Tensor Product Kernels

Maximum mean discrepancy (MMD), also called energy distance or N-distance in statistics and Hilbert-Schmidt independence criterion (HSIC), specifically distance covariance in statistics, are among the most popular and successful approaches to quantify the difference and independence of random variables, respectively. Thanks to their kernel-based foundations, MMD and HSIC are applicable on a wide variety of domains. Despite their tremendous success, quite little is known about when HSIC characterizes independence and when MMD with tensor product kernel can discriminate probability distributions. In this paper, we answer these questions by studying various notions of characteristic property of the tensor product kernel.

[1]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[2]  L. Klebanov,et al.  A characterization of distributions by mean values of statistics and certain probabilistic metrics , 1992 .

[3]  Jean-François Cardoso,et al.  Multidimensional independent component analysis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[5]  Dudley,et al.  Real Analysis and Probability: Integration , 2002 .

[6]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[7]  L. Baringhaus,et al.  On a new multivariate two-sample test , 2004 .

[8]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[9]  G. Székely,et al.  TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION , 2004 .

[10]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[11]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[12]  Maria L. Rizzo,et al.  A new test for multivariate normality , 2005 .

[13]  Yuesheng Xu,et al.  Universal Kernels , 2006, J. Mach. Learn. Res..

[14]  Hans-Peter Kriegel,et al.  Integrating structured biological data by Kernel Maximum Mean Discrepancy , 2006, ISMB.

[15]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[16]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[17]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, IFIP Working Conference on Database Semantics.

[18]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[19]  Zaïd Harchaoui,et al.  Testing for Homogeneity with Kernel Fisher Discriminant Analysis , 2007, NIPS.

[20]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[21]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[22]  C. Carmeli,et al.  Vector valued reproducing kernel Hilbert spaces and universality , 2008, 0807.1659.

[23]  Michael I. Jordan,et al.  Kernel dimension reduction in regression , 2009, 0908.1854.

[24]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[25]  Kenji Fukumizu,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[26]  Le Song,et al.  Kernelized Sorting , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[28]  Gilles Blanchard,et al.  Generalizing from Several Related Classification Tasks to a New Unlabeled Sample , 2011, NIPS.

[29]  Le Song,et al.  Kernel Belief Propagation , 2011, AISTATS.

[30]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[31]  Bernhard Schölkopf,et al.  Learning from Distributions via Support Measure Machines , 2012, NIPS.

[32]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[33]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[34]  Tapio Salakoski,et al.  A Kernel-Based Framework for Learning Graded Relations From Data , 2011, IEEE Transactions on Fuzzy Systems.

[35]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[36]  Le Song,et al.  Kernel Bayes' rule: Bayesian inference with positive definite kernels , 2013, J. Mach. Learn. Res..

[37]  Arthur Gretton,et al.  A Kernel Test for Three-Variable Interactions , 2013, NIPS.

[38]  Joshua B. Tenenbaum,et al.  Automatic Construction and Natural-Language Description of Nonparametric Regression Models , 2014, AAAI.

[39]  A. Gretton A simpler condition for consistency of a kernel independence test , 2015, 1501.06103.

[40]  Bernhard Schölkopf,et al.  Computing functions of random variables via reproducing kernel Hilbert space representations , 2015, Statistics and Computing.

[41]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[42]  Wittawat Jitkrittum,et al.  K2-ABC: Approximate Bayesian Computation with Kernel Embeddings , 2015, AISTATS.

[43]  M. Urner Scattered Data Approximation , 2016 .

[44]  Oluwasanmi Koyejo,et al.  Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[45]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[46]  Kenji Fukumizu,et al.  Persistence weighted Gaussian kernel for topological data analysis , 2016, ICML.

[47]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[48]  Arthur Gretton,et al.  Learning Theory for Distribution Regression , 2014, J. Mach. Learn. Res..

[49]  B. Schölkopf,et al.  Kernel‐based tests for joint independence , 2016, 1603.00285.

[50]  Kenji Fukumizu,et al.  A Linear-Time Kernel Goodness-of-Fit Test , 2017, NIPS.

[51]  Arthur Gretton,et al.  An Adaptive Test of Independence with Analytic Kernel Embeddings , 2016, ICML.

[52]  Johanna F. Ziegel,et al.  Strictly proper kernel scores and characteristic kernels on compact spaces , 2017, Applied and Computational Harmonic Analysis.

[53]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[54]  Ann. Probab Distance Covariance in Metric Spaces , 2017 .

[55]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[56]  Eric V. Strobl,et al.  Approximate Kernel-Based Conditional Independence Tests for Fast Non-Parametric Causal Discovery , 2017, Journal of Causal Inference.

[57]  Kenji Fukumizu,et al.  Post Selection Inference with Kernels , 2016, AISTATS.

[58]  Bernhard Schölkopf,et al.  Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions , 2016, J. Mach. Learn. Res..

[59]  Dino Sejdinovic,et al.  Bayesian Approaches to Distribution Regression , 2017, AISTATS.

[60]  Gilles Blanchard,et al.  Domain Generalization by Marginal Transfer Learning , 2017, J. Mach. Learn. Res..

[61]  Krishnakumar Balasubramanian,et al.  On the Optimality of Kernel-Embedding Based Goodness-of-Fit Tests , 2017, J. Mach. Learn. Res..