A Measure-Theoretic Approach to Kernel Conditional Mean Embeddings

We present a new operator-free, measure-theoretic definition of the conditional mean embedding as a random variable taking values in a reproducing kernel Hilbert space. While the kernel mean embedding of marginal distributions has been defined rigorously, the existing operator-based approach of the conditional version lacks a rigorous definition, and depends on strong assumptions that hinder its analysis. Our definition does not impose any of the assumptions that the operator-based counterpart requires. We derive a natural regression interpretation to obtain empirical estimates, and provide a thorough analysis of its properties, including universal consistency. As natural by-products, we obtain the conditional analogues of the Maximum Mean Discrepancy and Hilbert-Schmidt Independence Criterion, and demonstrate their behaviour via simulations.

[1]  Friedrich Sauvigny,et al.  Linear Operators in Hilbert Spaces , 2012 .

[2]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[3]  Bernhard Schölkopf,et al.  Consistent Kernel Mean Estimation for Functions of Random Variables , 2016, NIPS.

[4]  Zoltán Szabó,et al.  Characteristic and Universal Tensor Product Kernels , 2017, J. Mach. Learn. Res..

[5]  A. T. Bharucha-Reid,et al.  Random Integral Equations , 2012 .

[6]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[7]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[8]  Michael Rabadi,et al.  Kernel Methods for Machine Learning , 2015 .

[9]  Jun Zhu,et al.  Conditional Generative Moment-Matching Networks , 2016, NIPS.

[10]  Bernhard Schölkopf,et al.  Computing functions of random variables via reproducing kernel Hilbert space representations , 2015, Statistics and Computing.

[11]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[12]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[13]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[14]  N. Dinculeanu Vector Integration and Stochastic Integration in Banach Spaces , 2000, Oxford Handbooks Online.

[15]  J. K. Hunter,et al.  Measure Theory , 2007 .

[16]  Carlos Guestrin,et al.  Nonparametric Tree Graphical Models via Kernel Embeddings , 2010 .

[17]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[18]  C. Scovel,et al.  Separability of reproducing kernel spaces , 2015, 1506.04288.

[19]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[20]  Bharath K. Sriperumbudur,et al.  On Distance and Kernel Measures of Conditional Independence , 2019, 1912.01103.

[21]  Le Song,et al.  A unified kernel framework for nonparametric inference in graphical models ] Kernel Embeddings of Conditional Distributions , 2013 .

[22]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[23]  Yee Whye Teh,et al.  Causal Inference via Kernel Deviance Measures , 2018, NeurIPS.

[24]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[25]  C. Carmeli,et al.  Vector valued reproducing kernel Hilbert spaces and universality , 2008, 0807.1659.

[26]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[27]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[28]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[29]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[30]  Guy Lever,et al.  Conditional mean embeddings as regressors , 2012, ICML.

[31]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[32]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[33]  C. Carmeli,et al.  VECTOR VALUED REPRODUCING KERNEL HILBERT SPACES OF INTEGRABLE FUNCTIONS AND MERCER THEOREM , 2006 .

[34]  Bernhard Schölkopf,et al.  Towards a Learning Theory of Causation , 2015, 1502.02398.

[35]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[36]  Ingmar Schuster,et al.  A Rigorous Theory of Conditional Mean Embeddings , 2020, SIAM J. Math. Data Sci..

[37]  Gilles Blanchard,et al.  Optimal Rates for Regularization of Statistical Inverse Learning Problems , 2016, Found. Comput. Math..

[38]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[39]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[40]  M. Perlman Jensen's inequality for a convex vector-valued function on an infinite-dimensional space , 1974 .

[41]  Pierre Laforgue,et al.  Duality in RKHSs with Infinite Dimensional Outputs: Application to Robust Losses , 2020, ICML.

[42]  Le Song,et al.  Hilbert Space Embeddings of Hidden Markov Models , 2010, ICML.

[43]  Guy Lever,et al.  Modelling transition dynamics in MDPs with RKHS embeddings , 2012, ICML.

[44]  Kenji Fukumizu,et al.  Hilbert Space Embeddings of POMDPs , 2012, UAI.

[45]  Krikamol Muandet,et al.  Regularised Least-Squares Regression with Infinite-Dimensional Output Space , 2020, ArXiv.

[46]  Stéphane Canu,et al.  Operator-valued Kernels for Learning from Functional Response Data , 2015, J. Mach. Learn. Res..

[47]  Alexander J. Smola,et al.  Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[48]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[49]  Arthur Gretton,et al.  Learning Theory for Distribution Regression , 2014, J. Mach. Learn. Res..

[50]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[51]  K. Fukumizu Nonparametric Bayesian Inference with Kernel Mean Embedding , 2015 .

[52]  Heping Zhang,et al.  Conditional Distance Correlation , 2015, Journal of the American Statistical Association.

[53]  Le Song,et al.  Nonparametric Tree Graphical Models , 2010, AISTATS.

[54]  Bernhard Schölkopf,et al.  Learning from Distributions via Support Measure Machines , 2012, NIPS.

[55]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[56]  E. Çinlar Probability and Stochastics , 2011 .

[57]  Don R. Hush,et al.  Optimal Rates for Regularized Least Squares Regression , 2009, COLT.

[58]  Le Song,et al.  Kernel Bayes' rule: Bayesian inference with positive definite kernels , 2013, J. Mach. Learn. Res..

[59]  Charles A. Micchelli,et al.  On Learning Vector-Valued Functions , 2005, Neural Computation.

[60]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[61]  Dudley,et al.  Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .