论文信息 - A Measure-Theoretic Approach to Kernel Conditional Mean Embeddings

A Measure-Theoretic Approach to Kernel Conditional Mean Embeddings

We present a new operator-free, measure-theoretic definition of the conditional mean embedding as a random variable taking values in a reproducing kernel Hilbert space. While the kernel mean embedding of marginal distributions has been defined rigorously, the existing operator-based approach of the conditional version lacks a rigorous definition, and depends on strong assumptions that hinder its analysis. Our definition does not impose any of the assumptions that the operator-based counterpart requires. We derive a natural regression interpretation to obtain empirical estimates, and provide a thorough analysis of its properties, including universal consistency. As natural by-products, we obtain the conditional analogues of the Maximum Mean Discrepancy and Hilbert-Schmidt Independence Criterion, and demonstrate their behaviour via simulations.

Krikamol Muandet | Junhyung Park

[1] Friedrich Sauvigny,et al. Linear Operators in Hilbert Spaces , 2012 .

[2] Le Song,et al. A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[3] Bernhard Schölkopf,et al. Consistent Kernel Mean Estimation for Functions of Random Variables , 2016, NIPS.

[4] Zoltán Szabó,et al. Characteristic and Universal Tensor Product Kernels , 2017, J. Mach. Learn. Res..

[5] A. T. Bharucha-Reid,et al. Random Integral Equations , 2012 .

[6] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[7] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .

[8] Michael Rabadi,et al. Kernel Methods for Machine Learning , 2015 .

[9] Jun Zhu,et al. Conditional Generative Moment-Matching Networks , 2016, NIPS.

[10] Bernhard Schölkopf,et al. Computing functions of random variables via reproducing kernel Hilbert space representations , 2015, Statistics and Computing.

[11] Bernhard Schölkopf,et al. Kernel Measures of Conditional Dependence , 2007, NIPS.

[12] Michael I. Jordan,et al. Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[13] A. Berlinet,et al. Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[14] N. Dinculeanu. Vector Integration and Stochastic Integration in Banach Spaces , 2000, Oxford Handbooks Online.

[15] J. K. Hunter,et al. Measure Theory , 2007 .

[16] Carlos Guestrin,et al. Nonparametric Tree Graphical Models via Kernel Embeddings , 2010 .

[17] Le Song,et al. A Kernel Statistical Test of Independence , 2007, NIPS.

[18] C. Scovel,et al. Separability of reproducing kernel spaces , 2015, 1506.04288.

[19] Bernhard Schölkopf,et al. Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[20] Bharath K. Sriperumbudur,et al. On Distance and Kernel Measures of Conditional Independence , 2019, 1912.01103.

[21] Le Song,et al. A unified kernel framework for nonparametric inference in graphical models ] Kernel Embeddings of Conditional Distributions , 2013 .

[22] Andreas Christmann,et al. Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[23] Yee Whye Teh,et al. Causal Inference via Kernel Deviance Measures , 2018, NeurIPS.

[24] Ingo Steinwart,et al. On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[25] C. Carmeli,et al. Vector valued reproducing kernel Hilbert spaces and universality , 2008, 0807.1659.

[26] Kenji Fukumizu,et al. Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[27] Arthur Gretton,et al. A Kernel Test of Goodness of Fit , 2016, ICML.

[28] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[29] A. Caponnetto,et al. Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[30] Guy Lever,et al. Conditional mean embeddings as regressors , 2012, ICML.

[31] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[32] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[33] C. Carmeli,et al. VECTOR VALUED REPRODUCING KERNEL HILBERT SPACES OF INTEGRABLE FUNCTIONS AND MERCER THEOREM , 2006 .

[34] Bernhard Schölkopf,et al. Towards a Learning Theory of Causation , 2015, 1502.02398.

[35] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[36] Ingmar Schuster,et al. A Rigorous Theory of Conditional Mean Embeddings , 2020, SIAM J. Math. Data Sci..

[37] Gilles Blanchard,et al. Optimal Rates for Regularization of Statistical Inverse Learning Problems , 2016, Found. Comput. Math..

[38] Qiang Liu,et al. A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[39] Bernhard Schölkopf,et al. A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[40] M. Perlman. Jensen's inequality for a convex vector-valued function on an infinite-dimensional space , 1974 .

[41] Pierre Laforgue,et al. Duality in RKHSs with Infinite Dimensional Outputs: Application to Robust Losses , 2020, ICML.

[42] Le Song,et al. Hilbert Space Embeddings of Hidden Markov Models , 2010, ICML.

[43] Guy Lever,et al. Modelling transition dynamics in MDPs with RKHS embeddings , 2012, ICML.

[44] Kenji Fukumizu,et al. Hilbert Space Embeddings of POMDPs , 2012, UAI.

[45] Krikamol Muandet,et al. Regularised Least-Squares Regression with Infinite-Dimensional Output Space , 2020, ArXiv.

[46] Stéphane Canu,et al. Operator-valued Kernels for Learning from Functional Response Data , 2015, J. Mach. Learn. Res..

[47] Alexander J. Smola,et al. Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[48] Bernhard Schölkopf,et al. Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[49] Arthur Gretton,et al. Learning Theory for Distribution Regression , 2014, J. Mach. Learn. Res..

[50] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[51] K. Fukumizu. Nonparametric Bayesian Inference with Kernel Mean Embedding , 2015 .

[52] Heping Zhang,et al. Conditional Distance Correlation , 2015, Journal of the American Statistical Association.

[53] Le Song,et al. Nonparametric Tree Graphical Models , 2010, AISTATS.

[54] Bernhard Schölkopf,et al. Learning from Distributions via Support Measure Machines , 2012, NIPS.

[55] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..

[56] E. Çinlar. Probability and Stochastics , 2011 .

[57] Don R. Hush,et al. Optimal Rates for Regularized Least Squares Regression , 2009, COLT.

[58] Le Song,et al. Kernel Bayes' rule: Bayesian inference with positive definite kernels , 2013, J. Mach. Learn. Res..

[59] Charles A. Micchelli,et al. On Learning Vector-Valued Functions , 2005, Neural Computation.

[60] Bernhard Schölkopf,et al. Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[61] Dudley,et al. Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .