论文信息 - Conditional mean embeddings as regressors - supplementary

Conditional mean embeddings as regressors - supplementary

We demonstrate an equivalence between reproducing kernel Hilbert space (RKHS) embeddings of conditional distributions and vector-valued regressors. This connection introduces a natural regularized loss function which the RKHS embeddings minimise, providing an intuitive understanding of the embeddings and a justification for their use. Furthermore, the equivalence allows the application of vector-valued regression methods and results to the problem of learning conditional distributions. Using this link we derive a sparse version of the embedding by considering alternative formulations. Further, by applying convergence results for vector-valued regression to the embedding problem we derive minimax convergence rates which are O(\log(n)/n) -- compared to current state of the art rates of O(n^{-1/4}) -- and are valid under milder and more intuitive assumptions. These minimax upper rates coincide with lower rates up to a logarithmic factor, showing that the embedding method achieves nearly optimal rates. We study our sparse embedding algorithm in a reinforcement learning task where the algorithm shows significant improvement in sparsity over an incomplete Cholesky decomposition.

[1] Carl E. Rasmussen,et al. Gaussian process dynamic programming , 2009, Neurocomputing.

[2] A. Caponnetto,et al. Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[3] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.

[4] Le Song,et al. Kernel Bayes' Rule , 2010, NIPS.

[5] C. Carmeli,et al. VECTOR VALUED REPRODUCING KERNEL HILBERT SPACES OF INTEGRABLE FUNCTIONS AND MERCER THEOREM , 2006 .

[6] Guy Lever,et al. Conditional mean embeddings as regressors , 2012, ICML.

[7] Bernhard Schölkopf,et al. Kernel Measures of Conditional Dependence , 2007, NIPS.

[8] A. Berlinet,et al. Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[9] Alexander J. Smola,et al. Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[10] Charles A. Micchelli,et al. On Learning Vector-Valued Functions , 2005, Neural Computation.

[11] Le Song,et al. Nonparametric Tree Graphical Models , 2010, AISTATS.

[12] Guy Lever,et al. Modelling transition dynamics in MDPs with RKHS embeddings , 2012, ICML.

[13] Le Song,et al. A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[14] Michael I. Jordan,et al. Kernel dimension reduction in regression , 2009, 0908.1854.

[15] Charles A. Micchelli,et al. Universal Multi-Task Kernels , 2008, J. Mach. Learn. Res..

[16] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[17] O. Kallenberg. Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.