Kernel Mean Estimation via Spectral Filtering

The problem of estimating the kernel mean in a reproducing kernel Hilbert space (RKHS) is central to kernel methods in that it is used by classical approaches (e.g., when centering a kernel PCA matrix), and it also forms the core inference step of modern kernel methods (e.g., kernel-based non-parametric tests) that rely on embedding probability distributions in RKHSs. Previous work [1] has shown that shrinkage can help in constructing "better" estimators of the kernel mean than the empirical estimator. The present paper studies the consistency and admissibility of the estimators in [1], and proposes a wider class of shrinkage estimators that improve upon the empirical estimator by considering appropriate basis functions. Using the kernel PCA basis, we show that some of these estimators can be constructed using spectral filtering algorithms which are shown to be consistent under some technical assumptions. Our theoretical analysis also reveals a fundamental connection to the kernel-based supervised learning framework. The proposed estimators are simple to implement and perform well in practice.

[1]  Guy Lever,et al.  Conditional mean embeddings as regressors , 2012, ICML.

[2]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[3]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[4]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[5]  H. Engl,et al.  Regularization of Inverse Problems , 1996 .

[6]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[7]  KimJooSeuk,et al.  Robust kernel density estimation , 2012 .

[8]  Bernhard Schölkopf,et al.  Domain Generalization via Invariant Feature Representation , 2013, ICML.

[9]  Bernhard Schölkopf,et al.  One-Class Support Measure Machines for Group Anomaly Detection , 2013, UAI.

[10]  Lorenzo Rosasco,et al.  On regularization algorithms in learning theory , 2007, J. Complex..

[11]  Alexander J. Smola,et al.  Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[12]  Le Song,et al.  Tailoring density estimation via reproducing kernel moment matching , 2008, ICML '08.

[13]  Lorenzo Rosasco,et al.  Vector Field Learning via Spectral Filtering , 2010, ECML/PKDD.

[14]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[15]  A. Verri,et al.  Spectral Methods for Regularization in Learning Theory , 2006 .

[16]  Bernhard Schölkopf,et al.  Kernel Mean Estimation and Stein Effect , 2013, ICML.

[17]  Le Song,et al.  Robust Low Rank Kernel Embeddings of Multivariate Distributions , 2013, NIPS.

[18]  John Shawe-Taylor,et al.  Smooth Operators , 2013, ICML.

[19]  Mikhail Belkin,et al.  On Learning with Integral Operators , 2010, J. Mach. Learn. Res..

[20]  Bernhard Schölkopf,et al.  Learning from Distributions via Support Measure Machines , 2012, NIPS.

[21]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[22]  Lorenzo Rosasco,et al.  Spectral Algorithms for Supervised Learning , 2008, Neural Computation.

[23]  Le Song,et al.  Kernel Bayes' rule: Bayesian inference with positive definite kernels , 2013, J. Mach. Learn. Res..

[24]  Gene H. Golub,et al.  Generalized cross-validation as a method for choosing a good ridge parameter , 1979, Milestones in Matrix Computation.

[25]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[26]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[27]  Clayton D. Scott,et al.  Robust kernel density estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[29]  Lorenzo Rosasco,et al.  Learning from Examples as an Inverse Problem , 2005, J. Mach. Learn. Res..

[30]  E. Mammen,et al.  Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors , 1997 .

[31]  E. D. Vito,et al.  Learning Sets with Separating Kernels , 2012, 1204.3573.