Kernel Bayes' Rule

A nonparametric kernel-based method for realizing Bayes' rule is proposed, based on kernel representations of probabilities in reproducing kernel Hilbert spaces. The prior and conditional probabilities are expressed as empirical kernel mean and covariance operators, respectively, and the kernel mean of the posterior distribution is computed in the form of a weighted sample. The kernel Bayes' rule can be applied to a wide variety of Bayesian inference problems: we demonstrate Bayesian computation without likelihood, and filtering with a nonparametric state-space model. A consistency rate for the posterior estimate is established.

[1]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[2]  C. Baker Joint measures and cross-covariance operators , 1973 .

[3]  M. Rudemo Empirical Choice of Histograms and Kernel Density Estimators , 1982 .

[4]  A. Bowman An alternative method of cross-validation for the smoothing of density estimates , 1984 .

[5]  S. MacEachern Estimating normal means with a conjugate style dirichlet process prior , 1994 .

[6]  H. Engl,et al.  Regularization of Inverse Problems , 1996 .

[7]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[8]  P. Donnelly,et al.  Inferring coalescence times from DNA sequence data. , 1997, Genetics.

[9]  A. Kankainen,et al.  A consistent modification of a test for independence based on the empirical characteristic function , 1998 .

[10]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[11]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[12]  John Langford,et al.  Monte Carlo Hidden Markov Models: Learning Non-Parametric Models of Partially Observable Stochastic Processes , 1999, ICML.

[13]  Jun S. Liu,et al.  Sequential importance sampling for nonparametric Bayes models: The next generation , 1999 .

[14]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[15]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[16]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[18]  Fernando A. Quintana,et al.  Nonparametric Bayesian data analysis , 2004 .

[19]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[20]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[21]  Lorenzo Rosasco,et al.  Learning from Examples as an Inverse Problem , 2005, J. Mach. Learn. Res..

[22]  Michael A. West,et al.  Hierarchical priors and mixture models, with applications in regression and density estimation , 2006 .

[23]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[24]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[25]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[26]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[27]  Kenji Fukumizu,et al.  Statistical Consistency of Kernel Canonical Correlation Analysis , 2007 .

[28]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[29]  S. Smale,et al.  Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[30]  Mark M. Tanaka,et al.  Sequential Monte Carlo without likelihoods , 2007, Proceedings of the National Academy of Sciences.

[31]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[32]  P. Marteau,et al.  L1-convergence of smoothing densities in non-parametric state space models , 2008 .

[33]  Alex Smola,et al.  Kernel methods in machine learning , 2007, math/0701907.

[34]  Bernhard Schölkopf,et al.  Characteristic Kernels on Groups and Semigroups , 2008, NIPS.

[35]  Alexander J. Smola,et al.  Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[36]  Michael I. Jordan,et al.  Kernel dimension reduction in regression , 2009, 0908.1854.

[37]  Zaïd Harchaoui,et al.  A Fast, Consistent Kernel Two-Sample Test , 2009, NIPS.

[38]  Bharath K. Sriperumbudur,et al.  Discussion of: Brownian distance covariance , 2009, 1010.0836.

[39]  Kenji Fukumizu,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[40]  Le Song,et al.  Hilbert Space Embeddings of Hidden Markov Models , 2010, ICML.

[41]  Takafumi Kanamori,et al.  Conditional Density Estimation via Least-Squares Density Ratio Estimation , 2010, AISTATS.

[42]  Carlos Guestrin,et al.  Nonparametric Tree Graphical Models via Kernel Embeddings , 2010 .

[43]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[44]  Le Song,et al.  Kernel Belief Propagation , 2011, AISTATS.

[45]  K. Fukumizu,et al.  Kernel Embeddings of Conditional Distributions: A Unified Kernel Framework for Nonparametric Inference in Graphical Models , 2013, IEEE Signal Process. Mag..

[46]  K. Fukumizu,et al.  Kernel Monte Carlo Filter , 2013 .

[47]  Le Song,et al.  Kernel Bayes' rule: Bayesian inference with positive definite kernels , 2013, J. Mach. Learn. Res..