Approximate Shannon Sampling in Importance Sampling: Nearly Consistent Finite Particle Estimates

In Bayesian inference, we seek to compute information about random variables such as moments or quantiles on the basis of data and prior information. When the distribution of random variables is complicated, Monte Carlo (MC) sampling is usually required. Importance sampling is a standard MC tool for addressing this problem: one generates a collection of samples according to an importance distribution, computes their contribution to an unnormalized density, i.e., the importance weight, and then sums the result followed by normalization. This procedure is asymptotically consistent as the number of MC samples, and hence deltas (particles) that parameterize the density estimate, go to infinity. However, retaining in infnitely many particles is intractable. Thus, we propose a scheme for only keeping a nite representative subset of particles and their augmented importance weights that is nearly consistent.

[1]  Martin Jaggi,et al.  Safe Adaptive Importance Sampling , 2017, NIPS.

[2]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[3]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[4]  Alec Koppel,et al.  Consistent online Gaussian process regression without the sample complexity bottleneck , 2019, Statistics and Computing.

[5]  David Luengo,et al.  Generalized Multiple Importance Sampling , 2015, Statistical Science.

[6]  Robert E. Kass,et al.  Importance sampling: a review , 2010 .

[7]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[8]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[9]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[10]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[11]  Marc G. Bellemare,et al.  Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[12]  Stephen J. Roberts,et al.  A tutorial on variational Bayesian inference , 2012, Artificial Intelligence Review.

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  Alejandro Ribeiro,et al.  Parsimonious Online Learning with Kernels via sparse projections in function space , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[16]  Byoung-Tak Zhang,et al.  Stock Trading by Modelling Price Trend with Dynamic Bayesian Networks , 2004, IDEAL.

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  C. Robert,et al.  Rethinking the Effective Sample Size , 2018, International Statistical Review.

[19]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[20]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[21]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[22]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[23]  B. Rannala,et al.  The Bayesian revolution in genetics , 2004, Nature Reviews Genetics.

[24]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[25]  Stergios B. Fotopoulos,et al.  All of Nonparametric Statistics , 2007, Technometrics.

[26]  A. V. D. Vaart,et al.  Convergence rates of posterior distributions , 2000 .

[27]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[28]  John N. Tsitsiklis,et al.  Asynchronous stochastic approximation and Q-learning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[29]  Peter Richtárik,et al.  Stochastic Dual Coordinate Ascent with Adaptive Probabilities , 2015, ICML.

[30]  Luca Martino,et al.  Improving population Monte Carlo: Alternative weighting and resampling schemes , 2016, Signal Process..

[31]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[32]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[33]  Simo Särkkä,et al.  Bayesian Filtering and Smoothing , 2013, Institute of Mathematical Statistics textbooks.

[34]  Christian P. Robert,et al.  The Bayesian choice : from decision-theoretic foundations to computational implementation , 2007 .

[35]  P. Bickel,et al.  Curse-of-dimensionality revisited: Collapse of the particle filter in very large scale systems , 2008, 0805.3034.

[36]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[37]  Philip S. Thomas,et al.  Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.

[38]  Christian P. Robert,et al.  Accelerating MCMC algorithms , 2018, Wiley interdisciplinary reviews. Computational statistics.

[39]  R. Fortet,et al.  Convergence de la répartition empirique vers la répartition théorique , 1953 .

[40]  Doina Precup,et al.  Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[41]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[42]  James V. Candy,et al.  Bayesian Signal Processing: Classical, Modern and Particle Filtering Methods , 2009 .

[43]  Simo Srkk,et al.  Bayesian Filtering and Smoothing , 2013 .

[44]  Wolfram Burgard,et al.  OctoMap : A Probabilistic , Flexible , and Compact 3 D Map Representation for Robotic Systems , 2010 .

[45]  P. Bickel,et al.  Sharp failure rates for the bootstrap particle filter in high dimensions , 2008, 0805.3287.

[46]  Andreas Krause,et al.  Online Variance Reduction for Stochastic Optimization , 2018, COLT.

[47]  Zeyuan Allen Zhu,et al.  Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling , 2015, ICML.

[48]  Petar M. Djuric,et al.  Adaptive Importance Sampling: The past, the present, and the future , 2017, IEEE Signal Processing Magazine.

[49]  O. Papaspiliopoulos,et al.  Importance Sampling: Intrinsic Dimension and Computational Cost , 2015, 1511.06196.

[50]  Deanna Needell,et al.  Greedy signal recovery review , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[51]  P. Bickel,et al.  Curse-of-dimensionality revisited : Collapse of importance sampling in very large scale systems , 2005 .

[52]  Wei Dai,et al.  Fast variational Bayesian learning for channel estimation with prior statistical information , 2015, 2015 IEEE 16th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[53]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[54]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.