Convergence rates for optimised adaptive importance samplers

Adaptive importance samplers are adaptive Monte Carlo algorithms to estimate expectations with respect to some target distribution which adapt themselves to obtain better estimators over a sequence of iterations. Although it is straightforward to show that they have the same $$\mathcal {O}(1/\sqrt{N})$$ O ( 1 / N ) convergence rate as standard importance samplers, where N is the number of Monte Carlo samples, the behaviour of adaptive importance samplers over the number of iterations has been left relatively unexplored. In this work, we investigate an adaptation strategy based on convex optimisation which leads to a class of adaptive importance samplers termed optimised adaptive importance samplers (OAIS). These samplers rely on the iterative minimisation of the $$\chi ^2$$ χ 2 -divergence between an exponential family proposal and the target. The analysed algorithms are closely related to the class of adaptive importance samplers which minimise the variance of the weight function. We first prove non-asymptotic error bounds for the mean squared errors (MSEs) of these algorithms, which explicitly depend on the number of iterations and the number of samples together. The non-asymptotic bounds derived in this paper imply that when the target belongs to the exponential family, the $$L_2$$ L 2 errors of the optimised samplers converge to the optimal rate of $$\mathcal {O}(1/\sqrt{N})$$ O ( 1 / N ) and the rate of convergence in the number of iterations are explicitly provided. When the target does not belong to the exponential family, the rate of convergence is the same but the asymptotic $$L_2$$ L 2 error increases by a factor $$\sqrt{\rho ^\star } > 1$$ ρ ⋆ > 1 , where $$\rho ^\star - 1$$ ρ ⋆ - 1 is the minimum $$\chi ^2$$ χ 2 -divergence between the target and an exponential family proposal.

[1]  B. Arouna Robbins–Monro algorithms and variance reduction in finance , 2003 .

[2]  Bouhari Arouna,et al.  Adaptative Monte Carlo Method, A Variance Reduction Technique , 2004, Monte Carlo Methods Appl..

[3]  Jean-Michel Marin,et al.  Convergence of Adaptive Sampling Schemes , 2004 .

[4]  O. Cappé,et al.  Population Monte Carlo , 2004 .

[5]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[6]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[7]  H. Robbins A Stochastic Approximation Method , 1951 .

[8]  R. Douc,et al.  Convergence of Adaptive Sampling Schemes , 2007, 0708.0711.

[9]  R. Kawai Adaptive Monte Carlo Variance Reduction for Lévy Processes with Two-Time-Scale Stochastic Approximation , 2008 .

[10]  Jean-Michel Marin,et al.  Adaptive importance sampling in general mixture classes , 2007, Stat. Comput..

[11]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[12]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[13]  Bernard Lapeyre,et al.  A framework for adaptive Monte Carlo procedures , 2011, Monte Carlo Methods Appl..

[14]  Dan Crisan,et al.  Particle-kernel estimation of the filter density in state-space models , 2011, 1111.5866.

[15]  Ohad Shamir,et al.  Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.

[16]  Adaptive Importance Sampling via Stochastic Convex Programming , 2014, 1412.4845.

[17]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[18]  O. Papaspiliopoulos,et al.  Importance Sampling: Intrinsic Dimension and Computational Cost , 2015, 1511.06196.

[19]  Mónica F. Bugallo,et al.  Adaptive importance sampling in signal processing , 2015, Digit. Signal Process..

[20]  Hilbert J. Kappen,et al.  Adaptive Importance Sampling for Control and Inference , 2015, ArXiv.

[21]  P. Diaconis,et al.  The sample size required in importance sampling , 2015, 1511.01437.

[22]  Petar M. Djuric,et al.  Adaptive Importance Sampling: The past, the present, and the future , 2017, IEEE Signal Processing Magazine.

[23]  Reiichiro Kawai,et al.  Acceleration on Adaptive Importance Sampling with Sample Average Approximation , 2017, SIAM J. Sci. Comput..

[24]  Adji B. Dieng,et al.  Variational Inference via χ Upper Bound Minimization , 2017 .

[25]  A. Doucet,et al.  Asymptotic bias of stochastic gradient search , 2017 .

[26]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[27]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[28]  Jukka Corander,et al.  Layered adaptive importance sampling , 2015, Statistics and Computing.

[29]  Dustin Tran,et al.  Variational Inference via \chi Upper Bound Minimization , 2016, NIPS.

[30]  Reiichiro Kawai Optimizing Adaptive Importance Sampling by Stochastic Approximation , 2018, SIAM J. Sci. Comput..

[31]  Daniel Sanz-Alonso,et al.  Importance Sampling and Necessary Sample Size: An Information Theory Approach , 2016, SIAM/ASA J. Uncertain. Quantification.

[32]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[33]  Prateek Jain,et al.  Making the Last Iterate of SGD Information Theoretically Optimal , 2019, COLT.

[34]  Ömer Deniz Akyildiz,et al.  Nonasymptotic Estimates for Stochastic Gradient Langevin Dynamics Under Local Conditions in Nonconvex Optimization , 2019, Applied Mathematics & Optimization.

[35]  Omer Deniz Akyildiz,et al.  Nonasymptotic analysis of Stochastic Gradient Hamiltonian Monte Carlo under local conditions for nonconvex optimization , 2020, 2002.05465.