Meta-Learning Divergences for Variational Inference

Variational inference (VI) plays an essential role in approximate Bayesian inference due to its computational efficiency and broad applicability. Crucial to the performance of VI is the selection of the associated divergence measure, as VI approximates the intractable distribution by minimizing this divergence. In this paper we propose a meta-learning algorithm to learn the divergence metric suited for the task of interest, automating the design of VI methods. In addition, we learn the initialization of the variational parameters without additional cost when our method is deployed in the few-shot learning scenarios. We demonstrate our approach outperforms standard VI on Gaussian mixture distribution approximation, Bayesian neural network regression, image generation with variational autoencoders and recommender systems with a partial variational autoencoder.

[1]  Sergey Levine,et al.  Meta-Learning with Implicit Gradients , 2019, NeurIPS.

[2]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[3]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[4]  Richard E. Turner,et al.  Rényi Divergence Variational Inference , 2016, NIPS.

[5]  Sebastian Nowozin,et al.  EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE , 2018, ICML.

[6]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[7]  Ron Meir,et al.  Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory , 2017, ICML.

[8]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[9]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[10]  Dustin Tran,et al.  Hierarchical Variational Models , 2015, ICML.

[11]  Andrzej Cichocki,et al.  Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.

[12]  Hedvig Kjellström,et al.  Advances in Variational Inference , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[14]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[15]  Alex Beatson,et al.  Amortized Bayesian Meta-Learning , 2018, ICLR.

[16]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[17]  G. Crooks On Measures of Entropy and Information , 2015 .

[18]  Finale Doshi-Velez,et al.  Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks , 2016, ICLR.

[19]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[20]  Hao Liu,et al.  Variational Inference with Tail-adaptive f-Divergence , 2018, NeurIPS.

[21]  Razvan Pascanu,et al.  Meta-Learning with Warped Gradient Descent , 2020, ICLR.

[22]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[23]  Stuart J. Russell,et al.  Meta-Learning MCMC Proposals , 2017, NeurIPS.

[24]  Richard E. Turner,et al.  Black-box α-divergence minimization , 2016, ICML 2016.

[25]  Tim Salimans,et al.  Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression , 2012, ArXiv.

[26]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[27]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[28]  José Miguel Hernández-Lobato,et al.  Meta-Learning for Stochastic Gradient MCMC , 2018, ICLR.

[29]  Yee Whye Teh,et al.  Tighter Variational Bounds are Not Necessarily Better , 2018, ICML.

[30]  Yoshua Bengio,et al.  Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[31]  Richard E. Turner,et al.  Stochastic Expectation Propagation , 2015, NIPS.

[32]  Neil D. Lawrence,et al.  Transferring Knowledge across Learning Processes , 2018, ICLR.

[33]  Manfred Opper,et al.  Perturbative Black Box Variational Inference , 2017, NIPS.

[34]  Guillaume P. Dehaene,et al.  Expectation propagation in the large data limit , 2015, 1503.08060.

[35]  Sergey Levine,et al.  Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[36]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[37]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[38]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[39]  Gustavo L. Gilardoni On Pinsker's and Vajda's Type Inequalities for Csiszár's $f$ -Divergences , 2006, IEEE Transactions on Information Theory.

[40]  David Silver,et al.  Meta-Gradient Reinforcement Learning , 2018, NeurIPS.

[41]  Pieter Abbeel,et al.  Evolved Policy Gradients , 2018, NeurIPS.

[42]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[43]  José Miguel Hernández-Lobato,et al.  Partial VAE for Hybrid Recommender System , 2018 .

[44]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.