论文信息 - Variational Inference with Tail-adaptive f-Divergence - 字舞流文

Variational Inference with Tail-adaptive f-Divergence

Variational inference with α-divergences has been widely used in modern probabilistic machine learning. Compared to Kullback-Leibler (KL) divergence, a major advantage of using α-divergences (with positive α values) is their mass-covering property. However, estimating and optimizing α-divergences require to use importance sampling, which could have extremely large or infinite variances due to heavy tails of importance weights. In this paper, we propose a new class of tail-adaptive f-divergences that adaptively change the convex function f with the tail of the importance weights, in a way that theoretically guarantee finite moments, while simultaneously achieving mass-covering properties. We test our methods on Bayesian neural networks, as well as deep reinforcement learning in which our method is applied to improve a recent soft actor-critic (SAC) algorithm (Haarnoja et al., 2018). Our results show that our approach yields significant advantages compared with existing methods based on classical KL and α-divergences.

Hao Liu | Qiang Liu | Dilin Wang | Qiang Liu | Hao Liu | Dilin Wang | Hao Liu

[1] B. M. Hill,et al. A Simple General Approach to Inference About the Tail of a Distribution , 1975 .

[2] Tom Minka,et al. Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[3] F. Österreicher. f-DIVERGENCES-REPRESENTATION THEOREM AND METRIZABILITY , 2003 .

[4] Imre Csiszár,et al. Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[5] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[6] Thomas P. Minka,et al. Divergence measures and message passing , 2005 .

[7] Ole Winther,et al. Expectation Consistent Approximate Inference , 2005, J. Mach. Learn. Res..

[8] Shie Mannor,et al. A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[9] Igor Vajda,et al. On Divergences and Informations in Statistics and Information Theory , 2006, IEEE Transactions on Information Theory.

[10] S. Resnick. Heavy-Tail Phenomena: Probabilistic and Statistical Modeling , 2006 .

[11] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[12] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.

[13] Jean-Michel Marin,et al. Adaptive importance sampling in general mixture classes , 2007, Stat. Comput..

[14] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[15] Mark D. Reid,et al. Information, Divergence and Risk for Binary Experiments , 2009, J. Mach. Learn. Res..

[16] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..

[17] Adaptive Importance Sampling via Stochastic Convex Programming , 2014, 1412.4845.

[18] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[19] Sean Gerrish,et al. Black Box Variational Inference , 2013, AISTATS.

[20] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21] A. Gelman,et al. Pareto Smoothed Importance Sampling , 2015, 1507.02646.

[22] Parallel Adaptive Importance Sampling , 2015 .

[23] Ruslan Salakhutdinov,et al. Importance Weighted Autoencoders , 2015, ICLR.

[24] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.

[25] Richard E. Turner,et al. Rényi Divergence Variational Inference , 2016, NIPS.

[26] Richard E. Turner,et al. Black-box α-divergence minimization , 2016, ICML 2016.

[27] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[28] Adji B. Dieng,et al. Variational Inference via χ Upper Bound Minimization , 2017 .

[29] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[30] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[31] Dustin Tran,et al. Variational Inference via \chi Upper Bound Minimization , 2016, NIPS.

[32] Jan Peters,et al. Mean squared advantage minimization as a consequence of entropic policy improvement regularization , 2018 .

[33] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[34] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[35] Igal Sason,et al. On f-Divergences: Integral Representations, Local Behavior, and Inequalities , 2018, Entropy.

[36] Jan Peters,et al. f-Divergence constrained policy improvement , 2017, ArXiv.