论文信息 - Partitioned Variational Inference: A unified framework encompassing federated and continual learning

Partitioned Variational Inference: A unified framework encompassing federated and continual learning

Variational inference (VI) has become the method of choice for fitting many modern probabilistic models. However, practitioners are faced with a fragmented literature that offers a bewildering array of algorithmic options. First, the variational family. Second, the granularity of the updates e.g. whether the updates are local to each data point and employ message passing or global. Third, the method of optimization (bespoke or blackbox, closed-form or stochastic updates, etc.). This paper presents a new framework, termed Partitioned Variational Inference (PVI), that explicitly acknowledges these algorithmic dimensions of VI, unifies disparate literature, and provides guidance on usage. Crucially, the proposed PVI framework allows us to identify new ways of performing VI that are ideally suited to challenging learning scenarios including federated learning (where distributed computing is leveraged to process non-centralized data) and continual learning (where new data and tasks arrive over time and must be accommodated quickly). We showcase these new capabilities by developing communication-efficient federated training of Bayesian neural networks and continual learning for Gaussian process models with private pseudo-points. The new methods significantly outperform the state-of-the-art, whilst being almost as straightforward to implement as standard VI.

[1] Bo Wang,et al. Lack of Consistency of Mean Field and Variational Bayes Approximations for State Space Models , 2004, Neural Processing Letters.

[2] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[3] James Hensman,et al. Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models , 2018, AISTATS.

[4] Roni Khardon,et al. A Fixed-Point Operator for Inference in Variational Bayesian Latent Gaussian Models , 2016, AISTATS.

[5] Richard E. Turner,et al. A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation , 2016, J. Mach. Learn. Res..

[6] Tom Minka,et al. Non-conjugate Variational Message Passing for Multinomial and Binary Regression , 2011, NIPS.

[7] Julien Cornebise,et al. Weight Uncertainty in Neural Network , 2015, ICML.

[8] Richard S. Sutton,et al. Online Learning with Random Representations , 1993, ICML.

[9] Charles M. Bishop,et al. Variational Message Passing , 2005, J. Mach. Learn. Res..

[10] Neil D. Lawrence,et al. Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[11] J. Cunningham,et al. Expectation Propagation as a Way of Life , 2020 .

[12] Antti Honkela,et al. On-line Variational Bayesian Learning , 2003 .

[13] Sayan Mukherjee,et al. The Information Geometry of Mirror Descent , 2013, IEEE Transactions on Information Theory.

[14] C. Archambeau,et al. Incremental Variational Inference for Latent Dirichlet Allocation , 2015, 1507.05016.

[15] Yue Zhao,et al. Federated Learning with Non-IID Data , 2018, ArXiv.

[16] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17] Neil D. Lawrence,et al. Gaussian Processes for Big Data , 2013, UAI.

[18] Richard E. Turner,et al. Two problems with variational expectation maximisation for time-series models , 2011 .

[19] T. Jaakkola,et al. Improving the Mean Field Approximation Via the Use of Mixture Distributions , 1999, Learning in Graphical Models.

[20] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[21] Fabio Tozeto Ramos,et al. Gaussian process occupancy maps* , 2012, Int. J. Robotics Res..

[22] Nathan D. Cahill,et al. New Metrics and Experimental Paradigms for Continual Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..

[24] Matthew D. Hoffman,et al. A trust-region method for stochastic variational inference with applications to streaming data , 2015, ICML.

[25] Sebastian Nowozin,et al. Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks , 2017, ICML.

[26] Arnaud Doucet,et al. Sequential Monte Carlo Methods to Train Neural Network Models , 2000, Neural Computation.

[27] Michalis K. Titsias,et al. Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[28] Davide Maltoni,et al. CORe50: a New Dataset and Benchmark for Continuous Object Recognition , 2017, CoRL.

[29] Xiangyu Wang,et al. Parallelizing MCMC via Weierstrass Sampler , 2013, 1312.4605.

[30] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[31] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[32] Yee Whye Teh,et al. Distributed Bayesian Learning with Stochastic Natural Gradient Expectation Propagation and the Posterior Server , 2015, J. Mach. Learn. Res..

[33] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[34] Thomas P. Minka,et al. Probabilistic Programming with Infer.NET , 2017 .

[35] Max Welling,et al. Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[36] William T. Freeman,et al. Nonparametric belief propagation , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[37] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[38] Andre Wibisono,et al. Streaming Variational Bayes , 2013, NIPS.

[39] Geoffrey E. Hinton,et al. Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[40] James Hensman,et al. Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[41] R Ratcliff,et al. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[42] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.

[43] James Hensman,et al. On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes , 2015, AISTATS.

[44] Pascal Fua,et al. Kullback-Leibler Proximal Variational Inference , 2015, NIPS.

[45] Philip H. S. Torr,et al. Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence , 2018, ECCV.

[46] Radford M. Neal. Bayesian Learning via Stochastic Dynamics , 1992, NIPS.

[47] M. Opper. Sparse Online Gaussian Processes , 2008 .

[48] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.