Bayesian Coresets: An Optimization Perspective

Bayesian coresets have emerged as a promising approach for implementing scalable Bayesian inference. The Bayesian coreset problem involves selecting a (weighted) subset of the data samples, such that posterior inference using the selected subset closely approximates posterior inference using the full dataset. This manuscript revisits Bayesian coresets through the lens of sparsity constrained optimization. Leveraging recent advances in accelerated optimization methods, we propose and analyze a novel algorithm for coreset selection. We provide explicit convergence rate guarantees and present an empirical evaluation on a variety of benchmark datasets to highlight our proposed algorithm's superior performance compared to state of the art on speed and accuracy.

[1]  Michael I. Jordan,et al.  Sampling can be faster than optimization , 2018, Proceedings of the National Academy of Sciences.

[2]  Trevor Campbell,et al.  Sparse Variational Inference: Bayesian Coresets from Scratch , 2019, NeurIPS.

[3]  Deanna Needell,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[4]  Mike E. Davies,et al.  Normalized Iterative Hard Thresholding: Guaranteed Stability and Performance , 2010, IEEE Journal of Selected Topics in Signal Processing.

[5]  Volkan Cevher,et al.  Recipes on hard thresholding methods , 2011, 2011 4th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[6]  Thomas Blumensath,et al.  Accelerated iterative hard thresholding , 2012, Signal Process..

[7]  Edward I. George,et al.  Bayes and big data: the consensus Monte Carlo algorithm , 2016, Big Data and Information Theory.

[8]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[9]  Euhanna Ghadimi,et al.  Global convergence of the Heavy-ball method for convex optimization , 2014, 2015 European Control Conference (ECC).

[10]  Olgica Milenkovic,et al.  Subspace Pursuit for Compressive Sensing Signal Reconstruction , 2008, IEEE Transactions on Information Theory.

[11]  Max Welling,et al.  Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[12]  Volkan Cevher,et al.  Model-Based Compressive Sensing , 2008, IEEE Transactions on Information Theory.

[13]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[14]  Volkan Cevher,et al.  Matrix Recipes for Hard Thresholding Methods , 2012, Journal of Mathematical Imaging and Vision.

[15]  Simon Foucart,et al.  Hard Thresholding Pursuit: An Algorithm for Compressive Sensing , 2011, SIAM J. Numer. Anal..

[16]  Trevor Campbell,et al.  Bayesian Coreset Construction via Greedy Iterative Geodesic Ascent , 2018, ICML.

[17]  Tong Zhang,et al.  Gradient Hard Thresholding Pursuit , 2018, J. Mach. Learn. Res..

[18]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[19]  Ahn,et al.  Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[20]  Bhiksha Raj,et al.  Greedy sparsity-constrained optimization , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[21]  S. Sanghavi,et al.  A general framework for high-dimensional estimation in the presence of incoherence , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[22]  Michael I. Jordan,et al.  Variational Consensus Monte Carlo , 2015, NIPS.

[23]  Anastasios Kyrillidis,et al.  IHT dies hard: Provable accelerated Iterative Hard Thresholding , 2017, AISTATS.

[24]  Ryan P. Adams,et al.  Firefly Monte Carlo: Exact MCMC with Subsets of Data , 2014, UAI.

[25]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[26]  Anastasios Kyrillidis,et al.  Learning Sparse Distributions using Iterative Hard Thresholding , 2019, NeurIPS.

[27]  Oluwasanmi Koyejo,et al.  On Prior Distributions and Approximate Inference for Structured Variables , 2014, NIPS.

[28]  Tong Zhang,et al.  Trading Accuracy for Sparsity in Optimization Problems with Sparsity Constraints , 2010, SIAM J. Optim..

[29]  Deanna Needell,et al.  Linear Convergence of Stochastic Iterative Greedy Algorithms With Sparse Constraints , 2014, IEEE Transactions on Information Theory.

[30]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[31]  Volkan Cevher,et al.  WASP: Scalable Bayes via barycenters of subset posteriors , 2015, AISTATS.

[32]  E. Candès The restricted isometry property and its implications for compressed sensing , 2008 .

[33]  Volkan Cevher,et al.  Structured Sparsity: Discrete and Convex approaches , 2015, ArXiv.

[34]  Gunnar Rätsch,et al.  Boosting Black Box Variational Inference , 2018, NeurIPS.

[35]  Oluwasanmi Koyejo,et al.  Information Projection and Approximate Inference for Structured Sparse Variables , 2016, AISTATS.

[36]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[37]  Inderjit S. Dhillon,et al.  Structured Sparse Regression via Greedy Hard Thresholding , 2016, NIPS.

[38]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[39]  Tuo Zhao,et al.  Stochastic Variance Reduced Optimization for Nonconvex Sparse Learning , 2016, ICML.

[40]  Gunnar Rätsch,et al.  Boosting Variational Inference: an Optimization Perspective , 2017, AISTATS.

[41]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[42]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[43]  Volkan Cevher,et al.  Group-Sparse Model Selection: Hardness and Relaxations , 2013, IEEE Transactions on Information Theory.

[44]  Trevor Campbell,et al.  Coresets for Scalable Bayesian Logistic Regression , 2016, NIPS.

[45]  Prateek Jain,et al.  On Iterative Hard Thresholding Methods for High-dimensional M-Estimation , 2014, NIPS.

[46]  R. DeVore,et al.  A Simple Proof of the Restricted Isometry Property for Random Matrices , 2008 .

[47]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[48]  Trevor Campbell,et al.  Automated Scalable Bayesian Inference via Hilbert Coresets , 2017, J. Mach. Learn. Res..