论文信息 - Bayesian Pseudocoresets

Bayesian Pseudocoresets

Standard Bayesian inference algorithms are prohibitively expensive in the regime of modern large-scale data. Recent work has found that a small, weighted subset of data (a coreset) may be used in place of the full dataset during inference, taking advantage of data redundancy to reduce computational cost. However, this approach has limitations in the increasingly common setting of sensitive, high-dimensional data. Indeed, we prove that there are situations in which the Kullback-Leibler (KL) divergence between the optimal coreset and the true posterior grows with data dimension; and as coresets include a subset of the original data, they cannot be constructed in a manner that preserves individual privacy. We address both of these issues with a single unified solution, Bayesian pseudocoresets—a small weighted collection of synthetic “pseudodata”—along with a variational optimization method to select both pseudodata and weights. The use of pseudodata (as opposed to the original datapoints) enables both the summarization of high-dimensional data and the differentially private summarization of sensitive data. Real and synthetic experiments on high-dimensional data demonstrate that Bayesian pseudocoresets achieve significant improvements in posterior approximation error compared to traditional coresets, and that pseudocoresets provide privacy without a significant loss in approximation quality.

[1] Trevor Campbell,et al. Validated Variational Inference via Practical Posterior Error Bounds , 2019, AISTATS.

[2] James R. Foulds,et al. Variational Bayes In Private Settings (VIPS) , 2016, J. Artif. Intell. Res..

[3] Trevor Campbell,et al. Sparse Variational Inference: Bayesian Coresets from Scratch , 2019, NeurIPS.

[4] Rémi Gribonval,et al. Differentially Private Compressive K-means , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5] Trevor Campbell,et al. Data-dependent compression of random features for large-scale kernel approximation , 2019, AISTATS.

[6] Yu-Xiang Wang,et al. Subsampled Rényi Differential Privacy and Analytical Moments Accountant , 2018, AISTATS.

[7] Trevor Campbell,et al. Automated Scalable Bayesian Inference via Hilbert Coresets , 2017, J. Mach. Learn. Res..

[8] David B. Dunson,et al. Robust Bayesian Inference via Coarsening , 2015, Journal of the American Statistical Association.

[9] Trevor Campbell,et al. Bayesian Coreset Construction via Greedy Iterative Geodesic Ascent , 2018, ICML.

[10] Bernhard Schölkopf,et al. Differentially Private Database Release via Kernel Mean Embeddings , 2017, ICML.

[11] Max Welling,et al. VAE with a VampPrior , 2017, AISTATS.

[12] Ryan P. Adams,et al. PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference , 2017, NIPS.

[13] John O'Leary,et al. Unbiased Markov chain Monte Carlo with couplings , 2017, 1708.03625.

[14] Dan Feldman,et al. Coresets for Differentially Private K-Means Clustering and Applications to Privacy in Mobile Sensor Networks , 2017, 2017 16th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[15] Antti Honkela,et al. Differentially Private Variational Inference for Non-conjugate Models , 2016, UAI.

[16] David M. Blei,et al. Robust Probabilistic Modeling with Bayesian Data Reweighting , 2016, ICML.