Scalable and Robust Bayesian Inference via the Median Posterior

Many Bayesian learning methods for massive data benefit from working with small subsets of observations. In particular, significant progress has been made in scalable Bayesian learning via stochastic approximation. However, Bayesian learning methods in distributed computing environments are often problem- or distributionspecific and use ad hoc techniques. We propose a novel general approach to Bayesian inference that is scalable and robust to corruption in the data. Our technique is based on the idea of splitting the data into several non-overlapping subgroups, evaluating the posterior distribution given each independent subgroup, and then combining the results. Our main contribution is the proposed aggregation step which is based on finding the geometric median of subset posterior distributions. Presented theoretical and numerical results confirm the advantages of our approach.

[1]  Prosenjit Bose,et al.  Fast approximations for sums of distances, clustering and the Fermat-Weber problem , 2003, Computational geometry.

[2]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[3]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[4]  A. V. D. Vaart,et al.  Convergence rates of posterior distributions , 2000 .

[5]  P. Zitt,et al.  Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm , 2011, 1101.4316.

[6]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[7]  Alexander J. Smola,et al.  An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[8]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[9]  Stanislav Minsker Geometric median and robust estimation in Banach spaces , 2013, 1308.1334.

[10]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[11]  Ahn Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[12]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[13]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[14]  Christian Rieger,et al.  Deterministic Error Analysis of Support Vector Regression and Related Regularized Kernel Methods , 2009, J. Mach. Learn. Res..

[15]  John C. Duchi,et al.  Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[16]  Frank Nielsen,et al.  Statistical exponential families: A digest with flash cards , 2009, ArXiv.

[17]  Amir Beck,et al.  Weiszfeld’s Method: Old and New Results , 2015, J. Optim. Theory Appl..

[18]  Iain Murray,et al.  A framework for evaluating approximation methods for Gaussian process regression , 2012, J. Mach. Learn. Res..

[19]  M. Lerasle,et al.  ROBUST EMPIRICAL MEAN ESTIMATORS , 2011, 1112.3914.

[20]  Daniel J. Hsu,et al.  Loss Minimization and Parameter Estimation with Heavy Tails , 2013, J. Mach. Learn. Res..

[21]  Xiangyu Wang,et al.  Parallel MCMC via Weierstrass Sampler , 2013, ArXiv.

[22]  D D Baird,et al.  Preimplantation urinary hormone profiles and the probability of conception in healthy women. , 1999, Fertility and sterility.

[23]  Edward I. George,et al.  Bayes and big data: the consensus Monte Carlo algorithm , 2016, Big Data and Information Theory.

[24]  Kenji Fukumizu,et al.  On integral probability metrics, φ-divergences and binary classification , 2009, 0901.2698.

[25]  Max Welling,et al.  Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[26]  Chong Wang,et al.  Asymptotically Exact, Embarrassingly Parallel MCMC , 2013, UAI.

[27]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[28]  Cun-Hui Zhang,et al.  The multivariate L1-median and associated data depth. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[29]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .