β-Cores: Robust Large-Scale Bayesian Data Summarization in the Presence of Outliers

Modern machine learning applications should be able to address the intrinsic challenges arising over inference on massive real-world datasets, including scalability and robustness to outliers. Despite the multiple benefits of Bayesian methods (such as uncertainty-aware predictions, incorporation of experts knowledge, and hierarchical modeling), the quality of classic Bayesian inference depends critically on whether observations conform with the assumed data generating model, which is impossible to guarantee in practice. In this work, we propose a variational inference method that, in a principled way, can simultaneously scale to large datasets, and robustify the inferred posterior with respect to the existence of outliers in the observed data. Reformulating Bayes theorem via the $\beta$-divergence, we posit a robustified pseudo-Bayesian posterior as the target of inference. Moreover, relying on the recent formulations of Riemannian coresets for scalable Bayesian inference, we propose a sparse variational approximation of the robustified posterior and an efficient stochastic black-box algorithm to construct it. Overall our method allows releasing cleansed data summaries that can be applied broadly in scenarios including structured data corruption. We illustrate the applicability of our approach in diverse simulated and real datasets, and various statistical models, including Gaussian mean inference, logistic and neural linear regression, demonstrating its superiority to existing Bayesian summarization methods in the presence of outliers.

[1]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[2]  Chao Gao,et al.  Robust covariance and scatter matrix estimation under Huber’s contamination model , 2015, The Annals of Statistics.

[3]  Trevor Campbell,et al.  Bayesian Pseudocoresets , 2020, Neural Information Processing Systems.

[4]  Trevor Campbell,et al.  Automated Scalable Bayesian Inference via Hilbert Coresets , 2017, J. Mach. Learn. Res..

[5]  Calton Pu,et al.  Evolutionary study of web spam: Webb Spam Corpus 2011 versus Webb Spam Corpus 2006 , 2012, 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[6]  W. Marsden I and J , 2012 .

[7]  Jim Q. Smith,et al.  Principles of Bayesian Inference Using General Divergence Criteria , 2018, Entropy.

[8]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[9]  Jiawei Han,et al.  Debiasing Crowdsourced Batches , 2015, KDD.

[10]  David B. Dunson,et al.  Robust Bayesian Inference via Coarsening , 2015, Journal of the American Statistical Association.

[11]  Blaine Nelson,et al.  The security of machine learning , 2010, Machine Learning.

[12]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[14]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[15]  Xiaohan Wei,et al.  Estimation of the covariance structure of heavy-tailed distributions , 2017, NIPS.

[16]  Andrzej Cichocki,et al.  Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.

[17]  Theodoros Damoulas,et al.  Doubly Robust Bayesian Inference for Non-Stationary Streaming Data with β-Divergences , 2018, NeurIPS.

[18]  Percy Liang,et al.  Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[19]  Saeed Vahidian,et al.  Coresets for Estimating Means and Mean Square Error with Limited Greedy Samples , 2019, UAI.

[20]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[21]  David Ríos Insua,et al.  Robust Bayesian analysis , 2000 .

[22]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[23]  Yevgeniy Vorobeychik,et al.  Data Poisoning Attacks on Factorization-Based Collaborative Filtering , 2016, NIPS.

[24]  Masashi Sugiyama,et al.  Variational Inference based on Robust Divergences , 2017, AISTATS.

[25]  José Miguel Hernández-Lobato,et al.  Bayesian Batch Active Learning as Sparse Subset Approximation , 2019, NeurIPS.

[26]  Trevor Campbell,et al.  Coresets for Scalable Bayesian Logistic Regression , 2016, NIPS.

[27]  A. Dawid,et al.  Minimum Scoring Rule Inference , 2014, 1403.3920.

[28]  Beata Strack,et al.  Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records , 2014, BioMed research international.

[29]  S. Eguchi,et al.  Robust parameter estimation with a small bias against heavy contamination , 2008 .

[30]  Bin Bi,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[31]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[32]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[33]  Qiang Yang,et al.  Transferring Multi-device Localization Models using Latent Multi-task Learning , 2008, AAAI.

[34]  James O. Berger,et al.  An overview of robust Bayesian analysis , 1994 .

[35]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[36]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[37]  L. Goddard Information Theory , 1962, Nature.

[38]  Motoaki Kawanabe,et al.  Robust Spatial Filtering with Beta Divergence , 2013, NIPS.

[39]  Ryan P. Adams,et al.  Patterns of Scalable Bayesian Inference , 2016, Found. Trends Mach. Learn..

[40]  Kuldeep Kumar,et al.  Robust Statistics, 2nd edn , 2011 .

[41]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[42]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[43]  A. Basu,et al.  Robust Bayes estimation using the density power divergence , 2016 .

[44]  M. C. Jones,et al.  Robust and efficient estimation by minimising a density power divergence , 1998 .

[45]  Tom Diethe,et al.  Interpretable Anomaly Detection with Mondrian P{ó}lya Forests on Data Streams , 2020, ArXiv.

[46]  Chong Wang,et al.  A General Method for Robust Bayesian Modeling , 2015, Bayesian Analysis.

[47]  Jian Peng,et al.  Variational Inference for Crowdsourcing , 2012, NIPS.

[48]  David M. Blei,et al.  Robust Probabilistic Modeling with Bayesian Data Reweighting , 2016, ICML.

[49]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[50]  A. Zellner Optimal Information Processing and Bayes's Theorem , 1988 .

[51]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[52]  James Y. Zou,et al.  Data Shapley: Equitable Valuation of Data for Machine Learning , 2019, ICML.

[53]  Jasper Snoek,et al.  Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.

[54]  Jerry Li,et al.  Sever: A Robust Meta-Algorithm for Stochastic Optimization , 2018, ICML.

[55]  Petros Drineas,et al.  Ancestry informative markers for fine-scale individual assignment to worldwide populations , 2010, Journal of Medical Genetics.

[56]  Xi Chen,et al.  Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[57]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[58]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[59]  Bruno De Finetti,et al.  The Bayesian Approach to the Rejection of Outliers , 1961 .

[60]  L. Shapley A Value for n-person Games , 1988 .

[61]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[62]  Trevor Campbell,et al.  Sparse Variational Inference: Bayesian Coresets from Scratch , 2019, NeurIPS.

[63]  Anca D. Dragan,et al.  Bayesian Robustness: A Nonasymptotic Viewpoint , 2019, Journal of the American Statistical Association.

[64]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.