Using bagged posteriors for robust inference and model criticism

Standard Bayesian inference is known to be sensitive to model misspecification, leading to unreliable uncertainty quantification and poor predictive performance. However, finding generally applicable and computationally feasible methods for robust Bayesian inference under misspecification has proven to be a difficult challenge. An intriguing, easy-to-use, and widely applicable approach is to use bagging on the Bayesian posterior ("BayesBag"); that is, to use the average of posterior distributions conditioned on bootstrapped datasets. In this paper, we comprehensively develop the asymptotic theory of BayesBag, propose a model--data mismatch index for model criticism using BayesBag, and empirically validate our theory and methodology on synthetic and real-world data in linear regression (both feature selection and parameter inference), sparse logistic regression, insurance loss prediction, and phylogenetic tree reconstruction. We find that in the presence of significant misspecification, BayesBag yields more reproducible inferences, has better predictive accuracy, and selects correct models more often than the standard Bayesian posterior; meanwhile, when the model is correctly specified, BayesBag produces superior or equally good results for parameter inference and prediction, while being slightly more conservative for model selection. Overall, our results demonstrate that BayesBag combines the attractive modeling features of standard Bayesian inference with the distributional robustness properties of frequentist methods, providing benefits over both Bayes alone and the bootstrap alone.

[1]  Stephen Walker,et al.  Nonparametric learning from Bayesian models with randomized objective functions , 2018, NeurIPS.

[2]  David M. Blei,et al.  Population Predictive Checks , 2019, ArXiv.

[3]  C. Aitken,et al.  The logic of decision , 2014 .

[4]  Ziheng Yang Empirical evaluation of a prior for Bayesian phylogenetic inference , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[5]  G. Box Robustness in the Strategy of Scientific Model Building. , 1979 .

[6]  M. Peligrad,et al.  ON THE BLOCKWISE BOOTSTRAP FOR EMPIRICAL PROCESSES FOR STATIONARY SEQUENCES , 1998 .

[7]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[8]  W. Doolittle,et al.  Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. , 2003, Molecular biology and evolution.

[9]  David B. Dunson,et al.  Robust Bayesian Inference via Coarsening , 2015, Journal of the American Statistical Association.

[10]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[11]  A. Dawid,et al.  Posterior Model Probabilities , 2011 .

[12]  Ziheng Yang,et al.  Fair-balance paradox, star-tree paradox, and Bayesian phylogenetics. , 2007, Molecular biology and evolution.

[13]  Pier Giovanni Bissiri,et al.  A general framework for updating belief distributions , 2013, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[14]  Ryan Martin,et al.  Gibbs posterior inference on value-at-risk , 2018, Scandinavian Actuarial Journal.

[15]  Van Der Vaart,et al.  The Bernstein-Von-Mises theorem under misspecification , 2012 .

[16]  Kai Zhang,et al.  Models as Approximations I: Consequences Illustrated with Linear Regression , 2014, Statistical Science.

[17]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[18]  I. Guttman The Use of the Concept of a Future Observation in Goodness‐Of‐Fit Problems , 1967 .

[19]  Peter Buhlmann Discussion of Big Bayes Stories and BayesBag , 2014, 1405.4977.

[20]  N. Hjort,et al.  Post-Processing Posterior Predictive p Values , 2006 .

[21]  Ulrich K. Müller RISK OF BAYESIAN INFERENCE IN MISSPECIFIED MODELS, AND THE SANDWICH COVARIANCE MATRIX , 2013 .

[22]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[23]  Ziheng Yang,et al.  Bayesian selection of misspecified models is overconfident and may cause spurious posterior probabilities for phylogenetic trees , 2018, Proceedings of the National Academy of Sciences.

[24]  Jeffrey W. Miller,et al.  Robust and Reproducible Model Selection Using Bagged Posteriors , 2020 .

[25]  Andreas Buja,et al.  Models as Approximations II: A Model-Free Theory of Parametric Regression , 2016, Statistical Science.

[26]  B. Rannala,et al.  Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. , 2004, Systematic biology.

[27]  James M. Robins,et al.  Asymptotic Distribution of P Values in Composite Null Models , 2000 .

[28]  R. Berk,et al.  Limiting Behavior of Posterior Distributions when the Model is Incorrect , 1966 .

[29]  Aki Vehtari,et al.  A survey of Bayesian predictive methods for model assessment, selection and comparison , 2012 .

[30]  S. Haneuse,et al.  On the Assessment of Monte Carlo Error in Simulation-Based Statistical Analyses , 2009, The American statistician.

[31]  O. Kallenberg Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[32]  E. Lehmann Model Specification: The Views of Fisher and Neyman, and Later Developments , 1990 .

[33]  Chris Holmes,et al.  General Bayesian updating and the loss-likelihood bootstrap , 2017, Biometrika.

[34]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[35]  Emily C. Moriarty,et al.  The importance of proper model assumption in bayesian phylogenetics. , 2004, Systematic biology.

[36]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[37]  Aki Vehtari,et al.  Sparsity information and regularization in the horseshoe and other shrinkage priors , 2017, 1707.01694.

[38]  N. Hjort,et al.  On Bayesian consistency , 2001 .

[39]  P. Bühlmann,et al.  Analyzing Bagging , 2001 .

[40]  F. Lutzoni,et al.  Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence. , 2003, Molecular biology and evolution.

[41]  H. Künsch The Jackknife and the Bootstrap for General Stationary Observations , 1989 .

[42]  D. Schaid,et al.  From genome-wide associations to candidate causal variants by statistical fine-mapping , 2018, Nature Reviews Genetics.

[43]  David M. Blei,et al.  Build, Compute, Critique, Repeat: Data Analysis with Latent Variable Models , 2014 .

[44]  Thijs van Ommen,et al.  Inconsistency of Bayesian Inference for Misspecified Linear Models, and a Proposal for Repairing It , 2014, 1412.3730.

[45]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[46]  A. Bhattacharya,et al.  Bayesian fractional posteriors , 2016, The Annals of Statistics.

[47]  Raul Cano On The Bayesian Bootstrap , 1992 .

[48]  Derrick J. Zwickl,et al.  Phylogenetic relationships of the dwarf boas and a comparison of Bayesian and bootstrap measures of phylogenetic support. , 2002, Molecular phylogenetics and evolution.

[49]  C. Holmes,et al.  Assigning a value to a power likelihood in a general Bayesian model , 2017, 1701.08515.

[50]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[51]  Cosma Rohilla Shalizi,et al.  Philosophy and the practice of Bayesian statistics. , 2010, The British journal of mathematical and statistical psychology.

[52]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[53]  Stephen G. Walker,et al.  Bayesian Nonparametric Inference for the Power Likelihood , 2013 .

[54]  R. Royall,et al.  Interpreting statistical evidence by using imperfect models: robust adjusted likelihood functions , 2003 .

[55]  David R. Cox,et al.  Role of Models in Statistical Analysis , 1990 .

[56]  Enno Mammen,et al.  Bootstrap, wild bootstrap, and asymptotic normality , 1992 .

[57]  David B. Dunson,et al.  Comparing and Weighting Imperfect Models Using D-Probabilities , 2016, Journal of the American Statistical Association.

[58]  Hirohisa Kishino,et al.  Very fast algorithms for evaluating the stability of ML and Bayesian phylogenetic trees from sequence data. , 2002, Genome informatics. International Conference on Genome Informatics.

[59]  William J. Browne,et al.  Bayesian and likelihood-based methods in multilevel modeling 1 A comparison of Bayesian and likelihood-based methods for fitting multilevel models , 2006 .

[60]  Ryan Martin,et al.  Calibrating general posterior credible regions , 2015, Biometrika.

[61]  George E. P. Box,et al.  Sampling and Bayes' inference in scientific modelling and robustness , 1980 .

[62]  Christian P. Robert,et al.  The Bayesian choice , 1994 .

[63]  Andrew R. Barron,et al.  Information-theoretic asymptotics of Bayes methods , 1990, IEEE Trans. Inf. Theory.

[64]  D. Rubin Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .

[65]  T. Buckley,et al.  Model misspecification and probabilistic tests of topology: evidence from empirical data sets. , 2002, Systematic biology.

[66]  Michael I. Jordan,et al.  Covariances, Robustness, and Variational Bayes , 2017, J. Mach. Learn. Res..

[67]  Qian M. Zhou,et al.  Information Ratio Test for Model Misspecification in Quasi-Likelihood Inference , 2012 .

[68]  Ryan Martin,et al.  Likelihood-free Bayesian inference on the minimum clinically important difference , 2015, 1501.01840.

[69]  Rianne de Heide,et al.  Safe-Bayesian Generalized Linear Regression , 2019, AISTATS.

[70]  Peter Grünwald,et al.  The Safe Bayesian - Learning the Learning Rate via the Mixability Gap , 2012, ALT.

[71]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[72]  P. Diaconis,et al.  Updating Subjective Probability , 1982 .

[73]  G. Imbens,et al.  Nonparametric Applications of Bayesian Inference , 1996 .

[74]  Martin Raič,et al.  Normal Approximation by Stein ’ s Method , 2003 .