Bayesian Model Averaging

We provide an overview of Bayesian model averaging (BMA), starting with a summary of the mathematics associated with classical BMA, including the calculation of posterior model probabilities and the choice of priors for both the models and the model parameters. We also consider prediction-based approaches to BMA and argue that these are preferable to the classical approach. Use of BMA is illustrated by two examples involving real data. We finish with a discussion of the advantages and disadvantages of BMA.

[1]  Jing Shi,et al.  Bayesian adaptive combination of short-term wind speed forecasts from neural network models , 2011 .

[2]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[3]  Jean-Michel Marin,et al.  On some difficulties with a posterior probability approximation technique , 2008 .

[4]  David Maxwell Chickering,et al.  Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables , 1997, Machine Learning.

[5]  S. P. Neuman,et al.  On model selection criteria in multimodel analysis , 2007 .

[6]  Yuhong Yang Can the Strengths of AIC and BIC Be Shared , 2005 .

[7]  Arnold Zellner,et al.  Bayesian and non-Bayesian methods for combining models and forecasts with applications to forecasting international growth rates , 1993 .

[8]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo Methods for Computing Bayes Factors , 2001 .

[9]  David J. Nott,et al.  Adaptive sampling for Bayesian variable selection , 2005 .

[10]  Giovanni Parmigiani,et al.  Model averaged double robust estimation , 2017, Biometrics.

[11]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[12]  Jeff Harrison,et al.  Applied Bayesian Forecasting and Time Series Analysis , 1994 .

[13]  B. Carlin,et al.  Bayesian Model Choice Via Markov Chain Monte Carlo Methods , 1995 .

[14]  Merlise A. Clyde,et al.  Mixtures of g-Priors in Generalized Linear Models , 2015, Journal of the American Statistical Association.

[15]  A. Mira,et al.  Efficient Bayes factor estimation from the reversible jump output , 2006 .

[16]  Mevin B. Hooten,et al.  A guide to Bayesian model selection for ecologists , 2015 .

[17]  M. Steel,et al.  Mixtures of G-Priors for Bayesian Model Averaging with Economic Application , 2011 .

[18]  Jay Barlow,et al.  Bayesian state-space model of fin whale abundance trends from a 1991–2008 time series of line-transect surveys in the California Current , 2011 .

[19]  Bertrand Clarke,et al.  Comparing Bayes Model Averaging and Stacking When Model Approximation Error Cannot be Ignored , 2003, J. Mach. Learn. Res..

[20]  Joyee Ghosh,et al.  Bayesian Variable Selection Under Collinearity , 2015 .

[21]  S. Müller,et al.  Model Selection in Linear Mixed Models , 2013, 1306.2427.

[22]  Athanasios Kehagias,et al.  Short term load forecasting using a Bayesian combination method , 1997 .

[23]  Giorgio Corani,et al.  Credal ensembles of classifiers , 2014, Comput. Stat. Data Anal..

[24]  Ming-Hui Chen,et al.  Improving marginal likelihood estimation for Bayesian phylogenetic model selection. , 2011, Systematic biology.

[25]  Edward I. George,et al.  The Practical Implementation of Bayesian Model Selection , 2001 .

[26]  Marco Zaffalon The naive credal classifier , 2002 .

[27]  G. Anandalingam,et al.  Linear combination of forecasts: A general Bayesian model , 1989 .

[28]  P. Garthwaite,et al.  SELECTION OF WEIGHTS FOR WEIGHTED MODEL AVERAGING , 2010 .

[29]  Francis X. Diebold,et al.  The use of prior information in forecast combination , 1990 .

[30]  Marco Zaffalon,et al.  Credal Model Averaging: An Extension of Bayesian Model Averaging to Imprecise Probabilities , 2008, ECML/PKDD.

[31]  M. Steel,et al.  Jointness in Bayesian Variable Selection with Applications to Growth Regression , 2006 .

[32]  Xiao-Li Meng,et al.  SIMULATING RATIOS OF NORMALIZING CONSTANTS VIA A SIMPLE IDENTITY: A THEORETICAL EXPLORATION , 1996 .

[33]  A. Raftery Bayesian Model Selection in Social Research , 1995 .

[34]  Adrian E. Raftery,et al.  Bayesian Additive Regression Trees using Bayesian model averaging , 2015, Stat. Comput..

[35]  M. Clyde,et al.  Prediction via Orthogonalized Model Mixing , 1996 .

[36]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[37]  D. W. Bunn,et al.  A Bayesian Approach to the Linear Combination of Forecasts , 1975 .

[38]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[39]  Xiao-Li Meng,et al.  Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .

[40]  Nils Lid Hjort,et al.  Model Selection and Model Averaging: Contents , 2008 .

[41]  Krzysztof Drachal,et al.  Comparison between Bayesian and information-theoretic model averaging: Fossil fuels prices example , 2018, Energy Economics.

[42]  T. Fearn,et al.  Bayes model averaging with selection of regressors , 2002 .

[43]  N. Hjort,et al.  Frequentist Model Average Estimators , 2003 .

[44]  Ming-Hui Chen,et al.  On Monte Carlo methods for estimating ratios of normalizing constants , 1997 .

[45]  Bertrand Clarke,et al.  A Bayes Interpretation of Stacking for $\mathcal{M}$-Complete and $\mathcal{M}$-Open Settings , 2017 .

[46]  Herbert K. H. Lee,et al.  Model Selection for Neural Network Classification , 2001, J. Classif..

[47]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[48]  J. Berger,et al.  The Intrinsic Bayes Factor for Model Selection and Prediction , 1996 .

[49]  N Thompson Hobbs,et al.  Linking chronic wasting disease to mule deer movement scales: a hierarchical Bayesian approach. , 2006, Ecological applications : a publication of the Ecological Society of America.

[50]  A. Gelfand,et al.  Bayesian Model Choice: Asymptotics and Exact Calculations , 1994 .

[51]  Merlise A. Clyde,et al.  Bayesian Model Averaging in the M-Open Framework , 2013 .

[52]  J. Ibrahim,et al.  Conjugate priors for generalized linear models , 2003 .

[53]  Giovanni Parmigiani,et al.  Accounting for uncertainty in confounder and effect modifier selection when estimating average causal effects in generalized linear models , 2015, Biometrics.

[54]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[55]  Thomas Lumley,et al.  AIC AND BIC FOR MODELING WITH COMPLEX SURVEY DATA , 2015 .

[56]  Enrique Moral-Benito,et al.  Determinants of Economic Growth: A Bayesian Panel Data Approach , 2009, Review of Economics and Statistics.

[57]  James R. Bence,et al.  Performance of deviance information criterion model selection in statistical catch-at-age analysis , 2008 .

[58]  J. Berger,et al.  Optimal predictive model selection , 2004, math/0406464.

[59]  Martin Feldkircher,et al.  Forecast Combination and Bayesian Model Averaging - A Prior Sensitivity Analysis , 2012 .

[60]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[61]  Rangasami L. Kashyap,et al.  Optimal Choice of AR and MA Parts in Autoregressive Moving Average Models , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[63]  Thomas P. Minka,et al.  Bayesian model averaging is not model combination , 2002 .

[64]  Mark W. Watson,et al.  Chapter 10 Forecasting with Many Predictors , 2006 .

[65]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[66]  Mark F. J. Steel,et al.  On the effect of prior assumptions in Bayesian model averaging with applications to growth regression , 2009 .

[67]  Adrian E. Raftery,et al.  Iterative Bayesian Model Averaging: a method for the application of survival analysis to high-dimensional microarray data , 2009, BMC Bioinformatics.

[68]  Tiago M. Fragoso,et al.  Bayesian Model Averaging: A Systematic Review and Conceptual Classification , 2015, 1509.08864.

[69]  Matthew Parry,et al.  Extensive scoring rules , 2016 .

[70]  X. Sala-i-Martin,et al.  Determinants of Long-Term Growth: A Bayesian Averaging of Classical Estimates (Bace) Approach , 2000 .

[71]  Aki Vehtari,et al.  Using Stacking to Average Bayesian Predictive Distributions (with Discussion) , 2017, Bayesian Analysis.

[72]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[73]  R. O’Hara,et al.  A review of Bayesian variable selection methods: what, how and which , 2009 .

[74]  Joseph G. Ibrahim,et al.  Monte Carlo Methods in Bayesian Computation , 2000 .

[75]  Tsai-Hung Fan,et al.  A New Algorithm in Bayesian Model Averaging in Regression Models , 2014, Commun. Stat. Simul. Comput..

[76]  Aki Vehtari,et al.  Understanding predictive information criteria for Bayesian models , 2013, Statistics and Computing.

[77]  Mark F. J. Steel,et al.  8. Bayesian model averaging and forecasting , 2011 .

[78]  Sylvia Richardson,et al.  Evolutionary Stochastic Search for Bayesian model exploration , 2010, 1002.2706.

[79]  N. Hjort,et al.  Comprar Model Selection and Model Averaging | Gerda Claeskens | 9780521852258 | Cambridge University Press , 2008 .

[80]  Keming Yu,et al.  Bayesian Mode Regression , 2012, 1208.0579.

[81]  James O. Berger,et al.  Posterior model probabilities via path‐based pairwise priors , 2005 .

[82]  A. Raftery,et al.  Default Priors and Predictive Performance in Bayesian Model Averaging, with Application to Growth Determinants , 2007 .

[83]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[84]  Paul Hofmarcher,et al.  Unveiling covariate inclusion structures in economic growth regressions using latent class analysis , 2016 .

[85]  S. Walker,et al.  An Objective Bayesian Criterion to Determine Model Prior Probabilities , 2015 .

[86]  A. Liddle,et al.  Information criteria for astrophysical model selection , 2007, astro-ph/0701113.

[87]  Donatello Telesca,et al.  Nonlocal Priors for High-Dimensional Estimation , 2014, Journal of the American Statistical Association.

[88]  R. Kass Bayes Factors in Practice , 1993 .

[89]  Peter Congdon,et al.  Model weights for model choice and averaging , 2007 .

[90]  A. O'Hagan,et al.  Fractional Bayes factors for model comparison , 1995 .

[91]  Malcolm J Price,et al.  Model averaging in the presence of structural uncertainty about treatment effects: influence on treatment decision and expected value of information. , 2011, Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research.

[92]  C. Robert,et al.  Deviance information criteria for missing data models , 2006 .

[93]  Stephen G. Walker,et al.  A Decision Theoretic Approach to Model Averaging , 2001 .

[94]  J. Berger,et al.  Expected‐posterior prior distributions for model selection , 2002 .

[95]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[96]  N. Lazar,et al.  Methods and Criteria for Model Selection , 2004 .

[97]  G. Kapetanios,et al.  Forecasting Using Bayesian and Information-Theoretic Model Averaging , 2008 .

[98]  Leonhard Held,et al.  Bayesian fractional polynomials , 2011, Stat. Comput..

[99]  Sumio Watanabe,et al.  A widely applicable Bayesian information criterion , 2012, J. Mach. Learn. Res..

[100]  Hugh Chipman,et al.  Bayesian variable selection with related predictors , 1995, bayes-an/9510001.

[101]  Pedro M. Domingos Why Does Bagging Work? A Bayesian Account and its Implications , 1997, KDD.

[102]  Yuzo Maruyama,et al.  Fully Bayes factors with a generalized g-prior , 2008, 0801.4410.

[103]  M. Stone Comments on Model Selection Criteria of Akaike and Schwarz , 1979 .

[104]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[105]  Paul D. McNicholas,et al.  Mixture model averaging for clustering , 2012, Advances in Data Analysis and Classification.

[106]  J. Ghosh,et al.  Approximations and consistency of Bayes factors as model dimension grows , 2003 .

[107]  C M Pooley,et al.  Bayesian model evidence as a practical alternative to deviance information criterion , 2018, Royal Society Open Science.

[108]  S. Chib Marginal Likelihood from the Gibbs Output , 1995 .

[109]  Adrian E. Raftery,et al.  Prediction under Model Uncertainty Via Dynamic Model Averaging : Application to a Cold Rolling Mill 1 , 2008 .

[110]  Michael L. Littman,et al.  Bayesian Adaptive Sampling for Variable Selection and Model Averaging , 2011 .

[111]  William A Link,et al.  Model weights and the foundations of multimodel inference. , 2006, Ecology.

[112]  Tony R. Martinez,et al.  Turning Bayesian model averaging into Bayesian model combination , 2011, The 2011 International Joint Conference on Neural Networks.

[113]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[114]  Pedro M. Domingos Bayesian Averaging of Classifiers and the Overfitting Problem , 2000, ICML.

[115]  M. Clyde,et al.  Mixtures of g Priors for Bayesian Variable Selection , 2008 .

[116]  G. Doppelhofer,et al.  Jointness of growth determinants: Reply to comments by Rodney Strachan, Eduardo Ley and Mark F.J. Steel , 2009 .

[117]  James G. Scott,et al.  Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem , 2010, 1011.2333.

[118]  Giorgio Corani,et al.  Robust Bayesian model averaging for the analysis of presence–absence data , 2015, Environmental and Ecological Statistics.

[119]  Corwin M Zigler,et al.  Uncertainty in Propensity Score Estimation: Bayesian Methods for Variable Selection and Model-Averaged Causal Effects , 2014, Journal of the American Statistical Association.

[120]  Hong Chang,et al.  Model Determination Using Predictive Distributions with Implementation via Sampling-Based Methods , 1992 .

[121]  Stephen G. Walker,et al.  Statistical Decision Problems and Bayesian Nonparametric Methods , 2005 .

[122]  Tomohiro Ando,et al.  Predictive likelihood for Bayesian model selection and averaging , 2010 .

[123]  Bertrand Clarke,et al.  Improvement over bayes prediction in small samples in the presence of model uncertainty , 2004 .

[124]  J. Ching,et al.  Transitional Markov Chain Monte Carlo Method for Bayesian Model Updating, Model Class Selection, and Model Averaging , 2007 .

[125]  S. Brooks,et al.  On the Bayesian analysis of population size , 2001 .

[126]  Hyun-Chul Kim,et al.  Bayesian Classifier Combination , 2012, AISTATS.

[127]  G. Doppelhofer,et al.  Jointness of Growth Determinants , 2007, SSRN Electronic Journal.

[128]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..

[129]  R. Mukerjee,et al.  Probability Matching Priors: Higher Order Asymptotics , 2004 .

[130]  A. V. D. Vaart,et al.  BAYESIAN LINEAR REGRESSION WITH SPARSE PRIORS , 2014, 1403.0735.

[131]  D. Pauler The Schwarz criterion and related methods for normal linear models , 1998 .

[132]  C. Robert,et al.  Testing hypotheses via a mixture estimation model , 2014, 1412.2044.

[133]  Dimitris Fouskakis,et al.  Prior Distributions for Objective Bayesian Analysis , 2018, Bayesian Analysis.

[134]  Jana Eklund,et al.  Forecast Combination and Model Averaging Using Predictive Measures , 2005 .

[135]  S. Godsill On the Relationship Between Markov chain Monte Carlo Methods for Model Uncertainty , 2001 .

[136]  Giorgio Corani,et al.  Credal model averaging for classification: representing prior ignorance and expert opinions , 2015, Int. J. Approx. Reason..

[137]  Leonhard Held,et al.  Hyper-$g$ priors for generalized linear models , 2010, 1008.1550.

[138]  Enrique Moral-Benito,et al.  Model Averaging in Economics: An Overview , 2015 .

[139]  William A. Link,et al.  Bayesian Inference: With Ecological Applications , 2009 .

[140]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[141]  Leonhard Held,et al.  Objective Bayesian Model Selection in Generalized Additive Models With Penalized Splines , 2015 .

[142]  Aki Vehtari,et al.  Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC , 2015, Statistics and Computing.

[143]  Jianhua Zhao,et al.  Mixture model selection via hierarchical BIC , 2015, Comput. Stat. Data Anal..

[144]  N. G. Best,et al.  The deviance information criterion: 12 years on , 2014 .

[145]  M. Aitkin Posterior Bayes Factors , 1991 .

[146]  Peter Congdon,et al.  Bayesian model choice based on Monte Carlo estimates of posterior model probabilities , 2006, Comput. Stat. Data Anal..

[147]  William A. Link,et al.  Bayesian Multimodel Inference by RJMCMC: A Gibbs Sampling Approach , 2013 .

[148]  Ming Ye,et al.  Dependence of Bayesian Model Selection Criteria and Fisher Information Matrix on Sample Size , 2011 .

[149]  A. Raftery,et al.  Bayesian Information Criterion for Censored Survival Models , 2000, Biometrics.

[150]  Merlise A. Clyde,et al.  Model uncertainty and health effect studies for particulate matter , 2000 .

[151]  Bertrand Clarke,et al.  A Bayes interpretation of stacking for M-complete and M-open settings , 2016, 1602.05162.

[152]  Wayne E. Thogmartin,et al.  Predicting Regional Abundance of Rare Grassland Birds with a Hierarchical Spatial Count Model , 2006 .

[153]  A. Mohammadi,et al.  Bayesian Structure Learning in Sparse Gaussian Graphical Models , 2012, 1210.5371.

[154]  David R. Cox,et al.  PRINCIPLES OF STATISTICAL INFERENCE , 2017 .

[155]  Rodney W. Strachan Comment on 'Jointness of Growth Determinants' by Gernot Doppelhofer and Melvyn Weeks , 2009 .

[156]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[157]  A. Doucet,et al.  Computational Advances for and from Bayesian Analysis , 2004 .

[158]  Andrew Hoegh,et al.  Multiset Model Selection , 2018 .

[159]  A. Zellner,et al.  Posterior odds ratios for selected regression hypotheses , 1980 .

[160]  R. Kass,et al.  Bayes Factors and Approximations for Variance Component Models , 1999 .

[161]  Paul Hofmarcher,et al.  Bivariate jointness measures in Bayesian Model Averaging: Solving the conundrum , 2018, Journal of Macroeconomics.

[162]  Eric P. Smith,et al.  Model Selection Uncertainty and Bayesian Model Averaging in Fisheries Recruitment Modeling , 2009 .

[163]  George Kapetanios,et al.  Forecasting Using Bayesian and Information Theoretic Model Averaging: An Application to UK Inflation , 2005 .

[164]  Sumio Watanabe,et al.  Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory , 2010, J. Mach. Learn. Res..

[165]  D. Madigan,et al.  Bayesian Model Averaging for Linear Regression Models , 1997 .

[166]  Aliaksandr Hubin,et al.  Mode jumping MCMC for Bayesian variable selection in GLMM , 2016, Comput. Stat. Data Anal..

[167]  S. Geisser,et al.  A Predictive Approach to Model Selection , 1979 .

[168]  Christopher F. Parmeter,et al.  Bayesian Model Averaging in R , 2011 .

[169]  R. Millar,et al.  Comparison of Hierarchical Bayesian Models for Overdispersed Count Data using DIC and Bayes' Factors , 2009, Biometrics.

[170]  Aaron M. Ellison,et al.  Bayesian inference in ecology , 2004 .

[171]  M. Steel,et al.  Comments on ‘Jointness of growth determinants’ , 2009 .

[172]  D. Lindley A STATISTICAL PARADOX , 1957 .

[173]  A. Raftery Approximate Bayes factors and accounting for model uncertainty in generalised linear models , 1996 .