Prediction can be safely used as a proxy for explanation in causally consistent Bayesian generalized linear models

Bayesian modeling provides a principled approach to quantifying uncertainty in model parameters and structure and has seen a surge of applications in recent years. Despite a lot of existing work on an overarching Bayesian workflow, many individual steps still require more research to optimize the related decision pro-cesses. In this paper, we present results from a large simulation study of Bayesian generalized linear models for double- and lower-bounded data, where we analyze metrics on convergence, parameter recoverability, and predictive performance. We specifically investigate the validity of using predictive performance as a proxy for parameter recoverability in Bayesian model selection. Results indicate that – for a given, causally consistent predictor term – better out-of-sample predictions imply lower parameter RMSE, lower false positive rate, and higher true positive rate. In terms of initial model choice, we make recommendations for default likelihoods and link functions. We also find that, despite their lacking structural faithfulness for bounded data, Gaussian linear models show error calibration that is on par with structural faithful alternatives.

[1]  J. S. Vasconcelos,et al.  An extended logit-normal regression with application to human development index data , 2022, Communications in Statistics - Simulation and Computation.

[2]  Aki Vehtari,et al.  Prior Knowledge Elicitation: The Past, Present, and Future , 2021, Bayesian Analysis.

[3]  Thomas Kneib,et al.  Using the softplus function to construct alternative link functions in generalized linear models and beyond , 2021, Statistical Papers.

[4]  Bodo Winter,et al.  Poisson regression for linguists: A tutorial introduction to modelling count data with brms , 2021, Lang. Linguistics Compass.

[5]  H. Rue,et al.  Practical strategies for GEV-based regression models for extremes , 2021, 2106.13110.

[6]  Hamid Bidram,et al.  On the Liu estimator in the beta and Kumaraswamy regression models: A comparative study , 2021, Communications in Statistics - Theory and Methods.

[7]  R. Tibshirani,et al.  Cross-validation: what does it estimate and how well does it do it? , 2021, Journal of the American Statistical Association.

[8]  Mohammad Reza Eshraghian,et al.  Kumaraswamy regression modeling for Bounded Outcome Scores , 2021 .

[9]  K Sadik,et al.  A Bayesian Logit-Normal Model in Small Area Estimation , 2021 .

[10]  Aki Vehtari,et al.  R ∗ : A Robust MCMC Convergence Diagnostic with Uncertainty Using Decision Tree Classifiers , 2020, Bayesian Analysis.

[11]  Marcelo Bourguignon,et al.  Improved estimators in beta prime regression models , 2020, Communications in Statistics - Simulation and Computation.

[12]  Paul-Christian Bürkner,et al.  Bayesian Item Response Modeling in R with brms and Stan , 2019, J. Stat. Softw..

[13]  R. Arellano-Valle,et al.  Skew-normal Linear Mixed Models , 2005, Journal of Data Science.

[14]  Aki Vehtari,et al.  Bayesian Workflow. , 2020, 2011.01808.

[15]  J. Pearl,et al.  A Crash Course in Good and Bad Controls , 2020, SSRN Electronic Journal.

[16]  Emiliano A. Valdez,et al.  Skewed link regression models for imbalanced binary response with applications to life insurance , 2020, 2007.15172.

[17]  Aki Vehtari,et al.  Regression and Other Stories , 2020 .

[18]  Sergey Karayev,et al.  Grades are not Normal: Improving Exam Score Models Using the Logit-Normal Distribution , 2019, EDM.

[19]  Sanku Dey,et al.  The Fréchet distribution: Estimation and application - An overview , 2018, Journal of Statistics and Management Systems.

[20]  Aki Vehtari,et al.  Visualization in Bayesian workflow , 2017, Journal of the Royal Statistical Society: Series A (Statistics in Society).

[21]  D. Navarro Between the Devil and the Deep Blue Sea: Tensions Between Scientific Judgement and Statistical Model Selection , 2018, Computational Brain & Behavior.

[22]  Artur J. Lemonte,et al.  New links for binary regression: an application to coca cultivation in Peru , 2018 .

[23]  M. Bourguignon,et al.  A new regression model for positive data , 2018, 1804.07734.

[24]  Aki Vehtari,et al.  Validating Bayesian Inference Algorithms with Simulation-Based Calibration , 2018, 1804.06788.

[25]  Torrin M. Liddell,et al.  Analyzing Ordinal Data with Metric Models: What Could Possibly Go Wrong? , 2017, Journal of Experimental Social Psychology.

[26]  Aki Vehtari,et al.  Using Stacking to Average Bayesian Predictive Distributions (with Discussion) , 2017, Bayesian Analysis.

[27]  Carlos Pestana Barros,et al.  Efficiency in angolan hydro-electric power station: A two-stage virtual frontier dynamic DEA and simplex regression approach , 2017 .

[28]  Aleksey V. Belikov,et al.  The number of key carcinogenic events can be predicted from cancer incidence , 2017, Scientific Reports.

[29]  Donald R. Williams,et al.  Rethinking Robust Statistics with Modern Bayesian Methods , 2017 .

[30]  Paul-Christian Bürkner,et al.  brms: An R Package for Bayesian Multilevel Models Using Stan , 2017 .

[31]  T. Yarkoni,et al.  Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning , 2017, Perspectives on psychological science : a journal of the Association for Psychological Science.

[32]  Andrew Gelman,et al.  The Prior Can Often Only Be Understood in the Context of the Likelihood , 2017, Entropy.

[33]  Donald R. Williams,et al.  Outgrowing the Procrustean Bed of Normality: The Utility of Bayesian Modeling for Asymmetrical Data Analysis , 2017 .

[34]  E. Heath,et al.  Sample size determination for logistic regression on a logit-normal distribution , 2017, Statistical methods in medical research.

[35]  Diego Ramos Canterle,et al.  Variable dispersion beta regressions with parametric link functions , 2017, 1702.00327.

[36]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[37]  Aki Vehtari,et al.  Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC , 2015, Statistics and Computing.

[38]  Zhongheng Zhang,et al.  Parametric regression model for survival data: Weibull regression model as an example. , 2016, Annals of translational medicine.

[39]  Nadeem Shafique Butt,et al.  On Six-Parameter Frechet Distribution: Properties and Applications , 2016 .

[40]  Hee-Cheul Kim,et al.  A Performance Analysis of Software Reliability Model using Lomax and Gompertz Distribution Property , 2016 .

[41]  James E. Helmreich Regression Modeling Strategies with Applications to Linear Models, Logistic and Ordinal Regression and Survival Analysis (2nd Edition) , 2016 .

[42]  Michael Betancourt,et al.  Diagnosing Suboptimal Cotangent Disintegrations in Hamiltonian Monte Carlo , 2016, 1604.00695.

[43]  Richard McElreath,et al.  Statistical Rethinking: A Bayesian Course with Examples in R and Stan , 2015 .

[44]  Yanpeng Li,et al.  Improving deep neural networks using softplus units , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[45]  A. Gelman,et al.  Pareto Smoothed Importance Sampling , 2015, 1507.02646.

[46]  Thomas B. L. Kirkwood,et al.  Deciphering death: a commentary on Gompertz (1825) ‘On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies’ , 2015, Philosophical Transactions of the Royal Society B: Biological Sciences.

[47]  W. Stroup Rethinking the Analysis of Non-Normal Data in Plant and Soil Science , 2015 .

[48]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[49]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[50]  Dipak K. Dey,et al.  A new class of flexible link functions with application to species co-occurrence in cape floristic region , 2013, 1401.1915.

[51]  D. Zelterman,et al.  Beta prime regression with application to risky behavior frequency screening , 2013, Statistics in medicine.

[52]  W. H. Bonat,et al.  REGRESSION MODELS WITH RESPONSES ON THE UNIT INTERVAL: SPECIFICATION, ESTIMATION AND COMPARISON , 2013 .

[53]  F. López A Bayesian Approach to Parameter Estimation in Simplex Regression Model: A Comparison with Beta Regression , 2013 .

[54]  Judea Pearl,et al.  The Do-Calculus Revisited , 2012, UAI.

[55]  P. Pinson,et al.  Very‐short‐term probabilistic forecasting of wind power with generalized logit–normal distributions , 2012 .

[56]  Pablo A. Mitnik,et al.  The Kumaraswamy distribution: median-dispersion re-parameterizations for regression modeling and simulation-based estimation , 2012, Statistical Papers.

[57]  Aki Vehtari,et al.  A survey of Bayesian predictive methods for model assessment, selection and comparison , 2012 .

[58]  Gauss M. Cordeiro,et al.  The generalized inverse Weibull distribution , 2011 .

[59]  Galit Shmueli,et al.  To Explain or To Predict? , 2010, 1101.0891.

[60]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[61]  A. Zeileis,et al.  Beta Regression in R , 2010 .

[62]  C. J. Adcock,et al.  Asset pricing and portfolio selection based on the multivariate extended skew-Student-t distribution , 2010, Ann. Oper. Res..

[63]  Sumio Watanabe,et al.  Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory , 2010, J. Mach. Learn. Res..

[64]  T. Yee The VGAM Package for Categorical Data Analysis , 2010 .

[65]  Miguel A. Juárez,et al.  Model-Based Clustering of Non-Gaussian Panel Data Based on Skew-t Distributions , 2010 .

[66]  Khodabina Morteza,et al.  SOME PROPERTIES OF GENERALIZED GAMMA DISTRIBUTION , 2010 .

[67]  Roger Koenker,et al.  Parametric links for binary choice models: A Fisherian-Bayesian colloquy , 2009 .

[68]  Sumio Watanabe Algebraic Geometry and Statistical Learning Theory , 2009 .

[69]  J. Pearl Causal inference in statistics: An overview , 2009 .

[70]  M. C. Jones Kumaraswamy’s distribution: A beta-type distribution with some tractability advantages , 2009 .

[71]  渡邊 澄夫 Algebraic geometry and statistical learning theory , 2009 .

[72]  Ottar Hellevik,et al.  Linear versus logistic regression when the dependent variable is a dichotomy , 2009 .

[73]  Horst Rinne,et al.  The Weibull Distribution: A Handbook , 2008 .

[74]  Frank Lad,et al.  Two Moments of the Logitnormal Distribution , 2008, Commun. Stat. Simul. Comput..

[75]  Pablo A. Mitnik New Properties of the Kumaraswamy Distribution , 2008 .

[76]  S. Ferrari,et al.  On beta regression residuals , 2008 .

[77]  G. R. Pasha,et al.  Theoretical analysis of inverse weibull distribution , 2008 .

[78]  Saralees Nadarajah,et al.  On the distribution of Kumaraswamy , 2008 .

[79]  R. Rigby,et al.  Generalized Additive Models for Location Scale and Shape (GAMLSS) in R , 2007 .

[80]  J. Fine,et al.  Parametric regression on cumulative incidence function. , 2007, Biostatistics.

[81]  J. Davis Univariate Discrete Distributions , 2006 .

[82]  Jason P. Fine,et al.  Direct parametric inference for the cumulative incidence function , 2006 .

[83]  Min Xie,et al.  Weibull Distributions and Their Applications , 2006 .

[84]  Richard J. Cleary Handbook of Beta Distribution and Its Applications , 2006 .

[85]  T. Minka,et al.  A useful distribution for fitting discrete data: revival of the Conway–Maxwell–Poisson distribution , 2005 .

[86]  Necip Doganaksoy,et al.  Weibull Models , 2004, Technometrics.

[87]  Peter X.-K. Song,et al.  Modelling Heterogeneous Dispersion in Marginal Models for Longitudinal Proportional Data , 2004 .

[88]  S. Ferrari,et al.  Beta Regression for Modelling Rates and Proportions , 2004 .

[89]  B. McCullough,et al.  Regression analysis of variates observed on (0, 1): percentages, proportions and fractions , 2003 .

[90]  Rick Chappell,et al.  Multivariate Statistical Modelling Based on Generalized Linear Models , 2003 .

[91]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[92]  Eric R. Ziegel,et al.  Multivariate Statistical Modelling Based on Generalized Linear Models , 2002, Technometrics.

[93]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[94]  H. Aksoy Use of Gamma Distribution in Hydrological Analysis , 2000 .

[95]  J. Gill,et al.  Generalized Linear Models: A Unified Approach , 2000 .

[96]  P X Song,et al.  Marginal Models for Longitudinal Continuous Proportional Data , 2000, Biometrics.

[97]  F. Lecky,et al.  An introduction to statistical inference—3 , 2000, Journal of accident & emergency medicine.

[98]  Yoshua Bengio,et al.  Série Scientifique Scientific Series Incorporating Second-order Functional Knowledge for Better Option Pricing Incorporating Second-order Functional Knowledge for Better Option Pricing , 2022 .

[99]  Daniel A. Powers,et al.  Statistical Methods for Categorical Data Analysis , 1999 .

[100]  W. Krysicki,et al.  On some new properties of the beta distribution , 1999 .

[101]  Ananda Sen,et al.  The Theory of Dispersion Models , 1997, Technometrics.

[102]  N. G. Best,et al.  Bayesian deviance the e ective number of parameters and the comparison of arbitrarily complex models , 1998 .

[103]  D. Burmaster,et al.  Using lognormal distributions and lognormal probability plots in probabilistic risk assessments , 1997 .

[104]  K. Ponnambalam,et al.  Estimation of reservoir yield and storage distribution using moments analysis , 1996 .

[105]  K. Kosugi Lognormal Distribution Model for Unsaturated Soil Hydraulic Properties , 1996 .

[106]  P. R. Nelson Continuous Univariate Distributions Volume 2 , 1996 .

[107]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[108]  K. Stroyan Applications in Ecology , 1993 .

[109]  Byron J. T. Morgan,et al.  A note on Wadley's problem with overdispersion , 1992 .

[110]  Ole E. Barndorff-Nielsen,et al.  Some parametric models on the simplex , 1991 .

[111]  Michael R. Hagerty,et al.  Comparing the predictive powers of alternative multiple regression models , 1991 .

[112]  D. Wilks Maximum Likelihood Estimation for the Gamma Distribution Using Data Containing Zeros , 1990 .

[113]  Richard J. Butler,et al.  Regression models for positive random variables , 1990 .

[114]  Rick L. Edgeman The Inverse Gaussian Distribution: Theory, Methodology, and Applications , 1989 .

[115]  S. Sarna,et al.  [Regression models]. , 1988, Duodecim; laaketieteellinen aikakauskirja.

[116]  James B. McDonald,et al.  Model selection: some generalized distributions , 1987 .

[117]  M. Green,et al.  Fitting parametric link functions in generalised linear models , 1984 .

[118]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[119]  A. Scallan,et al.  Some Aspects of Parametric Link Functions , 1982 .

[120]  P. Kumaraswamy A generalized probability density function for double-bounded random processes , 1980 .

[121]  J. Atchison,et al.  Logistic-normal distributions:Some properties and uses , 1980 .

[122]  P. Öfverbeck,et al.  The log-normal distribution: A better fitness for the results of mechanical testing of materials , 1979 .

[123]  J. L. Folks,et al.  The Inverse Gaussian Distribution and its Statistical Application—A Review , 1978 .

[124]  J. L. Folks,et al.  The Inverse Gaussian Distribution as a Lifetime Model , 1977 .

[125]  T. A. Burgin,et al.  The Gamma Distribution and Inventory Control , 1975 .

[126]  R Mead,et al.  A generalised logit-normal distribution. , 1965, Biometrics.

[127]  S. C.,et al.  A NOTE ON THE GAMMA DISTRIBUTION , 1958 .

[128]  E. Gumbel Statistical Theory of Extreme Values and Some Practical Applications : A Series of Lectures , 1954 .