A weakly informative default prior distribution for logistic and other regression models

We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Student-t prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longer-tailed version of the distribution attained by assuming one-half additional success and one-half additional failure in a logistic regression. Cross-validation on a corpus of datasets shows the Cauchy class of prior distributions to outperform existing implementations of Gaussian and Laplace priors. We recommend this prior distribution as a default choice for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small), and also automatically applying more shrinkage to higher-order interactions. This can be useful in routine data analysis as well as in automated procedures such as chained equations for missing-data imputation. We implement a procedure to fit generalized linear models in R with the Student-t prior distribution by incorporating an approximate EM algorithm into the usual iteratively weighted least squares. We illustrate with several applications, including a series of logistic regressions predicting voting preferences, a small bioassay experiment, and an imputation model for a public health data set.

[1]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[2]  R. L. Winkler Scoring Rules and the Evaluation of Probability Assessors , 1969 .

[3]  S. Stigler Do Robust Estimators Work with Real Data , 1977 .

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  J. Bernardo Reference Posterior Distributions for Bayesian Inference , 1979 .

[6]  D. Spiegelhalter,et al.  Bayes Factors for Linear and Log‐Linear Models with Vague Prior Information , 1982 .

[7]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[8]  A. Albert,et al.  On the existence of maximum likelihood estimates in logistic regression models , 1984 .

[9]  A. F. Smith,et al.  Bayesian Methods in Practice: Experiences in the Pharmaceutical Industry , 1986 .

[10]  L. M. Berliner,et al.  Robust Bayes and Empirical Bayes Analysis with $_\epsilon$-Contaminated Priors , 1986 .

[11]  S Greenland,et al.  The fallacy of employing standardized regression coefficients and correlations as measures of effect. , 1986, American journal of epidemiology.

[12]  Jeremy MG Taylor,et al.  Robust Statistical Modeling Using the t Distribution , 1989 .

[13]  Emmanuel Lesaffre,et al.  Partial Separation in Logistic Discrimination , 1989 .

[14]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[15]  S L Hui,et al.  Validation techniques for logistic regression models. , 1991, Statistics in medicine.

[16]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[17]  John A. Nelder,et al.  Generalized linear models. 2nd ed. , 1993 .

[18]  D. Firth Bias reduction of maximum likelihood estimates , 1993 .

[19]  J. Berger,et al.  Estimation of a Covariance Matrix Using the Reference Prior , 1994 .

[20]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[21]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[22]  R. Christensen,et al.  A New Perspective on Priors for Generalized Linear Models , 1996 .

[23]  A. Raftery Approximate Bayes factors and accounting for model uncertainty in generalised linear models , 1996 .

[24]  L. Wasserman,et al.  The Selection of Prior Distributions by Formal Rules , 1996 .

[25]  Bradley P. Carlin,et al.  BAYES AND EMPIRICAL BAYES METHODS FOR DATA ANALYSIS , 1996, Stat. Comput..

[26]  C. Mantzoros,et al.  Insulin-like growth factor-I in relation to premenopausal ductal carcinoma in situ of the breast. , 1998, Epidemiology.

[27]  A. Agresti,et al.  Approximate is Better than “Exact” for Interval Estimation of Binomial Proportions , 1998 .

[28]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[29]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[30]  S Greenland,et al.  Software for hierarchical modeling of epidemiologic data. , 1998, Epidemiology.

[31]  Bradley P. Carlin,et al.  BAYES AND EMPIRICAL BAYES METHODS FOR DATA ANALYSIS , 1996, Stat. Comput..

[32]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[33]  S Greenland,et al.  Putting Background Information About Relative Risks into Conjugate Prior Distributions , 2001, Biometrics.

[34]  M. Schemper,et al.  A solution to the problem of separation in logistic regression , 2002, Statistics in medicine.

[35]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[36]  D. Rubin,et al.  MULTIPLE IMPUTATIONS IN SAMPLE SURVEYS-A PHENOMENOLOGICAL BAYESIAN APPROACH TO NONRESPONSE , 2002 .

[37]  Ivan Bratko,et al.  Analyzing Attribute Dependencies , 2003, PKDD.

[38]  A. Gelman Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) , 2004 .

[39]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[40]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[41]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[42]  Christopher Zorn,et al.  A Solution to Separation in Binary Response Models , 2005, Political Analysis.

[43]  Chuanhai Liu Robit Regression: A Simple Robust Alternative to Logistic and Probit Regression , 2005 .

[44]  Georg Heinze,et al.  A comparative investigation of methods for logistic regression with separated or nearly separated data , 2006, Statistics in medicine.

[45]  A. P. Dawid,et al.  Invariant Prior Distributions , 2006 .

[46]  Andrew Gelman,et al.  2. Average Predictive Comparisons for Models with Nonlinearity, Interactions, and Variance Components , 2007 .

[47]  David B. Dunson,et al.  Bayesian Methods for Highly Correlated Exposure Data , 2007, Epidemiology.

[48]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[49]  Andrew Gelman Bayes: Radical, liberal, or conservative? , 2007 .

[50]  D. Dunson,et al.  Bayesian Selection and Clustering of Polymorphisms in Functionally Related Genes , 2008 .

[51]  A. Gelman Scaling regression inputs by dividing by two standard deviations , 2008, Statistics in medicine.

[52]  David Firth,et al.  Bias reduction in exponential family nonlinear models , 2009 .

[53]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.