Prior elicitation, variable selection and Bayesian computation for logistic regression models

Bayesian selection of variables is often difficult to carry out because of the challenge in specifying prior distributions for the regression parameters for all possible models, specifying a prior distribution on the model space and computations. We address these three issues for the logistic regression model. For the first, we propose an informative prior distribution for variable selection. Several theoretical and computational properties of the prior are derived and illustrated with several examples. For the second, we propose a method for specifying an informative prior on the model space, and for the third we propose novel methods for computing the marginal distribution of the data. The new computational algorithms only require Gibbs samples from the full model to facilitate the computation of the prior and posterior model probabilities for all possible models. Several properties of the algorithms are also derived. The prior specification for the first challenge focuses on the observables in that the elicitation is based on a prior prediction y0 for the response vector and a quantity a0 quantifying the uncertainty in y0. Then, y0 and a0 are used to specify a prior for the regression coefficients semi‐automatically. Examples using real data are given to demonstrate the methodology.

[1]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[2]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[3]  D. Spiegelhalter,et al.  Bayes Factors and Choice Criteria for Linear Models , 1980 .

[4]  Peter E. Rossi,et al.  Bayesian analysis of dichotomous quantal response models , 1984 .

[5]  M. West,et al.  Dynamic Generalized Linear Models and Bayesian Forecasting , 1985 .

[6]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[7]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[8]  J. Albert Computational methods using a Bayesian hierarchical generalized linear model , 1988 .

[9]  Scott L. Zeger,et al.  Generalized linear models with random e ects: a Gibbs sampling approach , 1991 .

[10]  T. Merigan,et al.  Placebo-controlled trial to evaluate zidovudine in treatment of human immunodeficiency virus infection in asymptomatic patients with hemophilia. NHF-ACTG 036 Study Group. , 1991, Blood.

[11]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[12]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[13]  Ming-Hui Chen Importance-Weighted Marginal Bayesian Posterior Density Estimation , 1994 .

[14]  J. Currier Zidovudine for asymptomatic human immunodeficiency virus infection , 1994, ACP Journal Club.

[15]  Purushottam W. Laud,et al.  A Predictive Approach to the Analysis of Designed Experiments , 1994 .

[16]  M. Lawera Predictive inference : an introduction , 1995 .

[17]  M. Schervish Theory of Statistics , 1995 .

[18]  Purushottam W. Laud,et al.  Predictive Model Selection , 1995 .

[19]  Seymour Geisser,et al.  8. Predictive Inference: An Introduction , 1995 .

[20]  J. Berger,et al.  The Intrinsic Bayes Factor for Model Selection and Prediction , 1996 .

[21]  A. Gelfand,et al.  Efficient parametrizations for generalized linear mixed models, (with discussion). , 1996 .

[22]  Edward I. George,et al.  Two Approaches to Bayesian Model Selection with Applications , 1996 .

[23]  R. Christensen,et al.  A New Perspective on Priors for Generalized Linear Models , 1996 .

[24]  A. Raftery Approximate Bayes factors and accounting for model uncertainty in generalised linear models , 1996 .

[25]  Ming-Hui Chen,et al.  Performance study of marginal posterior density estimation via Kullback-Leibler divergence , 1997 .

[26]  Ming-Hui Chen,et al.  ESTIMATING RATIOS OF NORMALIZING CONSTANTS FOR DENSITIES WITH DIFFERENT DIMENSIONS , 1997 .

[27]  Kathryn Roeder,et al.  A Bayesian semiparametric model for case-control studies with errors in variables , 1997 .

[28]  Ming-Hui Chen,et al.  On Monte Carlo methods for estimating ratios of normalizing constants , 1997 .

[29]  S. MacEachern,et al.  Bayesian variable selection for proportional hazards models , 1999 .

[30]  Ming-Hui Chen,et al.  Properties of Prior and Posterior Distributions for Multivariate Categorical Response Data Models , 1999 .