Bayesian variable selection for high dimensional generalized linear models : Convergence rates of the fitted densities

Bayesian variable selection has gained much empirical success recently in a variety of applications when the number K of explanatory variables (x 1 ,..., x K ) is possibly much larger than the sample size n. For generalized linear models, if most of the xj's have very small effects on the response y, we show that it is possible to use Bayesian variable selection to reduce overfitting caused by the curse of dimensionality K » n. In this approach a suitable prior can be used to choose a few out of the many x j 's to model y, so that the posterior will propose probability densities p that are "often close" to the true density p* in some sense. The closeness can be described by a Hellinger distance between p and p* that scales at a power very close to n -½ , which is the "finite-dimensional rate" corresponding to a low-dimensional situation. These findings extend some recent work of Jiang [Technical Report 05-02 (2005) Dept. Statistics, Northwestern Univ.] on consistency of Bayesian variable selection for binary classification.

[1]  W. Wong,et al.  Probability inequalities for likelihood ratios and convergence rates of sieve MLEs , 1995 .

[2]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[3]  R. Kohn,et al.  Nonparametric regression using Bayesian variable selection , 1996 .

[4]  Merlise A. Clyde,et al.  Accounting for Model Uncertainty in Poisson Regression Models: Particulate Matter and Mortality in B , 1997 .

[5]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[6]  Subhashis Ghosal,et al.  Asymptotic normality of posterior distributions in high-dimensional linear models , 1999 .

[7]  A. V. D. Vaart,et al.  Convergence rates of posterior distributions , 2000 .

[8]  Herbert K. H. Lee Consistency of posterior distributions for neural networks , 2000, Neural Networks.

[9]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[10]  Robert Kohn,et al.  Nonparametric regression using linear combinations of basis functions , 2001, Stat. Comput..

[11]  Marina Vannucci,et al.  Gene selection: a Bayesian variable selection approach , 2003, Bioinform..

[12]  Marina Vannucci,et al.  Bayesian Variable Selection in Multinomial Probit Models to Identify Molecular Signatures of Disease Stage , 2004, Biometrics.

[13]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[14]  Stephen T. C. Wong,et al.  Cancer classification and prediction using logistic regression with Bayesian gene selection , 2004, J. Biomed. Informatics.

[15]  Y. Ritov,et al.  Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[16]  Xinlei,et al.  A Hierarchical Bayes Approach to Variable Selection for Generalized Linear Models , 2004 .

[17]  David J Nott,et al.  Sampling Schemes for Bayesian Variable Selection in Generalized Linear Models , 2004 .

[18]  P. Bickel,et al.  Some theory for Fisher''s linear discriminant function , 2004 .

[19]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[20]  Wenxin Jiang On the Consistency of Bayesian Variable Selection for High Dimensional Binary Regression and Classification , 2006, Neural Computation.

[21]  Tong Zhang From ɛ-entropy to KL-entropy: Analysis of minimum information complexity density estimation , 2006, math/0702653.

[22]  Runze Li,et al.  Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery , 2006, math/0602133.

[23]  Peter Buhlmann Boosting for high-dimensional linear models , 2006, math/0606789.

[24]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[25]  A. V. D. Vaart,et al.  Misspecification in infinite-dimensional Bayesian statistics , 2006, math/0607023.

[26]  E. Greenshtein Best subset selection, persistence in high-dimensional statistical learning and optimization under l1 constraint , 2006, math/0702684.