Variable selection in qualitative models via an entropic explanatory power

Abstract The variable selection method proposed in the paper is based on the evaluation of the Kullback–Leibler distance between the full (or encompassing) model and its submodels. The Bayesian implementation of the method does not require a separate prior modeling on the submodels since the corresponding parameters for the submodels are defined as the Kullback–Leibler projections of the full model parameters. The result of the selection procedure is the submodel with the smallest number of covariates which is at an acceptable distance of the full model. We introduce the notion of explanatory power of a model and scale the maximal acceptable distance in terms of the explanatory power of the full model. Moreover, an additivity property between embedded submodels shows that our selection procedure is equivalent to select the submodel with the smallest number of covariates which has a sufficient explanatory power. We illustrate the performances of this method on a breast cancer dataset

[1]  Christian P. Robert,et al.  The Bayesian choice , 1994 .

[2]  M. Gerber,et al.  Alcohol consumption in a case‐control study of breast cancer in southern france , 1989, International journal of cancer.

[3]  Peter E. Rossi,et al.  Bayes factors for nonlinear hypotheses and likelihood distributions , 1992 .

[4]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[5]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[6]  C. Robert,et al.  Model choice in generalised linear models: A Bayesian approach via Kullback-Leibler projections , 1998 .

[7]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[8]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[9]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[10]  D. Madigan,et al.  Correction to: ``Bayesian model averaging: a tutorial'' [Statist. Sci. 14 (1999), no. 4, 382--417; MR 2001a:62033] , 2000 .

[11]  Simon J. Godsill,et al.  On the relationship between MCMC model uncertainty methods , 1997 .

[12]  Ronald Christensen,et al.  Log-Linear Models and Logistic Regression , 1997 .

[13]  A. Gelfand,et al.  Bayesian Model Choice: Asymptotics and Exact Calculations , 1994 .

[14]  Bayesian test of homogeneity for Markov chains , 1997 .