A false‐discovery‐rate‐based loss framework for selection of interactions

Interaction effects have been consistently found important in explaining the variation in outcomes in many scientific research fields. Yet, in practice, variable selection including interactions is complicated due to the limited sample size, conflicting philosophies regarding model interpretability, and accompanying amplified multiple-testing problems. The lack of statistically sound algorithms for automatic variable selection with interactions has discouraged activities in exploring important interaction effects. In this article, we investigated issues of selecting interactions from three aspects: (1) What is the model space to be searched? (2) How is the hypothesis-testing performed? (3) How to address the multiple-testing issue? We propose loss functions and corresponding decision rules that control FDR in a Bayesian context. Properties of the decision rules are discussed and their performance in terms of power and FDR is compared through simulations. Methods are illustrated on data from a colorectal cancer study assessing the chemotherapy treatments and data from a diffuse large-B-cell lymphoma study assessing the prognostic effect of gene expressions.

[1]  Hugh Chipman,et al.  Bayesian variable selection with related predictors , 1995, bayes-an/9510001.

[2]  J. M. Ryan,et al.  Linear Transformations of Polynomial Regression Models , 1982 .

[3]  Adrian F. M. Smith,et al.  Bayesian Inference for Generalized Linear and Proportional Hazards Models Via Gibbs Sampling , 1993 .

[4]  D. Sargent,et al.  Chemotherapy permits resection of metastatic colorectal cancer: experience from Intergroup N9741. , 2005, Annals of oncology : official journal of the European Society for Medical Oncology.

[5]  Meland,et al.  THE USE OF MOLECULAR PROFILING TO PREDICT SURVIVAL AFTER CHEMOTHERAPY FOR DIFFUSE LARGE-B-CELL LYMPHOMA , 2002 .

[6]  Locating disease genes using Bayesian variable selection with the Haseman-Elston method , 2003, BMC genetics.

[7]  D. Ghosh,et al.  The false discovery rate: a variable selection perspective , 2006 .

[8]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[9]  K. Lunetta,et al.  Screening large-scale association study data: exploiting interactions using random forests , 2004, BMC Genetics.

[10]  P. Müller,et al.  Optimal Sample Size for Multiple Testing , 2004 .

[11]  L. Wasserman,et al.  Analysis of multilocus models of association , 2003, Genetic epidemiology.

[12]  Julian P T Higgins,et al.  Bayesian synthesis of epidemiological evidence with different combinations of exposure groups: application to a gene–gene–environment interaction , 2006, Statistics in medicine.

[13]  J. Licht,et al.  Specific peptide interference reveals BCL6 transcriptional and oncogenic mechanisms in B-cell lymphoma cells , 2004, Nature Medicine.

[14]  Ralph L. Keeney,et al.  Decisions with multiple objectives: preferences and value tradeoffs , 1976 .

[15]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[16]  N. Risch Searching for genetic determinants in the new millennium , 2000, Nature.

[17]  J. Nelder The Selection of Terms in Response-Surface Models—How Strong is the Weak-Heredity Principle? , 1998 .

[18]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[19]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[20]  Sanjay Gupta,et al.  Lineage-Specific Modulation of Interleukin 4 Signaling by Interferon Regulatory Factor 4 , 1999, The Journal of experimental medicine.

[21]  Sanjay Gupta,et al.  Stage-Specific Modulation of IFN-Regulatory Factor 4 Function by Krüppel-Type Zinc Finger Proteins1 , 2001, The Journal of Immunology.

[22]  S. Chib Bayes inference in the Tobit censored regression model , 1992 .

[23]  Ash A. Alizadeh,et al.  Prediction of survival in diffuse large-B-cell lymphoma based on the expression of six genes. , 2004, The New England journal of medicine.

[24]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[25]  Martin A. Tanner,et al.  Posterior Computations for Censored Regression Data , 1990 .

[26]  Larry Wasserman,et al.  Bayesian and Frequentist Multiple Testing , 2002 .

[27]  Petr Pancoska,et al.  p53 has a direct apoptogenic role at the mitochondria. , 2003, Molecular cell.

[28]  Daniel J Sargent,et al.  A randomized controlled trial of fluorouracil plus leucovorin, irinotecan, and oxaliplatin combinations in patients with previously untreated metastatic colorectal cancer. , 2004, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[29]  Debashis Ghosh,et al.  A Bayesian method for finding interactions in genomic studies , 2004 .

[30]  J. Peixoto A Property of Well-Formulated Polynomial Regression Models , 1990 .

[31]  T. Reich,et al.  A perspective on epistasis: limits of models displaying no main effect. , 2002, American journal of human genetics.

[32]  R. L. Keeney,et al.  Decisions with Multiple Objectives: Preferences and Value Trade-Offs , 1977, IEEE Transactions on Systems, Man, and Cybernetics.