A Penalized Robust Method for Identifying Gene–Environment Interactions

In high‐throughput studies, an important objective is to identify gene–environment interactions associated with disease outcomes and phenotypes. Many commonly adopted methods assume specific parametric or semiparametric models, which may be subject to model misspecification. In addition, they usually use significance level as the criterion for selecting important interactions. In this study, we adopt the rank‐based estimation, which is much less sensitive to model specification than some of the existing methods and includes several commonly encountered data and models as special cases. Penalization is adopted for the identification of gene–environment interactions. It achieves simultaneous estimation and identification and does not rely on significance level. For computation feasibility, a smoothed rank estimation is further proposed. Simulation shows that under certain scenarios, for example, with contaminated or heavy‐tailed data, the proposed method can significantly outperform the existing alternatives with more accurate identification. We analyze a lung cancer prognosis study with gene expression measurements under the AFT (accelerated failure time) model. The proposed method identifies interactions different from those using the alternatives. Some of the identified genes have important implications.

[1]  Ann Richmond,et al.  Role of chemokines in tumor growth. , 2007, Cancer letters.

[2]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[3]  Cornelis J H van de Velde,et al.  Disrupted Expression of CXCL5 in Colorectal Cancer Is Associated with Rapid Tumor Formation in Rats and Poor Prognosis in Patients , 2008, Clinical Cancer Research.

[4]  H. Hansen,et al.  Lung cancer. , 1990, Cancer chemotherapy and biological response modifiers.

[5]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[6]  Shuangge Ma,et al.  Penalised variable selection with U-estimates , 2010, Journal of nonparametric statistics.

[7]  Jian Huang,et al.  Identification of gene-environment interactions in cancer studies using penalization. , 2013, Genomics.

[8]  Takayuki Fukui,et al.  Transcriptional silencing of secreted frizzled related protein 1 (SFRP1) by promoter hypermethylation in non-small-cell lung cancer , 2005, Oncogene.

[9]  Robert J Tibshirani,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .

[10]  Pinchas Cohen,et al.  Insulin-like growth factor binding protein-3 inhibits the growth of non-small cell lung cancer. , 2002, Cancer research.

[11]  Robert Tibshirani,et al.  Survival analysis with high-dimensional covariates , 2010, Statistical methods in medical research.

[12]  Kai Wang,et al.  Accounting for linkage disequilibrium in genome-wide association studies: A penalized regression method. , 2013, Statistics and its interface.

[13]  D. Thomas,et al.  Methods for investigating gene-environment interactions in candidate pathway and genome-wide association studies. , 2010, Annual review of public health.

[14]  Jialiang Li,et al.  Time‐dependent ROC analysis under diverse censoring patterns , 2011, Statistics in medicine.

[15]  Qing Lu,et al.  Detecting genetic interactions for quantitative traits with U‐statistics , 2011, Genetic epidemiology.

[16]  Paul Dowling,et al.  Analysis of acute‐phase proteins, AHSG, C3, CLI, HP and SAA, reveals distinctive expression patterns associated with breast, colorectal and lung cancer , 2012, International journal of cancer.

[17]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[18]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[19]  Kari E. North,et al.  The Importance of Gene—Environment Interaction , 2008 .

[20]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[21]  Li Mao,et al.  Correlation between insulin-like growth factor-binding protein-3 promoter methylation and prognosis of patients with stage I non-small cell lung cancer. , 2002, Clinical cancer research : an official journal of the American Association for Cancer Research.

[22]  R. Sherman The Limiting Distribution of the Maximum Rank Correlation Estimator , 1993 .

[23]  Federico Rea,et al.  Anterior Gradient 2 Overexpression in Lung Adenocarcinoma , 2012, Applied immunohistochemistry & molecular morphology : AIMM.

[24]  Steven Gallinger,et al.  α2HS-glycoprotein, an Antagonist of Transforming Growth Factor β In vivo, Inhibits Intestinal Tumor Progression , 2004, Cancer Research.

[25]  Jian Huang,et al.  COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION. , 2011, The annals of applied statistics.

[26]  T. Standiford,et al.  Interleukin-8 (IL-8): the major neutrophil chemotactic factor in the lung. , 1991, Experimental lung research.

[27]  Jian Huang,et al.  Variable selection in the accelerated failure time model via the bridge method , 2010, Lifetime data analysis.

[28]  福井 高幸,et al.  Transcriptional silencing of secreted frizzled related protein 1 (SFRP1) by promoter hypermethylation in non-small-cell lung cancer , 2006 .

[29]  D. Hunter Gene–environment interactions in human diseases , 2005, Nature Reviews Genetics.

[30]  P. Qiu The Statistical Evaluation of Medical Tests for Classification and Prediction , 2005 .

[31]  Jian Huang,et al.  Regularized ROC method for disease classification and biomarker selection with microarray data , 2005, Bioinform..

[32]  K. Coombes,et al.  Robust Gene Expression Signature from Formalin-Fixed Paraffin-Embedded Samples Predicts Prognosis of Non–Small-Cell Lung Cancer Patients , 2011, Clinical Cancer Research.

[33]  Aaron K. Han Non-parametric analysis of a generalized regression model: the maximum rank correlation estimator , 1987 .

[34]  Elie Tamer,et al.  Partial rank estimation of duration models with general forms of censoring , 2007 .

[35]  T. Ørntoft,et al.  Low ANXA10 expression is associated with disease aggressiveness in bladder cancer , 2011, British Journal of Cancer.