Penalized integrative semiparametric interaction analysis for multiple genetic datasets

In this article, we consider a semiparametric additive partially linear interaction model for the integrative analysis of multiple genetic datasets. The goals are to identify important genetic predictors and gene-gene interactions and to estimate the nonparametric functions that describe the environmental effects at the same time. To find the similarities and differences of the genetic effects across different datasets, we impose a group structure on the regression coefficients matrix under the homogeneity assumption, ie, models for different datasets share the same sparsity structure, but the coefficients may differ across datasets. We develop an iterative approach to estimate the parameters of main effects, interactions and nonparametric functions, where a reparametrization of interaction parameters is implemented to meet the strong hierarchy assumption. We demonstrate the advantages of the proposed method in identification, estimation, and prediction in a series of numerical studies. We also apply the proposed method to the Skin Cutaneous Melanoma data and the lung cancer data from the Cancer Genome Atlas.

[1]  Yifan Sun,et al.  Identification of cancer omics commonality and difference via community fusion , 2018, Statistics in medicine.

[2]  Jian Huang,et al.  Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors , 2012, Statistics and Computing.

[3]  G. Kundu,et al.  Semaphorin 3A Suppresses Tumor Growth and Metastasis in Mice Melanoma Model , 2012, PloS one.

[4]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[5]  Ji Zhu,et al.  Variable Selection With the Strong Heredity Constraint and Its Oracle Property , 2010 .

[6]  Michael Huemer,et al.  AID/APOBEC deaminases and cancer , 2015, Oncoscience.

[7]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[8]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[9]  Z. Szallasi,et al.  Spatial and temporal diversity in genomic instability processes defines lung cancer evolution , 2014, Science.

[10]  Hua Liang,et al.  Estimation and Variable Selection for Semiparametric Additive Partial Linear Models (SS-09-140). , 2011, Statistica Sinica.

[11]  Yuehua Cui,et al.  Varying coefficient model for gene-environment interaction: a non-linear look , 2011, Bioinform..

[12]  Hua Liang,et al.  ESTIMATION AND VARIABLE SELECTION FOR GENERALIZED ADDITIVE PARTIAL LINEAR MODELS. , 2011, Annals of statistics.

[13]  Gary Lyman,et al.  Age as a prognostic factor in the malignant melanoma population , 1994, Annals of Surgical Oncology.

[14]  Xingjie Shi,et al.  A penalized robust semiparametric approach for gene–environment interactions , 2015, Statistics in medicine.

[15]  R. Tibshirani,et al.  A LASSO FOR HIERARCHICAL INTERACTIONS. , 2012, Annals of statistics.

[16]  Shuangge Ma,et al.  VARIABLE SELECTION IN PARTLY LINEAR REGRESSION MODEL WITH DIVERGING DIMENSIONS FOR RIGHT CENSORED DATA. , 2012, Statistica Sinica.

[17]  Jian Huang,et al.  Identifying gene‐gene interactions using penalized tensor regression , 2018, Statistics in medicine.

[18]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[19]  Jian Huang,et al.  Integrative Analysis of Cancer Diagnosis Studies with Composite Penalization , 2014, Scandinavian journal of statistics, theory and applications.

[20]  D. Bentrem,et al.  The significant role of mast cells in cancer , 2011, Cancer and Metastasis Reviews.

[21]  Shuangge Ma,et al.  Promoting Similarity of Sparsity Structures in Integrative Analysis With Penalization , 2015, Journal of the American Statistical Association.

[22]  E. Korn,et al.  Stage‐specific alterations of the genome, transcriptome, and proteome during colorectal carcinogenesis , 2007, Genes, chromosomes & cancer.

[23]  M. Martinka,et al.  Stage-specific prognostic biomarkers in melanoma , 2015, Oncotarget.

[24]  J. Lafitte,et al.  Profiling gene expression of whole cytochrome P450 superfamily in human bronchial and peripheral lung tissues: Differential expression in non-small cell lung cancers. , 2010, Biochimie.

[25]  Peter A. Jones,et al.  Alterations of immune response of non-small cell lung cancer with Azacytidine , 2013, Oncotarget.

[26]  Qun Wang,et al.  Bioinformatics analyses of the differences between lung adenocarcinoma and squamous cell carcinoma using The Cancer Genome Atlas expression data , 2017, Molecular medicine reports.

[27]  Guang Cheng,et al.  Semiparametric regression models with additive nonparametric components and high dimensional parametric components , 2012, Comput. Stat. Data Anal..

[28]  Shuangge Ma,et al.  Integrative analysis of gene–environment interactions under a multi‐response partially linear varying coefficient model , 2014, Statistics in medicine.

[29]  A. Stromberg,et al.  Correlation Between Prognostic Factors and Increasing Age in Melanoma , 2004, Annals of Surgical Oncology.

[30]  Debashis Ghosh,et al.  Classification and Selection of Biomarkers in Genomic Data Using LASSO , 2005, Journal of biomedicine & biotechnology.

[31]  A. Martí,et al.  Gene–gene interaction between PPARγ2 and ADRβ3 increases obesity risk in children and adolescents , 2004, International Journal of Obesity.

[32]  M. Szyf,et al.  A common promoter hypomethylation signature in invasive breast, liver and prostate cancer cell lines reveals novel targets involved in cancer invasiveness , 2015, Oncotarget.

[33]  Jin Liu,et al.  Promoting similarity of model sparsity structures in integrative analysis of cancer genetic data , 2017, Statistics in medicine.

[34]  Cen Wu,et al.  Additive varying-coefficient model for nonlinear gene-environment interactions , 2018, Statistical applications in genetics and molecular biology.

[35]  Steven A. Roberts,et al.  An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers , 2013, Nature Genetics.

[36]  Jian Huang,et al.  Regularized Estimation in the Accelerated Failure Time Model with High‐Dimensional Covariates , 2006, Biometrics.

[37]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[38]  Ashok Palaniappan,et al.  Computational Identification of Novel Stage-Specific Biomarkers in Colorectal Cancer Progression , 2016, PloS one.

[39]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[40]  Shuangge Ma,et al.  PENALIZED VARIABLE SELECTION PROCEDURE FOR COX MODELS WITH SEMIPARAMETRIC RELATIVE RISK. , 2010, Annals of statistics.