In Response to Comment on "Network-constrained regularization and variable selection for analysis of genomic data"

MOTIVATION Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of a priori information gathered over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene-expression data. How to incorporate information encoded by the known biological networks or graphs into analysis of numerical data raises interesting statistical challenges. In this article, we introduce a network-constrained regularization procedure for linear regression analysis in order to incorporate the information from these graphs into an analysis of the numerical data, where the network is represented as a graph and its corresponding Laplacian matrix. We define a network-constrained penalty function that penalizes the L(1)-norm of the coefficients but encourages smoothness of the coefficients on the network. RESULTS Simulation studies indicated that the method is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity than the commonly used procedures that do not use the pathway structure information. Application to one glioblastoma microarray gene-expression dataset identified several subnetworks on several of the Kyoto Encyclopedia of Genes and Genomes (KEGG) transcriptional pathways that are related to survival from glioblastoma, many of which were supported by published literatures. CONCLUSIONS The proposed network-constrained regularization procedure efficiently utilizes the known pathway structures in identifying the relevant genes and the subnetworks that might be related to phenotype in a general regression framework. As more biological networks are identified and documented in databases, the proposed method should find more applications in identifying the subnetworks that are related to diseases and other biological processes.

[1]  Siegfried Kropf,et al.  Prognostic relevance of MAPK expression in glioblastoma multiforme. , 2003, International journal of oncology.

[2]  I. Hussaini,et al.  The protein kinase C-η isoform induces proliferation in glioblastoma cell lines through an ERK/Elk-1 pathway , 2007, Oncogene.

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[5]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[6]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[7]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[8]  Wei Pan,et al.  Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model , 2008, Bioinform..

[9]  F. Chung Laplacians and the Cheeger Inequality for Directed Graphs , 2005 .

[10]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[11]  Hongzhe Li,et al.  A Markov random field model for network-based analysis of genomic data , 2007, Bioinform..

[12]  Hanna Göransson,et al.  Expression analysis of genes involved in brain tumor progression driven by retroviral insertional mutagenesis in mice , 2005, Oncogene.

[13]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[14]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[15]  M. Kubbies,et al.  Role of claudins in tumorigenesis. , 2005, Advanced drug delivery reviews.

[16]  Li Zhang,et al.  Prognostic Associations of Activated Mitogen-Activated Protein Kinase and Akt Pathways in Glioblastoma , 2006, Clinical Cancer Research.

[17]  Silvia Massari,et al.  Invasive behaviour of glioblastoma cell lines is associated with altered organisation of the cadherin-catenin adhesion system. , 2002, Journal of cell science.

[18]  M. Wigler,et al.  PTEN, a Putative Protein Tyrosine Phosphatase Gene Mutated in Human Brain, Breast, and Prostate Cancer , 1997, Science.

[19]  J. Allison,et al.  Enhancement of Antitumor Immunity by CTLA-4 Blockade , 1996, Science.

[20]  Harald Binder,et al.  Comment on "network-constrained regularization and variable selection for analysis of genomic data" , 2008, Bioinform..

[21]  Hongzhe Li,et al.  A hidden spatial-temporal Markov random field model for network-based analysis of time course gene expression data , 2008, 0803.3942.

[22]  S. Horvath,et al.  Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a molecular target , 2006, Proceedings of the National Academy of Sciences.

[23]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[24]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[25]  Hongzhe Li,et al.  Nonparametric pathway-based regression models for analysis of genomic data. , 2007, Biostatistics.

[26]  Thomas Lengauer,et al.  Statistical Applications in Genetics and Molecular Biology Calculating the Statistical Significance of Changes in Pathway Activity From Gene Expression Data , 2011 .

[27]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[28]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[29]  D. Accili,et al.  FoxOs at the Crossroads of Cellular Metabolism, Differentiation, and Transformation , 2004, Cell.