Pathway Detection Based on Hierarchical LASSO Regression Model

Rapid and accurate identification of potentially interested pathways through the analysis of genome-wide expres- sion profiles remains an important challenge in bioinformatics. Most existing methods are based on hypothesis testing, such as GSEA. These methods mainly focus on individual pathways and rank them based on their individual strengths. However, biolog- ical pathways often work together to function. Therefore, it is important to consider their correlations in detection of pathways that are most closely related to the phenotypes. Considering this problem in the framework of variable selection, we propose a hierarchical LASSO regression (HLR) model to detect differen- tially expressed gene pathways, which automatically takes into account the correlation structure among the genes via regression. This approach is able to both select important gene pathways and remove unimportant genes within selected pathways. Both simulation and real data analysis show promising results.

[1]  Bin Yu,et al.  Simultaneous Gene Clustering and Subset Selection for Sample Classification Via MDL , 2003, Bioinform..

[2]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[3]  G. Tian,et al.  Statistical Applications in Genetics and Molecular Biology Sparse Logistic Regression with Lp Penalty for Biomarker Identification , 2011 .

[4]  Tao Cai,et al.  Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary , 2005, Bioinform..

[5]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[6]  Hongzhe Li,et al.  Group additive regression models for genomic data analysis. , 2008, Biostatistics.

[7]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[9]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[10]  Jean-François Millau,et al.  P53 transcriptional activities: a general overview and some thoughts. , 2009, Mutation research.

[11]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[12]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[13]  Hongyu Zhao,et al.  Pathway analysis using random forests classification and regression , 2006, Bioinform..

[14]  Hongzhe Li,et al.  Nonparametric pathway-based regression models for analysis of genomic data. , 2007, Biostatistics.

[15]  Jian Huang,et al.  BMC Bioinformatics BioMed Central Methodology article Supervised group Lasso with applications to microarray data , 2007 .

[16]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[17]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[18]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[19]  Xiaotong Shen,et al.  Adaptive Model Selection , 2002 .

[20]  Robert W. Wilson,et al.  Regressions by Leaps and Bounds , 2000, Technometrics.

[21]  Jaeyoung Kim,et al.  Identifying Biologically Significant Pathways by Gene Set Enrichment Analysis Using Fisher's Criterion , 2008, 2008 Second International Conference on Future Generation Communication and Networking.

[22]  Seon-Young Kim,et al.  PAGE: Parametric Analysis of Gene Set Enrichment , 2005, BMC Bioinform..

[23]  Jun Lu,et al.  Pathway level analysis of gene expression using singular value decomposition , 2005, BMC Bioinformatics.

[24]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[25]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.