A novel network and sparsity constraint regression model for functional module identification in genomic data analysis

It is important to incorporate the accumulated biological pathways and interactions knowledge into genome-wide association studies to elucidate correlations between genetic variants and disease. Although a number of methods have been developed recently to identify disease related genes using prior biological knowledge, most methods only encourage the smoothness of the coefficients along the network which does not address the case where two connected genes both have positive or negative effects on the response. To overcome this issue, we propose to apply the Laplacian operation on the absolute values of the coefficients to take account of the positive and negative effects as well as a L1 norm term to impose sparsity. Further, an efficient algorithm is developed to get the whole solution path. Simulation studies show that the proposed method has better performance than network-constrained regularisation without absolute values. Applying our method on a microarray data of Alzheimer's disease (AD) identifies several subnetworks on Kyoto Encyclopedia of Genes and Genomes (KEGG) transcriptional pathways that are related to progression of AD. Many of those findings are confirmed by published literature.

[1]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[2]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[3]  Serban Nacu,et al.  Gene expression network analysis and applications to immunology , 2007, Bioinform..

[4]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[5]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[6]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[7]  W. Markesbery,et al.  Incipient Alzheimer's disease: Microarray correlation analyses reveal major transcriptional and tumor suppressor responses , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Wei Pan,et al.  Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model , 2008, Bioinform..

[9]  K. Herrup,et al.  Aβ Oligomers Induce Neuronal Cell Cycle Events in Alzheimer's Disease , 2008, The Journal of Neuroscience.

[10]  M. Kawaichi,et al.  A New Functional Screening System for Identification of Regulators for the Generation of Amyloid β-Protein* , 2002, The Journal of Biological Chemistry.

[11]  Hongzhe Li,et al.  In Response to Comment on "Network-constrained regularization and variable selection for analysis of genomic data" , 2008, Bioinform..

[12]  Xiongwei Zhu,et al.  Neuronal CDK7 in hippocampus is related to aging and Alzheimer disease , 2000, Neurobiology of Aging.

[13]  Akihiko Takashima,et al.  Chaperones increase association of tau protein with microtubules , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[14]  M. Savage,et al.  Cathepsin G: Localization in human cerebral cortex and generation of amyloidogenic fragments from the β-amyloid precursor protein , 1994, Neuroscience.

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  R. Maccioni,et al.  Role of the JAKs/STATs pathway in the intracellular calcium changes induced by interleukin-6 in hippocampal neurons , 2005, Neurotoxicity Research.

[17]  Wei Pan,et al.  Incorporating Predictor Network in Penalized Regression with Application to Microarray Data , 2010, Biometrics.

[18]  J. Loeffler,et al.  Targeting CREB-binding protein (CBP) loss of function as a therapeutic strategy in neurological disorders. , 2004, Biochemical pharmacology.

[19]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[20]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..