Grouped graphical Granger modeling for gene expression regulatory networks discovery

We consider the problem of discovering gene regulatory networks from time-series microarray data. Recently, graphical Granger modeling has gained considerable attention as a promising direction for addressing this problem. These methods apply graphical modeling methods on time-series data and invoke the notion of ‘Granger causality’ to make assertions on causality through inference on time-lagged effects. Existing algorithms, however, have neglected an important aspect of the problem—the group structure among the lagged temporal variables naturally imposed by the time series they belong to. Specifically, existing methods in computational biology share this shortcoming, as well as additional computational limitations, prohibiting their effective applications to the large datasets including a large number of genes and many data points. In the present article, we propose a novel methodology which we term ‘grouped graphical Granger modeling method’, which overcomes the limitations mentioned above by applying a regression method suited for high-dimensional and large data, and by leveraging the group structure among the lagged temporal variables according to the time series they belong to. We demonstrate the effectiveness of the proposed methodology on both simulated and actual gene expression data, specifically the human cancer cell (HeLa S3) cycle data. The simulation results show that the proposed methodology generally exhibits higher accuracy in recovering the underlying causal structure. Those on the gene expression data demonstrate that it leads to improved accuracy with respect to prediction of known links, and also uncovers additional causal relationships uncaptured by earlier works. Contact: aclozano@us.ibm.com

[1]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[2]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[3]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[4]  G. Toffolo,et al.  CNET: an algorithm for Reverse Engineering of Causal Gene Networks , 2008 .

[5]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[6]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[7]  S. Rafii,et al.  Splitting vessels: Keeping lymph apart from blood , 2003, Nature Medicine.

[8]  Xiaojiang Xu,et al.  Learning module networks from genome‐wide location and expression data , 2004, FEBS letters.

[9]  C. Granger Testing for causality: a personal viewpoint , 1980 .

[10]  B. Silverman,et al.  Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .

[11]  E. Brambilla,et al.  E2F-1, Skp2 and cyclin E oncoproteins are upregulated and directly correlated in high-grade neuroendocrine lung tumors , 2007, Oncogene.

[12]  Yan Liu,et al.  Temporal causal modeling with graphical granger methods , 2007, KDD '07.

[13]  Shun-Wu Fan,et al.  [Growth inhibition of MG-63 cells by cyclin A2 gene-specific small interfering RNA]. , 2007, Zhonghua yi xue za zhi.

[14]  C. Ball,et al.  Identification of genes periodically expressed in the human cell cycle and their expression in tumors. , 2002, Molecular biology of the cell.

[15]  R. Yoshida,et al.  Finding module-based gene networks with state-space models - Mining high-dimensional and short time-course gene expression data , 2007, IEEE Signal Processing Magazine.

[16]  J Carpenter,et al.  Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. , 2000, Statistics in medicine.

[17]  B. Silverman,et al.  Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .

[18]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[19]  P. Jackson,et al.  Cyclin E Uses Cdc6 as a Chromatin-Associated Receptor Required for DNA Replication , 2001, The Journal of cell biology.

[20]  S. Pandey,et al.  What Are Degrees of Freedom , 2008 .

[21]  Bernard Ducommun,et al.  Moderate variations in CDC25B protein levels modulate the response to DNA damaging agents , 2008, Cell cycle.

[22]  Peter Green,et al.  Highly Structured Stochastic Systems , 2003 .

[23]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[24]  Snigdhansu Chatterjee,et al.  Causality and pathway search in microarray time series experiment , 2007, Bioinform..

[25]  Korbinian Strimmer,et al.  Learning causal networks from systems biology time course data: an effective model selection procedure for the vector autoregressive process , 2007, BMC Bioinformatics.

[26]  João Ricardo Sato,et al.  Modeling gene expression regulatory networks with the sparse vector autoregressive model , 2007, BMC Systems Biology.

[27]  Anthony C. Davison,et al.  Bootstrap Methods and Their Application , 1998 .

[28]  D. Ray,et al.  CDC25A Levels Determine the Balance of Proliferation and Checkpoint Response , 2007, Cell cycle.

[29]  David Page,et al.  Modelling regulatory pathways in E. coli from time series expression profiles , 2002, ISMB.

[30]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[31]  W. Enders Applied Econometric Time Series , 1994 .

[32]  Li Li,et al.  Discovery of time-delayed gene regulatory networks based on temporal gene expression profiling , 2006, BMC Bioinformatics.

[33]  Beryl Rawson,et al.  Degrees of Freedom , 2010 .