Discovering graphical Granger causality using the truncating lasso penalty

Motivation: Components of biological systems interact with each other in order to carry out vital cell functions. Such information can be used to improve estimation and inference, and to obtain better insights into the underlying cellular mechanisms. Discovering regulatory interactions among genes is therefore an important problem in systems biology. Whole-genome expression data over time provides an opportunity to determine how the expression levels of genes are affected by changes in transcription levels of other genes, and can therefore be used to discover regulatory interactions among genes. Results: In this article, we propose a novel penalization method, called truncating lasso, for estimation of causal relationships from time-course gene expression data. The proposed penalty can correctly determine the order of the underlying time series, and improves the performance of the lasso-type estimators. Moreover, the resulting estimate provides information on the time lag between activation of transcription factors and their effects on regulated genes. We provide an efficient algorithm for estimation of model parameters, and show that the proposed method can consistently discover causal relationships in the large p, small n setting. The performance of the proposed model is evaluated favorably in simulated, as well as real, data examples. Availability: The proposed truncating lasso method is implemented in the R-package ‘grangerTlasso’ and is freely available at http://www.stat.lsa.umich.edu/∼shojaie/ Contact: shojaie@umich.edu

[1]  Aurélien Mazurie,et al.  Gene networks inference using dynamic Bayesian networks , 2003, ECCB.

[2]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[3]  Yan Liu,et al.  Temporal causal modeling with graphical granger methods , 2007, KDD '07.

[4]  Ali Shojaie,et al.  Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. , 2009, Biometrika.

[5]  G. Michailidis,et al.  Network Enrichment Analysis in Complex Experiments , 2010, Statistical applications in genetics and molecular biology.

[6]  Helmut Ltkepohl,et al.  New Introduction to Multiple Time Series Analysis , 2007 .

[7]  David Page,et al.  Modelling regulatory pathways in E. coli from time series expression profiles , 2002, ISMB.

[8]  Jan de Leeuw,et al.  Block-relaxation Algorithms in Statistics , 1994 .

[9]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[10]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[11]  Snigdhansu Chatterjee,et al.  Causality and pathway search in microarray time series experiment , 2007, Bioinform..

[12]  Korbinian Strimmer,et al.  Learning causal networks from systems biology time course data: an effective model selection procedure for the vector autoregressive process , 2007, BMC Bioinformatics.

[13]  Jianfeng Feng,et al.  Granger causality vs. dynamic Bayesian network inference: a comparative study , 2009, BMC Bioinformatics.

[14]  Ambuj K. Singh,et al.  Deriving phylogenetic trees from the similarity analysis of metabolic pathways , 2003, ISMB.

[15]  C. Ball,et al.  Identification of genes periodically expressed in the human cell cycle and their expression in tumors. , 2002, Molecular biology of the cell.

[16]  Katy C. Kao,et al.  Transcriptome-based determination of multiple transcription regulator activities in Escherichia coli by using network component analysis. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[17]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[18]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[19]  G. Toffolo,et al.  CNET: an algorithm for Reverse Engineering of Causal Gene Networks , 2008 .

[20]  R. Yoshida,et al.  Finding module-based gene networks with state-space models - Mining high-dimensional and short time-course gene expression data , 2007, IEEE Signal Processing Magazine.

[21]  Ali Shojaie,et al.  Analysis of Gene Sets Based on the Underlying Regulatory Network , 2009, J. Comput. Biol..

[22]  Naoki Abe,et al.  Grouped graphical Granger modeling for gene expression regulatory networks discovery , 2009, Bioinform..

[23]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[24]  João Ricardo Sato,et al.  Modeling gene expression regulatory networks with the sparse vector autoregressive model , 2007, BMC Systems Biology.