Network granger causality with inherent grouping structure

The problem of estimating high-dimensional network models arises naturally in the analysis of many biological and socio-economic systems. In this work, we aim to learn a network structure from temporal panel data, employing the framework of Granger causal models under the assumptions of sparsity of its edges and inherent grouping structure among its nodes. To that end, we introduce a group lasso regression regularization framework, and also examine a thresholded variant to address the issue of group misspecification. Further, the norm consistency and variable selection consistency of the estimates are established, the latter under the novel concept of direction consistency. The performance of the proposed methodology is assessed through an extensive set of simulation studies and comparisons with existing techniques. The study is illustrated on two motivating examples coming from functional genomics and financial econometrics.

[1]  Ali Shojaie,et al.  Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. , 2009, Biometrika.

[2]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[3]  D. B. Preston Spectral Analysis and Time Series , 1983 .

[4]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[5]  Shuheng Zhou Thresholded Lasso for high dimensional variable selection and statistical estimation , 2010, 1002.1583.

[6]  Jian Huang,et al.  Penalized methods for bi-level variable selection. , 2009, Statistics and its interface.

[7]  Craig Hiemstra,et al.  Testing for Linear and Nonlinear Granger Causality in the Stock Price-Volume Relation , 1994 .

[8]  C. Sims Money, Income, and Causality , 1972 .

[9]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[10]  Naoki Abe,et al.  Grouped graphical Granger modeling for gene expression regulatory networks discovery , 2009, Bioinform..

[11]  S. Parter Extreme eigenvalues of Toeplitz forms and applications to elliptic difference equations , 1961 .

[12]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[13]  M. Pesaran,et al.  ESTIMATION AND INFERENCE IN SHORT PANEL VECTOR AUTOREGRESSIONS WITH UNIT ROOTS AND COINTEGRATION , 2000, Econometric Theory.

[14]  João Ricardo Sato,et al.  Modeling gene expression regulatory networks with the sparse vector autoregressive model , 2007, BMC Systems Biology.

[15]  Jian Huang,et al.  Consistent group selection in high-dimensional linear regression. , 2010, Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability.

[16]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[17]  Martin J. Wainwright,et al.  Restricted Eigenvalue Properties for Correlated Gaussian Designs , 2010, J. Mach. Learn. Res..

[18]  S. Geer,et al.  The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso) , 2011 .

[19]  Seoung-Hoon Lee,et al.  Nuclear Factor of Activated T Cells c1 Induces Osteoclast-associated Receptor Gene Expression during Tumor Necrosis Factor-related Activation-induced Cytokine-mediated Osteoclastogenesis* , 2005, Journal of Biological Chemistry.

[20]  George Michailidis,et al.  Statistical Challenges in Biological Networks , 2012 .

[21]  Cun-Hui Zhang,et al.  A group bridge approach for variable selection , 2009, Biometrika.

[22]  Junzhou Huang,et al.  The Benefit of Group Sparsity , 2009 .

[23]  F. A. Hayek The American Economic Review , 2007 .

[24]  Helmut Ltkepohl,et al.  New Introduction to Multiple Time Series Analysis , 2007 .

[25]  M. Talagrand,et al.  Probability in Banach spaces , 1991 .

[26]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[27]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[28]  Ali Shojaie,et al.  Adaptive Thresholding for Reconstructing Regulatory Networks from Time-Course Gene Expression Data , 2012 .

[29]  Yixiao Sun,et al.  Asymptotic Distributions of Impulse Response Functions in Short Panel Vector Autoregressions , 2009 .

[30]  K. Jones,et al.  Purification of TCF-1 alpha, a T-cell-specific transcription factor that activates the T-cell receptor C alpha gene enhancer in a context-dependent manner. , 1990, The New biologist.

[31]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[32]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[33]  S. Geer,et al.  Oracle Inequalities and Optimal Inference under Group Sparsity , 2010, 1007.1771.

[34]  Ali Shojaie,et al.  Discovering graphical Granger causality using the truncating lasso penalty , 2010, Bioinform..

[35]  Zoubin Ghahramani,et al.  Modeling T-cell activation using gene expression profiling and state-space models , 2004, Bioinform..