Learning directed acyclic graphical structures with genetical genomics data

MOTIVATION Large amount of research efforts have been focused on estimating gene networks based on gene expression data to understand the functional basis of a living organism. Such networks are often obtained by considering pairwise correlations between genes, thus may not reflect the true connectivity between genes. By treating gene expressions as quantitative traits while considering genetic markers, genetical genomics analysis has shown its power in enhancing the understanding of gene regulations. Previous works have shown the improved performance on estimating the undirected network graphical structure by incorporating genetic markers as covariates. Knowing that gene expressions are often due to directed regulations, it is more meaningful to estimate the directed graphical network. RESULTS In this article, we introduce a covariate-adjusted Gaussian graphical model to estimate the Markov equivalence class of the directed acyclic graphs (DAGs) in a genetical genomics analysis framework. We develop a two-stage estimation procedure to first estimate the regression coefficient matrix by [Formula: see text] penalization. The estimated coefficient matrix is then used to estimate the mean values in our multi-response Gaussian model to estimate the regulatory networks of gene expressions using PC-algorithm. The estimation consistency for high dimensional sparse DAGs is established. Simulations are conducted to demonstrate our theoretical results. The method is applied to a human Alzheimer's disease dataset in which differential DAGs are identified between cases and controls. R code for implementing the method can be downloaded at http://www.stt.msu.edu/∼cui. AVAILABILITY AND IMPLEMENTATION R code for implementing the method is freely available at http://www.stt.msu.edu/∼cui/software.html.

[1]  D. Madigan,et al.  A characterization of Markov equivalence classes for acyclic digraphs , 1997 .

[2]  Yufeng Liu,et al.  Simultaneous multiple response regression and inverse covariance matrix estimation via penalized Gaussian maximum likelihood , 2012, J. Multivar. Anal..

[3]  R. Stoughton,et al.  Genetics of gene expression surveyed in maize, mouse and man , 2003, Nature.

[4]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[5]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[6]  Hongzhe Li,et al.  Adjusting for high-dimensional covariates in sparse precision matrix estimation by ℓ1-penalization , 2013, J. Multivar. Anal..

[7]  P. Spirtes,et al.  MARKOV EQUIVALENCE FOR ANCESTRAL GRAPHS , 2009, 0908.3605.

[8]  Peter Bühlmann,et al.  Causal Inference Using Graphical Models with the R Package pcalg , 2012 .

[9]  Hongzhe Li,et al.  Covariate-Adjusted Precision Matrix Estimation with an Application in Genetical Genomics. , 2013, Biometrika.

[10]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[11]  P. Spirtes,et al.  Ancestral graph Markov models , 2002 .

[12]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[13]  Anthony C. Davison,et al.  High-Dimensional Bayesian Clustering with Variable Selection: The R Package bclust , 2012 .

[14]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[15]  Vivian G. Cheung,et al.  The genetics of variation in gene expression , 2002, Nature Genetics.

[16]  Judea Pearl,et al.  An Algorithm for Deciding if a Set of Observed Independencies Has a Causal Explanation , 1992, UAI.

[17]  Hongzhe Li,et al.  A SPARSE CONDITIONAL GAUSSIAN GRAPHICAL MODEL FOR ANALYSIS OF GENETICAL GENOMICS DATA. , 2011, The annals of applied statistics.

[18]  Larry A. Wasserman,et al.  The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs , 2009, J. Mach. Learn. Res..

[19]  D. Stephan,et al.  Genetic control of human brain transcript expression in Alzheimer disease. , 2009, American journal of human genetics.

[20]  Larry A. Wasserman,et al.  High Dimensional Semiparametric Gaussian Copula Graphical Models. , 2012, ICML 2012.

[21]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[22]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[23]  M. Frydenberg The chain graph Markov property , 1990 .

[24]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..