An Equivalent Measure of Partial Correlation Coefficients for High-Dimensional Gaussian Graphical Models

Gaussian graphical models (GGMs) are frequently used to explore networks, such as gene regulatory networks, among a set of variables. Under the classical theory of GGMs, the construction of Gaussian graphical networks amounts to finding the pairs of variables with nonzero partial correlation coefficients. However, this is infeasible for high-dimensional problems for which the number of variables is larger than the sample size. In this article, we propose a new measure of partial correlation coefficient, which is evaluated with a reduced conditional set and thus feasible for high-dimensional problems. Under the Markov property and adjacency faithfulness conditions, the new measure of partial correlation coefficient is equivalent to the true partial correlation coefficient in construction of Gaussian graphical networks. Based on the new measure of partial correlation coefficient, we propose a multiple hypothesis test-based method for the construction of Gaussian graphical networks. Furthermore, we establish the consistency of the proposed method under mild conditions. The proposed method outperforms the existing methods, such as the PC, graphical Lasso, nodewise regression, and qp-average methods, especially for the problems for which a large number of indirect associations are present. The proposed method has a computational complexity of nearly O(p2), and is flexible in data integration, network comparison, and covariate adjustment.

[1]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[2]  Peter Langfelder,et al.  When Is Hub Gene Selection Better than Standard Meta-Analysis? , 2013, PloS one.

[3]  Larry A. Wasserman,et al.  The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs , 2009, J. Mach. Learn. Res..

[4]  Larry A. Wasserman,et al.  The huge Package for High-dimensional Undirected Graph Estimation in R , 2012, J. Mach. Learn. Res..

[5]  C. O. A. D. P. R. M. A. E. Stimation Covariate Adjusted Precision Matrix Estimation with an Application in Genetical Genomics , 2011 .

[6]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[7]  J. Friedman,et al.  New Insights and Faster Computations for the Graphical Lasso , 2011 .

[8]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[9]  R. Lyman Ott.,et al.  An introduction to statistical methods and data analysis , 1977 .

[10]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[12]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[13]  Hongzhe Li,et al.  A SPARSE CONDITIONAL GAUSSIAN GRAPHICAL MODEL FOR ANALYSIS OF GENETICAL GENOMICS DATA. , 2011, The annals of applied statistics.

[14]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[15]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[16]  Robert Castelo,et al.  Reverse Engineering Molecular Regulatory Networks from Microarray Data with qp-Graphs , 2009, J. Comput. Biol..

[17]  Wei Zhang,et al.  Multifunctional roles of insulin-like growth factor binding protein 5 in breast cancer , 2008, Breast Cancer Research.

[18]  T. Cai,et al.  Estimating the Null and the Proportion of Nonnull Effects in Large-Scale Multiple Comparisons , 2006, math/0611108.

[19]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[20]  Jiji Zhang,et al.  Adjacency-Faithfulness and Conservative Causal Inference , 2006, UAI.

[21]  B. Lee,et al.  Clinical Implication of Delirium Subtype , 2009 .

[22]  John D. Storey A direct approach to false discovery rates , 2002 .

[23]  J. Kost,et al.  Combining dependent P-values , 2002 .

[24]  Hajo Holzmann,et al.  Identifiability of Finite Mixtures of Elliptical Distributions , 2006 .

[25]  Shaun Lysen,et al.  Permuted Inclusion Criterion: A Variable Selection Technique , 2009 .

[26]  M. Maathuis,et al.  Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm , 2009, 0906.3204.

[27]  J. Schroeder,et al.  Understanding the Dual Nature of CD44 in Breast Cancer Progression , 2011, Molecular Cancer Research.

[28]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[29]  Alfred O. Hero,et al.  Hub Discovery in Partial Correlation Graphs , 2012, IEEE Transactions on Information Theory.

[30]  S. Stouffer Adjustment during army life , 1977 .

[31]  A. Hero,et al.  Large-Scale Correlation Screening , 2011, 1102.1204.

[32]  Robert Castelo,et al.  A Robust Procedure For Gaussian Graphical Model Search From Microarray Data With p Larger Than n , 2006, J. Mach. Learn. Res..

[33]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data: Methods and Models , 2009 .

[34]  K. Isselbacher,et al.  Genetic susceptibility to breast cancer: HLA DQB*03032 and HLA DRB1*11 may represent protective alleles. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Zoubin Ghahramani,et al.  Modeling T-cell activation using gene expression profiling and state-space models , 2004, Bioinform..

[36]  F. Liang,et al.  Convergence of stochastic approximation algorithms under irregular conditions , 2008 .

[37]  Paul M. Magwene,et al.  Estimating genomic coexpression networks using first-order conditional independence , 2004, Genome Biology.

[38]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Maximum Likelihood Estimation , 2007, ArXiv.

[39]  Trevor J. Hastie,et al.  The Graphical Lasso: New Insights and Alternatives , 2011, Electronic journal of statistics.

[40]  Jiji Zhang,et al.  Detection of Unfaithfulness and Robust Causal Inference , 2008, Minds and Machines.

[41]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[42]  Art B. Owen,et al.  Karl Pearson’s meta analysis revisited , 2009, 0911.3531.

[43]  Jan Lemeire,et al.  Conservative independence-based causal structure learning in absence of adjacency faithfulness , 2012, Int. J. Approx. Reason..

[44]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data , 2009 .

[45]  F. Liang,et al.  Estimating the false discovery rate using the stochastic approximation algorithm , 2008 .

[46]  P. Bühlmann,et al.  Statistical Applications in Genetics and Molecular Biology Low-Order Conditional Independence Graphs for Inferring Genetic Networks , 2011 .

[47]  Christopher Meek,et al.  Strong completeness and faithfulness in Bayesian networks , 1995, UAI.

[48]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[49]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[50]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[51]  S. Natsugoe,et al.  Clinical implication of HLA class I expression in breast cancer , 2011, BMC Cancer.

[52]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[53]  Nicolai Meinshausen,et al.  Relaxed Lasso , 2007, Comput. Stat. Data Anal..

[54]  Montserrat Garcia-Closas,et al.  Genetic susceptibility to breast cancer , 2010, Molecular oncology.

[55]  Y. Benjamini,et al.  Adaptive linear step-up procedures that control the false discovery rate , 2006 .

[56]  Shikai Luo,et al.  Sure Screening for Gaussian Graphical Models , 2014, ArXiv.

[57]  Patrick Danaher,et al.  The joint graphical lasso for inverse covariance estimation across multiple classes , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[58]  Hongzhe Li,et al.  Covariate-Adjusted Precision Matrix Estimation with an Application in Genetical Genomics. , 2013, Biometrika.