A Robust Procedure For Gaussian Graphical Model Search From Microarray Data With p Larger Than n

Learning of large-scale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a consequence, the prime objects of inference are full-order partial correlations which are partial correlations between two variables given the remaining ones. In the context of microarray data the number of variables exceed the sample size and this precludes the application of traditional structure learning procedures because a sampling version of full-order partial correlations does not exist. In this paper we consider limited-order partial correlations, these are partial correlations computed on marginal distributions of manageable size, and provide a set of rules that allow one to assess the usefulness of these quantities to derive the independence structure of the underlying Gaussian graphical model. Furthermore, we introduce a novel structure learning procedure based on a quantity, obtained from limited-order partial correlations, that we call the non-rejection rate. The applicability and usefulness of the procedure are demonstrated by both simulated and real data.

[1]  Korbinian Strimmer,et al.  Learning Large‐Scale Graphical Gaussian Models from Genomic Data , 2005 .

[2]  Michael A. West,et al.  Archival Version including Appendicies : Experiments in Stochastic Computation for High-Dimensional Graphical Models , 2005 .

[3]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[4]  P. Bühlmann,et al.  Statistical Applications in Genetics and Molecular Biology Low-Order Conditional Independence Graphs for Inferring Genetic Networks , 2011 .

[5]  Arnold L. Rosenberg,et al.  Graph Separators, with Applications , 2001, Frontiers of Computer Science.

[6]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[7]  Mathias Drton,et al.  Iterative Conditional Fitting for Estimation of a Covariance Matrix with Zeros ∗ , 2004 .

[8]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[9]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[10]  R. Kohn,et al.  Efficient estimation of covariance selection models , 2003 .

[11]  A. Roverato Hyper Inverse Wishart Distribution for Non-decomposable Graphs and its Application to Bayesian Inference for Gaussian Graphical Models , 2002 .

[12]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[13]  A. Butte,et al.  Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Steffen L. Lauritzen,et al.  Graphical models in R , 1996 .

[15]  Nanny Wermuth,et al.  Multivariate Dependencies: Models, Analysis and Interpretation , 1996 .

[16]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[17]  Douglas M. Hawkins,et al.  Elements of Continuous Multivariate Analysis , 1969 .

[18]  Michael A. West,et al.  Covariance decomposition in undirected Gaussian graphical models , 2005 .

[19]  Alberto de la Fuente,et al.  Discovery of meaningful associations in genomic data using partial correlation coefficients , 2004, Bioinform..

[20]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  D. Edwards Introduction to graphical modelling , 1995 .

[22]  Michael I. Jordan,et al.  Probabilistic Networks and Expert Systems , 1999 .

[23]  Jürgen Kurths,et al.  Observing and Interpreting Correlations in Metabolic Networks , 2003, Bioinform..

[24]  M. Drton,et al.  Model selection for Gaussian concentration graphs , 2004 .

[25]  O. Fiehn,et al.  Interpreting correlations in metabolomic networks. , 2003, Biochemical Society transactions.

[26]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[27]  P. Bühlmann,et al.  Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana , 2004, Genome Biology.

[28]  R. Dykstra Establishing the Positive Definiteness of the Sample Covariance Matrix , 1970 .

[29]  Ernst Wit,et al.  Statistics for Microarrays : Design, Analysis and Inference , 2004 .

[30]  R. Milo,et al.  Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[31]  J. Berger,et al.  Estimation of a Covariance Matrix Using the Reference Prior , 1994 .

[32]  N. Wermuth,et al.  Linear Dependencies Represented by Chain Graphs , 1993 .

[33]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[34]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[35]  Paul M. Magwene,et al.  Estimating genomic coexpression networks using first-order conditional independence , 2004, Genome Biology.

[36]  Reinhard Diestel,et al.  Graph Theory , 1997 .