Nested q-Partial Graphs for Genetic Network Inference from "Small n, Large p" Microarray Data

Gaussian graphical models are widely used to tackle the important and challenging problem of inferring genetic regulatory networks from expression data. These models have gained much attention as they encode full conditional relationships between variables, i.e. genes. Unfortunately, microarray data are characterized by a low number of samples compared to the number of genes. Hence, classical approaches to estimate the full joint distribution cannot be applied. Recently, limited-order partial correlation approaches have been proposed to circumvent this problem. It has been shown both theoretically and experimentally that such graphs provide accurate approximations of the full conditional independence structure between the variables thanks to the sparsity of genetic networks. Alas, computing limited-order partial correlation coefficients for large networks, even for small order values, is computationally expensive, and often even intractable. Moreover, problems deriving from multiple statistical testing arise, and one should expect that most of the edges are removed. We propose a procedure to tackle both problems by reducing the dimensionality of the inference tasks. By adopting a screening procedure, we iteratively build nested graphs by discarding the less relevant edges. Moreover, by conditioning only on relevant variables, we diminish the problems related to multiple testing. This procedure allows us to faster infer limited-order partial correlation graphs and to consider higher order values, increasing the accuracy of the inferred graph. The effectiveness of the proposed procedure is demonstrated on simulated data.

[1]  Reinhard Diestel,et al.  Graph Theory , 1997 .

[2]  Kalpathi R. Subramanian,et al.  Interactive Analysis of Gene Interactions Using Graphical gaussian model , 2003, BIOKDD.

[3]  D. Edwards Introduction to graphical modelling , 1995 .

[4]  Paul M. Magwene,et al.  Estimating genomic coexpression networks using first-order conditional independence , 2004, Genome Biology.

[5]  K. Brown,et al.  Graduate Texts in Mathematics , 1982 .

[6]  Hiroyuki Toh,et al.  Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling , 2002, Bioinform..

[7]  P. Bühlmann,et al.  Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana , 2004, Genome Biology.

[8]  A. Butte,et al.  Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[9]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[10]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[11]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[12]  P. Waddell,et al.  Cluster inference methods and graphical models evaluated on NCI60 microarray gene expression data. , 2000, Genome informatics. Workshop on Genome Informatics.

[13]  Hongzhe Li,et al.  Gradient directed regularization for sparse Gaussian concentration graphs, with applications to inference of genetic networks. , 2006, Biostatistics.

[14]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[15]  R. Dykstra Establishing the Positive Definiteness of the Sample Covariance Matrix , 1970 .

[16]  Robert Castelo,et al.  A Robust Procedure For Gaussian Graphical Model Search From Microarray Data With p Larger Than n , 2006, J. Mach. Learn. Res..

[17]  J. N. R. Jeffers,et al.  Graphical Models in Applied Multivariate Statistics. , 1990 .

[18]  Korbinian Strimmer,et al.  From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data , 2007, BMC Systems Biology.

[19]  P. Bühlmann,et al.  Statistical Applications in Genetics and Molecular Biology Low-Order Conditional Independence Graphs for Inferring Genetic Networks , 2011 .

[20]  M. Reinders,et al.  Genetic network modeling. , 2002, Pharmacogenomics.

[21]  Korbinian Strimmer,et al.  Learning Large‐Scale Graphical Gaussian Models from Genomic Data , 2005 .

[22]  H Kishino,et al.  Correspondence analysis of genes and tissue types and finding genetic links from microarray data. , 2000, Genome informatics. Workshop on Genome Informatics.

[23]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[24]  H Toh,et al.  System for Automatically Inferring a Genetic Netwerk from Expression Profiles , 2002, Journal of biological physics.

[25]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[26]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[27]  Michel Mouchart,et al.  Discussion on "Conditional independence in statistitical theory" by A.P. Dawid , 1979 .

[28]  A. Dawid Conditional Independence in Statistical Theory , 1979 .

[29]  Alexandre d'Aspremont,et al.  First-Order Methods for Sparse Covariance Selection , 2006, SIAM J. Matrix Anal. Appl..

[30]  Alberto de la Fuente,et al.  Discovery of meaningful associations in genomic data using partial correlation coefficients , 2004, Bioinform..

[31]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .