A comparative study of covariance selection models for the inference of gene regulatory networks

MOTIVATION The inference, or 'reverse-engineering', of gene regulatory networks from expression data and the description of the complex dependency structures among genes are open issues in modern molecular biology. RESULTS In this paper we compared three regularized methods of covariance selection for the inference of gene regulatory networks, developed to circumvent the problems raising when the number of observations n is smaller than the number of genes p. The examined approaches provided three alternative estimates of the inverse covariance matrix: (a) the 'PINV' method is based on the Moore-Penrose pseudoinverse, (b) the 'RCM' method performs correlation between regression residuals and (c) 'ℓ(2C)' method maximizes a properly regularized log-likelihood function. Our extensive simulation studies showed that ℓ(2C) outperformed the other two methods having the most predictive partial correlation estimates and the highest values of sensitivity to infer conditional dependencies between genes even when a few number of observations was available. The application of this method for inferring gene networks of the isoprenoid biosynthesis pathways in Arabidopsis thaliana allowed to enlighten a negative partial correlation coefficient between the two hubs in the two isoprenoid pathways and, more importantly, provided an evidence of cross-talk between genes in the plastidial and the cytosolic pathways. When applied to gene expression data relative to a signature of HRAS oncogene in human cell cultures, the method revealed 9 genes (p-value<0.0005) directly interacting with HRAS, sharing the same Ras-responsive binding site for the transcription factor RREB1. This result suggests that the transcriptional activation of these genes is mediated by a common transcription factor downstream of Ras signaling. AVAILABILITY Software implementing the methods in the form of Matlab scripts are available at: http://users.ba.cnr.it/issia/iesina18/CovSelModelsCodes.zip.

[1]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[2]  B. M. Lange,et al.  Isoprenoid biosynthesis: the evolution of two ancient and distinct pathways across genomes. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[3]  R. Dykstra Establishing the Positive Definiteness of the Sample Covariance Matrix , 1970 .

[4]  M. Rodríguez-Concepcíon,et al.  Distinct Light-Mediated Pathways Regulate the Biosynthesis and Exchange of Isoprenoid Precursors during Arabidopsis Seedling Development Article, publication date, and citation information can be found at www.plantcell.org/cgi/doi/10.1105/tpc.016204. , 2004, The Plant Cell Online.

[5]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[6]  Robert P. W. Duin,et al.  Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix , 1998, Pattern Recognit. Lett..

[7]  Francisco A. Tomás-Barberán,et al.  Ecological chemistry and biochemistry of plant terpenoids , 1992 .

[8]  Jörg Schwender,et al.  Biosynthesis of isoprenoids in higher plant chloroplasts proceeds via a mevalonate‐independent pathway , 1997, FEBS letters.

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  A. Fürholz,et al.  Crosstalk between cytosolic and plastidial pathways of isoprenoid biosynthesis in Arabidopsis thaliana , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[12]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[13]  M Mabry,et al.  RREB-1, a novel zinc finger protein, is involved in the differentiation response to Ras in human medullary thyroid carcinomas , 1996, Molecular and cellular biology.

[14]  J. N. R. Jeffers,et al.  Graphical Models in Applied Multivariate Statistics. , 1990 .

[15]  Vindi Jurinovic,et al.  Biological feature validation of estimated gene interaction networks from microarray data: a case study on MYC in lymphomas , 2011, Briefings Bioinform..

[16]  R. Tibshirani,et al.  Covariance‐regularized regression and classification for high dimensional problems , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[17]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[18]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[19]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Graziano Pesole,et al.  Regularized Least Squares Cancer Classifiers from DNA microarray data , 2005, BMC Bioinformatics.

[21]  Jeffrey T. Chang,et al.  Oncogenic pathway signatures in human cancers as a guide to targeted therapies , 2006, Nature.

[22]  Robert Clarke,et al.  Differential dependency network analysis to identify condition-specific topological changes in biological networks , 2009, Bioinform..

[23]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[24]  P. Bühlmann,et al.  Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana , 2004, Genome Biology.

[25]  H Toh,et al.  System for Automatically Inferring a Genetic Netwerk from Expression Profiles , 2002, Journal of biological physics.

[26]  Jie Li,et al.  The architecture of the gene regulatory networks of different tissues , 2012, Bioinform..

[27]  A. Hemmerlin,et al.  Mevalonate-derived isopentenyl diphosphate is the biosynthetic precursor of ubiquinone prenyl side chain in tobacco BY-2 cells. , 1998, The Biochemical journal.

[28]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[29]  G. Wahba,et al.  A NOTE ON THE LASSO AND RELATED PROCEDURES IN MODEL SELECTION , 2006 .

[30]  B. M. Lange,et al.  Metabolic cross talk between cytosolic and plastidial pathways of isoprenoid biosynthesis: unidirectional transport of intermediates across the chloroplast envelope membrane. , 2003, Archives of biochemistry and biophysics.

[31]  Robert Tibshirani,et al.  A Permutation Approach to Testing Interactions in Many Dimensions , 2012 .

[32]  S. Dudoit,et al.  Joint Multiple Testing Procedures for Graphical Model Selection with Applications to Biological Networks , 2009 .

[33]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[34]  Shin-Young Park,et al.  RAFTK/Pyk2 mediates LPA-induced PC12 cell migration. , 2006, Cellular signalling.

[35]  Annarita D'Addabbo,et al.  Comparative study of gene set enrichment methods , 2009, BMC Bioinformatics.

[36]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[37]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[38]  Robert Castelo,et al.  A Robust Procedure For Gaussian Graphical Model Search From Microarray Data With p Larger Than n , 2006, J. Mach. Learn. Res..

[39]  Claudio Altafini,et al.  Comparing association network algorithms for reverse engineering of large-scale gene regulatory networks: synthetic versus real data , 2007, Bioinform..

[40]  Melissa J. Davis,et al.  Gene regulatory network inference: evaluation and application to ovarian cancer allows the prioritization of drug targets , 2012, Genome Medicine.

[41]  A. Fernández-Medarde,et al.  Ras in cancer and developmental diseases. , 2011, Genes & cancer.

[42]  A. Butte,et al.  Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[43]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[44]  Trevor Hastie,et al.  Applications of the lasso and grouped lasso to the estimation of sparse graphical models , 2010 .

[45]  Pei Wang,et al.  Partial Correlation Estimation by Joint Sparse Regression Models , 2008, Journal of the American Statistical Association.

[46]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[47]  Jeffrey T. Chang,et al.  GATHER: a systems approach to interpreting genomic signatures , 2006, Bioinform..

[48]  D. Edwards Introduction to graphical modelling , 1995 .