Condition-adaptive fused graphical lasso (CFGL): an adaptive procedure for inferring condition-specific gene co-expression network

Co-expression network analysis provides useful information for studying gene regulation in biological processes. Examining condition-specific patterns of co-expression can provide insights into the underlying cellular processes activated in a particular condition. One challenge in this type of analysis is that the sample sizes in each condition are usually small, making the statistical inference of co-expression patterns highly underpowered. A joint network construction that borrows information from related structures across conditions has the potential to improve the power of the analysis. One possible approach to constructing the co-expression network is to use the Gaussian graphical model. Though several methods are available for joint estimation of multiple graphical models, they do not fully account for the heterogeneity between samples and between co-expression patterns introduced by condition specificity. Here we develop the condition-adaptive fused graphical lasso (CFGL), a data-driven approach to incorporate condition specificity in the estimation of co-expression networks. We show that this method improves the accuracy with which networks are learned. The application of this method on a rat multi-tissue dataset and The Cancer Genome Atlas (TCGA) breast cancer dataset provides interesting biological insights. In both analyses, we identify numerous modules enriched for Gene Ontology functions and observe that the modules that are upregulated in a particular condition are often involved in condition-specific activities. Interestingly, we observe that the genes strongly associated with survival time in the TCGA dataset are less likely to be network hubs, suggesting that genes associated with cancer progression are likely to govern specific functions, rather than regulating a large number of biological processes. Additionally, we observed that the tumor-specific hub genes tend to have few shared edges with normal tissue, revealing tumor-specific regulatory mechanism. Author summary Gene co-expression networks provide insights into the mechanism of cellular activity and gene regulation. Condition-specific mechanisms may be identified by constructing and comparing co-expression networks of multiple conditions. We propose a novel statistical method to jointly construct co-expression networks for gene expression profiles from multiple conditions. By using a data-driven approach to capture condition-specific co-expression patterns, this method is effective in identifying both co-expression patterns that are specific to a condition and that are common across conditions. The application of this method on real datasets reveals interesting biological insights.

[1]  Leng Han,et al.  Gene co-expression network analysis reveals common system-level properties of prognostic genes across cancer types , 2014, Nature Communications.

[2]  G. Koob,et al.  The sequenced rat brain transcriptome – its use in identifying networks predisposing alcohol consumption , 2015, The FEBS journal.

[3]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[4]  V. Křen,et al.  Invited Review: HXB/BXH rat recombinant inbred strain platform: a newly enhanced tool for cardiovascular, behavioral, and developmental genetics and genomics , 2003 .

[5]  S. Ashley,et al.  CEACAM6 Is a Novel Biomarker in Pancreatic Adenocarcinoma and PanIN Lesions , 2005, Annals of surgery.

[6]  H. Zou,et al.  Regularized rank-based estimation of high-dimensional nonparanormal graphical models , 2012, 1302.3082.

[7]  Yiannis Kourmpetis,et al.  Gene Regulatory Networks from Multifactorial Perturbations Using Graphical Lasso: Application to the DREAM4 Challenge , 2010, PloS one.

[8]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[9]  Kathryn Roeder,et al.  NETWORK ASSISTED ANALYSIS TO REVEAL THE GENETIC BASIS OF AUTISM. , 2015, The annals of applied statistics.

[10]  P. Neven,et al.  Expression of the BRCA1-interacting protein Brip1/BACH1/FANCJ is driven by E2F and correlates with human breast cancer malignancy , 2008, Oncogene.

[11]  Dawei Xu,et al.  NPY1R is a novel peripheral blood marker predictive of metastasis and prognosis in breast cancer patients , 2014, Oncology letters.

[12]  Jason A. Papin,et al.  Integration of expression data in genome-scale metabolic network reconstructions , 2012, Front. Physio..

[13]  BY YIN XIA,et al.  Testing differential networks with applications to the detection of gene-gene interactions , 2015 .

[14]  Benjamin A. Logsdon,et al.  Gene Expression Network Reconstruction by Convex Feature Selection when Incorporating Genetic Perturbations , 2010, PLoS Comput. Biol..

[15]  R. Campanini,et al.  Breast cancer metastases are molecularly distinct from their primary tumors , 2008, Oncogene.

[16]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[17]  Xiaotong Shen,et al.  Structural Pursuit Over Multiple Undirected Graphs , 2014, Journal of the American Statistical Association.

[18]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[19]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[20]  K. Sekine,et al.  Inhibitory effects of bovine lactoferrin on intestinal polyposis in the Apc(Min) mouse. , 1998, Cancer letters.

[21]  Tianxi Cai,et al.  Testing Differential Networks with Applications to Detecting Gene-by-Gene Interactions. , 2015, Biometrika.

[22]  Shiqian Ma,et al.  Alternating Direction Methods for Latent Variable Gaussian Graphical Model Selection , 2012, Neural Computation.

[23]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[24]  Wenjun Guo,et al.  Mammary-Stem-Cell-Based Somatic Mouse Models Reveal Breast Cancer Drivers Causing Cell Fate Dysregulation. , 2016, Cell reports.

[25]  Rajen Dinesh Shah,et al.  Variable selection with error control: another look at stability selection , 2011, 1105.5578.

[26]  Kenji Nakamura,et al.  Cardiac fibrosis in mice lacking brain natriuretic peptide. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[27]  S. Cantor,et al.  Hereditary breast cancer and the BRCA1-associated FANCJ/BACH1/BRIP1. , 2011, Future oncology.

[28]  Takumi Saegusa,et al.  Joint Estimation of Precision Matrices in Heterogeneous Populations. , 2016, Electronic journal of statistics.

[29]  Eric E Schadt,et al.  Multi-tissue coexpression networks reveal unexpected subnetworks associated with disease. , 2009 .

[30]  Pei Wang,et al.  Partial Correlation Estimation by Joint Sparse Regression Models , 2008, Journal of the American Statistical Association.

[31]  A. Dolphin,et al.  Human neuronal stargazin-like proteins, γ2, γ3 and γ4; an investigation of their specific localization in human brain and their influence on CaV2.1 voltage-dependent calcium channels expressed in Xenopus oocytes. , 2003, BMC Neuroscience.

[32]  Jing Chen,et al.  ToppGene Suite for gene list enrichment analysis and candidate gene prioritization , 2009, Nucleic Acids Res..

[33]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[34]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[35]  Patrick Danaher,et al.  The joint graphical lasso for inverse covariance estimation across multiple classes , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[36]  P. McKeigue,et al.  Linkage and association analysis of CACNG3 in childhood absence epilepsy , 2007, European Journal of Human Genetics.

[37]  Michael I. Jordan Graphical Models , 2003 .

[38]  Daphne Koller,et al.  Sharing and Specificity of Co-expression Networks across 35 Human Tissues , 2014, PLoS Comput. Biol..

[39]  Eric E Schadt,et al.  Cycle Regulation in Islets with Diabetes Susceptibility a Gene Expression Network Model of Type 2 Diabetes Links Cell P

, 2008 .

[40]  Lingzhou Xue,et al.  Nonparametric Finite Mixture of Gaussian Graphical Models , 2018, Technometrics.

[41]  Robert W. Williams,et al.  Using the Phenogen website for ‘in silico’ analysis of morphine‐induced analgesia: identifying candidate genes , 2011, Addiction biology.

[42]  S. Kung,et al.  Transcriptional Network Analysis Identifies BACH1 as a Master Regulator of Breast Cancer Bone Metastasis , 2012, The Journal of Biological Chemistry.

[43]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[44]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[45]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[46]  D. Goldenberg,et al.  Inhibition of adhesion, invasion, and metastasis by antibodies targeting CEACAM6 (NCA-90) and CEACAM5 (Carcinoembryonic Antigen). , 2005, Cancer research.

[47]  K. Gardner,et al.  Cyclin T1 overexpression induces malignant transformation and tumor growth , 2010, Cell cycle.

[48]  T. Nikolskaya,et al.  A comprehensive functional analysis of tissue specificity of human gene expression , 2008, BMC Biology.

[49]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[50]  Jianqing Fan,et al.  NETWORK EXPLORATION VIA THE ADAPTIVE LASSO AND SCAD PENALTIES. , 2009, The annals of applied statistics.

[51]  Peng Zhang,et al.  ZKSCAN1 gene and its related circular RNA (circZKSCAN1) both inhibit hepatocellular carcinoma cell growth, migration, and invasion but through different signaling pathways , 2017, Molecular oncology.

[52]  Carsten O. Peterson,et al.  Estrogen receptor status in breast cancer is associated with remarkably distinct gene expression patterns. , 2001, Cancer research.

[53]  Enrico Petretto,et al.  Multi-tissue Analysis of Co-expression Networks by Higher-Order Generalized Singular Value Decomposition Identifies Functionally Coherent Transcriptional Modules , 2014, PLoS genetics.

[54]  E. Levina,et al.  Joint estimation of multiple graphical models. , 2011, Biometrika.

[55]  M. Schwab,et al.  Developmental expression of the myelin gene MOBP in the rat nervous system , 1997, Journal of neurocytology.

[56]  Y. Qu,et al.  CEACAM6 promotes tumor migration, invasion, and metastasis in gastric cancer. , 2014, Acta biochimica et biophysica Sinica.

[57]  Sung Won Han,et al.  Gene Regulatory Network Analysis for Triple-Negative Breast Neoplasms by Using Gene Expression Data , 2017, Journal of breast cancer.

[58]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[59]  X. Hua,et al.  Exome sequencing identifies MXRA5 as a novel cancer gene frequently mutated in non-small cell lung carcinoma from Chinese patients. , 2012, Carcinogenesis.

[60]  Yong Li,et al.  Silencing of ZNF139-siRNA induces apoptosis in human gastric cancer cell line BGC823. , 2015, International journal of clinical and experimental pathology.

[61]  J. Lin,et al.  The Xin repeat-containing protein, mXinβ, initiates the maturation of the intercalated discs during postnatal heart development. , 2013, Developmental biology.

[62]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[63]  Jing Ma,et al.  Joint Structural Estimation of Multiple Graphical Models , 2016, J. Mach. Learn. Res..

[64]  Kengo Kinoshita,et al.  COXPRESdb in 2015: coexpression database for animal species by DNA-microarray and RNAseq-based expression data with multiple quality assessment systems , 2014, Nucleic Acids Res..

[65]  T. Yada,et al.  Arcuate NPY neurons sense and integrate peripheral metabolic signals to control feeding , 2012, Neuropeptides.

[66]  P. Furmanski,et al.  Human lactoferrin inhibits growth of solid tumors and development of experimental metastases in mice. , 1994, Cancer research.

[67]  T. Cai,et al.  Direct estimation of differential networks. , 2014, Biometrika.

[68]  H. Zou,et al.  Nonconcave penalized composite conditional likelihood estimation of sparse Ising models , 2012, 1208.3555.

[69]  Haiyan Huang,et al.  Review on statistical methods for gene network reconstruction using expression data. , 2014, Journal of theoretical biology.

[70]  Helen E. Lockstone,et al.  Exon array data analysis using Affymetrix power tools and R statistical software , 2011, Briefings Bioinform..

[71]  Holger Hoefling A Path Algorithm for the Fused Lasso Signal Approximator , 2009, 0910.0526.

[72]  S. Horvath,et al.  Gene connectivity, function, and sequence conservation: predictions from modular yeast co-expression networks , 2006, BMC Genomics.

[73]  Haifeng Li,et al.  Integrative Analysis of Many Weighted Co-Expression Networks Using Tensor Computation , 2011, PLoS Comput. Biol..

[74]  Juan M. Vaquerizas,et al.  A census of human transcription factors: function, expression and evolution , 2009, Nature Reviews Genetics.

[75]  David N. Messina,et al.  An ORFeome-based analysis of human transcription factor genes and the construction of a microarray to interrogate their expression. , 2004, Genome research.

[76]  L. Yao,et al.  Identification of MXRA5 as a novel biomarker in colorectal cancer , 2012, Oncology letters.

[77]  Robert Tibshirani,et al.  Estimation of Sparse Binary Pairwise Markov Networks using Pseudo-likelihoods , 2009, J. Mach. Learn. Res..

[78]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.