Addressing confounding artifacts in reconstruction of gene co-expression networks

Gene co-expression networks can capture biological relationships between genes, and are important tools in predicting gene function and understanding disease mechanism. We show that artifacts such as batch effects in gene expression data confound commonly used network reconstruction algorithms. We then demonstrate, both theoretically and empirically, that principal component correction of gene expression measurements prior to network inference can reduce false discoveries. Using expression data from the GTEx project in multiple tissues and hundreds of individuals, this approach improves precision and recall in the networks reconstructed.

[1]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[2]  Cohen,et al.  Resilience of the internet to random breakdowns , 2000, Physical review letters.

[3]  Andrew E. Jaffe,et al.  Bioinformatics Applications Note Gene Expression the Sva Package for Removing Batch Effects and Other Unwanted Variation in High-throughput Experiments , 2022 .

[4]  A. Buja,et al.  Remarks on Parallel Analysis. , 1992, Multivariate behavioral research.

[5]  R. Irizarry,et al.  Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation , 2015, Nature Biotechnology.

[6]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[7]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[8]  B. Snel,et al.  The yeast coexpression network has a small‐world, scale‐free architecture and can be explained by a simple model , 2004, EMBO reports.

[9]  Pradeep Ravikumar,et al.  QUIC: quadratic approximation for sparse inverse covariance estimation , 2014, J. Mach. Learn. Res..

[10]  S. Horvath,et al.  Gene connectivity, function, and sequence conservation: predictions from modular yeast co-expression networks , 2006, BMC Genomics.

[11]  Joshua M. Stuart,et al.  A Gene Expression Map for Caenorhabditis elegans , 2001, Science.

[12]  Chuan Gao,et al.  Context Specific and Differential Gene Co-expression Networks via Bayesian Biclustering , 2016, PLoS Comput. Biol..

[13]  Avi Ma'ayan,et al.  ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments , 2010, Bioinform..

[14]  Duc A. Tran,et al.  Fitness-Based Generative Models for Power-Law Networks , 2012 .

[15]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[16]  Jeffrey T Leek,et al.  On the design and analysis of gene expression studies in human populations , 2007, Nature Genetics.

[17]  David Heckerman,et al.  Correction for hidden confounders in the genetic analysis of gene expression , 2010, Proceedings of the National Academy of Sciences.

[18]  Avi Ma'ayan,et al.  Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool , 2013, BMC Bioinformatics.

[19]  Chunyu Liu,et al.  Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods , 2011, PloS one.

[20]  Leng Han,et al.  Gene co-expression network analysis reveals common system-level properties of prognostic genes across cancer types , 2014, Nature Communications.

[21]  S. Horvath,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .

[22]  Terence P. Speed,et al.  Systematic noise degrades gene co-expression signals but can be corrected , 2015, BMC Bioinformatics.

[23]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[24]  Rob Patro,et al.  Salmon provides fast and bias-aware quantification of transcript expression , 2017, Nature Methods.

[25]  Jeffrey T Leek,et al.  qSVA framework for RNA quality correction in differential expression analysis , 2017, Proceedings of the National Academy of Sciences of the United States of America.

[26]  P. Mazière,et al.  Impact of RNA degradation on gene expression profiles: assessment of different methods to reliably determine RNA quality. , 2007, Journal of biotechnology.

[27]  Helga Thorvaldsdóttir,et al.  Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..

[28]  S. Liebhaber mRNA stability and the control of gene expression. , 2007, Nucleic acids symposium series.

[29]  Andrew D. Rouillard,et al.  Enrichr: a comprehensive gene set enrichment analysis web server 2016 update , 2016, Nucleic Acids Res..

[30]  Alexis Battle,et al.  Co-expression networks reveal the tissue-specific regulation of transcription and splicing , 2019 .

[31]  Panos M. Pardalos,et al.  Handbook of Optimization in Complex Networks , 2012 .

[32]  Nicola J. Rinaldi,et al.  Genetic effects on gene expression across human tissues , 2017, Nature.

[33]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[34]  Leopold Parts,et al.  A Bayesian Framework to Account for Complex Non-Genetic Factors in Gene Expression Levels Greatly Increases Power in eQTL Studies , 2010, PLoS Comput. Biol..

[35]  R. Spielman,et al.  Reply to “On the design and analysis of gene expression studies in human populations” , 2007, Nature Genetics.

[36]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[37]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[38]  Y. Gilad,et al.  RNA-seq: impact of RNA degradation on transcript quantification , 2014, BMC Biology.

[39]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[40]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[41]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[42]  M. Stephens,et al.  A Statistical Framework for Joint eQTL Analysis in Multiple Tissues , 2012, PLoS genetics.

[43]  S. Brunak,et al.  Network biology concepts in complex disease comorbidities , 2016, Nature Reviews Genetics.

[44]  L. Furlong Human diseases through the lens of network biology. , 2013, Trends in genetics : TIG.

[45]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.