A resource for analyzing C. elegans’ gene expression data using transcriptional gene modules and module-weighted annotations

Identification of gene co-expression patterns (gene modules) is widely used for grouping functionally-related genes during transcriptomic data analysis. An organism-wide atlas of high quality fundamental gene modules would provide a powerful tool for unbiased detection of biological signals from gene expression data. Here, using a method of independent component analysis we call DEXICA, we have defined and optimized 209 modules that broadly represent transcriptional wiring of the key experimental organism C. elegans. Interrogation of these modules reveals processes that are activated in long-lived mutants in cases where traditional analyses of differentially-expressed genes fail to do so. Using this resource, users can easily identify active modules in their gene expression data and access detailed descriptions of each module. Additionally, we show that modules can inform the strength of the association between a gene and an annotation (e.g. GO term). Analysis of “module-weighted annotations” improves on several aspects of traditional annotation-enrichment tests and can aid in functional interpretation of poorly annotated genes. Interactive access to the resource is provided at http://genemodules.org/.

[1]  Dennis B. Troup,et al.  NCBI GEO: archive for functional genomics data sets—10 years on , 2010, Nucleic Acids Res..

[2]  Jörg Rahnenführer,et al.  Robert Gentleman, Vincent Carey, Wolfgang Huber, Rafael Irizarry, Sandrine Dudoit (2005): Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2009 .

[3]  Christopher H Wade,et al.  The budding yeast rRNA and ribosome biosynthesis (RRB) regulon contains over 200 genes , 2006, Yeast.

[4]  Seon-Young Kim,et al.  PAGE: Parametric Analysis of Gene Set Enrichment , 2005, BMC Bioinform..

[5]  P. Sternberg,et al.  Metazoan Operons Accelerate Recovery from Growth-Arrested States , 2011, Cell.

[6]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[7]  E. Purdom,et al.  Statistical Applications in Genetics and Molecular Biology Error Distribution for Gene Expression Data , 2011 .

[8]  L. Baugh To Grow or Not to Grow: Nutritional Control of Development During Caenorhabditis elegans L1 Arrest , 2013, Genetics.

[9]  Cole M. Haynes,et al.  Mitochondrial UPR-regulated innate immunity provides resistance to pathogen infection , 2014, Nature.

[10]  Stuart K. Kim,et al.  Roles of the HIF-1 Hypoxia-inducible Factor during Hypoxia Response in Caenorhabditis elegans* , 2005, Journal of Biological Chemistry.

[11]  Kimberly Van Auken,et al.  WormBase 2017: molting into a new stage , 2017, Nucleic Acids Res..

[12]  H. Bussemaker,et al.  Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Russ B. Altman,et al.  Independent component analysis: Mining microarray data for fundamental human gene expression modules , 2010, J. Biomed. Informatics.

[14]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[15]  Charles C. White,et al.  A molecular network of the aging human brain provides insights into the pathology and cognitive decline of Alzheimer’s disease , 2018, Nature Neuroscience.

[16]  Seung-Jae V. Lee,et al.  Direct and indirect gene regulation by a life-extending FOXO protein in C. elegans: roles for GATA factors and lipid gene regulators. , 2013, Cell metabolism.

[17]  Seung-Jae V. Lee,et al.  Inhibition of Respiration Extends C. elegans Life Span via Reactive Oxygen Species that Increase HIF-1 Activity , 2010, Current Biology.

[18]  S. Batzoglou,et al.  Application of independent component analysis to microarrays , 2003, Genome Biology.

[19]  Peter J. Woolf,et al.  GAGE: generally applicable gene set enrichment for pathway analysis , 2009, BMC Bioinformatics.

[20]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[21]  Cole M. Haynes,et al.  Mitochondrial Import Efficiency of ATFS-1 Regulates Mitochondrial UPR Activation , 2012, Science.

[22]  E. Birney,et al.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt , 2009, Nature Protocols.

[23]  A. Dillin,et al.  The Cell-Non-Autonomous Nature of Electron Transport Chain-Mediated Longevity , 2011, Cell.

[24]  K. Yamamoto,et al.  Identification of C. elegans DAF-12-binding sites, response elements, and target genes. , 2004, Genes & development.

[25]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[26]  Z. Szallasi,et al.  Correction of technical bias in clinical microarray data improves concordance with known biological information , 2008, Genome Biology.

[27]  David J. Arenillas,et al.  JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles , 2015, Nucleic Acids Res..

[28]  Mary F. McGuire,et al.  Data driven linear algebraic methods for analysis of molecular pathways: Application to disease progression in shock/trauma , 2012, J. Biomed. Informatics.

[29]  David Z. Chen,et al.  Architecture of the human regulatory network derived from ENCODE data , 2012, Nature.

[30]  S. Ghosh,et al.  Crosstalk in NF-κB signaling pathways , 2011, Nature Immunology.

[31]  Yvan Saeys,et al.  A comprehensive evaluation of module detection methods for gene expression data , 2018, Nature Communications.

[32]  David Lindgren,et al.  Independent component analysis reveals new and biologically significant structures in micro array data , 2006, BMC Bioinformatics.

[33]  R. Simes,et al.  An improved Bonferroni procedure for multiple tests of significance , 1986 .

[34]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[35]  Russ B. Altman,et al.  Data-driven human transcriptomic modules determined by independent component analysis , 2018, BMC Bioinformatics.

[36]  P. Morgan,et al.  Effects of the mitochondrial respiratory chain on longevity in C. elegans , 2014, Experimental Gerontology.

[37]  Daphne Koller,et al.  Genome-wide discovery of transcriptional modules from DNA sequence and gene expression , 2003, ISMB.

[38]  C. Mayr,et al.  Widespread Shortening of 3′UTRs by Alternative Cleavage and Polyadenylation Activates Oncogenes in Cancer Cells , 2009, Cell.

[39]  Joshua M. Stuart,et al.  A Gene Expression Map for Caenorhabditis elegans , 2001, Science.

[40]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[41]  Bruno Torrésani,et al.  Blind Source Separation and the Analysis of Microarray Data , 2004, J. Comput. Biol..

[42]  Maria Keays,et al.  ArrayExpress update—trends in database growth and links to data analysis tools , 2012, Nucleic Acids Res..

[43]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Steven J. M. Jones,et al.  High-Throughput In Vivo Analysis of Gene Expression in Caenorhabditis elegans , 2007, PLoS biology.

[45]  Pierre-Antoine Absil,et al.  Elucidating the Altered Transcriptional Programs in Breast Cancer using Independent Component Analysis , 2007, PLoS Comput. Biol..

[46]  Zhengyuan O. Wang,et al.  Identification of Hookworm DAF-16/FOXO Response Elements and Direct Gene Targets , 2010, PloS one.

[47]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[48]  Bor-Sen Chen,et al.  Computational reconstruction of transcriptional regulatory modules of the yeast cell cycle , 2006, BMC Bioinformatics.

[49]  Michael P. Cary,et al.  A Regulated Response to Impaired Respiration Slows Behavioral Rates and Increases Lifespan in Caenorhabditis elegans , 2009, PLoS genetics.

[50]  Sven Bergmann,et al.  Defining transcription modules using large-scale gene expression data , 2004, Bioinform..

[51]  Cole M. Haynes,et al.  Protective Coupling of Mitochondrial Function and Protein Synthesis via the eIF2α Kinase GCN-2 , 2012, PLoS genetics.

[52]  H. Gunshin,et al.  A review of independent component analysis application to microarray gene expression data. , 2008, BioTechniques.

[53]  Robert Clarke,et al.  Gene Module Identification from Microarray Data Using Nonnegative Independent Component Analysis , 2008, Gene regulation and systems biology.

[54]  Kathleen Marchal,et al.  Reverse‐Engineering Transcriptional Modules from Gene Expression Data , 2009, Annals of the New York Academy of Sciences.

[55]  A. Nobel,et al.  Heading Down the Wrong Pathway: on the Influence of Correlation within Gene Sets , 2010, BMC Genomics.

[56]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[57]  Emmitt R. Jolly,et al.  Inference of combinatorial regulation in yeast transcriptional networks: a case study of sporulation. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .