Module representatives for refining gene co-expression modules

This paper concerns the identification of gene co-expression modules in transcriptomics data, i.e. collections of genes which are highly co-expressed and potentially linked to a biological mechanism. Weighted gene co-expression network analysis (WGCNA) is a widely used method for module detection based on the computation of eigengenes, the weights of the first principal component for the module gene expression matrix. This eigengene has been used as a centroid in a k-means algorithm to improve module memberships. In this paper, we present four new module representatives: the eigengene subspace, flag mean, flag median and module expression vector. The eigengene subspace, flag mean and flag median are subspace module representatives which capture more variance of the gene expression within a module. The module expression vector is a weighted centroid of the module which leverages the structure of the module gene co-expression network. We use these module representatives in Linde–Buzo–Gray clustering algorithms to refine WGCNA module membership. We evaluate these methodologies on two transcriptomics data sets. We find that most of our module refinement techniques improve upon the WGCNA modules by two statistics: (1) module classification between phenotype and (2) module biological significance according to Gene Ontology terms.

[1]  M. Kirby,et al.  Pathway expression analysis , 2022, Scientific reports.

[2]  A. Mortazavi,et al.  PyWGCNA: a Python package for weighted gene co-expression network analysis , 2022, bioRxiv.

[3]  Rachel M. Lynch,et al.  Elucidating Mechanisms of Tolerance to Salmonella Typhimurium across Long-Term Infections Using the Collaborative Cross , 2022, bioRxiv.

[4]  M. Kirby,et al.  The Flag Median and FlagIRLS , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Rachel M. Lynch,et al.  Genetic background influences survival of infections with Salmonella enterica serovar Typhimurium in the Collaborative Cross , 2022, bioRxiv.

[6]  X. Ye,et al.  K-Module Algorithm: An Additional Step to Improve the Clustering Results of WGCNA Co-Expression Networks , 2021, Genes.

[7]  Xiaowei Niu,et al.  Weighted Gene Co-Expression Network Analysis Identifies Critical Genes in the Development of Heart Failure After Acute Myocardial Infarction , 2019, Front. Genet..

[8]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..

[9]  Y. Saeys,et al.  A comprehensive evaluation of module detection methods for gene expression data , 2018, Nature Communications.

[10]  John Hardy,et al.  An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks , 2017, BMC Systems Biology.

[11]  João Pedro de Magalhães,et al.  Gene co-expression analysis for functional classification and gene–disease predictions , 2017, Briefings Bioinform..

[12]  R. Gibbs,et al.  Genomic analyses identify molecular subtypes of pancreatic cancer , 2016, Nature.

[13]  Alfred O. Hero,et al.  An individualized predictor of health and disease using paired reference and target samples , 2016, BMC Bioinformatics.

[14]  Bruce A. Draper,et al.  A flag representation for finite collections of subspaces of mixed dimensions , 2014 .

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  Yufeng Liu,et al.  Building prognostic models for breast cancer patients using clinical variables and hundreds of gene expression signatures , 2011, BMC Medical Genomics.

[17]  Suresh Venkatasubramanian,et al.  The geometric median on Riemannian manifolds with application to robust atlas estimation , 2009, NeuroImage.

[18]  S. Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[19]  Ming Wu,et al.  Gene module level analysis: identification to networks and dynamics. , 2008, Current opinion in biotechnology.

[20]  Jun Dong,et al.  Geometric Interpretation of Gene Coexpression Network Analysis , 2008, PLoS Comput. Biol..

[21]  Peter Langfelder,et al.  Eigengene networks for studying the relationships between co-expression modules , 2007, BMC Systems Biology.

[22]  Pierre-Antoine Absil,et al.  Elucidating the Altered Transcriptional Programs in Breast Cancer using Independent Component Analysis , 2007, PLoS Comput. Biol..

[23]  Limin Fu,et al.  FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data , 2007, BMC Bioinformatics.

[24]  S. Horvath,et al.  A General Framework for Weighted Gene Co-Expression Network Analysis , 2005, Statistical applications in genetics and molecular biology.

[25]  Nicola J. Rinaldi,et al.  Computational discovery of gene modules and regulatory networks , 2003, Nature Biotechnology.

[26]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[27]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[28]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[29]  R. H. Hardin,et al.  Editors' Note on Packing Lines, Planes, etc.: Packings in Grassmannian Spaces , 2002, Exp. Math..

[30]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[31]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[32]  H. Karcher Riemannian center of mass and mollifier smoothing , 1977 .

[33]  Gene H. Golub,et al.  Numerical methods for computing angles between linear subspaces , 1971, Milestones in Matrix Computation.