Universal concept signature analysis: genome-wide quantification of new biological and pathological functions of genes and pathways

Identifying new gene functions and pathways underlying diseases and biological processes are major challenges in genomics research. Particularly, most methods for interpreting the pathways characteristic of an experimental gene list defined by genomic data are limited by their dependence on assessing the overlapping genes or their interactome topology, which cannot account for the variety of functional relations. This is particularly problematic for pathway discovery from single-cell genomics with low gene coverage or interpreting complex pathway changes such as during change of cell states. Here, we exploited the comprehensive sets of molecular concepts that combine ontologies, pathways, interactions and domains to help inform the functional relations. We first developed a universal concept signature (uniConSig) analysis for genome-wide quantification of new gene functions underlying biological or pathological processes based on the signature molecular concepts computed from known functional gene lists. We then further developed a novel concept signature enrichment analysis (CSEA) for deep functional assessment of the pathways enriched in an experimental gene list. This method is grounded on the framework of shared concept signatures between gene sets at multiple functional levels, thus overcoming the limitations of the current methods. Through meta-analysis of transcriptomic data sets of cancer cell line models and single hematopoietic stem cells, we demonstrate the broad applications of CSEA on pathway discovery from gene expression and single-cell transcriptomic data sets for genetic perturbations and change of cell states, which complements the current modalities. The R modules for uniConSig analysis and CSEA are available through https://github.com/wangxlab/uniConSig.

[1]  Susan G. Hilsenbeck,et al.  Recurrent ESR1-CCDC170 rearrangements in an aggressive subset of estrogen-receptor positive breast cancers , 2014, Nature Communications.

[2]  A. Schulze,et al.  The multifaceted roles of fatty acid synthesis in cancer , 2016, Nature Reviews Cancer.

[3]  Ashwini Jeggari,et al.  EviNet: a web platform for network enrichment analysis with flexible definition of gene sets , 2018, Nucleic Acids Res..

[4]  Ryan Miller,et al.  WikiPathways: capturing the full diversity of pathway knowledge , 2015, Nucleic Acids Res..

[5]  P. Boutros,et al.  Dysregulation of the mevalonate pathway promotes transformation , 2010, Proceedings of the National Academy of Sciences.

[6]  Mark J. Murphy,et al.  c-Myc controls the balance between hematopoietic stem cell self-renewal and differentiation. , 2004, Genes & development.

[7]  J. Mesirov,et al.  The Molecular Signatures Database Hallmark Gene Set Collection , 2015 .

[8]  Yves Moreau,et al.  PINTA: a web server for network-based gene prioritization from expression data , 2011, Nucleic Acids Res..

[9]  Q. Ren,et al.  Liver pyruvate kinase polymorphisms are associated with type 2 diabetes in northern European Caucasians. , 2002, Diabetes.

[10]  Arthur Liberzon,et al.  A description of the Molecular Signatures Database (MSigDB) Web site. , 2014, Methods in molecular biology.

[11]  Andrey Alexeyenko,et al.  Distinguishing between driver and passenger mutations in individual cancer genomes by network enrichment analysis , 2014, BMC Bioinformatics.

[12]  D. Bowtell,et al.  Stress‐induced decrease in TRAF2 stability is mediated by Siah2 , 2002, The EMBO journal.

[13]  David J. Porteous,et al.  SUSPECTS : enabling fast and effective prioritization of positional candidates , 2005 .

[14]  Guangchuang Yu,et al.  clusterProfiler: an R package for comparing biological themes among gene clusters. , 2012, Omics : a journal of integrative biology.

[15]  S. Liyanarachchi,et al.  An NF-κB--EphrinA5-Dependent Communication between NG2(+) Interstitial Cells and Myoblasts Promotes Muscle Growth in Neonates. , 2016, Developmental cell.

[16]  A. Levine,et al.  Mutant p53 Disrupts Mammary Tissue Architecture via the Mevalonate Pathway , 2012, Cell.

[17]  Zhang Zhang,et al.  Biological Databases for Human Research , 2015, Genom. Proteom. Bioinform..

[18]  Narmada Thanki,et al.  CDD: NCBI's conserved domain database , 2014, Nucleic Acids Res..

[19]  Xun Zhu,et al.  Single cell transcriptomics reveals unanticipated features of early hematopoietic precursors , 2016, Nucleic acids research.

[20]  Yves Moreau,et al.  Candidate gene prioritization with Endeavour , 2016, Nucleic Acids Res..

[21]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[22]  Gilbert S Omenn,et al.  An integrative approach to reveal driver gene fusions from paired-end sequencing data in cancer , 2009, Nature Biotechnology.

[23]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[24]  T. Kang,et al.  Kir3.1 channel is functionally involved in TLR4-mediated signaling. , 2011, Biochemical and biophysical research communications.

[25]  Sason Shaik,et al.  Mechanism of oxidation reactions catalyzed by cytochrome p450 enzymes. , 2004, Chemical reviews.

[26]  R. Freire,et al.  Rad9B responds to nucleolar stress through ATR and JNK signalling, and delays the G1–S transition , 2012, Journal of Cell Science.

[27]  Jing Chen,et al.  ToppGene Suite for gene list enrichment analysis and candidate gene prioritization , 2009, Nucleic Acids Res..

[28]  D. Edwards,et al.  Comprehensive functional analysis of the tousled-like kinase 2 frequently amplified in aggressive luminal breast cancers , 2016, Nature Communications.

[29]  Gary D. Bader,et al.  Pathway Commons, a web resource for biological pathway data , 2010, Nucleic Acids Res..

[30]  Jinbo Xu,et al.  Disease Gene Prioritization Using Network and Feature , 2015, J. Comput. Biol..

[31]  Kara Dolinski,et al.  The BioGRID interaction database: 2015 update , 2014, Nucleic Acids Res..

[32]  Søren Brunak,et al.  MetaRanker 2.0: a web server for prioritization of genetic variation data , 2013, Nucleic Acids Res..

[33]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[34]  A. Boyd,et al.  Functional heterogeneity within rhodamine123(lo) Hoechst33342(lo/sp) primitive hemopoietic stem cells revealed by pyronin Y. , 2001, Experimental hematology.

[35]  Bart De Moor,et al.  Mapping biomedical concepts onto the human genome by mining literature on chromosomal aberrations , 2007, Nucleic acids research.

[36]  L. G. Moss,et al.  Nkx6.1 regulates islet β-cell proliferation via Nr4a1 and Nr4a3 nuclear receptors , 2014, Proceedings of the National Academy of Sciences.

[37]  Xiang Li,et al.  A novel network-based method for measuring the functional relationship between gene sets , 2011, Bioinform..

[38]  Gert Vriend,et al.  GeneSeeker: extraction and integration of human disease-related information from web-based genetic databases , 2005, Nucleic Acids Res..

[39]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[40]  S. Hilsenbeck,et al.  Amplification and over‐expression of MAP3K3 gene in human breast cancer promotes formation and survival of breast cancer cells , 2014, The Journal of pathology.

[41]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[42]  R. Schiff,et al.  Identification of MYST3 as a novel epigenetic activator of ERα frequently amplified in breast cancer , 2016, Oncogene.

[43]  Yang Liu,et al.  VisANT 4.0: Integrative network platform to connect genes, drugs, diseases and therapies , 2013, Nucleic Acids Res..

[44]  J. S. Kwon,et al.  Controlling Depth of Cellular Quiescence by an Rb-E2F Network Switch. , 2017, Cell reports.

[45]  R. DePinho,et al.  Synthetic essentiality of chromatin remodelling factor CHD1 in PTEN-deficient cancer , 2017, Nature.

[46]  L. Robb,et al.  Cytokine receptors and hematopoietic differentiation , 2007, Oncogene.

[47]  Tatiana A. Tatusova,et al.  Gene: a gene-centered information resource at NCBI , 2014, Nucleic Acids Res..

[48]  Jennifer M. Rust,et al.  The BioGRID Interaction Database , 2011 .

[49]  Y. Jo,et al.  Control of cholesterol synthesis through regulated ER-associated degradation of HMG CoA reductase , 2010, Critical reviews in biochemistry and molecular biology.

[50]  M. Tan,et al.  Inhibition of the Warburg effect with a natural compound reveals a novel measurement for determining the metastatic potential of breast cancers , 2015, Oncotarget.

[51]  Muin J. Khoury,et al.  Gene Prospector: An evidence gateway for evaluating potential susceptibility genes and interacting risk factors for human diseases , 2008, BMC Bioinformatics.

[52]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[53]  J. Mesirov,et al.  The Molecular Signatures Database (MSigDB) hallmark gene set collection. , 2015, Cell systems.

[54]  B. Aggarwal,et al.  Identification and functional characterization of DR6, a novel death domain‐containing TNF receptor , 1998, FEBS letters.

[55]  S. Ohtake,et al.  Fluvastatin Upregulates the Expression of Tissue Factor Pathway Inhibitor in Human Umbilical Vein Endothelial Cells. , 2015, Journal of atherosclerosis and thrombosis.

[56]  Yusuke Nakamura,et al.  Involvement of PEG10 in human hepatocellular carcinogenesis through interaction with SIAH1. , 2003, Cancer research.

[57]  Miguel A. Andrade-Navarro,et al.  Génie: literature-based gene prioritization at multi genomic scale , 2011, Nucleic Acids Res..

[58]  Alfonso Valencia,et al.  TopoGSA: network topological gene set analysis , 2010, Bioinform..

[59]  T. Heskes,et al.  The statistical properties of gene-set analysis , 2016, Nature Reviews Genetics.

[60]  Jana Marie Schwarz,et al.  GeneDistiller—Distilling Candidate Genes from Linkage Intervals , 2008, PloS one.

[61]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[62]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[63]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[64]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[65]  I. Weissman,et al.  Quiescent hematopoietic stem cells accumulate DNA damage during aging that is repaired upon entry into cell cycle. , 2014, Cell stem cell.

[66]  F. Cao,et al.  Apelin stimulates glucose uptake through the PI3K/Akt pathway and improves insulin resistance in 3T3-L1 adipocytes , 2011, Molecular and Cellular Biochemistry.

[67]  Matthew N. Bainbridge,et al.  A visual and curatorial approach to clinical variant prioritization and disease gene discovery in genome-wide diagnostics , 2016, Genome Medicine.

[68]  Satoshi Takahashi,et al.  PosMed: ranking genes and bioresources based on Semantic Web Association Study , 2013, Nucleic Acids Res..

[69]  F. Dell’Accio,et al.  Activation of WNT and BMP signaling in adult human articular cartilage following mechanical injury , 2006, Arthritis research & therapy.

[70]  O. Gavrilova,et al.  Role of forkhead box protein A3 in age-associated metabolic decline , 2014, Proceedings of the National Academy of Sciences.

[71]  Martin Olbrot,et al.  Identification of β-cell-specific insulin gene transcription factor RIPE3b1 as mammalian MafA , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[72]  A. Fukamizu,et al.  CTF18 interacts with replication protein A in response to replication stress. , 2016, Molecular medicine reports.

[73]  John T. Wei,et al.  Integrative molecular concept modeling of prostate cancer progression , 2007, Nature Genetics.