Assembling cell context-specific gene sets: a case in cardiomyopathy

An increasing amount of evidence suggests that canonical pathways and standard molecular signature databases are incomplete and inadequate to model the complex behavior of cell physiology and pathology. Yet, many Gene Set Analysis (GSA) studies still rely on these databases to identify disease biomarkers and molecular mechanisms within a specific cell context. While tremendous effort has been invested in developing GSA tools, there is limited number of studies focusing on de novo assembly of context-specific gene sets as opposed to simply applying GSA using the standard gene set database. In this paper, we propose a pipeline to derive the entire collection of Cell context-Specific Gene Sets (CSGS) from a molecular interaction network, based on the hypothesis that molecular events linked to a specific phenotypic response should cluster within a subnet of interacting genes. Gene sets are assigned using both physical properties of the network and functional annotations of the neighboring nodes. The identified gene sets could provide a precise starting point such that the downstream GSA will cover all functional pathways in this particular cell context and, at the same time, avoid the noise and excessive multiple-hypothesis testing due to inclusion of irrelevant gene sets from the standard database. We applied the pipeline in the context of cardiomyopathy and demonstrated its superiority over MSigDB gene set collection in terms of: (i) reproducibility and robustness in GSA, (ii) effectiveness in uncovering molecular mechanisms associated with cardiomyopathy, and (iii) the performance in distinguishing diseased vs. normal states.

[1]  Michael P. Verdicchio,et al.  Learning contextual gene set interaction networks of cancer with condition specificity , 2013, BMC Genomics.

[2]  Christie S. Chang,et al.  The BioGRID interaction database: 2013 update , 2012, Nucleic Acids Res..

[3]  Lincoln Stein,et al.  Reactome: a database of reactions, pathways and biological processes , 2010, Nucleic Acids Res..

[4]  Bing Zhang,et al.  WebGestalt: an integrated system for exploring gene sets in various biological contexts , 2005, Nucleic Acids Res..

[5]  Geoffrey S. Tobias,et al.  Pathway analysis of genome-wide association study data highlights pancreatic development genes as susceptibility factors for pancreatic cancer. , 2012, Carcinogenesis.

[6]  Eytan Domany,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2004, Breast Cancer Research.

[7]  Mona Singh,et al.  Toward the dynamic interactome: it's about time , 2010, Briefings Bioinform..

[8]  Mariano J. Alvarez,et al.  A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers , 2010, Molecular systems biology.

[9]  Rüdiger Westermann,et al.  Random Walks for Interactive Organ Segmentation in Two and Three Dimensions: Implementation and Validation , 2005, MICCAI.

[10]  Leo Grady,et al.  Isoperimetric Partitioning: A New Algorithm for Graph Partitioning , 2005, SIAM J. Sci. Comput..

[11]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2009 update , 2009, Nucleic Acids Res..

[12]  Nancy Argüelles,et al.  Author ' s , 2008 .

[13]  Teresa M. Przytycka,et al.  Identifying Causal Genes and Dysregulated Pathways in Complex Diseases , 2011, PLoS Comput. Biol..

[14]  J. Uhm,et al.  The transcriptional network for mesenchymal transformation of brain tumours , 2010 .

[15]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[16]  Jeffrey T. Chang,et al.  GATHER: a systems approach to interpreting genomic signatures , 2006, Bioinform..

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Michael A. Burke,et al.  Targeting myocardial substrate metabolism in heart failure: potential for new therapies , 2012, European journal of heart failure.

[19]  A. Barabasi,et al.  Interactome Networks and Human Disease , 2011, Cell.

[20]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[21]  Seon-Young Kim,et al.  PAGE: Parametric Analysis of Gene Set Enrichment , 2005, BMC Bioinform..

[22]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[23]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[24]  H. V. Jagadish,et al.  ConceptGen: a gene set enrichment and gene set relation mapping tool , 2010, Bioinform..

[25]  Paul Pavlidis,et al.  ErmineJ: Tool for functional analysis of gene expression data sets , 2005, BMC Bioinformatics.

[26]  Israel Steinfeld,et al.  BMC Bioinformatics BioMed Central , 2008 .

[27]  William C Stanley,et al.  Myocardial substrate metabolism in the normal and failing heart. , 2005, Physiological reviews.

[28]  Adam A. Margolin,et al.  Reverse engineering cellular networks , 2006, Nature Protocols.

[29]  C. Gu,et al.  Pathway-based genome-wide association analysis of coronary heart disease identifies biologically important gene sets , 2012, European Journal of Human Genetics.

[30]  Sridhar Hannenhalli,et al.  Transcriptional Genomics Associates FOX Transcription Factors With Human Heart Failure , 2006, Circulation.

[31]  Hugo A. Katus,et al.  Targeted Next-Generation Sequencing for the Molecular Genetic Diagnostics of Cardiomyopathies , 2011, Circulation. Cardiovascular genetics.

[32]  Florentino Fernández Riverola,et al.  WhichGenes: a web-based tool for gathering, building, storing and exporting gene sets with application in gene set enrichment analysis , 2009, Nucleic Acids Res..

[33]  Marc A Pfeffer,et al.  Controversies in ventricular remodelling , 2006, The Lancet.

[34]  M. Humbert,et al.  Increased interleukin-1 and interleukin-6 serum concentrations in severe primary pulmonary hypertension. , 1995, American journal of respiratory and critical care medicine.

[35]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[36]  Leo Grady,et al.  Isoperimetric graph partitioning for image segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[38]  A. Butte,et al.  Leveraging models of cell regulation and GWAS data in integrative network-based association studies , 2012, Nature Genetics.

[39]  Seon-Young Kim,et al.  Gene-set approach for expression pattern analysis , 2008, Briefings Bioinform..

[40]  Adam A. Margolin,et al.  NOTCH1 directly regulates c-MYC and activates a feed-forward-loop transcriptional network promoting leukemic cell growth , 2006, Proceedings of the National Academy of Sciences.

[41]  A. Matsumori Cytokines in myocarditis and dilated cardiomyopathy , 2002 .

[42]  B. Fridley,et al.  Self-Contained Gene-Set Analysis of Expression Data: An Evaluation of Existing and Novel Methods , 2010, PloS one.

[43]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[44]  H. Miller,et al.  Role of cytokines in heart failure. , 1998, American heart journal.

[45]  Aedín C. Culhane,et al.  GeneSigDB: a manually curated database and resource for analysis of gene expression signatures , 2011, Nucleic Acids Res..

[46]  Wei Keat Lim,et al.  Master Regulators Used As Breast Cancer Metastasis Classifier , 2008, Pacific Symposium on Biocomputing.