MGSEA – a multivariate Gene set enrichment analysis

BackgroundGene Set Enrichment Analysis (GSEA) is a powerful tool to identify enriched functional categories of informative biomarkers. Canonical GSEA takes one-dimensional feature scores derived from the data of one platform as inputs. Numerous extensions of GSEA handling multimodal OMIC data are proposed, yet none of them explicitly captures combinatorial relations of feature scores from multiple platforms.ResultsWe propose multivariate GSEA (MGSEA) to capture combinatorial relations of gene set enrichment among multiple platform features. MGSEA successfully captures designed feature relations from simulated data. By applying it to the scores of delineating breast cancer and glioblastoma multiforme (GBM) subtypes from The Cancer Genome Atlas (TCGA) datasets of CNV, DNA methylation and mRNA expressions, we find that breast cancer and GBM data yield both similar and distinct outcomes. Among the enriched functional categories, subtype-specific biomarkers are dominated by mRNA expression in many functional categories in both cancer types and also by CNV in many functional categories in breast cancer. The enriched functional categories belonging to distinct combinatorial patterns are involved different oncogenic processes: cell proliferation (such as cell cycle control, estrogen responses, MYC and E2F targets) for mRNA expression in breast cancer, invasion and metastasis (such as cell adhesion and epithelial-mesenchymal transition (EMT)) for CNV in breast cancer, and diverse processes (such as immune and inflammatory responses, cell adhesion, angiogenesis, and EMT) for mRNA expression in GBM. These observations persist in two external datasets (Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) for breast cancer and Repository for Molecular Brain Neoplasia Data (REMBRANDT) for GBM) and are consistent with knowledge of cancer subtypes. We further compare the characteristics of MGSEA with several extensions of GSEA and point out the pros and cons of each method.ConclusionsWe demonstrated the utility of MGSEA by inferring the combinatorial relations of multiple platforms for cancer subtype delineation in three multi-OMIC datasets: TCGA, METABRIC and REMBRANDT. The inferred combinatorial patterns are consistent with the current knowledge and also reveal novel insights about cancer subtypes. MGSEA can be further applied to any genotype-phenotype association problems with multimodal OMIC data.

[1]  Vassilios Ioannidis,et al.  Avoiding the pitfalls of gene set enrichment analysis with SetRank , 2017, BMC Bioinformatics.

[2]  Ralf Zimmer,et al.  Widespread context dependency of microRNA-mediated regulation , 2014, Genome research.

[3]  J. Uhm Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2009 .

[4]  R. Wilson,et al.  Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. , 2010, Cancer cell.

[5]  J. Mesirov,et al.  The Molecular Signatures Database (MSigDB) hallmark gene set collection. , 2015, Cell systems.

[6]  Aedín C. Culhane,et al.  MOGSA: Integrative Single Sample Gene-set Analysis of Multiple Omics Data , 2016, Molecular & Cellular Proteomics.

[7]  J. Mesirov,et al.  The Molecular Signatures Database Hallmark Gene Set Collection , 2015 .

[8]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[9]  Christina Backes,et al.  Multi-omics enrichment analysis using the GeneTrail2 web service , 2016, Bioinform..

[10]  Subha Madhavan,et al.  Rembrandt: Helping Personalized Medicine Become a Reality through Integrative Translational Research , 2009, Molecular Cancer Research.

[11]  Fabian J. Theis,et al.  A modular framework for gene set analysis integrating multilevel omics data , 2013, Nucleic acids research.

[12]  Matko Bosnjak,et al.  REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms , 2011, PloS one.

[13]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[14]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[15]  Joseph G. Ibrahim,et al.  Patterns of cell cycle checkpoint deregulation associated with intrinsic molecular subtypes of human breast cancer cells , 2017, npj Breast Cancer.

[16]  The Gene Ontology Consortium,et al.  Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[17]  Christina Appin,et al.  Tumor-Infiltrating Lymphocytes in Glioblastoma Are Associated with Specific Genomic Alterations and Related to Transcriptional Class , 2013, Clinical Cancer Research.

[18]  Seon-Young Kim,et al.  PAGE: Parametric Analysis of Gene Set Enrichment , 2005, BMC Bioinform..

[19]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[20]  Chen-Hsiang Yeang,et al.  An integrative characterization of recurrent molecular aberrations in glioblastoma genomes , 2013, Nucleic acids research.

[21]  Yasuo Iwadate,et al.  Epithelial-mesenchymal transition in glioblastoma progression , 2016, Oncology letters.

[22]  The Gene Ontology Consortium Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[23]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[24]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[25]  S. Gabriel,et al.  Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. , 2010, Cancer cell.

[26]  Jessika Weiss,et al.  Graphical Models In Applied Multivariate Statistics , 2016 .

[27]  Andrew E. Sloan,et al.  Molecular Subtypes of Glioblastoma Are Relevant to Lower Grade Glioma , 2014, PloS one.

[28]  Galina V. Glazko,et al.  A Multivariate Extension of the gene Set Enrichment Analysis , 2007, J. Bioinform. Comput. Biol..

[29]  Norman R. Draper,et al.  Applied regression analysis (2. ed.) , 1981, Wiley series in probability and mathematical statistics.

[30]  F. Berrino,et al.  Sex Hormone Levels, Breast Cancer Risk, and Cancer Receptor Status in Postmenopausal Women: the ORDET Cohort , 2009, Cancer Epidemiology Biomarkers & Prevention.

[31]  Yuri Kotliarov,et al.  High-resolution global genomic survey of 178 gliomas reveals novel regions of copy number alteration and allelic imbalances. , 2006, Cancer research.

[32]  Avi Ma'ayan,et al.  Principle Angle Enrichment Analysis (PAEA): Dimensionally reduced multivariate gene set enrichment analysis tool , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[33]  Chen Meng,et al.  moGSA : integrative single sample gene-set analysis of 1 multiple omics data 2 , 2016 .