PAMOGK: a pathway graph kernel-based multiomics approach for patient clustering

MOTIVATION Accurate classification of patients into molecular subgroups is critical for the development of effective therapeutics and for deciphering what drives these subgroups to cancer. The availability of multi-omics data catalogs for large cohorts of cancer patients provides multiple views into the molecular biology of the tumors with unprecedented resolution. RESULTS We develop PAMOGK (Pathway based Multi Omic Graph Kernel clustering) that integrates multi-omics patient data with existing biological knowledge on pathways. We develop a novel graph kernel that evaluates patient similarities based on a single molecular alteration type in the context of a pathway. To corroborate multiple views of patients evaluated by hundreds of pathways and molecular alteration combinations, we use multi-view kernel clustering. Applying PAMOGK to kidney renal clear cell carcinoma (KIRC) patients results in four clusters with significantly different survival times (p-value = 1.24e-11). When we compare PAMOGK to eight other state-of-the-art multi-omics clustering methods, PAMOGK consistently outperforms these in terms of its ability to partition KIRC patients into groups with different survival distributions. The discovered patient subgroups also differ with respect to other clinical parameters such as tumor stage and grade, and primary tumor and metastasis tumor spreads. The pathways identified as important are highly relevant to KIRC. AVAILABILITY github.com/tastanlab/pamogk. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  D. Harrington A class of rank test procedures for censored survival data , 1982 .

[3]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[4]  Aedín C. Culhane,et al.  A multivariate approach to the integration of multi-omics datasets , 2014, BMC Bioinformatics.

[5]  Eli Upfal,et al.  Accurate Computation of Survival Statistics in Genome-Wide Studies , 2013, PLoS Comput. Biol..

[6]  Marina Vannucci,et al.  A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data. , 2018, Biostatistics.

[7]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[8]  Michael Q. Zhang,et al.  Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification , 2015, BMC Genomics.

[9]  Shiliang Sun,et al.  Multi-view learning overview: Recent progress and new challenges , 2017, Inf. Fusion.

[10]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[11]  Benjamin J. Raphael,et al.  Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin , 2014, Cell.

[12]  Adam B. Olshen,et al.  Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis , 2009, Bioinform..

[13]  Ruqing Liang,et al.  A comprehensive analysis of prognosis prediction models based on pathway-level, gene-level and clinical information for glioblastoma , 2018, International journal of molecular medicine.

[14]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of clear cell renal cell carcinoma , 2013, Nature.

[15]  Nico Pfeifer,et al.  Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery , 2015, Bioinform..

[16]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[17]  Roman Garnett,et al.  Propagation kernels: efficient graph kernels from propagated information , 2015, Machine Learning.

[18]  Nacim Fateh Chikhi,et al.  Multi-view clustering via spectral partitioning and local refinement , 2016, Inf. Process. Manag..

[19]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[20]  A. Levine,et al.  Surfing the p53 network , 2000, Nature.

[21]  S. Gabriel,et al.  Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. , 2010, Cancer cell.

[22]  Lorenz Wernisch,et al.  Clusternomics: Integrative context-dependent clustering for heterogeneous datasets , 2017, bioRxiv.

[23]  Stefano Monti,et al.  Gene expression profiling reveals reproducible human lung adenocarcinoma subtypes in multiple independent patient cohorts. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[24]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[25]  M. Arends,et al.  Molecular pathological classification of colorectal cancer , 2016, Virchows Archiv.

[26]  Massimo Cristofanilli,et al.  Molecular characterization and targeted therapeutic approaches in breast cancer , 2015, Breast Cancer Research.

[27]  Benjamin J. Raphael,et al.  Network propagation: a universal amplifier of genetic associations , 2017, Nature Reviews Genetics.

[28]  Johan A. K. Suykens,et al.  Optimized Data Fusion for Kernel k-Means Clustering , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  S. Drăghici,et al.  A novel approach for data integration and disease subtyping , 2017, Genome research.

[30]  Paul T. Spellman,et al.  The Cancer Genome Atlas Comprehensive Molecular Characterization of Renal Cell Carcinoma , 2018, Cell reports.

[31]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[32]  J. Brooks,et al.  Endothelin-1 promotes cell survival in renal cell carcinoma through the ET(A) receptor. , 2007, Cancer letters.

[33]  Rameen Beroukhim,et al.  Genetic and functional studies implicate HIF1α as a 14q kidney cancer suppressor gene. , 2011, Cancer discovery.

[34]  C. Sander,et al.  Pattern discovery and cancer gene identification in integrated cancer genomic data , 2013, Proceedings of the National Academy of Sciences.

[35]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[36]  R. Shamir,et al.  Multi-omic and multi-view clustering algorithms: review and cancer benchmark , 2018, bioRxiv.

[37]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[38]  P. Cairns,et al.  Signaling pathways in renal cell carcinoma , 2010, Cancer biology & therapy.

[39]  A. Shaw,et al.  Tumour heterogeneity and resistance to cancer therapies , 2018, Nature Reviews Clinical Oncology.

[40]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[41]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[42]  Chiou-Shann Fuh,et al.  Multiple Kernel Learning for Dimensionality Reduction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  L. Gunaratnam,et al.  Silencing of epidermal growth factor receptor suppresses hypoxia-inducible factor-2-driven VHL-/- renal cancer. , 2005, Cancer research.

[44]  Eric F Lock,et al.  JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES. , 2011, The annals of applied statistics.

[45]  Adrian V. Lee,et al.  An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics , 2018, Cell.