An interpretable multiple kernel learning approach for the discovery of integrative cancer subtypes

Due to the complexity of cancer, clustering algorithms have been used to disentangle the observed heterogeneity and identify cancer subtypes that can be treated specifically. While kernel based clustering approaches allow the use of more than one input matrix, which is an important factor when considering a multidimensional disease like cancer, the clustering results remain hard to evaluate and, in many cases, it is unclear which piece of information had which impact on the final result. In this paper, we propose an extension of multiple kernel learning clustering that enables the characterization of each identified patient cluster based on the features that had the highest impact on the result. To this end, we combine feature clustering with multiple kernel dimensionality reduction and introduce FIPPA, a score which measures the feature cluster impact on a patient cluster. Results: We applied the approach to different cancer types described by four different data types with the aim of identifying integrative patient subtypes and understanding which features were most important for their identification. Our results show that our method does not only have state-of-the-art performance according to standard measures (e.g., survival analysis), but, based on the high impact features, it also produces meaningful explanations for the molecular bases of the subtypes. This could provide an important step in the validation of potential cancer subtypes and enable the formulation of new hypotheses concerning individual patient groups. Similar analysis are possible for other disease phenotypes.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[3]  Johan A. K. Suykens,et al.  Optimized Data Fusion for Kernel k-Means Clustering , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Thomas Lengauer,et al.  Comprehensive Analysis of DNA Methylation Data with RnBeads , 2014, Nature Methods.

[5]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[6]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[7]  P. V. Rao,et al.  Applied Survival Analysis: Regression Modeling of Time to Event Data , 2000 .

[8]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[9]  J. Grandis,et al.  STAT signaling in head and neck cancer , 2000, Oncogene.

[10]  Nico Pfeifer,et al.  Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery , 2015, Bioinform..

[11]  S. Rodríguez-Rodero,et al.  Epigenetic modulators of thyroid cancer. , 2017, Endocrinologia, diabetes y nutricion.

[12]  L. Coussens,et al.  Inflammation and Cancer , 2016 .

[13]  Mary Goldman,et al.  Abstract 2584: The UCSC Xena system for cancer genomics data visualization and interpretation , 2017 .

[14]  Susan Muller,et al.  Muscle Invasion in Oral Tongue Squamous Cell Carcinoma as a Predictor of Nodal Status and Local Recurrence: Just as Effective as Depth of Invasion? , 2011, Head and neck pathology.

[15]  Tianxi Cai,et al.  Pathway aggregation for survival prediction via multiple kernel learning , 2018, Statistics in medicine.

[16]  S. Rorive,et al.  SOX2 controls tumour initiation and cancer stem-cell functions in squamous-cell carcinoma , 2014, Nature.

[17]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[18]  Linxi Chen,et al.  The diversified function and potential therapy of ectopic olfactory receptors in non‐olfactory tissues , 2018, Journal of cellular physiology.

[19]  Thomas Gärtner,et al.  Multi-Instance Kernels , 2002, ICML.

[20]  Ron Shamir,et al.  Multi-omic and multi-view clustering algorithms: review and cancer benchmark , 2018 .

[21]  L. Cantley,et al.  New insights into tumor suppression: PTEN suppresses tumor formation by restraining the phosphoinositide 3-kinase/AKT pathway. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[22]  C. Sander,et al.  Integrative Subtype Discovery in Glioblastoma Using iCluster , 2012, PloS one.

[23]  Christina Backes,et al.  Multi-omics enrichment analysis using the GeneTrail2 web service , 2016, Bioinform..

[24]  Mehmet Gönen,et al.  Discriminating early- and late-stage cancers using multiple kernel learning on gene sets , 2018, Bioinform..

[25]  Mehmet Gönen,et al.  Localized Data Fusion for Kernel k-Means Clustering with Application to Cancer Biology , 2014, NIPS.

[26]  Hal Daumé,et al.  Co-regularized Multi-view Spectral Clustering , 2011, NIPS.

[27]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[28]  Yung-Yu Chuang,et al.  Multiple Kernel Fuzzy Clustering , 2012, IEEE Transactions on Fuzzy Systems.

[29]  Luc De Raedt,et al.  Simultaneous discovery of cancer subtypes and subtype features by molecular data integration , 2016, Bioinform..

[30]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .