Localized Data Fusion for Kernel k-Means Clustering with Application to Cancer Biology

In many modern applications from, for example, bioinformatics and computer vision, samples have multiple feature representations coming from different data sources. Multiview learning algorithms try to exploit all these available information to obtain a better learner in such scenarios. In this paper, we propose a novel multiple kernel learning algorithm that extends kernel k-means clustering to the multiview setting, which combines kernels calculated on the views in a localized way to better capture sample-specific characteristics of the data. We demonstrate the better performance of our localized data fusion approach on a human colon and rectal cancer data set by clustering patients. Our method finds more relevant prognostic patient groups than global data fusion methods when we evaluate the results with respect to three commonly used clinical biomarkers.

[1]  C. Sander,et al.  Integrative Subtype Discovery in Glioblastoma Using iCluster , 2012, PloS one.

[2]  Jieping Ye,et al.  Nonlinear adaptive distance metric learning for clustering , 2007, KDD '07.

[3]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[4]  Joachim M. Buhmann,et al.  Fusion of Similarity Data in Clustering , 2005, NIPS.

[5]  Rong Jin,et al.  Generalized Maximum Margin Clustering and Unsupervised Kernel Learning , 2006, NIPS.

[6]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[7]  Colin Fyfe,et al.  Kernel and Nonlinear Canonical Correlation Analysis , 2000, IJCNN.

[8]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[9]  Bernhard Schölkopf,et al.  Support Vector Machine Applications in Computational Biology , 2004 .

[10]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[11]  Florian Markowetz,et al.  Patient-Specific Data Fusion Defines Prognostic Cancer Subtypes , 2011, PLoS Comput. Biol..

[12]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[13]  Ethem Alpaydin,et al.  Localized algorithms for multiple kernel learning , 2013, Pattern Recognit..

[14]  Wei Tang,et al.  Clustering with Multiple Graphs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[15]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[16]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of human colon and rectal cancer , 2012, Nature.

[17]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[18]  William Stafiord Noble,et al.  Support vector machine applications in computational biology , 2004 .

[19]  Hal Daumé,et al.  Co-regularized Multi-view Spectral Clustering , 2011, NIPS.

[20]  Christoph H. Lampert,et al.  Correlational spectral clustering , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[22]  C. Sander,et al.  Pattern discovery and cancer gene identification in integrated cancer genomic data , 2013, Proceedings of the National Academy of Sciences.

[23]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[24]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[25]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[26]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[27]  Johan A. K. Suykens,et al.  Optimized Data Fusion for Kernel k-Means Clustering , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Yung-Yu Chuang,et al.  Affinity aggregation for spectral clustering , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.