Multiple Kernel Clustering

Maximum margin clustering (MMC) has recently attracted considerable interests in both the data mining and machine learning communities. It first projects data samples to a kernel-induced feature space and then performs clustering by finding the maximum margin hyperplane over all possible cluster labelings. As in other kernel methods, choosing a suitable kernel function is imperative to the success of maximum margin clustering. In this paper, we propose a multiple kernel clustering (MKC) algorithm that simultaneously finds the maximum margin hyperplane, the best cluster labeling, and the optimal kernel. Moreover, we provide detailed analysis on the time complexity of the MKC algorithm and also extend multiple kernel clustering to the multi-class scenario. Experimental results on both toy and real-world data sets demonstrate the effectiveness and efficiency of the MKC algorithm.

[1]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[2]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[3]  Fei Wang,et al.  Efficient multiclass maximum margin clustering , 2008, ICML '08.

[4]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[5]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[6]  Fei Wang,et al.  Efficient Maximum Margin Clustering via Cutting Plane Algorithm , 2008, SDM.

[7]  Stephen P. Boyd,et al.  Applications of second-order cone programming , 1998 .

[8]  Ivor W. Tsang,et al.  Maximum Margin Clustering Made Practical , 2009, IEEE Trans. Neural Networks.

[9]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[10]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[11]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[12]  Ivor W. Tsang,et al.  Efficient hyperkernel learning using second-order cone programming , 2004, IEEE Transactions on Neural Networks.

[14]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[15]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[16]  David G. Stork,et al.  Pattern Classification , 1973 .

[17]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[18]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[19]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[20]  Dale Schuurmans,et al.  Unsupervised and Semi-Supervised Multi-Class Support Vector Machines , 2005, AAAI.

[21]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[22]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Xin Yao,et al.  Boosting Kernel Models for Regression , 2006, Sixth International Conference on Data Mining (ICDM'06).

[24]  Martine D. F. Schlag,et al.  Spectral K-way ratio-cut partitioning and clustering , 1994, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[25]  Thomas Hofmann,et al.  Kernel Methods for Missing Variables , 2005, AISTATS.

[26]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[27]  Rong Jin,et al.  Generalized Maximum Margin Clustering and Unsupervised Kernel Learning , 2006, NIPS.

[28]  Olivier Bousquet,et al.  On the Complexity of Learning the Kernel Matrix , 2002, NIPS.

[29]  Ethem Alpaydin,et al.  Localized multiple kernel learning , 2008, ICML '08.

[30]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[31]  Jieping Ye,et al.  Learning the kernel matrix in discriminant analysis via quadratically constrained quadratic programming , 2007, KDD '07.

[32]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[33]  J. E. Kelley,et al.  The Cutting-Plane Method for Solving Convex Programs , 1960 .

[34]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[35]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.