Stability and Generalization of Kernel Clustering: from Single Kernel to Multiple Kernel

Multiple kernel clustering (MKC) is an important research topic that has been widely studied for decades. However, current methods still face two problems: inefficient when handling out-of-sample data points and lack of theoretical study of the stability and generalization of clustering. In this paper, we propose a novel method that can efficiently compute the embedding of out-of-sample data with a solid generalization guarantee. Specifically, we approximate the eigen functions of the integral operator associated with the linear combination of base kernel functions to construct low-dimensional embeddings of out-of-sample points for efficient multiple kernel clustering. In addition, we, for the first time, theoretically study the stability of clustering algorithms and prove that the single-view version of the proposed method has uniform stability as O (cid:0) Kn − 3 / 2 (cid:1) and establish an upper bound of excess risk as (cid:101) O (cid:0) Kn − 3 / 2 + n − 1 / 2 (cid:1) , where K is the cluster number and n is the number of samples. We then extend the theoretical results to multiple kernel scenarios and find that the stability of MKC depends on kernel weights. As an example, we apply our method to a novel MKC algorithm termed SimpleMKKM and derive the upper bound of its excess clustering risk, which is tighter than the current results. Extensive experimental results validate the effectiveness and efficiency of the proposed method.

[1]  Xuelong Li,et al.  Multiview Clustering: A Scalable and Parameter-Free Bipartite Graph Fusion Method , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Siwei Wang,et al.  Localized Simple Multiple Kernel K-means , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Xinwang Liu Incomplete Multiple Kernel Alignment Maximization for Clustering. , 2021, IEEE transactions on pattern analysis and machine intelligence.

[4]  Hangfeng He,et al.  Toward Better Generalization Bounds with Locally Elastic Stability , 2020, ICML.

[5]  Yang Wang,et al.  SimpleMKKM: Simple Multiple Kernel K-means , 2020, ArXiv.

[6]  Zenglin Xu,et al.  Large-scale Multi-view Subspace Clustering in Linear Time , 2019, AAAI.

[7]  Ling Shao,et al.  Binary Multi-View Clustering , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Ke Wang,et al.  Ranking Distillation: Learning Compact Ranking Models With High Performance for Recommender System , 2018, KDD.

[9]  Siwei Lyu,et al.  Stochastic Proximal Algorithms for AUC Maximization , 2018, ICML.

[10]  Shusen Wang,et al.  Scalable Kernel K-Means Clustering with Nystrom Approximation: Relative-Error Bounds , 2017, J. Mach. Learn. Res..

[11]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Lei Wang,et al.  Multiple Kernel k-Means Clustering with Matrix-Induced Regularization , 2016, AAAI.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Mehmet Gönen,et al.  Localized Data Fusion for Kernel k-Means Clustering with Application to Cancer Biology , 2014, NIPS.

[15]  Jing-Yu Yang,et al.  Multiple kernel clustering based on centered kernel alignment , 2014, Pattern Recognit..

[16]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17]  Rong Jin,et al.  Efficient Kernel Clustering Using Random Fourier Features , 2012, 2012 IEEE 12th International Conference on Data Mining.

[18]  Yung-Yu Chuang,et al.  Multiple Kernel Fuzzy Clustering , 2012, IEEE Transactions on Fuzzy Systems.

[19]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[20]  Mikhail Belkin,et al.  On Learning with Integral Operators , 2010, J. Mach. Learn. Res..

[21]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  B. Nadler Finite sample approximation results for principal component analysis: a matrix perturbation approach , 2009, 0901.3245.

[23]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[24]  Partha Niyogi,et al.  Almost-everywhere Algorithmic Stability and Generalization Error , 2002, UAI.

[25]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[26]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[27]  Mingrui Liu,et al.  Generalization Guarantee of SGD for Pairwise Learning , 2021, NeurIPS.

[28]  Shaojie Li,et al.  Sharper Generalization Bounds for Clustering , 2021, ICML.

[29]  Marius Kloft,et al.  Sharper Generalization Bounds for Pairwise Learning , 2020, NeurIPS.

[30]  Stéphan Clémençon,et al.  On U-processes and clustering performance , 2011, NIPS.

[31]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[32]  Bin Zhao,et al.  Multiple Kernel Clustering , 2009, SDM.

[33]  Hong Chang,et al.  A Kernel Approach for Semisupervised Metric Learning , 2007, IEEE Transactions on Neural Networks.