Robust Spectral Clustering via Matrix Aggregation

Spectral clustering has become one of the most popular clustering algorithms in recent years. In real-world clustering problems, the data points for clustering may have considerable noise. To the best of our knowledge, no single clustering algorithm is able to identify all different types of cluster structures. In the existing spectral clustering methods, little effort has been made to explicitly handle both the possibly considerable noise in data points and the robustness of clustering methods, which often degrades the clustering performance. In this paper, motivated by resampling and matrix aggregation, we propose a method for robust spectral clustering. In our method, we first construct multiple transition probability matrices, each is constructed by a subset of randomly selected features. Then, these matrices can be used to recover a shared low-rank similarity matrix, which is the input to the spectral clustering, and several sparse matrices, which represent the noise. The corresponding optimization problem has a low-rank constraint on the transition probability matrix. To solve the corresponding optimization problem, an optimization procedure based on the scheme of Augmented Lagrangian Method of Multipliers is designed. Experimental results on several real-world datasets show that our method has superior performance over several state-of-the-art clustering methods.

[1]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[2]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[3]  Christopher J. C. Burges,et al.  Spectral clustering and transductive learning with multiple views , 2007, ICML '07.

[4]  Xiao-Lei Zhang,et al.  Multilayer bootstrap networks , 2014, Neural Networks.

[5]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[6]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[7]  Stelios Krinidis,et al.  A Robust Fuzzy Local Information C-Means Clustering Algorithm , 2010, IEEE Transactions on Image Processing.

[8]  Yung-Yu Chuang,et al.  Multiple Kernel Fuzzy Clustering , 2012, IEEE Transactions on Fuzzy Systems.

[9]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[10]  Xiao-Lei Zhang Learning the kernel matrix by resampling , 2017, ArXiv.

[11]  Shuicheng Yan,et al.  Efficient Subspace Segmentation via Quadratic Programming , 2011, AAAI.

[12]  Q. M. Jonathan Wu,et al.  Dynamic Fuzzy Clustering and Its Application in Motion Segmentation , 2013, IEEE Transactions on Fuzzy Systems.

[13]  Horng-Lin Shieh A new framework of fuzzy clustering algorithm , 2011, 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011).

[14]  Tao Zhang,et al.  A Fast Generalized Low Rank Representation Framework Based on $L_{2,p}$ Norm Minimization for Subspace Clustering , 2017, IEEE Access.

[15]  Lei Du,et al.  Robust Multi-View Spectral Clustering via Low-Rank and Sparse Decomposition , 2014, AAAI.

[16]  Hal Daumé,et al.  A Co-training Approach for Multi-view Spectral Clustering , 2011, ICML.

[17]  Jinfeng Yi,et al.  Robust Ensemble Clustering by Matrix Completion , 2012, 2012 IEEE 12th International Conference on Data Mining.

[18]  Jian Hou,et al.  A Data-Driven Clustering Approach for Fault Diagnosis , 2017, IEEE Access.

[19]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[20]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Zhi-Quan Luo,et al.  On the linear convergence of the alternating direction method of multipliers , 2012, Mathematical Programming.

[22]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[23]  Jianying Hu,et al.  Statistical methods for automated generation of service engagement staffing plans , 2007, IBM J. Res. Dev..

[24]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[25]  Derek Greene,et al.  Ensemble clustering in medical diagnostics , 2004, Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems.

[26]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[27]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[28]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[29]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[30]  Junjie Wu,et al.  Spectral Ensemble Clustering , 2015, KDD.

[31]  Bin Zhao,et al.  Multiple Kernel Clustering , 2009, SDM.