Efficient Clustering Based On A Unified View Of $K$-means And Ratio-cut

Spectral clustering and k-means, both as two major traditional clustering methods, are still attracting a lot of attention, although a variety of novel clustering algorithms have been proposed in recent years. Firstly, a unified framework of k-means and ratio-cut is revisited, and a novel and efficient clustering algorithm is then proposed based on this framework. The time and space complexity of our method are both linear with respect to the number of samples, and are independent of the number of clusters to construct, more importantly. These properties mean that it is easily scalable and applicable to large practical problems. Extensive experiments on 12 real-world benchmark and 8 facial datasets validate the advantages of the proposed algorithm compared to the state-of-the-art clustering algorithms. In particular, over 15x and 7x speed-up can be obtained with respect to k-means on the synthetic dataset of 1 million samples and the benchmark dataset (CelebA) of 200k samples, respectively [GitHub].

[1]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[2]  James T. Kwok,et al.  Making Large-Scale Nyström Approximation Possible , 2010, ICML.

[3]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[4]  R. Janani,et al.  Text document clustering using Spectral Clustering algorithm with Particle Swarm Optimization , 2019, Expert Syst. Appl..

[5]  Feiping Nie,et al.  Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach , 2012, NIPS.

[6]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[7]  Shengcai Liao,et al.  Learning Face Representation from Scratch , 2014, ArXiv.

[8]  Amnon Shashua,et al.  Doubly Stochastic Normalization for Spectral Clustering , 2006, NIPS.

[9]  Feiping Nie,et al.  Scalable Normalized Cut with Improved Spectral Rotation , 2017, IJCAI.

[10]  Chu-Song Chen,et al.  Face Recognition and Retrieval Using Cross-Age Reference Coding With Cross-Age Celebrity Dataset , 2015, IEEE Transactions on Multimedia.

[11]  Li He,et al.  Kernel K-Means Sampling for Nyström Approximation , 2018, IEEE Transactions on Image Processing.

[12]  Feiping Nie,et al.  Structured Doubly Stochastic Matrix for Graph Based Clustering: Structured Doubly Stochastic Matrix , 2016, KDD.

[13]  Sanja Fidler,et al.  Video Face Clustering With Unknown Number of Clusters , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Doheon Lee,et al.  Evaluation of the performance of clustering algorithms in kernel-induced feature space , 2005, Pattern Recognit..

[15]  Weihong Deng,et al.  Cross-Pose LFW : A Database for Studying Cross-Pose Face Recognition in Unconstrained Environments , 2018 .

[16]  Farhad Pourkamali-Anaraki,et al.  Large-Scale Sparse Subspace Clustering Using Landmarks , 2019, 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP).

[17]  Rong Jin,et al.  Approximate kernel k-means: solution to large scale kernel clustering , 2011, KDD.

[18]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Peter J. Mucha,et al.  Social clustering in epidemic spread on coevolving networks , 2017, Physical review. E.

[20]  Feng Jiang,et al.  Brain Image Segmentation Based on FCM Clustering Algorithm and Rough Set , 2019, IEEE Access.

[21]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[22]  Shusen Wang,et al.  Scalable Kernel K-Means Clustering with Nystrom Approximation: Relative-Error Bounds , 2017, J. Mach. Learn. Res..

[23]  Carlos D. Castillo,et al.  Deep Density Clustering of Unconstrained Faces , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  M. Saquib Sarfraz,et al.  Efficient Parameter-Free Clustering Using First Neighbor Relations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Weihong Deng,et al.  Cross-Age LFW: A Database for Studying Cross-Age Face Recognition in Unconstrained Environments , 2017, ArXiv.

[26]  Li Chen,et al.  Fast Kernel k-means Clustering Using Incomplete Cholesky Factorization , 2020, ArXiv.

[27]  Ivor W. Tsang,et al.  Spectral Embedded Clustering: A Framework for In-Sample and Out-of-Sample Spectral Clustering , 2011, IEEE Transactions on Neural Networks.

[28]  Rong Wang,et al.  Fast Spectral Clustering With Anchor Graph for Large Hyperspectral Images , 2017, IEEE Geoscience and Remote Sensing Letters.

[29]  Andreas Loukas,et al.  Approximating Spectral Clustering via Sampling: a Review , 2019, Sampling Techniques for Supervised or Unsupervised Tasks.

[30]  Guangliang Chen,et al.  Scalable spectral clustering with cosine similarity , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[31]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[32]  Luisa Krawczyk,et al.  Comparing applicability of prevalent Clustering Algorithms for Document Clustering , 2019 .

[33]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[34]  Ruixuan Li,et al.  EADP: An extended adaptive density peaks clustering for overlapping community detection in social networks , 2019, Neurocomputing.

[35]  Andrew W. Moore,et al.  Accelerating exact k-means algorithms with geometric reasoning , 1999, KDD '99.

[36]  Feiping Nie,et al.  Spectral Clustering of Large-scale Data by Directly Solving Normalized Cut , 2018, KDD.

[37]  Shengjin Wang,et al.  Linkage Based Face Clustering via Graph Convolution Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Xinlei Chen,et al.  Large Scale Spectral Clustering Via Landmark-Based Sparse Representation , 2015, IEEE Transactions on Cybernetics.

[39]  Feiping Nie,et al.  Clustering and projected clustering with adaptive neighbors , 2014, KDD.

[40]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[41]  Harry Wechsler,et al.  The FERET database and evaluation procedure for face-recognition algorithms , 1998, Image Vis. Comput..

[42]  Feiping Nie,et al.  Semi-supervised orthogonal discriminant analysis via label propagation , 2009, Pattern Recognit..

[43]  Feiping Nie,et al.  K-Multiple-Means: A Multiple-Means Clustering Method with Specified K Clusters , 2019, KDD.

[44]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Anil K. Jain,et al.  Clustering Millions of Faces by Identity , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Rashi Jain,et al.  Image Segmentation Through Fuzzy Clustering: A Survey , 2018, Harmony Search and Nature Inspired Optimization Algorithms.

[47]  Brendan J. Frey,et al.  Non-metric affinity propagation for unsupervised image categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[48]  Tal Hassner,et al.  Effective face frontalization in unconstrained images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Rong Wang,et al.  Scalable Graph-Based Clustering With Nonnegative Relaxation for Large Hyperspectral Image , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[50]  Andreas Krause,et al.  Approximate K-Means++ in Sublinear Time , 2016, AAAI.

[51]  Tieniu Tan,et al.  A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[52]  M. Saquib Sarfraz,et al.  Video Face Clustering With Self-Supervised Representation Learning , 2020, IEEE Transactions on Biometrics, Behavior, and Identity Science.

[53]  Monika Sharma,et al.  A Review of K-mean Algorithm , 2013 .

[54]  Andreas Krause,et al.  Fast and Provably Good Seedings for k-Means , 2016, NIPS.

[55]  Qi Wang,et al.  Fast Spectral Clustering for Unsupervised Hyperspectral Image Classification , 2019, Remote. Sens..

[56]  Deng Cai,et al.  EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph , 2016, ArXiv.