Compressed constrained spectral clustering framework for large-scale data sets

Abstract The method of incorporating constraint information into spectral clustering, i.e., \constrained spectral clustering (CSC), can greatly improve clustering accuracy, and thus has been widely employed in the machine learning literature. In this paper, we propose a compressed CSC framework by combining specific graph constructions with a recently introduced CSC model. Particularly, our framework has ability to avoid losing the main partition information in the compression process. By presenting a theoretical analysis and empirical results, we demonstrate that our new framework can achieve the same clustering solution as that of the original model with the specific graph structure. In addition, because our framework utilizes landmark-based graph construction and the approximate matrix decomposition simultaneously, it can be applied to both feature and graph data in a more general way. Moreover, the parameter setting in our framework is rather simple, and therefore it is very practical. Experimental results indicate that our framework has advantages in terms of efficiency and effectiveness.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  Xinlei Chen,et al.  Large Scale Spectral Clustering with Landmark-Based Representation , 2011, AAAI.

[3]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[4]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[5]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[6]  Jane You,et al.  Adaptive Fuzzy Consensus Clustering Framework for Clustering Analysis of Cancer Data , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[8]  Dan Klein,et al.  Spectral Learning , 2003, IJCAI.

[9]  Gary L. Miller,et al.  Simple and Scalable Constrained Clustering: a Generalized Spectral Method , 2016, AISTATS.

[10]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[11]  Yao Wang,et al.  LED: A fast overlapping communities detection algorithm based on structural clustering , 2016, Neurocomputing.

[12]  Zhiwu Lu,et al.  Constrained Spectral Clustering via Exhaustive and Efficient Constraint Propagation , 2010, ECCV.

[13]  Zhili Zhou,et al.  Fast and accurate near-duplicate image elimination for visual sensor networks , 2017, Int. J. Distributed Sens. Networks.

[14]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[15]  Yuhui Zheng,et al.  Image segmentation by generalized hierarchical fuzzy C-means algorithm , 2015, J. Intell. Fuzzy Syst..

[16]  Zhenguo Li,et al.  Constrained clustering via spectral regularization , 2009, CVPR.

[17]  Huy L. Nguyen,et al.  OSNAP: Faster Numerical Linear Algebra Algorithms via Sparser Subspace Embeddings , 2012, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[18]  Michael W. Mahoney,et al.  Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression , 2012, STOC '13.

[19]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[20]  Ian Davidson,et al.  On constrained spectral clustering and its applications , 2012, Data Mining and Knowledge Discovery.

[21]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Charu C. Aggarwal,et al.  Data Clustering , 2013 .

[23]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Yingjie Xia,et al.  Scalable Constrained Spectral Clustering , 2015, IEEE Transactions on Knowledge and Data Engineering.

[25]  Miguel Á. Carreira-Perpiñán,et al.  Constrained spectral clustering through affinity propagation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[27]  Xinlei Chen,et al.  Large Scale Spectral Clustering Via Landmark-Based Sparse Representation , 2015, IEEE Transactions on Cybernetics.

[28]  James T. Kwok,et al.  Large-Scale Nyström Kernel Matrix Approximation Using Randomized SVD , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Tinghuai Ma,et al.  A novel subgraph K+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K^{+}$$\end{document}-isomorphism method in social , 2017, Soft Computing.

[30]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2013, STOC '13.