A Fast Spectral Clustering Method Based on Growing Vector Quantization for Large Data Sets

Spectral clustering is a flexible clustering algorithm that can produce high-quality clusters on small scale data sets, but it is limited applicable to large scale data sets because it needs On 3 computational operations to process a data set of n data points[1]. Based on the minimization of the increment of distortion, we tackle this problem by developing a novel efficient growing vector quantization method to preprocess a large scale data set, which can compress the original data set into a small set of representative data points in one scan of the original data set. Then we apply spectral clustering algorithm to the small set. Experiments on real data sets show that our method provides fast and accurate clustering results.

[1]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Sanjoy Dasgupta,et al.  Random projection trees for vector quantization , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[3]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[4]  Gilles Pagès,et al.  Intrinsic Stationarity for Vector Quantization: Foundation of Dual Quantization , 2010, SIAM J. Numer. Anal..

[5]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[6]  Zhaohong Deng,et al.  Robust fuzzy clustering neural network based on epsilon-insensitive loss function , 2007, Appl. Soft Comput..

[7]  Nebojsa Jojic,et al.  Active spectral clustering via iterative uncertainty reduction , 2012, KDD.

[8]  Hui Xiong,et al.  SAIL: summation-based incremental learning for information-theoretic clustering , 2008, KDD.

[9]  William Equitz,et al.  A new vector quantization clustering algorithm , 1989, IEEE Trans. Acoust. Speech Signal Process..

[10]  Zhaohong Deng,et al.  Robust maximum entropy clustering algorithm with its labeling for outliers , 2006, Soft Comput..

[11]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[12]  Thomas Brox,et al.  Higher order motion models and spectral clustering , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Thomas Villmann,et al.  Functional relevance learning in generalized learning vector quantization , 2012, Neurocomputing.

[14]  Ling Huang,et al.  Fast approximate spectral clustering , 2009, KDD.

[15]  Pengjiang Qian,et al.  Fast Graph-Based Relaxed Clustering for Large Data Sets Using Minimal Enclosing Ball , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[17]  Zhaohong Deng,et al.  Enhanced soft subspace clustering integrating within-cluster and between-cluster information , 2010, Pattern Recognit..

[18]  Chi-Hoon Lee,et al.  Clustering high dimensional data: A graph-based relaxed optimization approach , 2008, Inf. Sci..

[19]  A. Vasuki,et al.  A review of vector quantization techniques , 2006, IEEE Potentials.

[20]  Junjie Wu,et al.  Towards information-theoretic K-means clustering for image indexing , 2013, Signal Process..