Sparse Kernel Clustering of Massive High-Dimensional Data sets with Large Number of Clusters

In clustering applications involving documents and images, in addition to the large number of data points (N) and their high dimensionality (d), the number of clusters (C) into which the data need to be partitioned is also large. Kernel-based clustering algorithms, which have been shown to perform better than linear clustering algorithms, have high running time complexity in terms of N, d and C. We propose an efficient sparse kernel k-means clustering algorithm, which incrementally samples the most informative points from the data set using importance sampling, and constructs a sparse kernel matrix using these sampled points. Each row in this matrix corresponds to a data point's similarity with its p-nearest neighbors among the sampled points (p -- N). This sparse kernel matrix is used to perform clustering and obtain the cluster labels. This combination of sampling and sparsity reduces both the running time and memory complexity of kernel clustering. In order to further enhance its efficiency, the proposed algorithm projects the data on to the top C eigenvectors of the sparse kernel matrix and clusters these eigenvectors using a modified k-means algorithm. The running time of the proposed sparse kernel k-means algorithm is linear in N and d, and logarithmic in C. We show analytically that only a small number of points need to be sampled from the data set, and the resulting approximation error is well-bounded. We demonstrate, using several large high-dimensional text and image data sets, that the proposed algorithm is significantly faster than classical kernel-based clustering algorithms, while maintaining clustering quality.

[1]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  Frank-Michael Schleif,et al.  Fast approximated relational and kernel clustering , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[3]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[5]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[6]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[7]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[8]  Christos Boutsidis,et al.  An improved approximation algorithm for the column subset selection problem , 2008, SODA.

[9]  M. Brand,et al.  Fast low-rank modifications of the thin singular value decomposition , 2006 .

[10]  Michael W. Berry,et al.  Large-Scale Sparse Singular Value Computations , 1992 .

[11]  Michael W. Mahoney,et al.  Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..

[12]  Richard I. Hartley,et al.  Optimised KD-trees for fast image descriptor matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  S. Smale,et al.  Geometry on Probability Spaces , 2009 .

[14]  Georgios B. Giannakis,et al.  Sketch and Validate for Big Data Clustering , 2015, IEEE Journal of Selected Topics in Signal Processing.

[15]  Ting Liu,et al.  Clustering Billions of Images with Large Scale Nearest Neighbor Search , 2007, 2007 IEEE Workshop on Applications of Computer Vision (WACV '07).

[16]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[17]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[18]  David P. Woodruff,et al.  Fast approximation of matrix coherence and statistical leverage , 2011, ICML.

[19]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[20]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[21]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[22]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[23]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[24]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[26]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[27]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[28]  Bernhard Schölkopf,et al.  A Local Learning Approach for Clustering , 2006, NIPS.

[29]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[30]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[31]  Fakhri Karray,et al.  Embed and Conquer: Scalable Embeddings for Kernel k-Means on MapReduce , 2013, SDM.

[32]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[33]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Jengnan Tzeng,et al.  Split-and-Combine Singular Value Decomposition for Large-Scale Matrix , 2013, J. Appl. Math..

[35]  Bing Liu,et al.  Unsupervised non-parametric kernel learning algorithm , 2013, Knowl. Based Syst..

[36]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[37]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[38]  Christos Boutsidis,et al.  Randomized Dimensionality Reduction for $k$ -Means Clustering , 2011, IEEE Transactions on Information Theory.

[39]  Rong Jin,et al.  Approximate kernel k-means: solution to large scale kernel clustering , 2011, KDD.

[40]  S. Chatterjee,et al.  Influential Observations, High Leverage Points, and Outliers in Linear Regression , 1986 .

[41]  G. Stewart,et al.  Matrix Perturbation Theory , 1990 .