t-Distributed stochastic neighbor embedding spectral clustering

This paper introduces a new topological clustering approach to cluster high dimensional datasets based on t-SNE (Stochastic Neighbor Embedding) dimensionality reduction method and spectral clustering. Spectral clustering method needs to construct an adjacency matrix and calculate the eigen-decomposition of the corresponding Laplacian matrix [1] which are computational expensive and is not easy to apply on large-scale data sets. One of the issue of this problem is to reduce the dimensionality befor to cluster the dataset. The t-SNE method which performs good results for visulaization allows a projection of the dataset in low dimensional spaces that make it easy to use for very large datasets. Using t-SNE during the learning process will allow to reduce the dimensionality and to preserve the topology of the dataset by increasing the clustering accuracy. We illustrate the power of this method with several real datasets. The results show a good quality of clustering results and a higher speed.

[1]  Seiichi Ozawa,et al.  t-Distributed Stochastic Neighbor Embedding with Inhomogeneous Degrees of Freedom , 2016, ICONIP.

[2]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[3]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[4]  Yunming Ye,et al.  Fuzzy K-Means with Variable Weighting in High Dimensional Data Analysis , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[5]  Xiaogang Wang,et al.  Bi-level clustering of mixed categorical and numerical biomedical data , 2006, Int. J. Data Min. Bioinform..

[6]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[7]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[8]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[9]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[10]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Martine D. F. Schlag,et al.  Spectral K-Way Ratio-Cut Partitioning and Clustering , 1993, 30th ACM/IEEE Design Automation Conference.

[12]  Michael I. Jordan,et al.  Learning Spectral Clustering , 2003, NIPS.

[13]  Mustapha Lebbah,et al.  From variable weighting to cluster characterization in topographic unsupervised learning , 2009, 2009 International Joint Conference on Neural Networks.

[14]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[15]  Shehroz S. Khan,et al.  Computation of Initial Modes for K-modes Clustering Algorithm Using Evidence Accumulation , 2007, IJCAI.