Topology-Based Clustering Using Polar Self-Organizing Map

Cluster analysis of unlabeled data sets has been recognized as a key research topic in varieties of fields. In many practical cases, no a priori knowledge is specified, for example, the number of clusters is unknown. In this paper, grid clustering based on the polar self-organizing map (PolSOM) is developed to automatically identify the optimal number of partitions. The data topology consisting of both the distance and density is exploited in the grid clustering. The proposed clustering method also provides a visual representation as PolSOM allows the characteristics of clusters to be presented as a 2-D polar map in terms of the data feature and value. Experimental studies on synthetic and real data sets demonstrate that the proposed algorithm provides higher clustering accuracy and lower computational cost compared with six conventional methods.

[1]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[4]  Wolfgang Rosenstiel,et al.  Automatic Cluster Detection in Kohonen's SOM , 2008, IEEE Transactions on Neural Networks.

[5]  Joydeep Ghosh,et al.  Automated Hierarchical Density Shaving: A Robust Automated Clustering and Visualization Framework for Large Biological Data Sets , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Yee Whye Teh,et al.  Bayesian Agglomerative Clustering with Coalescents , 2007, NIPS.

[7]  Rui Xu,et al.  Clustering Algorithms in Biomedical Research: A Review , 2010, IEEE Reviews in Biomedical Engineering.

[8]  Tommy W. S. Chow,et al.  PolSOM: A new method for multidimensional data visualization , 2010, Pattern Recognit..

[9]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[10]  V. Estivill-Castro,et al.  A Fast and Robust General Purpose Clustering Algorithm , 2000 .

[11]  Kilian Stoffel,et al.  Parallel k/h-Means Clustering for Large Data Sets , 1999, Euro-Par.

[12]  James C. Bezdek,et al.  Numerical convergence and interpretation of the fuzzy c-shells clustering algorithm , 1992, IEEE Trans. Neural Networks.

[13]  Patrick K. Simpson,et al.  Fuzzy min-max neural networks. I. Classification , 1992, IEEE Trans. Neural Networks.

[14]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[15]  Erzsébet Merényi,et al.  A Validity Index for Prototype-Based Clustering of Data Sets With Complex Cluster Structures , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  Michalis Vazirgiannis,et al.  A density-based cluster validity approach using multi-representatives , 2008, Pattern Recognit. Lett..

[17]  Swagatam Das,et al.  Automatic Clustering Using an Improved Differential Evolution Algorithm , 2007 .

[18]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[19]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[20]  Thomas Villmann,et al.  Exploratory Observation Machine (XOM) with Kullback-Leibler Divergence for Dimensionality Reduction and Visualization , 2010, ESANN.

[21]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[22]  Jennie Si,et al.  Dynamic topology representing networks , 2000, Neural Networks.

[23]  Chung-Chian Hsu,et al.  Visualized Analysis of Mixed Numeric and Categorical Data Via Extended Self-Organizing Map , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Ronald R. Yager Intelligent control of the hierarchical agglomerative clustering process , 2000, IEEE Trans. Syst. Man Cybern. Part B.

[25]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[26]  Kadim Tasdemir,et al.  Topology-Based Hierarchical Clustering of Self-Organizing Maps , 2011, IEEE Transactions on Neural Networks.

[27]  B. Jaumard,et al.  Cluster Analysis and Mathematical Programming , 2003 .

[28]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[29]  Erzsébet Merényi,et al.  Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps , 2009, IEEE Transactions on Neural Networks.

[30]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[31]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[32]  Amnon Shashua,et al.  A unifying approach to hard and probabilistic clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[33]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[34]  Margaret J. Eppstein,et al.  Data-Driven Cluster Reinforcement and Visualization in Sparsely-Matched Self-Organizing Maps , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[36]  Patrick K. Simpson,et al.  Fuzzy min-max neural networks - Part 2: Clustering , 1993, IEEE Trans. Fuzzy Syst..

[37]  Melody Y. Kiang,et al.  Extending the Kohonen self-organizing map networks for clustering analysis , 2002 .

[38]  Thomas Martinetz,et al.  'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[39]  Zhaoshui He,et al.  Symmetric Nonnegative Matrix Factorization: Algorithms and Applications to Probabilistic Clustering , 2011, IEEE Transactions on Neural Networks.

[40]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[41]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[42]  Vladimir Estivill-Castro,et al.  Fast and Robust General Purpose Clustering Algorithms , 2000, Data Mining and Knowledge Discovery.

[43]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[44]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[45]  Kyuwan Choi,et al.  Detecting the Number of Clusters in n-Way Probabilistic Clustering , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Tommy W. S. Chow,et al.  Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density , 2004, Pattern Recognit..

[47]  Haibo He,et al.  SOMKE: Kernel Density Estimation Over Data Streams by Sequences of Self-Organizing Maps , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[48]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[49]  Stephen Grossberg,et al.  The ART of adaptive pattern recognition by a self-organizing neural network , 1988, Computer.

[50]  Ana L. N. Fred,et al.  A New Cluster Isolation Criterion Based on Dissimilarity Increments , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Tommy W. S. Chow,et al.  A new shifting grid clustering algorithm , 2004, Pattern Recognit..

[52]  Xiaofeng Wang,et al.  A Novel Density-Based Clustering Framework by Using Level Set Method , 2009, IEEE Transactions on Knowledge and Data Engineering.

[53]  Katherine A. Heller,et al.  Bayesian hierarchical clustering , 2005, ICML.