Topology-Based Hierarchical Clustering of

A powerful method in the analysis of datasets where there are many natural clusters with varying statistics such as different sizes, shapes, density distribution, overlaps, etc., is the use of self-organizing maps (SOMs). However, further processing tools, such as visualization and interactive clustering, are often necessary to capture the clusters from the learned SOM knowledge. A recent visualization scheme (CONNvis) and its interactive clustering utilize the data topology for SOM knowledge representation by using a connectivity matrix (a weighted Delaunay graph), CONN. In this paper, we propose an automated clustering method for SOMs, which is a hierarchical agglomerative clustering of CONN. We determine the number of clusters either by using cluster validity indices or by prior knowledge on the datasets. We show that, for the datasets used in this paper, data-topology-based hierarchical clustering can produce better partitioning than hierarchical clustering based solely on distance information.

[1]  Wolfgang Rosenstiel,et al.  Automatic Cluster Detection in Kohonen's SOM , 2008, IEEE Transactions on Neural Networks.

[2]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[3]  Ujjwal Maulik,et al.  Validity index for crisp and fuzzy clusters , 2004, Pattern Recognit..

[4]  Kadim Tasdemir Graph Based Representations of Density Distribution and Distances for Self-Organizing Maps , 2010, IEEE Transactions on Neural Networks.

[5]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  José Alfredo Ferreira Costa,et al.  Data Clustering using Self-Organizing Maps segmented by Mathematic Morphology and Simplified Cluster Validity Indexes: an application in remotely sensed images , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[7]  Juha Vesanto,et al.  SOM-based data visualization methods , 1999, Intell. Data Anal..

[8]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[9]  Lin-Yu Tseng,et al.  A genetic approach to the automatic clustering problem , 2001, Pattern Recognit..

[10]  H. Akaike A new look at the statistical model identification , 1974 .

[11]  Katherine A. Heller,et al.  Bayesian hierarchical clustering , 2005, ICML.

[12]  E. Merényi,et al.  A new cluster validity index for prototype based clustering algorithms based on inter- and intra-cluster density , 2007, 2007 International Joint Conference on Neural Networks.

[13]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[14]  Michalis Vazirgiannis,et al.  A density-based cluster validity approach using multi-representatives , 2008, Pattern Recognit. Lett..

[15]  B. Csatho,et al.  Knowledge discovery in urban environments from fused multi-dimensional imagery , 2007, 2007 Urban Remote Sensing Joint Event.

[16]  Thomas Martinetz,et al.  Topology representing networks , 1994, Neural Networks.

[17]  Fouad Badran,et al.  Hierarchical clustering of self-organizing maps for cloud classification , 2000, Neurocomputing.

[18]  Fionn Murtagh,et al.  Interpreting the Kohonen self-organizing feature map using contiguity-constrained clustering , 1995, Pattern Recognit. Lett..

[19]  A. Ultsch Maps for the Visualization of high-dimensional Data Spaces , 2003 .

[20]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[21]  Erzsébet Merényi,et al.  Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps , 2009, IEEE Transactions on Neural Networks.

[22]  Erzsébet Merényi,et al.  Intelligent information extraction to aid science decision making in autonomous space exploration , 2008, SPIE Defense + Commercial Sensing.

[23]  Tommy W. S. Chow,et al.  Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density , 2004, Pattern Recognit..

[24]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[25]  Ricardo J. G. B. Campello,et al.  A Robust Methodology for Comparing Performances of Clustering Validity Criteria , 2008, SBIA.

[26]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[27]  Fabrice Rossi,et al.  Topologically Ordered Graph Clustering via Deterministic Annealing , 2009, ESANN.

[28]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[29]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[30]  Patrick Rousset,et al.  The Kohonen Algorithm: A Powerful Tool for Analyzing and Representing Multidimensional Quantitative and Qualitative Data , 1997, IWANN.

[31]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[32]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[33]  M. Netto,et al.  An unsupervised method of classifying remotely sensed images using Kohonen self‐organizing maps and agglomerative hierarchical clustering methods , 2008 .