Hybrid Minimal Spanning Tree and Mixture of Gaussians Based Clustering Algorithm

Clustering is an important tool to explore the hidden structure of large databases. There are several algorithms based on different approaches (hierarchical, partitional, density-based, model-based, etc.). Most of these algorithms have some discrepancies, e.g. they are not able to detect clusters with convex shapes, the number of the clusters should be a priori known, they suffer from numerical problems, like sensitiveness to the initialization, etc. In this paper we introduce a new clustering algorithm based on the sinergistic combination of the hierarchial and graph theoretic minimal spanning tree based clustering and the partitional Gaussian mixture model-based clustering algorithms. The aim of this hybridization is to increase the robustness and consistency of the clustering results and to decrease the number of the heuristically defined parameters of these algorithms to decrease the influence of the user on the clustering results. As the examples used for the illustration of the operation of the new algorithm will show, the proposed algorithm can detect clusters from data with arbitrary shape and does not suffer from the numerical problems of the Gaussian mixture based clustering algorithms.

[1]  Vijay V. Raghavan,et al.  A Comparison of the Stability Characteristics of Some Graph Theoretic Clustering Methods , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[3]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[4]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Calvin C. Gotlieb,et al.  Semantic Clustering of Index Terms , 1968, J. ACM.

[7]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[8]  R. Prim Shortest connection networks and some generalizations , 1957 .

[9]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[10]  Jeffrey Heer,et al.  Identification of Web User Traffic Composition using Multi-Modal Clustering and Information Scent , 2000 .

[11]  Monica Casale,et al.  Minimum spanning tree: ordering edges to identify clustering structure , 2004 .

[12]  Giovanna Castellano,et al.  A fuzzy clustering approach for mining diagnostic rules , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[13]  Niina Päivinen Clustering with a minimum spanning tree of scale-free-like structure , 2005, Pattern Recognit. Lett..

[14]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[15]  J. Barrow,et al.  Minimal spanning trees, filaments and galaxy clustering , 1985 .

[16]  Richard Simon,et al.  Iterative class discovery and feature selection using Minimal Spanning Trees , 2004, BMC Bioinformatics.

[17]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Jack Minker,et al.  An Analysis of Some Graph Theoretical Cluster Techniques , 1970, JACM.

[19]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[20]  José M. González-Barrios,et al.  A clustering procedure based on the comparison between the k nearest neighbors graph and the minimal spanning tree , 2003 .

[21]  RICHARD C. DUBES,et al.  How many clusters are best? - An experiment , 1987, Pattern Recognit..

[22]  Zohar Yakhini,et al.  Clustering gene expression patterns , 1999, J. Comput. Biol..

[23]  L. Hubert,et al.  A Graph-Theoretic Approach to Goodness-of-Fit in Complete-Link Hierarchical Clustering , 1976 .

[24]  James C. Bezdek,et al.  Validity-guided (re)clustering with applications to image segmentation , 1996, IEEE Trans. Fuzzy Syst..

[25]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .