Fast Minimum Spanning Tree Based Clustering Algorithms on Local Neighborhood Graph

Minimum spanning tree (MST) based clustering algorithms have been employed successfully to detect clusters of heterogeneous nature. Given a dataset of n random points, most of the MST-based clustering algorithms first generate a complete graph G of the dataset and then construct MST from G. The first step of the algorithm is the major bottleneck which takes O(n 2) time. This paper proposes two algorithms namely MST-based clustering on K-means Graph and MST-based clustering on Bi-means Graph for reducing the computational overhead. The proposed algorithms make use of a centroid based nearest neighbor rule to generate a partition-based Local Neighborhood Graph (LNG). We prove that both the size and the computational time to construct the graph (LNG) is O(n 3/2), which is a \(O(\sqrt n)\) factor improvement over the traditional algorithms. The approximate MST is constructed from LNG in \(O(n^{3/2} \lg n)\) time, which is asymptotically faster than O(n 2). The advantage of the proposed algorithms is that they do not require any parameter setting which is a major issue in many of the nearest neighbor finding algorithms. Experimental results demonstrate that the computational time has been reduced significantly by maintaining the quality of the clusters obtained from the MST.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[3]  Xia Li Wang,et al.  Enhancing minimum spanning tree-based clustering by removing density-based outliers , 2013, Digit. Signal Process..

[4]  Pasi Fränti,et al.  Minimum spanning tree based split-and-merge: A hierarchical clustering method , 2011, Inf. Sci..

[5]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[6]  Y Xu,et al.  Minimum spanning trees for gene expression data clustering. , 2001, Genome informatics. International Conference on Genome Informatics.

[7]  D. Mitchell Wilkes,et al.  A Divide-and-Conquer Approach for Minimum Spanning Tree-Based Clustering , 2009, IEEE Transactions on Knowledge and Data Engineering.

[8]  Shuicheng Yan,et al.  Robust Graph Mode Seeking by Graph Shift , 2010, ICML.

[9]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[10]  Ting Luo,et al.  A Neighborhood Density Estimation Clustering Algorithm Based on Minimum Spanning Tree , 2010, RSKT.

[11]  Pasi Fränti,et al.  Fast Approximate Minimum Spanning Tree Algorithm Based on K-Means , 2013, CAIP.

[12]  Shuicheng Yan,et al.  Learning With $\ell ^{1}$-Graph for Image Analysis , 2010, IEEE Transactions on Image Processing.

[13]  Yves Lechevallier,et al.  DIVCLUS-T: A monothetic divisive hierarchical clustering method , 2007, Comput. Stat. Data Anal..

[14]  Xinquan Chen Clustering based on a near neighbor graph and a grid cell graph , 2013, Journal of Intelligent Information Systems.