Revealing Density-Based Clustering Structure from the Core-Connected Tree of a Network

Clustering is an important technique for mining the intrinsic community structures in networks. The density-based network clustering method is able to not only detect communities of arbitrary size and shape, but also identify hubs and outliers. However, it requires manual parameter specification to define clusters, and is sensitive to the parameter of density threshold which is difficult to determine. Furthermore, many real-world networks exhibit a hierarchical structure with communities embedded within other communities. Therefore, the clustering result of a global parameter setting cannot always describe the intrinsic clustering structure accurately. In this paper, we introduce a novel density-based network clustering method, called graph-skeleton-based clustering (gSkeletonClu). By projecting an undirected network to its core-connected maximal spanning tree, the clustering problem can be converted to detect core connectivity components on the tree. The density-based clustering of a specific parameter setting and the hierarchical clustering structure both can be efficiently extracted from the tree. Moreover, it provides a convenient way to automatically select the parameter and to achieve the meaningful cluster tree in a network. Extensive experiments on both real-world and synthetic networks demonstrate the superior performance of gSkeletonClu for effective and efficient density-based clustering.

[1]  Y Xu,et al.  Minimum spanning trees for gene expression data clustering. , 2001, Genome informatics. International Conference on Genome Informatics.

[2]  Ronald L. Graham,et al.  On the History of the Minimum Spanning Tree Problem , 1985, Annals of the History of Computing.

[3]  Michalis Vazirgiannis,et al.  Clustering validity checking methods: part II , 2002, SGMD.

[4]  Hawoong Jeong,et al.  Scale-free trees: the skeletons of complex networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Zhiyong Lu,et al.  Automatic Extraction of Clusters from Hierarchical Clustering Representations , 2003, PAKDD.

[6]  Ying Xu,et al.  Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees , 2002, Bioinform..

[7]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[8]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[10]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[12]  Alex Arenas,et al.  Synchronization reveals topological scales in complex networks. , 2006, Physical review letters.

[13]  Jörg Sander,et al.  Semi-supervised Density-Based Clustering , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[14]  Xiaowei Xu,et al.  A Novel Similarity-Based Modularity Function for Graph Partitioning , 2007, DaWaK.

[15]  Jiawei Han,et al.  gSkeletonClu: Density-Based Network Clustering via Structure-Connected Tree Division or Agglomeration , 2010, 2010 IEEE International Conference on Data Mining.

[16]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Yizhou Sun,et al.  SHRINK: a structural clustering algorithm for detecting hierarchical communities in networks , 2010, CIKM.

[18]  Padhraic Smyth,et al.  A Spectral Clustering Approach To Finding Communities in Graph , 2005, SDM.

[19]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[21]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[23]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[25]  Jiawei Han,et al.  Progressive clustering of networks using Structure-Connected Order of Traversal , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[26]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[27]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[28]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[29]  Hui Xiong,et al.  Understanding of Internal Clustering Validation Measures , 2010, 2010 IEEE International Conference on Data Mining.

[30]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[31]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[32]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  M. Newman,et al.  Vertex similarity in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.

[35]  Xiaofeng Wang,et al.  A Novel Density-Based Clustering Framework by Using Level Set Method , 2009, IEEE Transactions on Knowledge and Data Engineering.