gSkeletonClu: Density-Based Network Clustering via Structure-Connected Tree Division or Agglomeration

Community detection is an important task for mining the structure and function of complex networks. Many pervious approaches are difficult to detect communities with arbitrary size and shape, and are unable to identify hubs and outliers. A recently proposed network clustering algorithm, SCAN, is effective and can overcome this difficulty. However, it depends on a sensitive parameter: minimum similarity threshold $\varepsilon$, but provides no automated way to find it. In this paper, we propose a novel density-based network clustering algorithm, called gSkeletonClu (graph-skeleton based clustering). By projecting a network to its Core-Connected Maximal Spanning Tree (CCMST), the network clustering problem is converted to finding core-connected components in the CCMST. We discover that all possible values of the parameter $\varepsilon$ lie in the edge weights of the corresponding CCMST. By means of tree divisive or agglomerative clustering, our algorithm can find the optimal parameter $\varepsilon$ and detect communities, hubs and outliers in large-scale undirected networks automatically without any user interaction. Extensive experiments on both real-world and synthetic networks demonstrate the superior performance of gSkeletonClu over the baseline methods.

[1]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[2]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[3]  Padhraic Smyth,et al.  A Spectral Clustering Approach To Finding Communities in Graph , 2005, SDM.

[4]  Michalis Vazirgiannis,et al.  Cluster validity methods: part I , 2002, SGMD.

[5]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  R. Guimerà,et al.  Functional cartography of complex metabolic networks , 2005, Nature.

[7]  Y Xu,et al.  Minimum spanning trees for gene expression data clustering. , 2001, Genome informatics. International Conference on Genome Informatics.

[8]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[10]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[12]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[13]  Jiawei Han,et al.  Progressive clustering of networks using Structure-Connected Order of Traversal , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[14]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[15]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Xiaowei Xu,et al.  A Novel Similarity-Based Modularity Function for Graph Partitioning , 2007, DaWaK.

[17]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[18]  Michalis Vazirgiannis,et al.  Clustering validity checking methods: part II , 2002, SGMD.

[19]  Hawoong Jeong,et al.  Scale-free trees: the skeletons of complex networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  S.,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2022 .

[21]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.