DPSCAN: Structural Graph Clustering Based on Density Peaks

Structural graph clustering is one of the fundamental problems in managing and analyzing graph data. The structural clustering algorithm SCAN is successfully used in many applications because it obtains not only clusters but also hubs and outliers. However, the results of SCAN heavily depend on two sensitive parameters, \(\epsilon \) and \(\mu \), which may result in loss of accuracy and efficiency. In this paper, we propose a novel Density Peak-based Structural Clustering Algorithm for Networks (DPSCAN). Specifically, DPSCAN clusters vertices based on the structural similarity and the structural dependency between vertices and their neighbors, without tuning parameters. Through theoretical analysis, we prove that DPSCAN can detect meaningful clusters, hubs and outliers. In addition, to improve the efficiency of DPSCAN, we propose a new index structure named DP-Index, so that each vertex needs to be visited only once. Finally, we conduct comprehensive experimental studies on real and synthetic graphs, which demonstrate that our new approach outperforms the state-of-the-art approaches.

[1]  Junming Shao,et al.  Community Detection based on Distance Dynamics , 2015, KDD.

[2]  Jiawei Han,et al.  gSkeletonClu: Density-Based Network Clustering via Structure-Connected Tree Division or Agglomeration , 2010, 2010 IEEE International Conference on Data Mining.

[3]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Lijun Chang,et al.  Efficient structural graph clustering: an index-based approach , 2017, The VLDB Journal.

[5]  Kyomin Jung,et al.  LinkSCAN*: Overlapping community detection using the link-space transformation , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[6]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[8]  Gennaro Cordasco,et al.  Community detection via semi-synchronous label propagation algorithms , 2010 .

[9]  Makoto Onizuka,et al.  Graph Partitioning for Distributed Graph Processing , 2017, Data Science and Engineering.

[10]  Ge Yu,et al.  Clustering Stream Data by Exploring the Evolution of Density Mountain , 2017, Proc. VLDB Endow..

[11]  Lijun Chang,et al.  pSCAN: Fast and exact structural graph clustering , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[12]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[14]  Quan Pan,et al.  SELP: Semi-supervised evidential label propagation algorithm for graph data clustering , 2018, Int. J. Approx. Reason..

[15]  Quan Pan,et al.  Evidential Community Detection Based on Density Peaks , 2018, BELIEF.

[16]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.

[17]  Yizhou Sun,et al.  SHRINK: a structural clustering algorithm for detecting hierarchical communities in networks , 2010, CIKM.

[18]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Peilin Yang,et al.  An overlapping community detection algorithm based on density peaks , 2017, Neurocomputing.

[20]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Yasuhiro Fujiwara,et al.  SCAN++: Efficient Algorithm for Finding Clusters, Hubs and Outliers on Large-scale Graphs , 2015, Proc. VLDB Endow..

[22]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[23]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[25]  Ulises Cortés,et al.  Fluid Communities: A Competitive, Scalable and Diverse Community Detection Algorithm , 2017, COMPLEX NETWORKS.

[26]  Myra Spiliopoulou,et al.  Studying Community Dynamics with an Incremental Graph Mining Algorithm , 2008, AMCIS.

[27]  Lu Qin,et al.  pSCAN: Fast and exact structural graph clustering , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).