SCAN: a structural clustering algorithm for networks

Network clustering (or graph partitioning) is an important task for the discovery of underlying structures in networks. Many algorithms find clusters by maximizing the number of intra-cluster edges. While such algorithms find useful and interesting structures, they tend to fail to identify and isolate two kinds of vertices that play special roles - vertices that bridge clusters (hubs) and vertices that are marginally connected to clusters (outliers). Identifying hubs is useful for applications such as viral marketing and epidemiology since hubs are responsible for spreading ideas or disease. In contrast, outliers have little or no influence, and may be isolated as noise in the data. In this paper, we proposed a novel algorithm called SCAN (Structural Clustering Algorithm for Networks), which detects clusters, hubs and outliers in networks. It clusters vertices based on a structural similarity measure. The algorithm is fast and efficient, visiting each vertex only once. An empirical evaluation of the method using both synthetic and real datasets demonstrates superior performance over other methods such as the modularity-based algorithms.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[3]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[4]  R. Guimerà,et al.  Functional cartography of complex metabolic networks , 2005, Nature.

[5]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[8]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[10]  G. W. Milligan,et al.  A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis. , 1986, Multivariate behavioral research.

[11]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[12]  Christos Faloutsos,et al.  Epidemic spreading in real networks: an eigenvalue viewpoint , 2003, 22nd International Symposium on Reliable Distributed Systems, 2003. Proceedings..

[13]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[14]  John Scott What is social network analysis , 2010 .

[15]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.