A Search Algorithm for Clusters in a Network or Graph

A novel breadth-first based structural clustering method for graphs is proposed. Clustering is an important task for analyzing complex networks such as biological networks, World Wide Web and social networks. The clusters are of various shapes such as cliques and stars, for example, in ProteinProtein Interactive(PPI) networks. Traditional algorithms may detect clique-shaped clusters, but they fail to identify star-shaped clusters that are common in scale free networks, including PPI networks. We propose a novel clustering algorithm to solve the problem. Experimental results demonstrate it outperforms other algorithms in one or several aspects: Detecting clusters of mixed shapes, including both cliques and stars; Faster. Its running time on a network with n nodes and m links is O(n), which is much faster than O(mdlogn) of the fastest modularity-based algorithm(where d is the depth of the dendrogram describing the hierarchical cluster structure); A non-parametric algorithm. It can accomplish all goals without requiring any input parameters.

[1]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[2]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[3]  Eytan Ruppin,et al.  Unsupervised learning of natural languages , 2006 .

[4]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[6]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.

[7]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[9]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[10]  R. Guimerà,et al.  Functional cartography of complex metabolic networks , 2005, Nature.

[11]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.