Finding community structure in mega-scale social networks: [extended abstract]

Community analysis algorithm proposed by Clauset, Newman, and Moore (CNM algorithm) finds community structure in social networks. Unfortunately, CNM algorithm does not scale well and its use is practically limited to networks whose sizes are up to 500,000 nodes. We show that this inefficiency is caused from merging communities in unbalanced manner and that a simple heuristics that attempts to merge community structures in a balanced manner can dramatically improve community structure analysis. The proposed techniques are tested using data sets obtained from existing social networking service that hosts 5.5 million users. We have tested three three variations of the heuristics. The fastest method processes a SNS friendship network with 1 million users in 5 minutes (70 times faster than CNM) and another friendship network with 4 million users in 35 minutes, respectively. Another one processes a network with 500,000 nodes in 50 minutes (7 times faster than CNM), finds community structures that has improved modularity, and scales to a network with 5.5 million.

[1]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[3]  Masaru Kitsuregawa,et al.  Creating a Web community chart for navigating related communities , 2001, Hypertext.

[4]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Osamu Watanabe,et al.  Simple Algorithms for Graph Partition Problems , 2005 .

[6]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[7]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  C. Lee Giles,et al.  Self-Organization and Identification of Web Communities , 2002, Computer.

[9]  Joel C. Miller,et al.  Modifications of Kleinberg's HITS algorithm using matrix exponentiation and web log records , 2001, SIGIR '01.

[10]  Gary William Flake,et al.  Self-organization of the web and identification of communities , 2002 .

[11]  Yoshi Fujiwara,et al.  A Gap in the Community-Size Distribution of a Large-Scale Social Networking Site , 2007, ArXiv.

[12]  Wei-Ying Ma,et al.  Block-level link analysis , 2004, SIGIR '04.

[13]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[14]  Fang Wu,et al.  Finding communities in linear time: a physics approach , 2003, ArXiv.

[15]  Monika Henzinger,et al.  Finding Related Pages in the World Wide Web , 1999, Comput. Networks.

[16]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[17]  Yoshi Fujiwara,et al.  Structural Analysis of Human Network in Social Networking Services , 2006 .