Distributed Generation of Billion-node Social Graphs with Overlapping Community Structure

In the field of social community detection, it is commonly accepted to utilize graphs with reference community structure for accuracy evaluation. The method for generating large random social graphs with realistic community structure is introduced in the paper. The resulting graphs have several of recently discovered properties of social community structure which run counter to conventional wisdom: dense community overlaps, superlinear growth of number of edges inside a community with its size, and power law distribution of user-community memberships. Further, the method is by-design distributable and showed near-linear scalability in Amazon EC2 cloud using Apache Spark implementation.

[1]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[2]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[3]  FaloutsosMichalis,et al.  On power-law relationships of the Internet topology , 1999 .

[4]  Jure Leskovec,et al.  Structure and Overlaps of Communities in Networks , 2012, KDD 2012.

[5]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[6]  Michel Crampes,et al.  Survey on Social Community Detection , 2013, Social Media Retrieval.

[7]  B. Bollobás The evolution of random graphs , 1984 .

[8]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[9]  Jure Leskovec,et al.  Community-Affiliation Graph Model for Overlapping Network Community Detection , 2012, 2012 IEEE 12th International Conference on Data Mining.

[10]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[11]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[12]  J. Spencer The giant component: The golden anniversary , 2010 .

[13]  Bruce A. Reed,et al.  A Critical Point for Random Graphs with a Given Degree Sequence , 1995, Random Struct. Algorithms.

[14]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[15]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[16]  Andrea Lancichinetti,et al.  Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.