Fast Generation of Large Scale Social Networks with Clustering

A key challenge within the social network literature is the problem of network generation - that is, how can we create synthetic networks that match characteristics traditionally found in most real world networks? Important characteristics that are present in social networks include a power law degree distribution, small diameter and large amounts of clustering; however, most current network generators, such as the Chung Lu and Kronecker models, largely ignore the clustering present in a graph and choose to focus on preserving other network statistics, such as the power law distribution. Models such as the exponential random graph model have a transitivity parameter, but are computationally difficult to learn, making scaling to large real world networks intractable. In this work, we propose an extension to the Chung Lu ran- dom graph model, the Transitive Chung Lu (TCL) model, which incorporates the notion of a random transitive edge. That is, with some probability it will choose to connect to a node exactly two hops away, having been introduced to a 'friend of a friend'. In all other cases it will follow the standard Chung Lu model, selecting a 'random surfer' from anywhere in the graph according to the given invariant distribution. We prove TCL's expected degree distribution is equal to the degree distribution of the original graph, while being able to capture the clustering present in the network. The single parameter required by our model can be learned in seconds on graphs with millions of edges, while networks can be generated in time that is linear in the number of edges. We demonstrate the performance TCL on four real- world social networks, including an email dataset with hundreds of thousands of nodes and millions of edges, showing TCL generates graphs that match the degree distribution, clustering coefficients and hop plots of the original networks.

[1]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[2]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[3]  Eli Upfal,et al.  Stochastic models for the Web graph , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[4]  F. Chung,et al.  The average distances in random graphs with given expected degrees , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[5]  László Lovász,et al.  Random Walks on Graphs: A Survey , 1993 .

[6]  S. Wasserman,et al.  Logit models and logistic regressions for social networks: I. An introduction to Markov graphs andp , 1996 .

[7]  Peng Wang,et al.  Recent developments in exponential random graph (p*) models for social networks , 2007, Soc. Networks.

[8]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[9]  Ian T. Foster,et al.  Mapping the Gnutella Network , 2002, IEEE Internet Comput..

[10]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[11]  F. Göbel,et al.  Random walks on graphs , 1974 .

[12]  Azadeh Iranmehr,et al.  Trust Management for Semantic Web , 2009, 2009 Second International Conference on Computer and Electrical Engineering.

[13]  Ramana Rao Kompella,et al.  Network Sampling via Edge-based Node Selection with Graph Induction , 2011 .

[14]  Tamara G. Kolda,et al.  Community structure and scale-free collections of Erdös-Rényi graphs , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Tamara G. Kolda,et al.  The Similarity Between Stochastic Kronecker and Chung-Lu Graph Models , 2011, SDM.

[16]  L. Asz Random Walks on Graphs: a Survey , 2022 .

[17]  Ove Frank,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[18]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[19]  B. Bollobás The evolution of random graphs , 1984 .