A Scalable Generative Graph Model with Community Structure

Network data is ubiquitous and growing, yet we lack realistic generative network models that can be calibrated to match real-world data. The recently proposed block two-level Erdos--Renyi (BTER) model can be tuned to capture two fundamental properties: degree distribution and clustering coefficients. The latter is particularly important for reproducing graphs with community structure, such as social networks. In this paper, we compare BTER to other scalable models and show that it gives a better fit to real data. We provide a scalable implementation that requires only $O(d_{\rm max})$ storage, where $d_{\rm max}$ is the maximum number of neighbors for a single node. The generator is trivially parallelizable, and we show results for a Hadoop MapReduce implementation for modeling a real-world Web graph with over 4.6 billion edges. We propose that the BTER model can be used as a graph generator for benchmarking purposes and provide idealized degree distributions and clustering coefficient profiles that can b...

[1]  Eli Upfal,et al.  Stochastic models for the Web graph , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[2]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[3]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[4]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[5]  Tamara G. Kolda,et al.  Counting Triangles in Massive Graphs with MapReduce , 2013, SIAM J. Sci. Comput..

[6]  Steven B. Kraines,et al.  A Random Network Generator with Finely Tunable Clustering Coefficient for Small-World Social Networks , 2009, 2009 International Conference on Computational Aspects of Social Networks.

[7]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Tamara G. Kolda,et al.  A scalable directed graph model with reciprocal edges , 2012, ArXiv.

[9]  Colin Cooper,et al.  Randomization and Approximation Techniques in Computer Science , 1999, Lecture Notes in Computer Science.

[10]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.

[11]  Josep-Lluís Larriba-Pey,et al.  Survey of Graph Database Performance on the HPC Scalable Graph Analysis Benchmark , 2010, WAIM Workshops.

[12]  Tamara G. Kolda,et al.  A scalable null model for directed graphs matching all degree distributions: In, out, and reciprocal , 2012, 2013 IEEE 2nd Network Science Workshop (NSW).

[13]  B. Bollobás The evolution of random graphs , 1984 .

[14]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[15]  Karen Rose,et al.  What is Twitter , 2009 .

[16]  Jennifer Neville,et al.  Tied Kronecker product graph models to capture variance in network populations , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[17]  Tinkara Toš,et al.  Graph Algorithms in the Language of Linear Algebra , 2012, Software, environments, tools.

[18]  Rebecca N. Wright,et al.  A differentially private estimator for the stochastic Kronecker graph model , 2012, EDBT-ICDT '12.

[19]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[20]  M E J Newman,et al.  Component sizes in networks with arbitrary degree distributions. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[22]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[23]  Tamara G. Kolda,et al.  The Similarity Between Stochastic Kronecker and Chung-Lu Graph Models , 2011, SDM.

[24]  David F. Gleich,et al.  Moment-Based Estimation of Stochastic Kronecker Graph Parameters , 2011, Internet Math..

[25]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[26]  Ilya Safro,et al.  Multiscale network generation , 2012, 2015 18th International Conference on Information Fusion (Fusion).

[27]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[28]  Tim Roughgarden,et al.  Decompositions of triangle-dense graphs , 2013, SIAM J. Comput..

[29]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[30]  Christos H. Papadimitriou,et al.  On the Eigenvalue Power Law , 2002, RANDOM.

[31]  Patrick J. Wolfe,et al.  Subgraph Detection Using Eigenvector L1 Norms , 2010, NIPS.

[32]  Tamara G. Kolda,et al.  Community structure and scale-free collections of Erdös-Rényi graphs , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  Tamara G. Kolda,et al.  An in-depth analysis of stochastic Kronecker graphs , 2011, JACM.

[34]  F. Chung,et al.  The average distances in random graphs with given expected degrees , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[35]  M. Newman Properties of highly clustered networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  Ben Y. Zhao,et al.  Revisiting Degree Distribution Models for Social Graph Analysis , 2011, ArXiv.

[37]  A. Rbnyi ON THE EVOLUTION OF RANDOM GRAPHS , 2001 .

[38]  Sebastiano Vigna,et al.  A large time-aware web graph , 2008, SIGF.

[39]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[40]  Christos Faloutsos,et al.  The "DGX" distribution for mining massive, skewed data , 2001, KDD '01.

[41]  Tamara G. Kolda,et al.  An In-depth Study of Stochastic Kronecker Graphs , 2011, 2011 IEEE 11th International Conference on Data Mining.

[42]  Silvio Lattanzi,et al.  On compressing social networks , 2009, KDD.

[43]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Tamara G. Kolda,et al.  Triadic Measures on Graphs: The Power of Wedge Sampling , 2012, SDM.

[45]  Ben Y. Zhao,et al.  Measurement-calibrated graph models for social network experiments , 2010, WWW '10.

[46]  David Banks,et al.  Models for networks: a cross‐disciplinary science , 2012 .

[47]  Fan Chung Graham,et al.  A Random Graph Model for Power Law Graphs , 2001, Exp. Math..

[48]  M. Weigt,et al.  On the properties of small-world network models , 1999, cond-mat/9903411.

[49]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[50]  Jeremy Kepner The Kronecker Theory of Power Law Graphs , 2011, Graph Algorithms in the Language of Linear Algebra.

[51]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[52]  Rizal Setya Perdana What is Twitter , 2013 .

[53]  Benjamin A. Miller,et al.  Goodness-of-fit statistics for anomaly detection in Chung-Lu random graphs , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).