Recent Advances in Scalable Network Generation

Random graph models are frequently used as a controllable and versatile data source for experimental campaigns in various research fields. Generating such data-sets at scale is a non-trivial task as it requires design decisions typically spanning multiple areas of expertise. Challenges begin with the identification of relevant domain-specific network features, continue with the question of how to compile such features into a tractable model, and culminate in algorithmic details arising while implementing the pertaining model. In the present survey, we explore crucial aspects of random graph models with known scalable generators. We begin by briefly introducing network features considered by such models, and then discuss random graphs alongside with generation algorithms. Our focus lies on modelling techniques and algorithmic primitives that have proven successful in obtaining massive graphs. We consider concepts and graph models for various domains (such as social network, infrastructure, ecology, and numerical simulations), and discuss generators for different models of computation (including shared-memory parallelism, massive-parallel GPUs, and distributed systems).

[1]  E. Stadlober,et al.  The ratio of uniforms approach for generating discrete random variates , 1990 .

[2]  Svante Janson,et al.  Threshold Graph Limits and Random Threshold Graphs , 2008, Internet Math..

[3]  Christos Faloutsos,et al.  Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication , 2005, PKDD.

[4]  Václav Havel,et al.  Poznámka o existenci konečných grafů , 1955 .

[5]  Frank Harary,et al.  Distance in graphs , 1990 .

[6]  Tamara Munzner,et al.  Exploring Large Graphs in 3D Hyperbolic Space , 1998, IEEE Computer Graphics and Applications.

[7]  C. J. Carstens TOPOLOGY OF COMPLEX NETWORKS: MODELS AND ANALYSIS , 2017, Bulletin of the Australian Mathematical Society.

[8]  Peter Sanders,et al.  Scalable generation of scale-free graphs , 2016, Inf. Process. Lett..

[9]  E. N. Gilbert,et al.  Random Plane Networks , 1961 .

[10]  Bedrich Benes,et al.  Authoring Hierarchical Road Networks , 2011, Comput. Graph. Forum.

[11]  Luc Devroye,et al.  The Botanical Beauty of Random Binary Trees , 1995, Graph Drawing.

[12]  Peter Sanders,et al.  Thrill: High-performance algorithmic distributed batch data processing with C++ , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[13]  Carsten Thomassen,et al.  Kuratowski's theorem , 1981, J. Graph Theory.

[14]  Darren J. Wilkinson,et al.  A review of stochastic block models and extensions for graph clustering , 2019, Applied Network Science.

[15]  P. Hammer,et al.  Aggregation of inequalities in integer programming. , 1975 .

[16]  S. N. Dorogovtsev,et al.  Evolution of networks , 2001, cond-mat/0106144.

[17]  Jeffrey Scott Vitter,et al.  Faster methods for random sampling , 1984, CACM.

[18]  Reynold Xin,et al.  Apache Spark , 2016 .

[19]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[20]  Tobias Friedrich,et al.  Exact and Efficient Generation of Geometric Random Variates and Random Graphs , 2013, ICALP.

[21]  M. Newman Analysis of weighted networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Nicholas C. Wormald,et al.  Almost All Regular Graphs Are Hamiltonian , 1994, Random Struct. Algorithms.

[23]  Sören Laue,et al.  Generating massive complex networks with hyperbolic geometry faster in practice , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[24]  J. Dall,et al.  Random geometric graphs. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Manuel Penschuck Generating Practical Random Hyperbolic Graphs in Near-Linear Time and with Sub-Linear Memory , 2017, SEA.

[26]  Chris Walshaw,et al.  Journal of Graph Algorithms and Applications a Multilevel Algorithm for Force-directed Graph-drawing , 2022 .

[27]  Joel C. Miller,et al.  Efficient Generation of Networks with Given Expected Degrees , 2011, WAW.

[28]  N. Wormald,et al.  Models of the , 2010 .

[29]  Mervin E. Muller,et al.  Development of Sampling Plans by Using Sequential (Item by Item) Selection Techniques and Digital Computers , 1962 .

[30]  Peter Sanders,et al.  Efficient Calculation of Microscopic Travel Demand Data with Low Calibration Effort , 2019, SIGSPATIAL/GIS.

[31]  S. Hakimi On Realizability of a Set of Integers as Degrees of the Vertices of a Linear Graph. I , 1962 .

[32]  Peter Sanders,et al.  Novel Parallel Algorithms for Fast Multi-GPU-Based Generation of Massive Scale-Free Networks , 2019, Data Science and Engineering.

[33]  Giovanni Strona,et al.  A fast and unbiased procedure to randomize ecological binary matrices with fixed row and column totals , 2014, Nature Communications.

[34]  Madhav V. Marathe,et al.  Fast Parallel Algorithms for Edge-Switching to Achieve a Target Visit Rate in Heterogeneous Graphs , 2014, 2014 43rd International Conference on Parallel Processing.

[35]  Ulrik Brandes,et al.  Efficient generation of large random networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  Pu Gao,et al.  Uniform Generation of Random Regular Graphs , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[37]  Peter Sanders,et al.  Linear work generation of R-MAT graphs , 2019, Network Science.

[38]  Tom A. B. Snijders,et al.  Exponential Random Graph Models for Social Networks , 2013 .

[39]  Ilya Safro,et al.  Relaxation-based coarsening and multiscale graph organization , 2010, Multiscale Model. Simul..

[40]  Henning Meyerhenke,et al.  Generating Random Hyperbolic Graphs in Subquadratic Time , 2015, ISAAC.

[41]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[42]  Kathleen M. Carley,et al.  On the robustness of centrality measures under conditions of imperfect data , 2006, Soc. Networks.

[43]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[44]  Robert A. Bridges,et al.  International Conference on Advances in Social Networks Analysis and Mining ( ASONAM ) EGBTER : Capturing degree distribution , clustering coefficients , and community structure in a single random graph model , 2018 .

[45]  Andrea Lancichinetti,et al.  Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  Chiara Orsini,et al.  Hyperbolic graph generator , 2015, Comput. Phys. Commun..

[47]  Luca Gugelmann,et al.  Random Hyperbolic Graphs: Degree Sequence and Clustering - (Extended Abstract) , 2012, ICALP.

[48]  Stéphane Bressan,et al.  Fast random graph generation , 2011, EDBT/ICDT '11.

[49]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[50]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[51]  Matthieu Latapy,et al.  Fast generation of random connected graphs with prescribed degrees , 2005, ArXiv.

[52]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[53]  Alana Shine,et al.  Generative Graph Models based on Laplacian Spectra? , 2019, WWW.

[54]  Amin Vahdat,et al.  Hyperbolic Geometry of Complex Networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[55]  Jure Leskovec,et al.  Community-Affiliation Graph Model for Overlapping Network Community Detection , 2012, 2012 IEEE 12th International Conference on Data Mining.

[56]  Nikolaos Fountoulakis,et al.  On the Largest Component of a Hyperbolic Model of Complex Networks , 2015, Electron. J. Comb..

[57]  Katharina Anna Zweig,et al.  Different flavors of randomness: comparing random graph models with fixed degree sequences , 2015, Social Network Analysis and Mining.

[58]  Hyunseung Choo,et al.  On Generating Random Network Structures: Trees , 2003, International Conference on Computational Science.

[59]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[60]  Tobias Friedrich,et al.  On the Diameter of Hyperbolic Random Graphs , 2015, ICALP.

[61]  Mahantesh Halappanavar,et al.  A generative graph model for electrical infrastructure networks , 2017, J. Complex Networks.

[62]  Kenichi Kurihara,et al.  A Frequency-based Stochastic Blockmodel , 2006 .

[63]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[64]  Peter L. Hammer,et al.  The splittance of a graph , 1981, Comb..

[65]  Manuel Penschuck,et al.  I/O-Efficient Generation of Massive Graphs Following the LFR Benchmark , 2016, ALENEX.

[66]  Catherine S. Greenhill,et al.  The mixing time of the swap (switch) Markov chains: a unified approach , 2019, ArXiv.

[67]  Éric Fusy,et al.  Uniform random sampling of planar graphs in linear time , 2007, Random Struct. Algorithms.

[68]  Emmanuel Abbe,et al.  Community Detection and Stochastic Block Models , 2017, Found. Trends Commun. Inf. Theory.

[69]  Ilya Safro,et al.  Multiscale network generation , 2012, 2015 18th International Conference on Information Fusion (Fusion).

[70]  Anthony Bonato,et al.  A Survey of Models of the Web Graph , 2004, CAAN.

[71]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[72]  Cynthia A. Phillips,et al.  Scalable generation of graphs for benchmarking HPC community-detection algorithms , 2019, SC.

[73]  Peter Sanders,et al.  Efficient Parallel Random Sampling—Vectorized, Cache-Efficient, and Online , 2016, ACM Trans. Math. Softw..

[74]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[75]  Peter Sanders,et al.  Communication-Free Massively Distributed Graph Generation , 2017, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[76]  Ulrik Brandes,et al.  Generative Data Models for Validation and Evaluation of Visualization Techniques , 2016, BELIV '16.

[77]  P. Diaconis,et al.  Estimating and understanding exponential random graph models , 2011, 1102.2650.

[78]  Tamara G. Kolda,et al.  A Scalable Generative Graph Model with Community Structure , 2013, SIAM J. Sci. Comput..

[79]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[80]  DrobyshevskiyMikhail,et al.  Random Graph Modeling , 2020 .

[81]  Peter Sanders,et al.  Engineering a scalable high quality graph partitioner , 2009, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[82]  Bryan T. Adey,et al.  Generation of Spatially Embedded Random Networks to Model Complex Transportation Networks , 2016, 1609.03324.

[83]  Ilya Safro,et al.  Algebraic Distance on Graphs , 2011, SIAM J. Sci. Comput..

[84]  Tamara G. Kolda,et al.  A Hitchhiker's Guide to Choosing Parameters of Stochastic Kronecker Graphs , 2011, ArXiv.

[85]  Jennifer Neville,et al.  Scalable and exact sampling method for probabilistic generative graph models , 2018, Data Mining and Knowledge Discovery.

[86]  Bruce A. Reed,et al.  A Critical Point for Random Graphs with a Given Degree Sequence , 1995, Random Struct. Algorithms.

[87]  Tamara G. Kolda,et al.  An in-depth analysis of stochastic Kronecker graphs , 2011, JACM.

[88]  Christos Faloutsos,et al.  Graph mining: Laws, generators, and algorithms , 2006, CSUR.

[89]  N. Gotelli,et al.  NULL MODELS IN ECOLOGY , 1996 .

[90]  X ZhengAlice,et al.  A Survey of Statistical Network Models , 2010 .

[91]  Frieder Nake,et al.  Das doppelte Bild , 2017 .

[92]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[93]  Milena Mihail,et al.  Efficient Generation ε-close to G(n,p) and Generalizations , 2012, ArXiv.

[94]  Pu Gao,et al.  Fast Uniform Generation of Random Graphs with Given Degree Sequences , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[95]  Brendan D. McKay,et al.  Uniform Generation of Random Regular Graphs of Moderate Degree , 1990, J. Algorithms.

[96]  Blair D. Sullivan,et al.  A multi-level anomaly detection algorithm for time-varying graph data with interactive visualization , 2016, Social Network Analysis and Mining.

[97]  BERNARD M. WAXMAN,et al.  Routing of multipoint connections , 1988, IEEE J. Sel. Areas Commun..

[98]  William W. Hager,et al.  A multilevel bilinear programming algorithm for the vertex separator problem , 2018, Comput. Optim. Appl..

[99]  Donald E. Knuth The Art of Computer Programming 2 / Seminumerical Algorithms , 1971 .

[100]  Tamara G. Kolda,et al.  A scalable null model for directed graphs matching all degree distributions: In, out, and reciprocal , 2012, 2013 IEEE 2nd Network Science Workshop (NSW).

[101]  F. Chung,et al.  Connected Components in Random Graphs with Given Expected Degree Sequences , 2002 .

[102]  B. Bollobás The evolution of random graphs , 1984 .

[103]  Jennifer Neville,et al.  A Scalable Method for Exact Sampling from Kronecker Family Models , 2014, 2014 IEEE International Conference on Data Mining.

[104]  Matthew Roughan,et al.  Fast Generation of Spatially Embedded Random Networks , 2015, IEEE Transactions on Network Science and Engineering.

[105]  Peter Sanders,et al.  Advanced Coarsening Schemes for Graph Partitioning , 2012, ACM J. Exp. Algorithmics.

[106]  Jonathan Richard Shewchuk,et al.  Delaunay refinement algorithms for triangular mesh generation , 2002, Comput. Geom..

[107]  Santo Fortunato,et al.  Community detection in networks: A user guide , 2016, ArXiv.

[108]  Dorothea Wagner,et al.  An Experimental Study on Generating Planar Graphs , 2011, FAW-AAIM.

[109]  A. Rao,et al.  A Markov chain Monte carol method for generating random (0, 1)-matrices with given marginals , 1996 .

[110]  Michael D. Vose,et al.  A Linear Algorithm For Generating Random Numbers With a Given Distribution , 1991, IEEE Trans. Software Eng..

[111]  Tamara G. Kolda,et al.  Community structure and scale-free collections of Erdös-Rényi graphs , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[112]  William H. Press,et al.  Numerical recipes in C , 2002 .

[113]  Peter Sanders,et al.  Random Permutations on Distributed, External and Hierarchical Memory , 1998, Inf. Process. Lett..

[114]  D. Garlaschelli The weighted random graph model , 2009, 0902.0897.

[115]  Ilya Safro,et al.  Multiscale planar graph generation , 2018, Applied Network Science.

[116]  Denis Turdakov,et al.  Distributed Generation of Billion-node Social Graphs with Overlapping Community Structure , 2014, CompleNet.

[117]  Matthieu Latapy,et al.  Efficient and simple generation of random simple connected graphs with prescribed degree sequence , 2005, J. Complex Networks.

[118]  Frank S. de Boer,et al.  A high-level and scalable approach for generating scale-free graphs using active objects , 2016, SAC.

[119]  Mohammad Mahdian,et al.  Stochastic Kronecker Graphs , 2007, WAW.

[120]  S. Shen-Orr,et al.  Networks Network Motifs : Simple Building Blocks of Complex , 2002 .

[121]  Lars Arge,et al.  The Buffer Tree: A Technique for Designing Batched External Data Structures , 2003, Algorithmica.

[122]  Garry Robins,et al.  Exponential random graph model parameter estimation for very large directed networks , 2019, PloS one.

[123]  Catherine C. McGeoch A Guide to Experimental Algorithmics , 2012 .

[124]  Cheng Wang,et al.  Generating Synthetic Social Graphs with Darwini , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[125]  Marta C. González,et al.  A universal model for mobility and migration patterns , 2011, Nature.

[126]  Ralph Keusch,et al.  Geometric Inhomogeneous Random Graphs , 2015, Theor. Comput. Sci..

[127]  N. Wormald Models of random regular graphs , 2010 .

[128]  Edward A. Bender,et al.  The Asymptotic Number of Labeled Graphs with Given Degree Sequences , 1978, J. Comb. Theory A.

[129]  Alastair J. Walker,et al.  An Efficient Method for Generating Discrete Random Variables with General Distributions , 1977, TOMS.

[130]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[131]  Byungnam Kahng,et al.  Weighted Scale-Free Network in Financial Correlations , 2002 .

[132]  Peter Sanders,et al.  Parallel Weighted Random Sampling , 2019, ESA.

[133]  Anna Scaglione,et al.  Generating Random Topology Power Grids , 2008, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).

[134]  Jure Leskovec,et al.  Structure and Overlaps of Ground-Truth Communities in Networks , 2014, TIST.

[135]  Annabell Berger,et al.  Curveball: a new generation of sampling algorithms for graphs with fixed degree sequence , 2016, ArXiv.

[136]  Dmitri Krioukov,et al.  Cosmological networks , 2013 .

[137]  Ulrik Brandes,et al.  A Quantitative Comparison of Stress-Minimization Approaches for Offline Dynamic Graph Drawing , 2011, GD.

[138]  Madhav V. Marathe,et al.  Distributed-memory parallel algorithms for generating massive scale-free networks using preferential attachment model , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[139]  Norbert Zeh,et al.  A Survey of Techniques for Designing I/O-Efficient Algorithms , 2002, Algorithms for Memory Hierarchies.

[140]  Ulrich Meyer,et al.  Efficiently generating geometric inhomogeneous and hyperbolic random graphs , 2019, Network Science.

[141]  Stanley Wasserman,et al.  Statistical Models for Social Networks , 2000 .

[142]  Madhav V. Marathe,et al.  An Efficient and Scalable Algorithmic Method for Generating Large-Scale Random Graphs , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[143]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[144]  Christos Gkantsidis,et al.  The Markov Chain Simulation Method for Generating Connected Power Law Random Graphs , 2003, ALENEX.

[145]  Peter Sanders,et al.  Communication efficient algorithms for fundamental big data problems , 2013, 2013 IEEE International Conference on Big Data.

[146]  N. Mahadev,et al.  Threshold graphs and related topics , 1995 .

[147]  Priya Mahadevan,et al.  Systematic topology analysis and generation using degree correlations , 2006, SIGCOMM.

[148]  Ulrich Meyer,et al.  Parallel and I/O-efficient Randomisation of Massive Networks using Global Curveball Trades , 2018, ESA.

[149]  Ulrich Meyer,et al.  Generating Massive Scale-Free Networks under Resource Constraints , 2016, ALENEX.

[150]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[151]  Tasuku Igarashi Exponential Random Graph Models for Social Networks: Longitudinal Changes in Face-to-Face and Text Message–Mediated Friendship Networks , 2012 .

[152]  Annabell Berger,et al.  Smaller universes for sampling graphs with fixed degree sequence , 2018 .

[153]  Beom Jun Kim,et al.  Growing scale-free networks with tunable clustering. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[154]  Wolfgang Rauch,et al.  Automatic generation of water distribution systems based on GIS data , 2013, Environ. Model. Softw..

[155]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[156]  Martin G. Everett,et al.  Models of core/periphery structures , 2000, Soc. Networks.

[157]  Tom Britton,et al.  A Weighted Configuration Model and Inhomogeneous Epidemics , 2011 .

[158]  M. Serrano,et al.  Weighted Configuration Model , 2005, cond-mat/0501750.

[159]  Kalyan S. Perumalla,et al.  GPU-based parallel algorithm for generating massive scale-free networks using the preferential attachment model , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[160]  B. McKay,et al.  Fast generation of planar graphs , 2007 .

[161]  Kaiming He,et al.  Exploring Randomly Wired Neural Networks for Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).