Communication Efficient Algorithms for Generating Massive Networks

Massive complex systems are prevalent throughout all of our lives, from various biological systems as the human genome to technological networks such as Facebook or Twitter. Rapid advances in technology allow us to gather more and more data that is connected to these systems. Analyzing and extracting this huge amount of information is a crucial task for a variety of scientific disciplines. A common abstraction for handling complex systems are networks (graphs) made up of entities and their relationships. For example, we can represent wireless ad hoc networks in terms of nodes and their connections with each other. We then identify the nodes as vertices and their connections as edges between the vertices. This abstraction allows us to develop algorithms that are independent of the underlying domain. Designing algorithms for massive networks is a challenging task that requires thorough analysis and experimental evaluation. A major hurdle for this task is the scarcity of publicly available large-scale datasets. To approach this issue, we can make use of network generators [21]. These generators allow us to produce synthetic instances that exhibit properties found in many real-world networks. In this thesis we develop a set of novel graph generators that have a focus on scalability. In particular, we cover the classic Erdős-Rényi model, random geometric graphs and random hyperbolic graphs. These models represent different real-world systems, from the aforementioned wireless ad-hoc networks [40] to social networks [44]. We ensure scalability by making use of pseudorandomization via hash functions and redundant computations. The resulting network generators are communication agnostic, i.e. they require no communication. This allows us to generate massive instances of up to 2 vertices and 2 edges in less than 22 minutes on 32.768 processors. In addition to proving theoretical bounds for each generator, we perform an extensive experimental evaluation. We cover both their sequential performance, as well as scaling behavior. We are able to show that our algorithms are competitive to state-of-the-art implementations found in network analysis libraries. Additionally, our generators exhibit near optimal scaling behavior for large instances. Finally, we show that pseudorandomization has little to no measurable impact on the quality of our generated instances.

[1]  C. Jacoboni,et al.  The Monte Carlo method for the solution of charge transport in semiconductors with applications to covalent materials , 1983 .

[2]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[3]  Stéphane Bressan,et al.  Fast random graph generation , 2011, EDBT/ICDT '11.

[4]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[5]  H. Feistel Cryptography and Computer Privacy , 1973 .

[6]  J. Wishart Statistical tables , 2018, Global Education Monitoring Report.

[7]  Tao Zhou,et al.  Traffic dynamics based on local routing protocol on a scale-free network. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Duncan J. Watts,et al.  The Structure and Dynamics of Networks: (Princeton Studies in Complexity) , 2006 .

[9]  Xingde Jia,et al.  Wireless networks and random geometric graphs , 2004, 7th International Symposium on Parallel Architectures, Algorithms and Networks, 2004. Proceedings..

[10]  Christos Faloutsos,et al.  Graph mining: Laws, generators, and algorithms , 2006, CSUR.

[11]  M. Newman Random Graphs as Models of Networks , 2002, cond-mat/0202208.

[12]  K. Choromanski,et al.  Scale-Free Graph with Preferential Attachment and Evolving Internal Vertex Structure , 2013 .

[13]  Ibrahim Matta,et al.  BRITE: an approach to universal topology generation , 2001, MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[14]  Jan Palczewski,et al.  Monte Carlo Simulation , 2008, Encyclopedia of GIS.

[15]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[16]  E. JeffreyGoldstein,et al.  Emergence as a Construct : History and Issues , 2000 .

[17]  Joachim H. Ahrens,et al.  Sequential random sampling , 1985, TOMS.

[18]  Mutsuo Saito,et al.  A PRNG Specialized in Double Precision Floating Point Numbers Using an Affine Transition , 2009 .

[19]  Udi Manber,et al.  DIB—a distributed implementation of backtracking , 1987, TOPL.

[20]  Shailesh A. Shirali,et al.  Triangular numbers , 2012 .

[21]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[22]  Ulrich Meyer,et al.  Generating Massive Scale-Free Networks under Resource Constraints , 2016, ALENEX.

[23]  E. Stadlober,et al.  Ratio of uniforms as a convenient method for sampling from classical discrete distributions , 1989, WSC '89.

[24]  G. M.,et al.  The Thirteen Books of Euclid's Elements , 1909, Nature.

[25]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[26]  Jeffrey Scott Vitter,et al.  An efficient algorithm for sequential random sampling , 1987, TOMS.

[27]  Franziska Abend,et al.  Sync The Emerging Science Of Spontaneous Order , 2016 .

[28]  Jurgen Kurths,et al.  Synchronization in complex networks , 2008, 0805.2976.

[29]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[30]  Luca Gugelmann,et al.  Random Hyperbolic Graphs: Degree Sequence and Clustering , 2012, ArXiv.

[31]  Sören Laue,et al.  Generating massive complex networks with hyperbolic geometry faster in practice , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[32]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[33]  Peter Sanders Lastverteilungsalgorithmen für parallele Tiefensuche , 1997 .

[34]  Torben Hagerup,et al.  A Guided Tour of Chernoff Bounds , 1990, Inf. Process. Lett..

[35]  Ernst Stadlober,et al.  The patchwork rejection technique for sampling from unimodal distributions , 1999, TOMC.

[36]  A-L Barabási,et al.  Structure and tie strengths in mobile communication networks , 2006, Proceedings of the National Academy of Sciences.

[37]  Peter Sanders,et al.  Engineering a scalable high quality graph partitioner , 2009, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[38]  C. Ma,et al.  BEAM: a Monte Carlo code to simulate radiotherapy treatment units. , 1995, Medical physics.

[39]  Alessandro Vespignani,et al.  Epidemic spreading in scale-free networks. , 2000, Physical review letters.

[40]  Tamara G. Kolda,et al.  Community structure and scale-free collections of Erdös-Rényi graphs , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  Christian Staudt,et al.  NetworKit: An Interactive Tool Suite for High-Performance Network Analysis , 2014, ArXiv.

[42]  Christos Faloutsos,et al.  Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication , 2005, PKDD.

[43]  Hsinchun Chen,et al.  COPLINK Center: Information and Knowledge Management for Law Enforcement , 2004, DG.O.

[44]  Sherif Sakr,et al.  Large scale graph processing systems: survey and an experimental evaluation , 2015, Cluster Computing.

[45]  Ralph Keusch,et al.  Geometric Inhomogeneous Random Graphs , 2015, Theor. Comput. Sci..

[46]  Frédéric Amblard Linked: The New Science of Networks by Albert-László Barabási , 2003, J. Artif. Soc. Soc. Simul..

[47]  Peter Sanders,et al.  Scalable generation of scale-free graphs , 2016, Inf. Process. Lett..

[48]  Fan Chung Graham,et al.  A random graph model for massive graphs , 2000, STOC '00.

[49]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[50]  Martin Raab,et al.  "Balls into Bins" - A Simple and Tight Analysis , 1998, RANDOM.

[51]  Amin Vahdat,et al.  Hyperbolic Geometry of Complex Networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[52]  Henning Meyerhenke,et al.  Generating Random Hyperbolic Graphs in Subquadratic Time , 2015, ISAAC.

[53]  J. Dall,et al.  Random geometric graphs. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[54]  Donald Ervin Knuth,et al.  The Art of Computer Programming, Volume II: Seminumerical Algorithms , 1970 .

[55]  Shilpa Chakravartula,et al.  Complex Networks: Structure and Dynamics , 2014 .

[56]  Tobias Friedrich,et al.  Efficient Embedding of Scale-Free Graphs in the Hyperbolic Plane , 2018, IEEE/ACM Transactions on Networking.

[57]  Mervin E. Muller,et al.  Development of Sampling Plans by Using Sequential (Item by Item) Selection Techniques and Digital Computers , 1962 .

[58]  Lars Arge,et al.  The Buuer Tree: a New Technique for Optimal I/o-algorithms ? , 1995 .

[59]  Ulrik Brandes,et al.  Efficient generation of large random networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[60]  Pierre L'Ecuyer,et al.  TestU01: A C library for empirical testing of random number generators , 2006, TOMS.