Coin-flipping, ball-dropping, and grass-hopping for generating random graphs from matrices of edge probabilities

Common models for random graphs, such as Erd\H{o}s-R\'{e}nyi and Kronecker graphs, correspond to generating random adjacency matrices where each entry is non-zero based on a large matrix of probabilities. Generating an instance of a random graph based on these models is easy, although inefficient, by flipping biased coins (i.e. sampling binomial random variables) for each possible edge. This process is inefficient because most large graph models correspond to sparse graphs where the vast majority of coin flips will result in no edges. We describe some not-entirely-well-known, but not-entirely-unknown, techniques that will enable us to sample a graph by finding only the coin flips that will produce edges. Our analogies for these procedures are ball-dropping, which is easier to implement, but may need extra work due to duplicate edges, and grass-hopping, which results in no duplicated work or extra edges. Grass-hopping does this using geometric random variables. In order to use this idea on complex probability matrices such as those in Kronecker graphs, we decompose the problem into three steps, each of which are independently useful computational primitives: (i) enumerating non-decreasing sequences, (ii) unranking multiset permutations, and (iii) decoding and encoding z-curve and Morton codes and permutations. The third step is the result of a new connection between repeated Kronecker product operations and Morton codes. Throughout, we draw connections to ideas underlying applied math and computer science including coupon collector problems.

[1]  M E J Newman,et al.  Random graphs with clustering. , 2009, Physical review letters.

[2]  Blai Bonet Efficient Algorithms to Rank and Unrank Permutations in Lexicographic Order , 2008 .

[3]  Ronald Aylmer Sir Fisher,et al.  141: "The Coefficient of Racial Likeness" and the Future of Craniometry , 1936 .

[4]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[5]  Tamara G. Kolda,et al.  An in-depth analysis of stochastic Kronecker graphs , 2011, JACM.

[6]  Lada A. Adamic,et al.  Search in Power-Law Networks , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  László Lovász,et al.  Multifractal network generator , 2010, Proceedings of the National Academy of Sciences.

[8]  Mervin E. Muller,et al.  Development of Sampling Plans by Using Sequential (Item by Item) Selection Techniques and Digital Computers , 1962 .

[9]  Ulrik Brandes,et al.  Efficient generation of large random networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Nathan Lemons,et al.  Fast Generation of Sparse Random Kernel Graphs , 2015, PloS one.

[11]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Vlastimil Havran,et al.  Extended Morton codes for high performance bounding volume hierarchy construction , 2017, High Performance Graphics.

[13]  F. N. David,et al.  Games, gods and gambling: The origins and history of probability and statistical ideas from the earliest times to the Newtonian era , 1963 .

[14]  Paul Pollack Euler and the partial sums of the prime harmonic series , 2015 .

[15]  F. Chung,et al.  Connected Components in Random Graphs with Given Expected Degree Sequences , 2002 .

[16]  Brian P. Dawkins Siobhan's Problem: The Coupon Collector Revisited , 1991 .

[17]  Timothy A. Davis,et al.  Direct methods for sparse linear systems , 2006, Fundamentals of algorithms.

[18]  Albert-László Barabási,et al.  Hierarchical organization in complex networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Craig B. Borkowf,et al.  Random Number Generation and Monte Carlo Methods , 2000, Technometrics.

[20]  Wojciech Szpankowski,et al.  Assessing Significance of Connectivity and Conservation in Protein Interaction Networks , 2006, RECOMB.

[21]  Jennifer Neville,et al.  A Scalable Method for Exact Sampling from Kronecker Family Models , 2014, 2014 IEEE International Conference on Data Mining.

[22]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[23]  John R. Gilbert,et al.  Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.

[24]  Werner R. W. Scheinhardt,et al.  In-Degree and PageRank of Web pages: Why do they follow similar power laws? , 2006, ArXiv.

[25]  M. D. Ernst Permutation Methods: A Basis for Exact Inference , 2004 .

[26]  Jeremy G. Siek,et al.  The Boost Graph Library - User Guide and Reference Manual , 2001, C++ in-depth series.

[27]  Jennifer Neville,et al.  Network Hypothesis Testing Using Mixed Kronecker Product Graph Models , 2013, 2013 IEEE 13th International Conference on Data Mining.

[28]  Béla Bollobás,et al.  The phase transition in inhomogeneous random graphs , 2007, Random Struct. Algorithms.

[29]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[30]  Brian W. Barrett,et al.  Introducing the Graph 500 , 2010 .

[31]  R. Stanley What Is Enumerative Combinatorics , 1986 .

[32]  T. H. Merrett,et al.  A class of data structures for associative searching , 1984, PODS.