Parallel and I/O-efficient Randomisation of Massive Networks using Global Curveball Trades

Graph randomisation is a crucial task in the analysis and synthesis of networks. It is typically implemented as an edge switching process (ESMC) repeatedly swapping the nodes of random edge pairs while maintaining the degrees involved. Curveball is a novel approach that instead considers the whole neighbourhoods of randomly drawn node pairs. Its Markov chain converges to a uniform distribution, and experiments suggest that it requires less steps than the established ESMC. Since trades however are more expensive, we study Curveball's practical runtime by introducing the first efficient Curveball algorithms: the I/O-efficient EM-CB for simple undirected graphs and its internal memory pendant IM-CB. Further, we investigate global trades processing every node in a graph during a single super step, and show that undirected global trades converge to a uniform distribution and perform superior in practice. We then discuss EM-GCB and EM-PGCB for global trades and give experimental evidence that EM-PGCB achieves the quality of the state-of-the-art ESMC algorithm EM-ES nearly one order of magnitude faster.

[1]  Julio Saez-Rodriguez,et al.  Efficient randomization of biological networks while preserving functional characterization of individual nodes , 2016, BMC Bioinformatics.

[2]  Ulrich Meyer,et al.  Algorithms for Memory Hierarchies , 2003, Lecture Notes in Computer Science.

[3]  Elizabeth L. Wilmer,et al.  Markov Chains and Mixing Times , 2008 .

[4]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[5]  C. J. Carstens TOPOLOGY OF COMPLEX NETWORKS: MODELS AND ANALYSIS , 2017, Bulletin of the Australian Mathematical Society.

[6]  Katharina Anna Zweig,et al.  Influence of the null-model on motif detection , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[7]  M. Newman,et al.  On the uniform generation of random graphs with prescribed degree sequences , 2003, cond-mat/0312028.

[8]  Bruce A. Reed,et al.  A Critical Point for Random Graphs with a Given Degree Sequence , 1995, Random Struct. Algorithms.

[9]  Catherine S. Greenhill The switch Markov chain for sampling irregular graphs (Extended Abstract) , 2014, SODA.

[10]  Annabell Berger,et al.  A unifying framework for fast randomization of ecological networks with fixed (node) degrees , 2016, MethodsX.

[11]  Norbert Zeh,et al.  A Survey of Techniques for Designing I/O-Efficient Algorithms , 2002, Algorithms for Memory Hierarchies.

[12]  Christos Gkantsidis,et al.  The Markov Chain Simulation Method for Generating Connected Power Law Random Graphs , 2003, ALENEX.

[13]  Christian Staudt,et al.  NetworKit: A tool suite for large-scale complex network analysis , 2014, Network Science.

[14]  Peter Sanders,et al.  STXXL: standard template library for XXL data sets , 2008, Softw. Pract. Exp..

[15]  R. Milo,et al.  Subgraphs in random networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  C. J. Carstens Proof of uniform sampling of binary matrices with fixed row sums and column sums for the fast Curveball algorithm. , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Andrea Lancichinetti,et al.  Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Matthieu Latapy,et al.  Fast generation of random connected graphs with prescribed degrees , 2005, ArXiv.

[19]  R. B. Eggleton,et al.  Simple and multigraphic realizations of degree sequences , 1981 .

[20]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[21]  Katharina Anna Zweig,et al.  Different flavors of randomness: comparing random graph models with fixed degree sequences , 2015, Social Network Analysis and Mining.

[22]  Annabell Berger,et al.  Curveball: a new generation of sampling algorithms for graphs with fixed degree sequence , 2016, ArXiv.

[23]  Yung-Pin Chen,et al.  An Application of Markov Chain Monte Carlo to Community Ecology , 2003, Am. Math. Mon..

[24]  N. Verhelst An Efficient MCMC Algorithm to Sample Binary Matrices with Fixed Marginals , 2008 .

[25]  Manuel Penschuck,et al.  I/O-Efficient Generation of Massive Graphs Following the LFR Benchmark , 2016, ALENEX.

[26]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[27]  Jop F. Sibeyn,et al.  Algorithms for Memory Hierarchies: Advanced Lectures , 2003 .

[28]  S. Hakimi On Realizability of a Set of Integers as Degrees of the Vertices of a Linear Graph. I , 1962 .

[29]  Giovanni Strona,et al.  A fast and unbiased procedure to randomize ecological binary matrices with fixed row and column totals , 2014, Nature Communications.

[30]  Lars Arge,et al.  The Buuer Tree: a New Technique for Optimal I/o-algorithms ? , 1995 .

[31]  Ali Pinar,et al.  A stopping criterion for Markov chains when generating independent random graphs , 2012, J. Complex Networks.

[32]  S. Strogatz Exploring complex networks , 2001, Nature.

[33]  Roman Dementiev,et al.  Building a parallel pipelined external memory algorithm library , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[34]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  Catherine S. Greenhill A Polynomial Bound on the Mixing Time of a Markov Chain for Sampling Regular Directed Graphs , 2011, Electron. J. Comb..

[37]  Lars Arge,et al.  The Buffer Tree: A New Technique for Optimal I/O-Algorithms (Extended Abstract) , 1995, WADS.

[38]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.