Compact graph representations and parallel connectivity algorithms for massive dynamic network analysis

Graph-theoretic abstractions are extensively used to analyze massive data sets. Temporal data streams from socio-economic interactions, social networking web sites, communication traffic, and scientific computing can be intuitively modeled as graphs. We present the first study of novel high-performance combinatorial techniques for analyzing largescale information networks, encapsulating dynamic interaction data in the order of billions of entities. We present new data structures to represent dynamic interaction networks, and discuss algorithms for processing parallel insertions and deletions of edges in small-world networks. With these new approaches, we achieve an average performance rate of 25 million structural updates per second and a parallel speed-up of nearly 28 on a 64-way Sun UltraSPARC T2 multicore processor, for insertions and deletions to a small-world network of 33.5 million vertices and 268 million edges. We also design parallel implementations of fundamental dynamic graph kernels related to connectivity and centrality queries. Our implementations are freely distributed as part of the open-source SNAP (Small-world Network Analysis and Partitioning) complex network analysis framework.

[1]  David A. Bader,et al.  Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[2]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[3]  Cecilia R. Aragon,et al.  Randomized search trees , 2005, Algorithmica.

[4]  Robert E. Tarjan,et al.  A data structure for dynamic trees , 1981, STOC '81.

[5]  David Eppstein,et al.  Sparsification—a technique for speeding up dynamic graph algorithms , 1997, JACM.

[6]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[7]  Cecilia R. Aragon,et al.  Randomized search trees , 1989, 30th Annual Symposium on Foundations of Computer Science.

[8]  H E Stanley,et al.  Classes of small-world networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Sebastiano Vigna,et al.  The Webgraph framework II: codes for the World-Wide Web , 2004, Data Compression Conference, 2004. Proceedings. DCC 2004.

[10]  Jignesh M. Patel,et al.  SAGA: a subgraph matching tool for biological graphs , 2007, Bioinform..

[11]  Camil Demetrescu,et al.  Trading off space for passes in graph streaming problems , 2009, SODA '06.

[12]  Valdis E. Krebs,et al.  Mapping Networks of Terrorist Cells , 2001 .

[13]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[14]  Andrea Maggiolo-Schettini,et al.  Dynamic Graphs , 1996, MFCS.

[15]  David A. Bader,et al.  On the architectural requirements for efficient execution of graph algorithms , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[16]  David A. Bader,et al.  Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2 , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[17]  Amit Kumar,et al.  Connectivity and inference problems for temporal networks , 2000, STOC '00.

[18]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[19]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[20]  David Eppstein,et al.  Dynamic graph algorithms , 2010 .

[21]  Monika Henzinger,et al.  Randomized dynamic graph algorithms with polylogarithmic time per operation , 1995, STOC '95.

[22]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[23]  Sherry Marcus,et al.  Graph-based technologies for intelligence analysis , 2004, CACM.

[24]  David A. Bader,et al.  An Experimental Study of A Parallel Shortest Path Algorithm for Solving Large-Scale Graph Instances , 2007, ALENEX.

[25]  Viktor K. Prasanna,et al.  Optimizing graph algorithms for improved cache performance , 2002, IEEE Transactions on Parallel and Distributed Systems.

[26]  Christos D. Zaroliagis,et al.  Implementations and Experimental Studies of Dynamic Graph Algorithms , 2000, Experimental Algorithmics.

[27]  Renato F. Werneck,et al.  Design and analysis of data structures for dynamic trees , 2006 .

[28]  David A. Bader,et al.  Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[29]  David A. Bader,et al.  SNAP, Small-world Network Analysis and Partitioning: An open-source parallel graph framework for the exploration of large-scale networks , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[30]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.