Ieee Transactions on Parallel and Distributed Systems Graphct: Multithreaded Algorithms for Massive Graph Analysis

The digital world has given rise to massive quantities of data that include rich semantic and complex networks. A social graph, for example, containing hundreds of millions of actors and tens of billions of relationships is not uncommon. Analyzing these large data sets, even to answer simple analytic queries, often pushes the limits of algorithms and machine architectures. We present GraphCT, a scalable framework for graph analysis using parallel and multithreaded algorithms on shared memory platforms. Utilizing the unique characteristics of the Cray XMT, GraphCT enables fast network analysis at unprecedented scales on a variety of input data sets. On a synthetic power law graph with 2 billion vertices and 17 billion edges, we can find the connected components in 2 minutes. We can estimate the betweenness centrality of a similar graph with 537 million vertices and over 8 billion edges in under 1 hour. GraphCT is built for portability and performance.

[1]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[2]  Steven J. Plimpton,et al.  MapReduce in MPI for Large-scale graph algorithms , 2011, Parallel Comput..

[3]  Sherry Marcus,et al.  Graph-based technologies for intelligence analysis , 2004, CACM.

[4]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[5]  John R. Gilbert,et al.  A Flexible Open-Source Toolbox for Scalable Complex Graph Analysis , 2012, SDM.

[6]  David A. Bader,et al.  Massive Social Network Analysis: Mining Twitter for Social Good , 2010, 2010 39th International Conference on Parallel Processing.

[7]  Ian Gorton,et al.  The Changing Paradigm of Data-Intensive Computing , 2009, Computer.

[8]  Matthew S. Mayernik,et al.  Drowning in data: digital library architecture to support scientific use of embedded sensor networks , 2007, JCDL '07.

[9]  Jonathan W. Berry,et al.  Software and Algorithms for Graph Queries on Multithreaded Architectures , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[10]  Andrew Lumsdaine,et al.  Lifting sequential graph algorithms for distributed-memory parallel computation , 2005, OOPSLA '05.

[11]  S. Havlin,et al.  Breakdown of the internet under intentional attack. , 2000, Physical review letters.

[12]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[13]  David A. Bader,et al.  Graph Algorithms , 2011, Encyclopedia of Parallel Computing.

[14]  Kevin J. Lang Finding good nearly balanced cuts in power law graphs , 2004 .

[15]  Uzi Vishkin,et al.  An O(log n) Parallel Connectivity Algorithm , 1982, J. Algorithms.

[16]  L. Amaral,et al.  The web of human sexual contacts , 2001, Nature.

[17]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[18]  Robin Wilson,et al.  Modern Graph Theory , 2013 .

[19]  David A. Bader,et al.  Massive streaming data analytics: A case study with clustering coefficients , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[20]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[21]  David A. Bader,et al.  National Laboratory Lawrence Berkeley National Laboratory Title A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets Permalink , 2009 .

[22]  David A. Bader,et al.  Generalizing k-Betweenness Centrality Using Short Paths and a Parallel Multithreaded Implementation , 2009, 2009 International Conference on Parallel Processing.

[23]  David A. Bader,et al.  SNAP, Small-world Network Analysis and Partitioning: An open-source parallel graph framework for the exploration of large-scale networks , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[24]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[25]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[26]  Christos Faloutsos,et al.  Spectral Analysis for Billion-Scale Graphs: Discoveries and Implementation , 2011, PAKDD.

[27]  David A. Bader,et al.  Approximating Betweenness Centrality , 2007, WAW.

[28]  Mateo Valero,et al.  Proceedings of the 2nd conference on Computing frontiers , 2005, CF 2008.

[29]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[30]  Vladimir Batagelj,et al.  Pajek - Program for Large Network Analysis , 1999 .

[31]  David A. Bader,et al.  Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[32]  Kamesh Madduri,et al.  Parallel breadth-first search on distributed memory systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[33]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[34]  R. Guimerà,et al.  The worldwide air transportation network: Anomalous centrality, community structure, and cities' global roles , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[35]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[37]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.