Introducing ScaleGraph: an X10 library for billion scale graph analytics

Highly Productive Computing Systems (HPCS) and PGAS languages are considered as important ways in achieving the exascale computational capabilities. Most of the current large graph processing applications are custom developed using non-HPCS/PGAS techniques such as MPI, MapReduce. This paper introduces Scale-Graph, an X10 library targeting billion scale graph analysis scenarios. Compared to non-PGAS alternatives, ScaleGraph defines concrete, simple abstractions for representing massive graphs. We have designed ScaleGraph from ground up considering graph structural property analysis, graph clustering and community detection. We describe the design of the library and provide some initial performance evaluation results of the library using a twitter graph with 1.47 billion edges.

[1]  Rizal Setya Perdana What is Twitter , 2013 .

[2]  David A. Bader,et al.  Multithreaded Algorithms for Processing Massive Graphs. , 2007 .

[3]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[4]  Song Yang,et al.  Networks: An Introduction by M. E. J. Newman , 2013 .

[5]  Mark E. J. Newman,et al.  Structure and Dynamics of Networks , 2009 .

[6]  Steven Skiena,et al.  The Algorithm Design Manual , 2020, Texts in Computer Science.

[7]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[8]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[9]  David A. Bader,et al.  On the architectural requirements for efficient execution of graph algorithms , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[10]  Guojing Cong,et al.  Fast PGAS connected components algorithms , 2009, PGAS '09.

[11]  Andrew Lumsdaine,et al.  Lifting sequential graph algorithms for distributed-memory parallel computation , 2005, OOPSLA '05.

[12]  U. Brandes,et al.  GraphML Progress Report ? Structural Layer Proposal , 2001 .

[13]  David D. Jensen,et al.  Accurate Estimation of the Degree Distribution of Private Networks , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[14]  J. Anthonisse The rush in a directed graph , 1971 .

[15]  David A. Bader,et al.  Massive Social Network Analysis: Mining Twitter for Social Good , 2010, 2010 39th International Conference on Parallel Processing.

[16]  Brian W. Barrett,et al.  Implementing a portable Multi-threaded Graph Library: The MTGL on Qthreads , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[17]  Keshav Pingali,et al.  Optimistic parallelism requires abstractions , 2009, CACM.

[18]  Jaakko Järvi,et al.  A comparative study of language support for generic programming , 2003, OOPSLA 2003.

[19]  Jim Law,et al.  Review of "The boost graph library: user guide and reference manual by Jeremy G. Siek, Lie-Quan Lee, and Andrew Lumsdaine." Addison-Wesley 2002. , 2003, SOEN.

[20]  John Shalf,et al.  The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..

[21]  Guojing Cong,et al.  Fast PGAS Implementation of Distributed Graph Algorithms , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[22]  Dmitry Batenkov Boosting productivity with the Boost Graph Library , 2011, XRDS.

[23]  Keshav Pingali,et al.  The tao of parallelism in algorithms , 2011, PLDI '11.

[24]  John R. Gilbert,et al.  A Flexible Open-Source Toolbox for Scalable Complex Graph Analysis , 2012, SDM.

[25]  David Grove,et al.  X10 as a Parallel Language for Scientific Computation: Practice and Experience , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[26]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[27]  John R. Gilbert,et al.  The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..

[28]  Duncan J. Watts,et al.  The Structure and Dynamics of Networks: (Princeton Studies in Complexity) , 2006 .

[29]  Vivek Sarkar,et al.  May-happen-in-parallel analysis of X10 programs , 2007, PPoPP.

[30]  Vijay Saraswat,et al.  GPU programming in a high level language: compiling X10 to CUDA , 2011, X10 '11.

[31]  Kemal Ebcioğlu,et al.  X 10 : Programming for Hierarchical Parallelism and Non-Uniform Data Access ( Extended , 2004 .

[32]  Andrs Vajda Programming Many-Core Chips , 2011 .

[33]  Jonathan W. Berry,et al.  Software and Algorithms for Graph Queries on Multithreaded Architectures , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[34]  Nancy M. Amato,et al.  STAPL: An Adaptive, Generic Parallel C++ Library , 2001, LCPC.

[35]  David Cunningham,et al.  A performance model for X10 applications: what's going on under the hood? , 2011, X10 '11.

[36]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[37]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[38]  Jeremy G. Siek,et al.  The generic graph component library , 1999, OOPSLA '99.

[39]  U. Brandes A faster algorithm for betweenness centrality , 2001 .