A lightweight infrastructure for graph analytics

Several domain-specific languages (DSLs) for parallel graph analytics have been proposed recently. In this paper, we argue that existing DSLs can be implemented on top of a general-purpose infrastructure that (i) supports very fine-grain tasks, (ii) implements autonomous, speculative execution of these tasks, and (iii) allows application-specific control of task scheduling policies. To support this claim, we describe such an implementation called the Galois system. We demonstrate the capabilities of this infrastructure in three ways. First, we implement more sophisticated algorithms for some of the graph analytics problems tackled by previous DSLs and show that end-to-end performance can be improved by orders of magnitude even on power-law graphs, thanks to the better algorithms facilitated by a more general programming model. Second, we show that, even when an algorithm can be expressed in existing DSLs, the implementation of that algorithm in the more general system can be orders of magnitude faster when the input graphs are road networks and similar graphs with high diameter, thanks to more sophisticated scheduling. Third, we implement the APIs of three existing graph DSLs on top of the common infrastructure in a few hundred lines of code and show that even for power-law graphs, the performance of the resulting implementations often exceeds that of the original DSL systems, thanks to the lightweight infrastructure.

[1]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[2]  Ulrich Meyer,et al.  Delta-Stepping: A Parallel Single Source Shortest Path Algorithm , 1998, ESA.

[3]  Nir Shavit,et al.  Skiplist-based concurrent priority queues , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[4]  Karen Rose,et al.  What is Twitter , 2009 .

[5]  Kathryn S. McKinley,et al.  Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.

[6]  Rizal Setya Perdana What is Twitter , 2013 .

[7]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[8]  Nir Shavit,et al.  Scalable concurrent priority queue algorithms , 1999, PODC '99.

[9]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[10]  Keshav Pingali,et al.  Optimistic parallelism requires abstractions , 2007, PLDI '07.

[11]  Anne Rogers,et al.  Process decomposition through locality of reference , 1989, PLDI '89.

[12]  Kunle Olukotun,et al.  A practical concurrent binary search tree , 2010, PPoPP '10.

[13]  Dimitrios S. Nikolopoulos,et al.  Scalable locality-conscious multithreaded memory allocation , 2006, ISMM '06.

[14]  Nancy M. Amato,et al.  Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[16]  Keshav Pingali,et al.  Synthesizing concurrent schedulers for irregular algorithms , 2011, ASPLOS XVI.

[17]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[18]  Christoforos E. Kozyrakis,et al.  Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[19]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[20]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[21]  Srinivasan Parthasarathy,et al.  An Efficient Algorithm for Concurrent Priority Queue Heaps , 1996, Inf. Process. Lett..

[22]  K. Mani Chandy,et al.  The drinking philosophers problem , 1984, ACM Trans. Program. Lang. Syst..

[23]  Keshav Pingali,et al.  The tao of parallelism in algorithms , 2011, PLDI '11.

[24]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[25]  D. Bertsekas,et al.  Parallel asynchronous label-correcting methods for shortest paths , 1996 .

[26]  Philippas Tsigas,et al.  Fast and lock-free concurrent priority queues for multi-thread systems , 2005, J. Parallel Distributed Comput..

[27]  Kunle Olukotun,et al.  Green-Marl: a DSL for easy and efficient graph analysis , 2012, ASPLOS XVII.

[28]  Marios C. Papaefthymiou,et al.  Implementing parallel shortest-paths algorithms , 1994, Parallel Algorithms.

[29]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[30]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[31]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[32]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[33]  Joseph E. Gonzalez,et al.  GraphLab: A New Parallel Framework for Machine Learning , 2010 .

[34]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[35]  Maged M. Michael Scalable lock-free dynamic memory allocation , 2004, PLDI '04.

[36]  M. Frans Kaashoek,et al.  Scalable address spaces using RCU balanced trees , 2012, ASPLOS XVII.

[37]  Bernard Chazelle,et al.  The soft heap: an approximate priority queue with optimal error rate , 2000, JACM.