PriorityGraph: A Unified Programming Model for Optimizing Ordered Graph Algorithms

Many graph problems can be solved using ordered parallel graph algorithms that achieve significant speedup over their unordered counterparts by reducing redundant work. This paper introduces a new priority-based extension to GraphIt, a domain-specific language for writing graph applications, to simplify writing high-performance parallel ordered graph algorithms. The extension enables vertices to be processed in a dynamic order while hiding low-level implementation details from the user. We extend the compiler with new program analyses, transformations, and code generation to produce fast implementations of ordered parallel graph algorithms. We also introduce bucket fusion, a new performance optimization that fuses together different rounds of ordered algorithms to reduce synchronization overhead, resulting in 1.2×–3× speedup over the fastest existing ordered algorithm implementations on road networks with large diameters. With the extension, GraphIt achieves up to 3× speedup on six ordered graph algorithms over state-of-the-art frameworks and hand-optimized implementations (Julienne, Galois, and GAPBS) that support ordered algorithms.

[1]  Peter Sanders,et al.  [Delta]-stepping: a parallelizable shortest path algorithm , 2003, J. Algorithms.

[2]  Frédo Durand,et al.  Halide , 2017, Commun. ACM.

[3]  Andrew V. Goldberg,et al.  The shortest path problem : ninth DIMACS implementation challenge , 2009 .

[4]  Shoaib Kamil,et al.  Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code , 2018, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[5]  Daniel Sánchez,et al.  Data-centric execution of speculative parallel programs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  Keshav Pingali,et al.  Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations , 2017, PPoPP.

[7]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[8]  Christoforos E. Kozyrakis,et al.  Making pull-based graph processing performant , 2018, PPoPP.

[9]  Julian Shun,et al.  Low-latency graph streaming using compressed purely-functional trees , 2019, PLDI.

[10]  Matei Zaharia,et al.  Making caches work for graph analytics , 2016, 2017 IEEE International Conference on Big Data (Big Data).

[11]  Shoaib Kamil,et al.  GraphIt: a high-performance graph DSL , 2018, Proc. ACM Program. Lang..

[12]  Nancy M. Amato,et al.  KLA: A new algorithmic paradigm for parallel graph computations , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[13]  Dan Alistarh,et al.  Distributionally Linearizable Data Structures , 2018, SPAA.

[14]  Daniel Sánchez,et al.  Harmonizing Speculative and Non-Speculative Execution in Architectures for Ordered Parallelism , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  Maleen Abeydeera,et al.  Chronos: Efficient Speculative Parallelism for Accelerators , 2020, ASPLOS.

[16]  Dan Alistarh,et al.  The SprayList: a scalable relaxed priority queue , 2015, PPoPP.

[17]  Keshav Pingali,et al.  Kinetic Dependence Graphs , 2015, ASPLOS.

[18]  Nathan Beckmann,et al.  PHI: Architectural Support for Synchronization- and Bandwidth-Efficient Commutative Scatter Updates , 2019, MICRO.

[19]  Rajiv Gupta,et al.  PnP: Pruning and Prediction for Point-To-Point Iterative Graph Analytics , 2019, ASPLOS.

[20]  Keshav Pingali,et al.  A compiler for throughput optimization of graph algorithms on GPUs , 2016, OOPSLA.

[21]  Pradeep Dubey,et al.  GraphMat: High performance graph analytics made productive , 2015, Proc. VLDB Endow..

[22]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[23]  Alex Brooks,et al.  Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics , 2018, PLDI.

[24]  Guy E. Blelloch,et al.  Parallel and I/O efficient set covering algorithms , 2012, SPAA '12.

[25]  Keshav Pingali,et al.  Phoenix: A Substrate for Resilient Distributed Graph Analytics , 2019, ASPLOS.

[26]  Sherif Sakr,et al.  Large-Scale Graph Processing Using Apache Giraph , 2017, Springer International Publishing.

[27]  Margaret Martonosi,et al.  Graphicionado: A high-performance and energy-efficient accelerator for graph analytics , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[28]  Keshav Pingali,et al.  Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms , 2011, PPoPP '11.

[29]  John D. Owens,et al.  Gunrock , 2017, ACM Trans. Parallel Comput..

[30]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[31]  T. Lindvall ON A ROUTING PROBLEM , 2004, Probability in the Engineering and Informational Sciences.

[32]  Rajiv Gupta,et al.  ASPIRE: exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM , 2014, OOPSLA.

[33]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[34]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[35]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[36]  Andreas Gerstlauer,et al.  Start Late or Finish Early: A Distributed Graph Processing System with Redundancy Reduction , 2018, Proc. VLDB Endow..

[37]  Guy E. Blelloch,et al.  Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing , 2017, SPAA.

[38]  Wenguang Chen,et al.  Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.

[39]  Leland L. Beck,et al.  Smallest-last ordering and clustering and graph coloring algorithms , 1983, JACM.

[40]  Daniel Sánchez,et al.  Fractal: An execution model for fine-grain nested speculative parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[41]  Ke Meng,et al.  A pattern based algorithmic autotuner for graph processing on GPUs , 2019, PPoPP.

[42]  Guy E. Blelloch,et al.  Linear-work greedy parallel approximate set cover and variants , 2011, SPAA '11.

[43]  John D. Owens,et al.  Gunrock: a high-performance graph processing library on the GPU , 2016, PPoPP 2016.

[44]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[45]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[46]  David A. Patterson,et al.  The GAP Benchmark Suite , 2015, ArXiv.

[47]  Ming Wu,et al.  Managing Large Graphs on Multi-Cores with Graph Awareness , 2012, USENIX Annual Technical Conference.

[48]  Xiaosong Ma,et al.  Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[49]  Rajiv Gupta,et al.  KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations , 2017, ASPLOS.

[50]  Ulrich Meyer,et al.  [Delta]-stepping: a parallelizable shortest path algorithm , 2003, J. Algorithms.

[51]  Cong Yan,et al.  A scalable architecture for ordered parallelism , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[52]  Ying Liu,et al.  Lazygraph: lazy data coherency for replicas in distributed graph-parallel computation , 2018, PPoPP.

[53]  Kang Chen,et al.  Wonderland: A Novel Abstraction-Based Out-Of-Core Graph Processing System , 2018, ASPLOS.