Custom code generation for a graph DSL

We present challenges faced in making a domain-specific language (DSL) for graph algorithms adapt to varying requirements of generating a spectrum of efficient parallel codes. Graph algorithms are at the heart of several applications, and achieving high performance on graph applications has become critical due to the tremendous growth of irregular data. However, irregular algorithms are quite challenging to auto-parallelize, due to access patterns influenced by the input graph - which is unavailable until execution. Former research has addressed this issue by designing DSLs for graph algorithms, which restrict generality but allow efficient codegeneration for various backends. Such DSLs are, however, too rigid, and do not adapt to changes. For instance, these DSLs are incapable of changing the way of processing if the underlying graph changes. As another instance, most of the DSLs do not support more than one backends. We narrate our experiences in making an existing DSL, named Falcon, adaptive. The biggest challenge in the process is to retain the DSL code for specifying the underlying algorithm, and still generate different backend codes. We illustrate the effectiveness of our proposal by auto-generating codes for vertex-based versus edge-based graph processing, synchronous versus asynchronous execution, and CPU versus GPU backends from the same specification.

[1]  Y. N. Srikant,et al.  Large Scale Graph Processing in a Distributed Environment , 2017, Euro-Par Workshops.

[2]  Y. N. Srikant,et al.  DH-Falcon: A Language for Large-Scale Graph Processing on Distributed Heterogeneous Systems , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).

[3]  Keshav Pingali,et al.  A GPU implementation of inclusion-based points-to analysis , 2012, PPoPP '12.

[4]  Y. N. Srikant,et al.  Falcon: A Graph Manipulation Language for Heterogeneous Systems , 2016, ACM Trans. Archit. Code Optim..

[5]  Sungpack Hong,et al.  PGX.D: a fast distributed graph processing engine , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Keshav Pingali,et al.  A compiler for throughput optimization of graph algorithms on GPUs , 2016, OOPSLA.

[7]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[8]  Keshav Pingali,et al.  Abelian: A Compiler for Graph Analytics on Distributed, Heterogeneous Platforms , 2018, Euro-Par.

[9]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[10]  Keshav Pingali,et al.  Morph algorithms on GPUs , 2013, PPoPP '13.

[11]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[12]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[13]  Rupesh Nasre,et al.  LightHouse: An Automatic Code Generator for Graph Algorithms on GPUs , 2016, LCPC.

[14]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[15]  Keshav Pingali,et al.  The tao of parallelism in algorithms , 2011, PLDI '11.

[16]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[17]  Keshav Pingali,et al.  Phoenix: A Substrate for Resilient Distributed Graph Analytics , 2019, ASPLOS.

[18]  Kunle Olukotun,et al.  Green-Marl: a DSL for easy and efficient graph analysis , 2012, ASPLOS XVII.

[19]  Keshav Pingali,et al.  Data-Driven Versus Topology-driven Irregular Computations on GPUs , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[20]  Ulrich Meyer,et al.  Delta-Stepping: A Parallel Single Source Shortest Path Algorithm , 1998, ESA.

[21]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[22]  Andrew S. Grimshaw,et al.  Scalable GPU graph traversal , 2012, PPoPP '12.

[23]  Avery Ching,et al.  One Trillion Edges: Graph Processing at Facebook-Scale , 2015, Proc. VLDB Endow..

[24]  Shoaib Kamil,et al.  GraphIt: a high-performance graph DSL , 2018, Proc. ACM Program. Lang..

[25]  Alex Brooks,et al.  Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics , 2018, PLDI.

[26]  Kunle Olukotun,et al.  Simplifying Scalable Graph Processing with a Domain-Specific Language , 2014, CGO '14.

[27]  Ümit V. Çatalyürek,et al.  Betweenness centrality on GPUs and heterogeneous architectures , 2013, GPGPU@ASPLOS.

[28]  Rajeev Gandhi,et al.  The next-generation in-stadium experience (keynote) , 2016 .

[29]  Michael Garland,et al.  Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[30]  John D. Owens,et al.  Gunrock: a high-performance graph processing library on the GPU , 2015, PPoPP.

[31]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[32]  Matei Ripeanu,et al.  A yoke of oxen and a thousand chickens for heavy lifting graph processing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).