GraphIt: a high-performance graph DSL

The performance bottlenecks of graph applications depend not only on the algorithm and the underlying hardware, but also on the size and structure of the input graph. As a result, programmers must try different combinations of a large set of techniques, which make tradeoffs among locality, work-efficiency, and parallelism, to develop the best implementation for a specific algorithm and type of graph. Existing graph frameworks and domain specific languages (DSLs) lack flexibility, supporting only a limited set of optimizations. This paper introduces GraphIt, a new DSL for graph computations that generates fast implementations for algorithms with different performance characteristics running on graphs with different sizes and structures. GraphIt separates what is computed (algorithm) from how it is computed (schedule). Programmers specify the algorithm using an algorithm language, and performance optimizations are specified using a separate scheduling language. The algorithm language simplifies expressing the algorithms, while exposing opportunities for optimizations. We formulate graph optimizations, including edge traversal direction, data layout, parallelization, cache, NUMA, and kernel fusion optimizations, as tradeoffs among locality, parallelism, and work-efficiency. The scheduling language enables programmers to easily search through this complicated tradeoff space by composing together a large set of edge traversal, vertex data layout, and program structure optimizations. The separation of algorithm and schedule also enables us to build an autotuner on top of GraphIt to automatically find high-performance schedules. The compiler uses a new scheduling representation, the graph iteration space, to model, compose, and ensure the validity of the large number of optimizations. We evaluate GraphIt’s performance with seven algorithms on graphs with different structures and sizes. GraphIt outperforms the next fastest of six state-of-the-art shared-memory frameworks (Ligra, Green-Marl, GraphMat, Galois, Gemini, and Grazelle) on 24 out of 32 experiments by up to 4.8×, and is never more than 43% slower than the fastest framework on the other experiments. GraphIt also reduces the lines of code by up to an order of magnitude compared to the next fastest framework.

[1]  Wojciech Matusik,et al.  Simit , 2016, ACM Trans. Graph..

[2]  Alexander S. Szalay,et al.  FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs , 2014, FAST.

[3]  Keshav Pingali,et al.  Synthesizing parallel graph programs via automated planning , 2015, PLDI.

[4]  Matei Zaharia,et al.  Making caches work for graph analytics , 2016, 2017 IEEE International Conference on Big Data (Big Data).

[5]  Pradeep Dubey,et al.  GraphMat: High performance graph analytics made productive , 2015, Proc. VLDB Endow..

[6]  Wenguang Chen,et al.  Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.

[7]  John D. Owens,et al.  Gunrock: a high-performance graph processing library on the GPU , 2016, PPoPP 2016.

[8]  Kunle Olukotun,et al.  EmptyHeaded: A Relational Engine for Graph Processing , 2015, ACM Trans. Database Syst..

[9]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[10]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[11]  Kai Wang,et al.  Graspan: A Single-machine Disk-based Graph System for Interprocedural Static Analyses of Large-scale Systems Code , 2017, ASPLOS.

[12]  Christos Faloutsos,et al.  PEGASUS: mining peta-scale graphs , 2011, Knowledge and Information Systems.

[13]  Zhiyuan Li,et al.  Data dependence analysis on multi-dimensional array references , 1989, ICS '89.

[14]  Monica S. Lam,et al.  SociaLite: Datalog extensions for efficient social network analysis , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[15]  Willy Zwaenepoel,et al.  Chaos: scale-out graph processing from secondary storage , 2015, SOSP.

[16]  Sherif Sakr,et al.  Large-Scale Graph Processing Using Apache Giraph , 2017, Springer International Publishing.

[17]  Yunming Zhang,et al.  Optimizing indirect memory references with milk , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[18]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[19]  John R. Gilbert,et al.  A Flexible Open-Source Toolbox for Scalable Complex Graph Analysis , 2012, SDM.

[20]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[21]  David A. Patterson,et al.  Reducing Pagerank Communication via Propagation Blocking , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[22]  Kunle Olukotun,et al.  Green-Marl: a DSL for easy and efficient graph analysis , 2012, ASPLOS XVII.

[23]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[24]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[25]  Min Wu,et al.  GeaBase: A High-Performance Distributed Graph Database for Industry-Scale Applications , 2017, 2017 Fifth International Conference on Advanced Cloud and Big Data (CBD).

[26]  James Demmel,et al.  When cache blocking of sparse matrix vector multiply works and why , 2007, Applicable Algebra in Engineering, Communication and Computing.

[27]  Willy Zwaenepoel,et al.  Everything you always wanted to know about multicore graph processing but were afraid to ask , 2017, USENIX Annual Technical Conference.

[28]  Dimitrios S. Nikolopoulos,et al.  GraphGrind: addressing load imbalance of graph partitioning , 2017, ICS.

[29]  Rajiv Gupta,et al.  Load the Edges You Need: A Generic I/O Optimization for Disk-based Graph Processing , 2016, USENIX Annual Technical Conference.

[30]  David A. Patterson,et al.  Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server , 2015, 2015 IEEE International Symposium on Workload Characterization.

[31]  Pradeep Dubey,et al.  Navigating the maze of graph analytics frameworks using massive graph datasets , 2014, SIGMOD Conference.

[32]  Kai Wang,et al.  GraphQ: Graph Query Processing with Abstraction Refinement - Scalable and Programmable Analytics over Very Large Graphs on a Single PC , 2015, USENIX Annual Technical Conference.

[33]  Keshav Pingali,et al.  Abelian: A Compiler for Graph Analytics on Distributed, Heterogeneous Platforms , 2018, Euro-Par.

[34]  Haibo Chen,et al.  NUMA-aware graph-structured analytics , 2015, PPoPP.

[35]  Jimmy J. Lin,et al.  GraphJet: Real-Time Content Recommendations at Twitter , 2016, Proc. VLDB Endow..

[36]  Yuanyuan Tian,et al.  Big Graph Analytics Platforms , 2017, Found. Trends Databases.

[37]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[38]  Keshav Pingali,et al.  Elixir: a system for synthesizing concurrent graph programs , 2012, OOPSLA '12.

[39]  David A. Patterson,et al.  Direction-optimizing breadth-first search , 2012, HiPC 2012.

[40]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[41]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[42]  James Bennett,et al.  The Netflix Prize , 2007 .

[43]  Torsten Hoefler,et al.  To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations , 2017, HPDC.

[44]  Christoforos E. Kozyrakis,et al.  Making pull-based graph processing performant , 2018, PPoPP.

[45]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[46]  Jure Leskovec,et al.  Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time , 2017, WWW.

[47]  Frédo Durand,et al.  Halide , 2017, Commun. ACM.

[48]  Alex Brooks,et al.  Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics , 2018, PLDI.

[49]  Tim Weninger,et al.  Thinking Like a Vertex , 2015, ACM Comput. Surv..

[50]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[51]  Eric Darve,et al.  Liszt: A domain specific language for building portable mesh-based PDE solvers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[52]  Ming Wu,et al.  Managing Large Graphs on Multi-Cores with Graph Awareness , 2012, USENIX Annual Technical Conference.

[53]  Sizhuo Zhang,et al.  GraFBoost: Using Accelerated Flash Storage for External Graph Analytics , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[54]  R. Dolbeau,et al.  HMPP TM : A Hybrid Multi-core Parallel Programming Environment , 2022 .

[55]  Binyu Zang,et al.  PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs , 2019, TOPC.

[56]  Mohan Kumar,et al.  Mosaic: Processing a Trillion-Edge Graph on a Single Machine , 2017, EuroSys.

[57]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[58]  Wenguang Chen,et al.  GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning , 2015, USENIX ATC.

[59]  Mary W. Hall,et al.  CHiLL : A Framework for Composing High-Level Loop Transformations , 2007 .

[60]  Monica S. Lam,et al.  Efficient and exact data dependence analysis , 1991, PLDI '91.

[61]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[62]  Yannis Sismanis,et al.  Sparkler: supporting large-scale matrix factorization , 2013, EDBT '13.

[63]  -. Qiang,et al.  Graph Processing on GPUs , 2018, ACM Comput. Surv..