Graph programming interface (GPI): a linear algebra programming model for large scale graph computations

Graph processing is becoming a crucial component for analyzing big data arising in many application domains such as social and biological networks, fraud detection, and sentiment analysis. As a result, a number of computational models for graph analytics have been proposed in the literature to help users write efficient large scale graph algorithms. In this paper we present an alternative model for implementing graph algorithms using a linear algebra based specification. We first specify a set of linear algebra primitives that allows users to express graph algorithms by composition of linear algebra operations. We then describe a high performance implementation of these primitives and its integration with the Spark framework to achieve the scalability we need for large shared-memory systems. We provide an overview of our implementation and also compare and contrast the expressiveness and performance of various algorithms implemented with our approach with that of the current Spark GraphX implementation of those algorithms.

[1]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[2]  Kurt Mehlhorn,et al.  LEDA: a platform for combinatorial and geometric computing , 1997, CACM.

[3]  John R. Gilbert,et al.  Graph Algorithms in the Language of Linear Algebra: How Did We Get Here, and Where Do We Go Next? , 2011, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[4]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[5]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[6]  Nancy M. Amato,et al.  A Hybrid Approach to Processing Big Data Graphs on Memory-Restricted Systems , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[7]  John R. Gilbert,et al.  Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.

[8]  Brian W. Barrett,et al.  Introducing the Graph 500 , 2010 .

[9]  Joseph E. Gonzalez,et al.  GraphLab: A New Parallel Framework for Machine Learning , 2010 .

[10]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[11]  John R. Gilbert,et al.  The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..

[12]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[13]  Jeremy G. Siek,et al.  The Boost Graph Library - User Guide and Reference Manual , 2001, C++ in-depth series.

[14]  Avery Ching,et al.  One Trillion Edges: Graph Processing at Facebook-Scale , 2015, Proc. VLDB Endow..

[15]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[16]  Inderjit S. Dhillon,et al.  Scalable Data-Driven PageRank: Algorithms, System Issues, and Lessons Learned , 2015, Euro-Par.

[17]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[18]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[19]  Fabio Checconi,et al.  Optimizing Sparse Linear Algebra for Large-Scale Graph Analytics , 2015, Computer.

[20]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[21]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[22]  Michael Stonebraker,et al.  Standards for graph algorithm primitives , 2014, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[23]  T. Nguyen,et al.  IBM Power Systems built with the POWER8 architecture and processors , 2015, IBM J. Res. Dev..

[24]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[25]  Robert A. van de Geijn,et al.  SUMMA: Scalable Universal Matrix Multiplication Algorithm , 1995 .