Efficient Distributed Graph Analytics using Triply Compressed Sparse Format

This paper presents Triply Compressed Sparse Column (TCSC), a novel compression technique designed specifically for matrix-vector operations where the matrix as well as the input and output vectors are sparse. We refer to these operations as SpMSpV2. TCSC compresses the nonzero columns and rows of a highly sparse matrix representing a large real-world graph. During this compression, it encodes the sparsity patterns of the input and output vectors within the compressed representation of the sparse matrix itself. Consequently, it aligns the compressed indices of the input and output vectors with those of the compressed matrix columns and rows, thus eliminating the need for extra indirections when SpMSpV2 operations access the vectors. This results in fewer cache misses, greater space efficiency and faster execution times. We evaluate TCSC’s performance and show that it is more space and time efficient compared to CSC and DCSC, with up to 11× speedup. We integrate TCSC into GraphTap, our suggested linear algebra-based distributed graph analytics system. We compare GraphTap against GraphPad and LA3, two state-of-the-art linear algebra-based distributed graph analytics systems, using different dataset scales and numbers of processes. GraphTap is up to 7× faster than these systems due to TCSC and the resulting communication efficiency.

[1]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[2]  Wenguang Chen,et al.  Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.

[3]  TIMOTHY A. DAVIS,et al.  Algorithm 1000 , 2019, ACM Transactions on Mathematical Software.

[4]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[5]  John R. Gilbert,et al.  Scalable complex graph analysis with the knowledge discovery toolbox , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[7]  H. Howie Huang,et al.  Graphene: Fine-Grained IO Management for Graph Computing , 2017, FAST.

[8]  Sivasankaran Rajamanickam,et al.  Scalable matrix computations on large scale-free graphs using 2D graph partitioning , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[9]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[10]  Ariful Azad,et al.  A Work-Efficient Parallel Sparse Matrix-Sparse Vector Multiplication Algorithm , 2016, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[11]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[12]  John R. Gilbert,et al.  On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[13]  Tinkara Toš,et al.  Graph Algorithms in the Language of Linear Algebra , 2012, Software, environments, tools.

[14]  Khuzaima Daudjee,et al.  Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems , 2015, Proc. VLDB Endow..

[15]  Keshav Pingali,et al.  A Study of Partitioning Policies for Graph Analytics on Large-scale Distributed Platforms , 2018, Proc. VLDB Endow..

[16]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[17]  Christoforos E. Kozyrakis,et al.  Making pull-based graph processing performant , 2018, PPoPP.

[18]  Zhisong Fu,et al.  MapGraph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs , 2014, GRADES.

[19]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[20]  Tamer Elsayed,et al.  LA3: A Scalable Link- and Locality-Aware Linear Algebra-Based Graph Analytics System , 2018, Proc. VLDB Endow..

[21]  Pradeep Dubey,et al.  GraphMat: High performance graph analytics made productive , 2015, Proc. VLDB Endow..

[22]  Mohan Kumar,et al.  Mosaic: Processing a Trillion-Edge Graph on a Single Machine , 2017, EuroSys.

[23]  John R. Gilbert,et al.  The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..

[24]  Pradeep Dubey,et al.  GraphPad: Optimized Graph Primitives for Parallel and Distributed Platforms , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[25]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.