Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication

We propose a fine-grained hypergraph model for sparse matrix-matrix multiplication (SpGEMM), a key computational kernel in scientific computing and data analysis whose performance is often communication bound. This model correctly describes both the interprocessor communication volume along a critical path in a parallel computation and also the volume of data moving through the memory hierarchy in a sequential computation. We show that identifying a communication-optimal algorithm for particular input matrices is equivalent to solving a hypergraph partitioning problem. Our approach is nonzero structure dependent, meaning that we seek the best algorithm for the given input matrices. In addition to our three-dimensional fine-grained model, we also propose coarse-grained one-dimensional and two-dimensional models that correspond to simpler SpGEMM algorithms. We explore the relations between our models theoretically, and we study their performance experimentally in the context of three applications that use SpGEMM as a key computation. For each application, we find that at least one coarse-grained model is as communication efficient as the fine-grained model. We also observe that different applications have affinities for different algorithms. Our results demonstrate that hypergraphs are an accurate model for reasoning about the communication costs of SpGEMM as well as a practical tool for exploring the SpGEMM algorithm design space.

[1]  S. M. Faisal,et al.  Global graphs: A middleware for large scale graph processing , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[2]  James Demmel,et al.  Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..

[3]  Jonathan J. Hu,et al.  Reducing Communication Costs for Sparse Matrix Multiplication within Algebraic Multigrid , 2015, SIAM J. Sci. Comput..

[4]  John R. Gilbert,et al.  Parallel Triangle Counting and Enumeration Using Matrix Algebra , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[5]  S. M. Faisal,et al.  A fast implementation of MLR-MCL algorithm on multi-core processors , 2014, 2014 21st International Conference on High Performance Computing (HiPC).

[6]  Kadir Akbudak,et al.  Simultaneous Input and Output Matrix Partitioning for Outer-Product-Parallel Sparse Matrix-Matrix Multiplication , 2014, SIAM J. Sci. Comput..

[7]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[8]  AykanatCevdet,et al.  Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication , 1999 .

[9]  Ümit V. Çatalyürek,et al.  PaToH: Partitioning Tool for Hypergraphs , 1999 .

[10]  Dror Irony,et al.  Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..

[11]  Samuel Williams,et al.  Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication , 2015, SIAM J. Sci. Comput..

[12]  Panayot S. Vassilevski,et al.  Two-Level Adaptive Algebraic Multigrid for a Sequence of Problems with Slowly Varying Random Coefficients , 2013, SIAM J. Sci. Comput..

[13]  Ümit V. Çatalyürek,et al.  Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication , 1999, IEEE Trans. Parallel Distributed Syst..

[14]  S. Dongen Graph clustering by flow simulation , 2000 .

[15]  Panayot S. Vassilevski,et al.  Accurate Coarse-Scale AMG-Based Finite Volume Reservoir Simulations in Highly Heterogeneous Media , 2015, ANSS 2015.

[16]  Oded Schwartz,et al.  Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication , 2015, SPAA.

[17]  Sivasankaran Rajamanickam,et al.  Scalable matrix computations on large scale-free graphs using 2D graph partitioning , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[18]  James Demmel,et al.  Communication optimal parallel multiplication of sparse random matrices , 2013, SPAA.

[19]  Bora Uçar,et al.  On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe , 2010, SIAM J. Sci. Comput..

[20]  Rasmus Pagh,et al.  The Input/Output Complexity of Sparse Matrix Multiplication , 2014, ESA.

[21]  Kadir Akbudak,et al.  Hypergraph Partitioning Based Models and Methods for Exploiting Cache Locality in Sparse Matrix-Vector Multiplication , 2012, SIAM J. Sci. Comput..

[22]  D. Kalchev,et al.  Adaptive Algebraic Multigrid for Finite Element Elliptic Equations with Random Coefficients , 2012 .

[23]  Brendan Vastenhouw,et al.  A Two-Dimensional Data Distribution Method for Parallel Sparse Matrix-Vector Multiplication , 2005, SIAM Rev..

[24]  John R. Gilbert,et al.  On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[25]  Ojas Parekh,et al.  LDRD Final Report on Massively-Parallel Linear Programming: the parPCx System , 2005 .

[26]  John R. Gilbert,et al.  Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments , 2011, SIAM J. Sci. Comput..

[27]  Panayot S. Vassilevski,et al.  Smoothed Aggregation Spectral Element Agglomeration AMG: SA-ρAMGe , 2011, LSSC.

[28]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.

[29]  Joost VandeVondele,et al.  Sparse matrix multiplication: The distributed block-compressed sparse row library , 2014, Parallel Comput..

[30]  Edith Cohen,et al.  Structure Prediction and Computation of Sparse Matrix Products , 1998, J. Comb. Optim..

[31]  Gero Greiner,et al.  Sparse Matrix Computations and their I/O Complexity , 2012 .

[32]  P. Sadayappan,et al.  Hypergraph Partitioning for Automatic Memory Hierarchy Management , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[33]  Rajeev Thakur,et al.  Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..

[34]  Srinivasan Parthasarathy,et al.  Scalable graph clustering using stochastic flows: applications to community discovery , 2009, KDD.

[35]  Michael Andrew Christie,et al.  Tenth SPE Comparative Solution Project: a comparison of upscaling techniques , 2001 .

[36]  Fred G. Gustavson,et al.  Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition , 1978, TOMS.

[37]  Timothy A. Davis,et al.  Direct methods for sparse linear systems , 2006, Fundamentals of algorithms.

[38]  Ichitaro Yamazaki,et al.  On Techniques to Improve Robustness and Scalability of a Parallel Hybrid Linear Solver , 2010, VECPAR.

[39]  Ümit V. Çatalyürek,et al.  A fine-grain hypergraph model for 2D decomposition of sparse matrices , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[40]  B GibbonsPhillip ACM transactions on parallel computing , 2014 .

[41]  James Demmel,et al.  Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds , 2012, SPAA '12.

[42]  Pradeep Dubey,et al.  High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[43]  Panayot S. Vassilevski,et al.  Multilevel Techniques Lead to Accurate Numerical Upscaling and Scalable Robust Solvers for Reservoir Simulation , 2015, ANSS 2015.

[44]  James Demmel,et al.  Communication lower bounds and optimal algorithms for numerical linear algebra*† , 2014, Acta Numerica.

[45]  Vijay V. Vazirani,et al.  Maximum Matchings in General Graphs Through Randomization , 1989, J. Algorithms.