论文信息 - Sparse Matrix-Matrix Multiplication for Modern Architectures

Sparse Matrix-Matrix Multiplication for Modern Architectures

Sparse matrix-matrix multiplication (SPMM) is an important kernel in high performance computing that is heavily used in the graph analytics as well as multigrid linear solvers. Because of its highly sparse structure, it is usually difficult to exploit the parallelism in the modern shared memory architectures. Although there have been various work studying shared memory parallelism of SPMM, some points are usually overlooked, such as the memory usage of the SPMM kernels. Since SPMM is a service-kernel, it is important to respect the memory usage of the calling application in order not to interfere with its execution. In this work, we study memory-efficient scalable shared memory parallel SPMM methods. We study graph compression techniques that reduce the size of the matrices, and allow faster computations. Our preliminary results show that we obtain upto 40% speedups w.r.t SPMM implementation provided in Intel Math Library while using 65% less memory.

[1] Luke N. Olson,et al. Optimizing Sparse Matrix—Matrix Multiplication for the GPU , 2015, ACM Trans. Math. Softw..

[2] Pradeep Dubey,et al. Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms , 2015, ISC.

[3] Edith Cohen,et al. Structure Prediction and Computation of Sparse Matrix Products , 1998, J. Comb. Optim..

[4] Mehmet Deveci,et al. Parallel Graph Coloring for Manycore Architectures , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[5] Fred G. Gustavson,et al. Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition , 1978, TOMS.