Sparse Matrix-Matrix Multiplication for Modern Architectures

Sparse matrix-matrix multiplication (SPMM) is an important kernel in high performance computing that is heavily used in the graph analytics as well as multigrid linear solvers. Because of its highly sparse structure, it is usually difficult to exploit the parallelism in the modern shared memory architectures. Although there have been various work studying shared memory parallelism of SPMM, some points are usually overlooked, such as the memory usage of the SPMM kernels. Since SPMM is a service-kernel, it is important to respect the memory usage of the calling application in order not to interfere with its execution. In this work, we study memory-efficient scalable shared memory parallel SPMM methods. We study graph compression techniques that reduce the size of the matrices, and allow faster computations. Our preliminary results show that we obtain upto 40% speedups w.r.t SPMM implementation provided in Intel Math Library while using 65% less memory.