Performance evaluation of sparse matrix products in UPC

Unified Parallel C (UPC) is a Partitioned Global Address Space (PGAS) language whose popularity has increased during the last years owing to its high programmability and reasonable performance through an efficient exploitation of data locality, especially on hierarchical architectures like multicore clusters. However, the performance issues that arise in this language due to the irregular structure of sparse matrix operations have not yet been studied. Among them, the selection of an adequate storage format for the sparse matrices can significantly improve the efficiency of the parallel codes. This paper presents an evaluation, using UPC, of the most common sparse storage formats with different implementations of the matrix-vector and matrix-matrix products, which are key kernels in many scientific applications.

[1]  Jack Dongarra,et al.  Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.

[2]  Tarek A. El-Ghazawi,et al.  UPC Performance and Potential: A NPB Experimental Study , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[3]  Rajesh Nishtala,et al.  UPC Implementation of the Sparse Triangular Solve and NAS FT , 2004 .

[4]  Mikel Luján,et al.  Storage Formats for Sparse Matrices in Java , 2005, International Conference on Computational Science.

[5]  Katherine A. Yelick,et al.  Optimizing bandwidth limited problems using one-sided communication and overlap , 2005, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[6]  Pierre Kuonen,et al.  Well balanced sparse matrix-vector multiplication on a parallel heterogeneous system , 2006, 2006 IEEE International Conference on Cluster Computing.

[7]  Anila Usman,et al.  Implementation and Evaluation of Parallel Sparse Matrix-Vector Products on Distributed Memory Parallel Computers , 2006, 2006 IEEE International Conference on Cluster Computing.

[8]  José Nelson Amaral,et al.  Shared memory programming for large scale machines , 2006, PLDI '06.

[9]  Mikel Luján,et al.  Performance Evaluation of Storage Formats for Sparse Matrices in Fortran , 2006, HPCC.

[10]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[11]  Lowell Ozment The last visit. , 2007, The Journal of the Arkansas Medical Society.

[12]  John R. Gilbert,et al.  Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication , 2008, 2008 37th International Conference on Parallel Processing.

[13]  Yunquan Zhang,et al.  Performance Evaluation of Multithreaded Sparse Matrix-Vector Multiplication Using OpenMP , 2009, 2009 11th IEEE International Conference on High Performance Computing and Communications.

[14]  Juan Touriño,et al.  Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures , 2009, PVM/MPI.

[15]  Katherine A. Yelick,et al.  Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[16]  Nicholas J. Wright,et al.  A programming model performance study using the NAS parallel benchmarks , 2010, Sci. Program..

[17]  Maxime R. Hugues,et al.  Sparse Matrix Formats Evaluation and Optimization on a GPU , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).

[18]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[19]  Juan Touriño,et al.  UPCBLAS: a library for parallel matrix computations in Unified Parallel C , 2012, Concurr. Comput. Pract. Exp..

[20]  A preliminary evaluation of the hardware acceleration of the Cray Gemini interconnect for PGAS languages and comparison with MPI , 2012, PERV.