Prospects for truly asynchronous communication with pure MPI and hybrid MPI/OpenMP on current supercomputing platforms

We investigate the ability of MPI implementations to perform truly asynchronous communication with nonblocking point-to-point calls on current highly parallel systems, including the Cray XT and XE series. For cases where no automatic overlap of communication with computation is available, we demonstrate several different ways of establishing explicitly asynchronous communication by variants of functional decomposition using OpenMP threads or tasks, implement these methods in a parallel sparse matrix-vector multiplication code, and show the resulting performance benefits. The impact of node topology and the possible use of simultaneous multithreading (SMT) is studied in detail.

[1]  E. Cuthill,et al.  Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[2]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[3]  K. Stuben,et al.  Algebraic Multigrid (AMG) : An Introduction With Applications , 2000 .

[4]  Roman Geus,et al.  Towards a fast parallel sparse symmetric matrix-vector multiplication , 2001, Parallel Comput..

[5]  Gerhard Wellein,et al.  Fast Sparse Matrix-Vector Multiplication for TeraFlop/s Computers , 2002, VECPAR.

[6]  Gerhard Wellein,et al.  Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures , 2003, Int. J. High Perform. Comput. Appl..

[7]  Gerhard Wellein,et al.  Quantum lattice dynamical effects on the single-particle excitations in 1D Mott and Peierls insulators , 2003 .

[8]  G. Wellein,et al.  The kernel polynomial method , 2005, cond-mat/0504627.

[9]  Holger Fehske,et al.  Chebyshev Expansion Techniques , 2008 .

[10]  Georg Hager,et al.  Communication Characteristics and Hybrid MPI/OpenMP Parallel Pr ogramming on Clusters of Multi-core SMP Nodes , 2009 .

[11]  Georg Hager,et al.  Performance limitations for sparse matrix-vector multiplications on current multicore environments , 2009, ArXiv.

[12]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..

[13]  Marcin Dabrowski,et al.  Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs , 2010, Parallel Comput..

[14]  Alice Koniges,et al.  Application Acceleration on Current and Future Cray Platforms , 2010 .

[15]  Gerhard Wellein,et al.  Parallel Sparse Matrix-Vector Multiplication as a Test Case for Hybrid MPI+OpenMP Programming , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[16]  Gerhard Wellein,et al.  Introduction to High Performance Computing for Scientists and Engineers , 2010, Chapman and Hall / CRC computational science series.