Improving Mapping for Sparse Direct Solvers - A Trade-Off Between Data Locality and Load Balancing

In order to express parallelism, parallel sparse direct solvers take advantage of the elimination tree to exhibit tree-shaped task graphs, where nodes represent computational tasks and edges represent data dependencies. One of the pre-processing stages of sparse direct solvers consists of mapping computational resources (processors) to these tasks. The objective is to minimize the factorization time by exhibiting good data locality and load balancing. The proportional mapping technique is a widely used approach to solve this resource-allocation problem. It achieves good data locality by assigning the same processors to large parts of the elimination tree. However, it may limit load balancing in some cases. In this paper, we propose a dynamic mapping algorithm based on proportional mapping. This new approach, named Steal, relaxes the data locality criterion to improve load balancing. In order to validate the newly introduced method, we perform extensive experiments on the PaStiX sparse direct solver. It demonstrates that our algorithm enables better static scheduling of the numerical factorization while keeping good data locality.

[1]  J. Tukey,et al.  Variations of Box Plots , 1978 .

[2]  Katherine A. Yelick,et al.  An Asynchronous Task-based Fan-Both Sparse Cholesky Solver , 2016, ArXiv.

[3]  Olivier Beaumont,et al.  Task Scheduling for Parallel Multifrontal Methods , 2007, Euro-Par.

[4]  Pascal Hénon,et al.  PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems , 2002, Parallel Comput..

[5]  Alan George,et al.  Communication results for parallel sparse Cholesky factorization on a hypercube , 1989, Parallel Comput..

[6]  Pascal Hénon,et al.  PaStiX: A High-Performance Parallel Direct Solver for Sparse Symmetric Definite Systems , 2000 .

[7]  Patrick R. Amestoy,et al.  Multifrontal parallel distributed symmetric and unsymmetric solvers , 2000 .

[8]  Alex Pothen,et al.  A Mapping Algorithm for Parallel Sparse Cholesky Factorization , 1993, SIAM J. Sci. Comput..

[9]  Anoop Gupta,et al.  An efficient block-oriented approach to parallel sparse Cholesky factorization , 1993, Supercomputing '93. Proceedings.

[10]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[11]  A. Gupta,et al.  An efficient block-oriented approach to parallel sparse Cholesky factorization , 1993, Supercomputing '93.

[12]  G. N. Srinivasa Prasanna,et al.  Generalized Multiprocessor Scheduling and Applications to Matrix Computations , 1996, IEEE Trans. Parallel Distributed Syst..

[13]  Timothy A. Davis,et al.  Direct methods for sparse linear systems , 2006, Fundamentals of algorithms.

[14]  James Demmel,et al.  SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems , 2003, TOMS.

[15]  Michael T. Heath,et al.  Parallel Algorithms for Sparse Linear Systems , 1991, SIAM Rev..

[16]  John K. Reid,et al.  The Multifrontal Solution of Indefinite Sparse Symmetric Linear , 1983, TOMS.