Parallel hierarchical hybrid linear solvers for emerging computing platforms

The design of the extreme-scale platforms that are expected to become available in the forthcoming decade will represent a convergence of technological trends and the boundary conditions imposed by over half a century of algorithm and application software development. These platforms will be hierarchical because they provide coarse grain parallelism between nodes and fine grain parallelism within each node. They are also expected to be very heterogeneous since multi-core chips and accelerators have completely different architectures and potentials. It is clear that such a degree of complexity will embody radical changes that will render obsolete the current software infrastructure for large-scale scientific applications. In this paper, we illustrate a hierarchical algorithmic approach for the implementation of an efficient parallel sparse linear solver that combines direct and iterative methods. Such a hybrid approach exploits the advantages of both numerical techniques and enables the use of several levels and grains of parallelism. This combination express different levels of parallelism and permits an optimal trade-off between numerical and parallel efficiency. Consequently, such a numerical technique appears as a promising candidate for intensive simulations on future many-core parallel platforms.

[1]  Julien Langou,et al.  A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..

[2]  James Demmel,et al.  Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..

[3]  Pascal Hénon,et al.  PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems , 2002, Parallel Comput..

[4]  Luc Giraud,et al.  Local preconditioners for two-level non-overlapping domain decomposition methods , 2001, Numer. Linear Algebra Appl..

[5]  J. Dongarra,et al.  Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems) , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[6]  Nicholas J. Higham,et al.  INVERSE PROBLEMS NEWSLETTER , 1991 .

[7]  Y. Saad,et al.  Overlapping Domain Decomposition Algorithms for General Sparse Matrices , 1996, Numer. Linear Algebra Appl..

[8]  Barry F. Smith,et al.  Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations , 1996 .

[9]  François Pellegrini,et al.  PT-Scotch: A tool for efficient parallel graph ordering , 2008, Parallel Comput..

[10]  David E Keyes,et al.  Fifth International Symposium on Domain Decomposition Methods for Partial Differential Equations , 1992 .

[11]  Jack Dongarra,et al.  Implementation of the Mixed-Precision High Performance LINPACK Benchmark on the CELL Processor , 2006 .

[12]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[13]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[14]  Yves Robert,et al.  Parallel conjugate gradient-like algorithms for solving sparse nonsymmetric linear systems on a vector multiprocessor , 1989, Parallel Comput..

[15]  George Bosilca,et al.  Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project , 2010 .

[16]  Jack Dongarra,et al.  Faster, Cheaper, Better { a Hybridization Methodology to Develop Linear Algebra Software for GPUs , 2010 .

[17]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[18]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[19]  Anne Greenbaum,et al.  Iterative methods for solving linear systems , 1997, Frontiers in applied mathematics.

[20]  Layne T. Watson,et al.  Parallel scalability study of hybrid preconditioners in three dimensions , 2008, Parallel Comput..

[21]  Tarek P. Mathew,et al.  Domain Decomposition Methods for the Numerical Solution of Partial Differential Equations , 2008, Lecture Notes in Computational Science and Engineering.

[22]  Alfio Quarteroni,et al.  Domain Decomposition Methods for Partial Differential Equations , 1999 .

[23]  Padma Raghavan,et al.  Parallel Processing for Scientific Computing , 2006, Software, Environments, Tools.

[24]  Robert A. van de Geijn,et al.  Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures , 2007, SPAA '07.

[25]  James Demmel,et al.  An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination , 1997, SIAM J. Matrix Anal. Appl..

[26]  M. Rozložník,et al.  Numerical stability of GMRES , 1995 .

[27]  Jacques Periaux,et al.  On Domain Decomposition Methods , 1988 .