Parallel Communication-Avoiding Algorithm for Triangular Matrix Inversion on Homogeneous and Heterogeneous Platforms

We address in this paper the parallelization of a recursive algorithm for large scale triangular matrix inversion based on the ‘Divide and Conquer’ (D&C) paradigm. A set of different versions of an original sequential algorithm are first presented. A theoretical performance study permits to establish an accurate comparison between the designed algorithms. Afterwards, we develop in the second part of the paper, an optimal parallel avoiding-communication algorithm for a given number of available homogeneous and heterogeneous processors. To reach this target, we use a so called ‘non equitable and incomplete’ version of the D&C paradigm consisting in recursively decomposing the original problem into two sub-problems of non equal sizes, then decomposing only one sub-problem in the same previous manner. The theoretical study is validated by a series of experiments achieved on three target platforms, namely an 8-core shared memory machine, a distributed memory cluster and a heterogeneous CPU-GPU cluster. The obtained results permit to illustrate the interest of the contribution.

[1]  William L. Goffe,et al.  Multi-core CPUs, Clusters, and Grid Computing: A Tutorial , 2005 .

[2]  G. S. Lueker,et al.  Probabilistic analysis of optimum partitioning , 1986, Journal of Applied Probability.

[3]  Keqin Li Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systems , 2009, The Journal of Supercomputing.

[4]  Jack J. Dongarra,et al.  Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing , 2010, Parallel Comput..

[5]  Roberto Guerrieri,et al.  Triangular Matrix Inversion on Heterogeneous Multicore Systems , 2012, IEEE Transactions on Parallel and Distributed Systems.

[6]  Jean-Frédéric Gerbeau,et al.  Méthodes numériques : algorithmes, analyse et applications , 2007 .

[7]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[8]  Jagdish J. Modi,et al.  Parallel algorithms and matrix computation , 1988 .

[9]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[10]  Antonio J. Plaza,et al.  An experimental comparison of parallel algorithms for hyperspectral analysis using heterogeneous and homogeneous networks of workstations , 2008, Parallel Comput..

[11]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[12]  Katherine A. Yelick,et al.  Communication avoiding and overlapping for numerical linear algebra , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  ANDREAS SCHIKARSKI,et al.  Efficient Parallel Matrix Inversion on Interconnection Networks , 1996, J. Parallel Distributed Comput..

[14]  Zaher Mahjoub,et al.  Computing the inverse of a triangular matrix on heterogeneous clusters , 2002, Scalable Comput. Pract. Exp..

[15]  D. Heller A Survey of Parallel Algorithms in Numerical Linear Algebra. , 1978 .

[16]  Amal Khabou,et al.  Calculs pour les matrices denses : coût de communication et stabilité numérique. (Dense matrix computations : communication cost and numerical stability) , 2013 .

[17]  Jaeyoung Choi,et al.  A Proposal for a Set of Parallel Basic Linear Algebra Subprograms , 1995, PARA.

[18]  Zaher Mahjoub,et al.  Parallel communication-free algorithm for triangular matrix inversion on heterogenoues platform , 2012, 2012 Federated Conference on Computer Science and Information Systems (FedCSIS).

[19]  Zaher Mahjoub,et al.  Optimal parallelization of a recursive algorithm for triangular matrix inversion on MIMD computers , 2001, Parallel Comput..

[20]  Laura Grigori,et al.  Avoiding Communication through a Multilevel LU Factorization , 2012, Euro-Par.