Towards a Distributed GPU-Accelerated Matrix Inversion

We present an extension of a GPU-based matrix inversion algorithm for distributed memory contexts. Specifically, we implement and evaluate a message-passing variant of the Gauss-Jordan method (gje) for matrix inversion on a cluster of nodes equipped with GPU hardware accelerators. The experimental evaluation of the proposal shows a significant runtime reduction when compared with both the distributed non-GPU implementation of gje and a conventional method based on the LU factorization (as implemented in ScaLAPACK). In addition to this, our proposal leverages the aggregated capacity of the GPU memories in the cluster to overcome the constraints imposed by the reduced memory space of these devices.

[1]  Robert A. van de Geijn,et al.  The science of deriving dense linear algebra algorithms , 2005, TOMS.

[2]  James Demmel,et al.  LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs , 2008 .

[3]  Jack Dongarra,et al.  ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[4]  B. Tapley,et al.  Statistical Orbit Determination , 2004 .

[5]  J. D. Roberts,et al.  Linear model reduction and solution of the algebraic Riccati equation by use of the sign function , 1980 .

[6]  Alexandros V. Gerbessiotis,et al.  Programming Research Group ALGORITHMIC AND PRACTICAL CONSIDERATIONS FOR DENSE MATRIX COMPUTATIONS ON THE BSP MODEL , 1997 .

[7]  Enrique S. Quintana-Ortí,et al.  High Performance Matrix Inversion on a Multi-core Platform with Several GPUs , 2011, 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[8]  Enrique S. Quintana-Ortí,et al.  Exploiting the capabilities of modern GPUs for dense matrix computations , 2009, Concurr. Comput. Pract. Exp..

[9]  Eduardo Fernández,et al.  Inverse lighting design for interior buildings integrating natural and artificial sources , 2012, Comput. Graph..

[10]  Robert A. van de Geijn,et al.  A Note On Parallel Matrix Inversion , 2000, SIAM J. Sci. Comput..

[11]  Mei Han An,et al.  accuracy and stability of numerical algorithms , 1991 .

[12]  Enrique S. Quintana-Ortí,et al.  Matrix inversion on CPU–GPU platforms with applications in control theory , 2013, Concurr. Comput. Pract. Exp..

[13]  Enrique S. Quintana-Ortí,et al.  Using Hybrid CPU-GPU Platforms to Accelerate the Computation of the Matrix Sign Function , 2009, Euro-Par Workshops.

[14]  Robert A. van de Geijn,et al.  FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.