Multifrontal Computations on GPUs and Their Multi-core Hosts

The use of GPUs to accelerate the factoring of large sparse symmetric matrices shows the potential of yielding important benefits to a large group of widely used applications. This paper examines how a multifrontal sparse solver performs when exploiting both the GPU and its multi-core host. It demonstrates that the GPU can dramatically accelerate the solver relative to one host CPU. Furthermore, the solver can profitably exploit both the GPU to factor its larger frontal matrices and multiple threads on the host to handle the smaller frontal matrices.

[1]  Dinesh Manocha,et al.  Cache-efficient numerical algorithms using graphics hardware , 2007, Parallel Comput..

[2]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[3]  Seif Haridi,et al.  EURO-PAR '95 Parallel Processing , 1995, Lecture Notes in Computer Science.

[4]  Pat Hanrahan,et al.  Understanding the efficiency of GPU algorithms for matrix-matrix multiplication , 2004, Graphics Hardware.

[5]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[6]  John K. Reid,et al.  The Multifrontal Solution of Indefinite Sparse Symmetric Linear , 1983, TOMS.

[7]  M. Suzuoki,et al.  Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor , 2006, IEEE Journal of Solid-State Circuits.

[8]  Charles Slocomb Proceedings of the 2001 ACM/IEEE conference on Supercomputing, Denver, CO, USA, November 10-16, 2001, CD-ROM , 2001, SC.

[9]  Ian Buck GPU Computing: Programming a Massively Parallel Processor , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[10]  Iain S. Duff,et al.  Parallel implementation of multifrontal schemes , 1986, Parallel Comput..

[11]  Mark G. Arnold,et al.  Applying Features of IEEE 754 to Sign/Logarithm Arithmetic , 1992, IEEE Trans. Computers.

[12]  Roger Grimes,et al.  The influence of relaxed supernode partitions on the multifrontal method , 1989, TOMS.

[13]  John L. Gustafson,et al.  Introducing Replicated VLSI to Supercomputing: the FPS-164/MAX Scientific Computer , 1986, Computer.

[14]  David K. McAllister,et al.  Fast matrix multiplies using graphics hardware , 2001, SC.

[15]  Michael T. Heath,et al.  Parallel Algorithms for Sparse Linear Systems , 1991, SIAM Rev..