论文信息 - A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform - 字舞流文

A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform

We present a parallel conjugate gradient solver for the Poisson problem optimized for multi-GPU platforms. Our approach includes a novel heuristic Poisson preconditioner well suited for massively-parallel SIMD processing. Furthermore, we address the problem of limited transfer rates over typical data channels such as the PCI-express bus relative to the bandwidth requirements of powerful GPUs. Specifically, naive communication schemes can severely reduce the achievable speedup in such communication-intense algorithms. For this reason, we employ overlapping memory transfers to establish a high level of concurrency and to improve scalability. We have implemented our model on a high-performance workstation with multiple hardware accelerators. We discuss the mathematical principles, give implementation details, and present the performance and the scalability of the system.

Wolfgang Straßer | Daniel Weiskopf | Marco Ament | Günter Knittel

[1] Michele Benzi,et al. A Sparse Approximate Inverse Preconditioner for the Conjugate Gradient Method , 1996, SIAM J. Sci. Comput..

[2] Gene H. Golub,et al. Some History of the Conjugate Gradient and Lanczos Algorithms: 1948-1976 , 1989, SIAM Rev..

[3] Nathan A. Carr,et al. Cache and bandwidth aware matrix multiplication on the GPU , 2010 .

[4] H. V. D. Vorst,et al. The rate of convergence of Conjugate Gradients , 1986 .

[5] Satoshi Matsuoka,et al. Fast Conjugate Gradients with Multiple GPUs , 2009, ICCS.

[6] Jonathan M. Cohen,et al. Low viscosity flow simulations for animation , 2008, SCA '08.

[7] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .

[8] James Demmel,et al. Parallel numerical linear algebra , 1993, Acta Numerica.

[9] Dinesh Manocha,et al. LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[10] Guillaume Caumon,et al. Concurrent number cruncher: a GPU implementation of a general sparse linear solver , 2009, Int. J. Parallel Emergent Distributed Syst..

[11] Robert Bridson,et al. Fluid simulation: SIGGRAPH 2007 course notesVideo files associated with this course are available from the citation page , 2007, SIGGRAPH Courses.

[12] J. Krüger,et al. Linear algebra operators for GPU implementation of numerical algorithms , 2003, ACM Trans. Graph..

[13] Naga K. Govindaraju,et al. GPGPU: general-purpose computation on graphics hardware , 2006, SC.

[14] Katherine Yelick,et al. Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply , 2004 .

[15] A. Griewank,et al. Approximate inverse preconditionings for sparse linear systems , 1992 .

[16] David K. McAllister,et al. Fast matrix multiplies using graphics hardware , 2001, SC.

[17] J. Shewchuk. An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[18] G.J.M. Smit,et al. Implementing the conjugate gradient algorithm on multi-core systems , 2007, 2007 International Symposium on System-on-Chip.

[19] M. Hestenes,et al. Methods of conjugate gradients for solving linear systems , 1952 .