A Hybrid Approach for Mapping Conjugate Gradient onto an FPGA-Augmented Reconfigurable Supercomputer

Supercomputer companies such as Cray, Silicon Graphics, and SRC Computers now offer reconfigurable computer (RC) systems that combine general-purpose processors (GPPs) with field-programmable gate arrays (FPGAs). The FPGAs can be programmed to become, in effect, application-specific processors. These exciting supercomputers allow end-users to create custom computing architectures aimed at the computationally intensive parts of each problem. This report describes a parameterized, parallelized, deeply pipelined, dual-FPGA, IEEE-754 64-bit floating-point design for accelerating the conjugate gradient (CG) iterative method on an FPGA-augmented RC. The FPGA-based elements are developed via a hybrid approach that uses a high-level language (HLL)-to-hardware description language (HDL) compiler in conjunction with custom-built, VHDL-based, floating-point components. A reference version of the design is implemented on a contemporary RC. Actual run time performance data compare the FPGA-augmented CG to the software-only version and show that the FPGA-based version runs 1.3 times faster than the software version. Estimates show that the design can achieve a 4 fold speedup on a next-generation RC

[1]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[2]  Reinhard Männer,et al.  Using floating-point arithmetic on FPGAs to accelerate scientific N-Body simulations , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[3]  Viktor K. Prasanna,et al.  High-performance FPGA-based general reduction methods , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[4]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[5]  Gerald Estrin,et al.  Organization of computer systems: the fixed plus variable structure computer , 1960, IRE-AIEE-ACM '60 (Western).

[6]  Henk A. van der Vorst Krylov subspace iteration , 2000, Comput. Sci. Eng..

[7]  André DeHon,et al.  Floating-point sparse matrix-vector multiply for FPGAs , 2005, FPGA '05.

[8]  Gerald Estrin,et al.  Reconfigurable Computer Origins: The UCLA Fixed-Plus-Variable (F+V) Structure Computer , 2002, IEEE Ann. Hist. Comput..

[9]  Viktor K. Prasanna,et al.  Sparse Matrix-Vector multiplication on FPGAs , 2005, FPGA '05.

[10]  Youcef Saad,et al.  A Basic Tool Kit for Sparse Matrix Computations , 1990 .

[11]  Viktor K. Prasanna,et al.  Computing Lennard-Jones Potentials and Forces with Reconfigurable Hardware , 2004, ERSA.

[12]  Keith D. Underwood,et al.  FPGAs vs. CPUs: trends in peak floating-point performance , 2004, FPGA '04.

[13]  Maya Gokhale,et al.  A Preliminary Study of Molecular Dynamics on Reconfigurable Computers , 2003, Engineering of Reconfigurable Systems and Algorithms.

[14]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[15]  Viktor K. Prasanna,et al.  A Library of Parameterizable Floating-Point Cores for FPGAs and Their Application to Scientific Computing , 2005, ERSA.

[16]  Viktor K. Prasanna,et al.  An FPGA-based floating-point Jacobi iterative solver , 2005, 8th International Symposium on Parallel Architectures,Algorithms and Networks (ISPAN'05).

[17]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[18]  Yong Dou,et al.  64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.