论文信息 - Portable and scalable FPGA-based acceleration of a direct linear system solver

Portable and scalable FPGA-based acceleration of a direct linear system solver

FPGAs are becoming an attractive platform for accelerating many computations including scientific applications. However, their adoption has been limited by the large development cost and short life span of FPGA designs. We believe that FPGA-based scientific computation would become far more practical if there were hardware libraries that were portable to any FPGA with performance that could scale with the resources of the FPGA. To illustrate this idea we have implemented one common supercomputing library function: the LU factorization method for solving linear systems. This paper discusses issues in making the design both portable and scalable. The design is automatically generated to match the FPGApsilas capabilities and external memory through the use of parameters. We compared the performance of the design on the FPGA to a single processor core and found that it performs 2.2 times faster, and that the energy dissipated per computation is a factor 5 times less.

Wei Zhang | Vaughn Betz | Jonathan Rose

[1] George A. Constantinides,et al. A High Throughput FPGA-based Floating Point Conjugate Gradient Implementation , 2008, ARC.

[2] Viktor K. Prasanna,et al. Efficient Floating-point Based Block LU Decomposition on FPGAs , 2004, ERSA.

[3] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[4] Jack Dongarra,et al. Numerical Linear Algebra for High-Performance Computers , 1998 .

[5] Viktor K. Prasanna,et al. High-Performance and Parameterized Matrix Factorization on FPGAs , 2006, 2006 International Conference on Field Programmable Logic and Applications.

[6] Gregory D. Peterson,et al. High-Performance Mixed-Precision Linear Solver for FPGAs , 2008, IEEE Transactions on Computers.

[7] Laurie A. Smith King,et al. Vforce: An Extensible Framework for Reconfigurable Supercomputing , 2007, Computer.

[8] Michael J. Flynn,et al. PAM-Blox: high performance FPGA design for adaptive computing , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[9] Viktor K. Prasanna,et al. Sparse Matrix Computations on Reconfigurable Hardware , 2007, Computer.

[10] W. Hager. Applied Numerical Linear Algebra , 1987 .

[11] Jack S. N. Jean,et al. Mapping of generalized template matching onto reconfigurable computers , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[12] Viktor K. Prasanna,et al. Sparse Matrix-Vector multiplication on FPGAs , 2005, FPGA '05.

[13] Karl S. Hemmert,et al. Embedded floating-point units in FPGAs , 2006, FPGA '06.

[14] J. Demmel,et al. Sun Microsystems , 1996 .

[15] André DeHon,et al. Floating-point sparse matrix-vector multiply for FPGAs , 2005, FPGA '05.

[16] Viktor K. Prasanna,et al. High-Performance Designs for Linear Algebra Operations on Reconfigurable Hardware , 2008, IEEE Transactions on Computers.