Accelerating a Sparse Matrix Iterative Solver Using a High Performance Reconfigurable Computer

High performance reconfigurable computers (HPRCs), which combine general-purpose processors (GPPs) and field programmable gate arrays (FPGAs), are now commercially available. These interesting architectures allow for the creation of reconfigurable processors. HPRCs have already been used to accelerate integer and fixed-point applications. However, extensive parallelism and deeply pipelined floating-point cores are necessary to make MHz-scale FPGAs competitive with GHz-scale GPPs, thus making it difficult to accelerate certain kinds of floating-point kernels. Kernels with variable length nested loops, e.g., sparse matrix-vector multiply, have been problematic because of the loop-carried dependence associated with the pipelined floating-point units. While hardware description language (HDL)-based kernels have shown moderate success in addressing this problem, the use of a high-level language (HLL)-based approach to accelerate such applications has been rather elusive. If HPRCs are to become a part of mainstream military and scientific computing, we should emphasize the use of HLL-based programming, whenever possible, rather than HDL-based hardware design. The primary reason is the increased programmer productivity associated with HLLs when compared with HDLs. For example, the floating-point addition statement z = x+y, a single line in an HLL, corresponds to hundreds of lines of HDL. In this paper, we describe the design and implementation of a sparse matrix Jacobi processor to solve systems of linear equations, Ax=b. The parallelized, deeply pipelined, IEEE-754-compliant 32-bit floating-point sparse matrix Jacobi iterative solver runs on a contemporary HPRC. The FPGA-based components are implemented using only an HLL (the C programming language) and the Carte HLL-to-HDL compiler. An HLL-based streaming accumulator allows for the implementation of fully pipelined loops and results in a 2.5-fold wall clock runtime speedup when compared with an equivalent software-only implementation.

[1]  George A. Constantinides,et al.  A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation for Dense Matrices , 2010, TRETS.

[2]  Martin C. Herbordt,et al.  Parallel Discrete Event Simulation of Molecular Dynamics Through Event-Based Decomposition , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[3]  Viktor K. Prasanna,et al.  Design tradeoffs for BLAS operations on reconfigurable hardware , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[4]  Thomas L. Moore,et al.  LLC , 2022, The Fairchild Books Dictionary of Fashion.

[5]  Viktor K. Prasanna,et al.  A Library of Parameterizable Floating-Point Cores for FPGAs and Their Application to Scientific Computing , 2005, ERSA.

[6]  Khalid H. Abed,et al.  Design Heuristics for Mapping Floating-Point Scientific Computational Kernels onto High Performance Reconfigurable Computers , 2009, J. Comput..

[7]  Yong Dou,et al.  A Fine-grained Pipelined Implementation of the LINPACK Benchmark on FPGAs , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[8]  Miriam Leeser,et al.  Advanced Components in the Variable Precision Floating-Point Library , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[9]  Gerald Estrin,et al.  Reconfigurable Computer Origins: The UCLA Fixed-Plus-Variable (F+V) Structure Computer , 2002, IEEE Ann. Hist. Comput..

[10]  Viktor K. Prasanna,et al.  Hardware/Software Approach to Molecular Dynamics on Reconfigurable Computers , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[11]  Viktor K. Prasanna,et al.  High Performance Linear Algebra Operations on Reconfigurable Systems , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[12]  Viktor K. Prasanna,et al.  Sparse Matrix-Vector multiplication on FPGAs , 2005, FPGA '05.

[13]  Viktor K. Prasanna,et al.  An FPGA-based floating-point Jacobi iterative solver , 2005, 8th International Symposium on Parallel Architectures,Algorithms and Networks (ISPAN'05).

[14]  Gerald Estrin,et al.  Organization of computer systems: the fixed plus variable structure computer , 1960, IRE-AIEE-ACM '60 (Western).

[15]  Roberto Bagnara,et al.  A Unified Proof for the Convergence of Jacobi and Gauss-Seidel Methods , 1995, SIAM Rev..