High-Performance Architecture for the Conjugate Gradient Solver on FPGAs

The conjugate gradient (CG) solver is an important algorithm for solving the symmetric positive define systems. However, existing CG architectures on field-programmable gate arrays (FPGAs) either need aggressive zero padding or can only be applied for small matrices and particular matrix sparsity patterns. This brief proposes a high-performance architecture for the CG solver on FPGAs, which can handle sparse linear systems with arbitrary size and sparsity pattern. Furthermore, it does not need aggressive zero padding. Our CG architecture mainly consists of a high-throughput sparse matrix-vector multiplication design including a multi-output adder tree, a reduction circuit, and a sum sequencer. Our experimental results demonstrate that our CG architecture can achieve speedup of 4.62X-9.24X on a Virtex5-330 FPGA, relative to a software implementation.

[1]  Dave Strenski,et al.  Exploring Accelerating Science Applications with FPGAs , 2007 .

[2]  Robert Strzodka,et al.  Pipelined Mixed Precision Algorithms on FPGAs for Fast and Accurate PDE Solvers from Low Precision Components , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[3]  Phillip H. Jones,et al.  An I/O Bandwidth-Sensitive Sparse Matrix-Vector Multiplication Engine on FPGAs , 2012, IEEE Transactions on Circuits and Systems I: Regular Papers.

[4]  Eric C. Kerrigan,et al.  More Flops or More Precision? Accuracy Parameterizable Linear Equation Solvers for Model Predictive Control , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[5]  Steve Poole,et al.  Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer , 2008, FCCM.

[6]  Oleg Maslennikov,et al.  FPGA Implementation of the Conjugate Gradient Method , 2005, PPAM.

[7]  Viktor K. Prasanna,et al.  Sparse Matrix-Vector multiplication on FPGAs , 2005, FPGA '05.

[8]  Gregory D. Peterson,et al.  Sparse Matrix-Vector Multiplication Design on FPGAs , 2007, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007).

[9]  George A. Constantinides,et al.  A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation for Dense Matrices , 2010, TRETS.

[10]  Pascal Benoit,et al.  Run-time mapping and communication strategies for Homogeneous NoC-Based MPSoCs , 2007 .

[11]  Jason D. Bakos,et al.  Exploiting Matrix Symmetry to Improve FPGA-Accelerated Conjugate Gradient , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[12]  Eric C. Kerrigan,et al.  A floating-point solver for band structured linear equations , 2008, 2008 International Conference on Field-Programmable Technology.

[13]  Eric S. Chung,et al.  Towards a Universal FPGA Matrix-Vector Multiplication Architecture , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[14]  Steven F. Quigley,et al.  An element-by-element preconditioned Conjugate Gradient solver of 3D tetrahedral finite elements on an FPGA coprocessor , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[15]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[16]  Keshab K. Parhi,et al.  A Fast Radix-4 Division Algorithm and Its Architecture , 1995, IEEE Trans. Computers.