Exploiting Matrix Symmetry to Improve FPGA-Accelerated Conjugate Gradient

In this paper we describe a new approach for accelerating the Conjugate Gradient (CG) method using an FPGA co-processor. As in previous approaches, our co-processor performs a double-precision sparse matrix-vector multiplication. However, our implementation doubles the amount of computation per unit of input data by exploiting the symmetry of the input matrix and computing the upper and lower triangle of the input matrix in parallel. Using a Virtex-2 Pro 100 FPGA, we have achieved an observed computational throughput of 1155 MFLOPS.

[1]  Sotirios G. Ziavras,et al.  FPGA implementation of a Cholesky algorithm for a shared-memory multiprocessor architecture , 2004, Parallel Algorithms Appl..

[2]  Sotirios G. Ziavras,et al.  Parallel direct solution of linear equations on FPGA-based machines , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[3]  Viktor K. Prasanna,et al.  High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs , 2007, IEEE Transactions on Parallel and Distributed Systems.

[4]  J. Miller Numerical Analysis , 1966, Nature.