An FPGA based implementation of the Conjugate Gradient Kernels

The Conjugate Gradient (CG) is frequently used iterative methods to solve Systems of Linear Equations (SLEs). The CG has a faster convergence rate and higher accuracy. It is widely used for many scientific applications such as meteorology, groundwater flow problems, studying satellite data, ocean circulation modeling, molecular dynamics simulations, real-time power quality assessment, and a neural robot controller, etc. It can be implemented on CPUs, GPUs and in Field Programmable Gate Arrays (FPGAs). FPGAs have been shown to provide an order of magnitude to speed up for various computation-intensive applications. However, a Hardware Description Language (HDL) based FPGA implementation for all the arithmetic modules requires considerable development time and the designer needs to be knowledgeable in hardware design as well as in HDL programming. Using IP cores can reduce the development time and design complexity. Prominently, CG has basic three computational kernels and amongst them, Matrix-Vector Multiplication (MVM) is the most computationally intensive kernel. Optimizing MVM kernels with higher throughput can reduce the computation time required for each iteration of CG. In this research, three basic kernels of CG are implemented on FPGAs using floating-point IP cores. The results show that with an FPGA-based implementation of CG we achieved a significant order-of-magnitude over the software implementation (Intel Xenon ® CPU E5-2650 V2, 2.60 GHz) of the CG using Arria 10 1150 GX.

[1]  George A. Constantinides,et al.  A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation for Dense Matrices , 2010, TRETS.

[2]  Luciano Lavagno,et al.  Design space exploration of multi-core RTL via high level synthesis from OpenCL models , 2018, Microprocess. Microsystems.

[3]  Torstein Habbestad An FPGA-based implementation of the Conjugate Gradient Method used to solve Large Dense Systems of Linear Equations , 2011 .

[4]  Jack J. Dongarra,et al.  Optimizing symmetric dense matrix-vector multiplication on GPUs , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[5]  A. Sameh,et al.  The behavior of conjugate gradient algorithms on a multivector processor with a hierarchical memory , 1988 .

[6]  A. Lesk,et al.  Conformations of immunoglobulin hypervariable regions , 1989, Nature.

[7]  Peng Zhang Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[8]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[9]  Eric S. Chung,et al.  Towards a Universal FPGA Matrix-Vector Multiplication Architecture , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[10]  Steve Poole,et al.  Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[11]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[12]  Yong Dou,et al.  64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.