Energy Efficient Scientific Computing on FPGAs using OpenCL

An indispensable part of our modern life is scientific computing which is used in large-scale high-performance systems as well as in low-power smart cyber-physical systems. Hence, accelerators for scientific computing need to be fast and energy efficient. Therefore, partial differential equations (PDEs), as an integral component of many scientific computing tasks, require efficient implementation. In this regard, FPGAs are well suited for data-parallel computations as they occur in PDE solvers. However, including FPGAs in the programming flow is not trivial, as hardware description languages (HDLs) have to be exploited, which requires detailed knowledge of the underlying hardware. This issue is tackled by OpenCL, which allows to write standardized code in a C-like fashion, rendering experience with HDLs unnecessary. Yet, hiding the underlying hardware from the developer makes it challenging to implement solvers that exploit the full FPGA potential. Therefore, we propose in this work a comprehensive set of generic and specific optimization techniques for PDE solvers using OpenCL that improve the FPGA performance and energy efficiency by orders of magnitude. Based on these optimizations, our study shows that, despite the high abstraction level of OpenCL, very energy efficient PDE accelerators on the FPGA fabric can be designed, making the FPGA an ideal solution for power-constrained applications.

[1]  John Freeman,et al.  From opencl to high-performance hardware on FPGAS , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[2]  Constantine Bekas,et al.  Analyzing the energy-efficiency of sparse matrix multiplication on heterogeneous systems: A comparative study of GPU, Xeon Phi and FPGA , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[3]  Yu Cao,et al.  Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.

[4]  Stephen Neuendorffer,et al.  FPGA Based OpenCL Acceleration of Genome Sequencing Software , 2015 .

[5]  Alexander Ditter,et al.  OpenCL 2.0 for FPGAs using OCLAcc , 2015, ArXiv.

[6]  Mehdi Baradaran Tahoori,et al.  High-resolution online power monitoring for modern microprocessors , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[7]  Tomasz S. Czajkowski,et al.  Harnessing the power of FPGAs using altera's OpenCL compiler , 2013, FPGA '13.

[8]  Dietmar P. F. Möller,et al.  Guide to Computing Fundamentals in Cyber-Physical Systems , 2016, Computer Communications and Networks.

[9]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[10]  Eitan Grinspun,et al.  Sparse matrix solvers on the GPU: conjugate gradients and multigrid , 2003, SIGGRAPH Courses.

[11]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[12]  Jason Cong,et al.  Software Infrastructure for Enabling FPGA-Based Accelerations in Data Centers: Invited Paper , 2016, ISLPED.

[13]  Vítor Manuel Mendes da Silva,et al.  From OpenCL to gates: The FFT , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[14]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[15]  W. Schiesser The Numerical Method of Lines: Integration of Partial Differential Equations , 1991 .

[16]  Jason Cong,et al.  Architecture evaluation for power-efficient FPGAs , 2003, FPGA '03.