OpenMP extensions for FPGA accelerators

Reconfigurable computing is one of the paths to explore towards low-power supercomputing. However, programming these reconfigurable devices is not an easy task and still requires significant research and development efforts to make it really productive. In addition, the use of these devices as accelerators in multicore, SMPs and ccNUMA architectures adds an additional level of programming complexity in order to specify the offloading of tasks to reconfigurable devices and the interoperability with current shared-memory programming paradigms such as OpenMP. This paper presents extensions to OpenMP 3.0 that try to address this second challenge and an implementation in a prototype runtime system. With these extensions the programmer can easily express the offloading of an already existing reconfigurable binary code (bitstream) hiding all the complexities related with device configuration, bitstream loading, data arrangement and movement to the device memory. Our current prototype implementation targets the SGI Altix systems with RASC blades (based on the Virtex 4 FPGA). We analyze the overheads introduced in this implementation and propose a hybrid host/device operational mode to hide some of these overheads, significantly improving the performance of the applications. A complete evaluation of the system is done with a matrix multiplication kernel, including an estimation considering different FPGA frequencies.

[1]  Georgi Kuzmanov,et al.  OpenFPGA CoreLib core library interoperability effort , 2008, Parallel Comput..

[2]  Alejandro Duran,et al.  A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures , 2009, IWOMP.

[3]  Bobby Bodenheimer,et al.  Synthesis and evaluation of linear motion transitions , 2008, TOGS.

[4]  Piotr Dziurzanski,et al.  Defining Synthesizable OpenMP Directives and Clauses , 2004, International Conference on Computational Science.

[5]  Eduard Ayguadé,et al.  Evaluation of memory performance on the cell BE with the SARC programming model , 2008, MEDEA '08.

[6]  Edward T. Grochowski,et al.  Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[7]  Maya Gokhale,et al.  Stream-oriented FPGA computing in the Streams-C high level language , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[8]  Walid A. Najjar Compiling code accelerators for FPGAs , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[9]  Stamatis Vassiliadis,et al.  The Molen Programming Paradigm , 2004, SAMOS.

[10]  Alejandro Duran,et al.  Extending the OpenMP Tasking Model to Allow Dependent Tasks , 2008, IWOMP.

[11]  Piotr Dziurzanski,et al.  A system for transforming an ANSI C code with OpenMP directives into a SystemC description , 2006, 2006 IEEE Design and Diagnostics of Electronic Circuits and systems.

[12]  S. Asano,et al.  The design and implementation of a first-generation CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[13]  R. Dolbeau,et al.  HMPP TM : A Hybrid Multi-core Parallel Programming Environment , 2022 .

[14]  Yong Dou,et al.  64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.