Convey vector personalities - FPGA acceleration with an openmp-like programming effort?

Although the benefits of FPGAs for accelerating scientific codes are widely acknowledged, the use of FPGA accelerators in scientific computing is not widespread because reaping these benefits requires knowledge of hardware design methods and tools that is typically not available with domain scientists. A promising but hardly investigated approach is to develop tool flows that keep the common languages for scientific code (C,C++, and Fortran) and allow the developer to augment the source code with OpenMP-like directives for instructing the compiler which parts of the application shall be offloaded the FPGA accelerator. In this work we study whether the promise of effective FPGA acceleration with an OpenMP-like programming effort can actually be held. Our target system is the Convey HC-1 reconfigurable computer for which an OpenMP-like programming environment exists. As case study we use an application from computational nanophotonics. Our results show that a developer without previous FPGA experience could create an FPGA-accelerated application that is competitive to an optimized OpenMP-parallelized CPU version running on a two socket quad-core server. Finally, we discuss our experiences with this tool flow and the Convey HC-1 from a productivity and economic point of view.

[1]  K. Yee Numerical solution of initial boundary value problems involving maxwell's equations in isotropic media , 1966 .

[2]  Hans-Peter Seidel,et al.  Cache Accurate Time Skewing in Iterative Stencil Computations , 2011, 2011 International Conference on Parallel Processing.

[3]  S. Koch,et al.  Electromagnetic field structure and normal mode coupling in photonic crystal nanocavities. , 2005, Optics express.

[4]  Barbara Chapman,et al.  Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) , 2007 .

[5]  Allen Taflove,et al.  Computational Electrodynamics the Finite-Difference Time-Domain Method , 1995 .

[6]  Samuel Williams,et al.  Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  Samuel Williams,et al.  Implicit and explicit optimizations for stencil computations , 2006, MSPC '06.

[8]  Jason D. Bakos High-Performance Heterogeneous Computing with the Convey HC-1 , 2010, Computing in Science & Engineering.

[9]  Tony M. Brewer,et al.  Instruction Set Innovations for the Convey HC-1 Computer , 2010, IEEE Micro.

[10]  Christian Plessl,et al.  Transformation of Scientific Algorithms to Parallel Computing Code: Single GPU and MPI Multi GPU Backends with Subdomain Support , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.

[11]  Vincent Heuveline,et al.  Convey HC-1 Hybrid Core Computer - The Potential of FPGAs in Numerical Simulation , 2010, HPCA 2010.

[12]  Barbara Chapman,et al.  Using OpenMP - portable shared memory parallel programming , 2007, Scientific and engineering computation.