Generation of Multigrid-based Numerical Solvers for FPGA Accelerators

Not only in the eld of High-Performance Computing (HPC), Field Programmable Gate Arrays (FPGAs) are a soaringly popular accelerator technology. However, they increase the heterogeneity of clusters, which might be equipped already today with accelerators, such as GPUs. This results in having to combine expertise from dierent elds, e. g., mathematical, algorithmic and technical experts are needed to create numerical solvers for such systems. To bridge this programmability gap, Domain-Specic Languages (DSLs) are a popular choice to generate low-level implementations from an abstract algorithm description. In this work, we demonstrate the generation of implementations of numerical solvers based on the multigrid method for FPGAs from the same codebase that is also used to generate code for CPUs using a hybrid parallelization of MPI and OpenMP. Our approach yields in a hardware design that can compute up to 12 V-cycles per second with an input grid size of 4096 4096 on a mid-range FPGA, beating vectorized, single-threaded execution on an Intel i7 by a factor of almost three.

[1]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[2]  Jürgen Teich,et al.  Code Generation for High-Level Synthesis of Multiresolution Applications on FPGAs , 2014, ArXiv.

[3]  Jürgen Teich,et al.  A deeply pipelined and parallel architecture for denoising medical images , 2010, 2010 International Conference on Field-Programmable Technology.

[4]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[5]  Jürgen Teich,et al.  ExaStencils: Advanced Stencil-Code Engineering , 2014, Euro-Par Workshops.

[6]  David Padua,et al.  Encyclopedia of Parallel Computing , 2011 .

[7]  Frédo Durand,et al.  Decoupling algorithms from schedules for easy optimization of image processing pipelines , 2012, ACM Trans. Graph..

[8]  Jürgen Teich,et al.  PARO: Synthesis of Hardware Accelerators for Multi-Dimensional Dataflow-Intensive Applications , 2008, ARC.

[9]  Bradley C. Kuszmaul,et al.  The pochoir stencil compiler , 2011, SPAA '11.

[10]  Helmar Burkhart,et al.  PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[11]  Jürgen Teich,et al.  ExaSlang: A Domain-Specific Language for Highly Scalable Multigrid Solvers , 2014, 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing.

[12]  Pat Hanrahan,et al.  Darkroom , 2014, ACM Trans. Graph..

[13]  Eric Darve,et al.  Liszt: A domain specific language for building portable mesh-based PDE solvers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[14]  Chi-Bang Kuan,et al.  Automated Empirical Optimization , 2011, Encyclopedia of Parallel Computing.

[15]  Jürgen Teich,et al.  An image processing library for C-based high-level synthesis , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[16]  Jürgen Teich,et al.  Towards a performance-portable description of geometric multigrid algorithms using a domain-specific language , 2014, J. Parallel Distributed Comput..