HLS-Based FPGA Acceleration of Building-Cube Stencil Computation

This paper presents design and implementation of a framework for high-level synthesis (HLS), which allows easy description and acceleration of stencil computation with building-cube method (BCM) on FPGAs. The BCM is one of adaptive mesh refinement methods, which can reduce computational costs by using various granularity of cubes depending on computational precision required by target models. By placing some restrictions on size ratios between adjacent cubes, the BCM offers affinity to parallel processing. However, non-continuous memory access imposed by the irregular cubes does not straightforwardly match with stream processing on FPGA accelerators. To fill this gap, we design and implement a BCM framework as a class library on a high-level synthesis environment. The framework automatically generates mechanisms required for the BCM, such as reordering modules of data streams and data interpolation hardware between different cubes. The proposed framework is evaluated in terms of computing performance, memory performance and required hardware resources on a Maxeler Technologies FPGA accelerator. The results reveal that a performance overhead of data exchange between different sizes of cubes is reasonably small.

[1]  Wayne Luk,et al.  Evaluating reconfigurable dataflow computing using the Himeno benchmark , 2012, 2012 International Conference on Reconfigurable Computing and FPGAs.

[2]  Kazuhiro Nakahashi,et al.  Efficient and Robust Cartesian Mesh Generation for Building-Cube Method , 2008 .

[3]  Kazuhiro Nakahashi,et al.  High-Density Mesh Flow Computations by Building-Cube Method , 2007 .

[4]  Ryo Ito,et al.  FPGA-based Custom Computing Architecture for Large-Scale Fluid Simulation with Building Cube Method , 2014, CARN.

[5]  Kun-Lung Wu,et al.  Safe Data Parallelism for General Streaming , 2015, IEEE Transactions on Computers.

[6]  Dohi Keisuke,et al.  A Memory Profiling Framework for Stencil Computation on an FPGA Accelerator with High Level Synthesis , 2014 .

[7]  Hiroe Yamazaki,et al.  Non‐hydrostatic atmospheric cut cell model on a block‐structured mesh , 2012 .

[8]  Samuel Williams,et al.  Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Oskar Mencer,et al.  ASC: a stream compiler for computing with FPGAs , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[10]  Yuichiro Shibata,et al.  Power Performance Profiling of 3-D Stencil Computation on an FPGA Accelerator for Efficient Pipeline Optimization , 2016, CARN.

[11]  Christian Plessl,et al.  Accelerating finite difference time domain simulations with reconfigurable dataflow computers , 2014, CARN.

[12]  Kentaro Sano FPGA-Based Systolic Computational-Memory Array for Scalable Stencil Computations , 2013 .