论文信息 - FPGA-based Custom Computing Architecture for Large-Scale Fluid Simulation with Building Cube Method

FPGA-based Custom Computing Architecture for Large-Scale Fluid Simulation with Building Cube Method

We are designing a custom computing machine for large-scale flui simulation with the building-cube method (BCM). In BCM, parallel computation is performed with cubes, each of which is an orthogonal grid with a f xed resolution of cells. Although BCM is advantageous in balancing loads with cubes, it also has a problem of efficien y and scalability for comptuting with general-purpose supercomputers due to insufficien memory bandwidth and communication overhead of an interconnection network. In this paper, we present a custom computing architecture for FPGA-based scalable BCM computation with a dedicated network, called an accelerator domain network (ADN). We design a cube engine which allows bandwidth-efficien computation of cubes based on streamed stencil computation of the fractional-step method. Through prototype implementation, we evaluate the potential performance of the architecture. For ALTERA Stratix V 28nm FPGA, we estimate that a single FPGA has the peak performance of 107 GFlop/s in a single precision.

[1] Kazuhiro Nakahashi,et al. Building-Cube Method for Flow Problems with Broadband Characteristic Length , 2003 .

[2] Satoru Yamamoto,et al. Systolic Architecture for Computational Fluid Dynamics on FPGAs , 2007 .

[3] Ryo Ito,et al. Efficient custom computing of fully-streamed lattice boltzmann method on tightly-coupled FPGA cluster , 2013, CARN.

[4] Satoru Yamamoto,et al. Systolic Architecture for Computational Fluid Dynamics on FPGAs , 2007, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007).

[5] Satoru Yamamoto,et al. FPGA-Array with Bandwidth-Reduction Mechanism for Scalable and Power-Efficient Numerical Simulations Based on Finite Difference Methods , 2010, TRETS.

[6] Kazuhiro Nakahashi,et al. Large Scaled Computation of Incompressible Flows on Cartesian Mesh Using a Vector-Parallel Supercomputer , 2010 .

[7] André DeHon,et al. Floating-point sparse matrix-vector multiply for FPGAs , 2005, FPGA '05.

[8] Viktor K. Prasanna,et al. Sparse Matrix-Vector multiplication on FPGAs , 2005, FPGA '05.

[9] P. Moin,et al. Application of a Fractional-Step Method to Incompressible Navier-Stokes Equations , 1984 .

[10] Yong Dou,et al. 64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.

[11] Kazuhiro Nakahashi,et al. Landing gear aerodynamic noise prediction using building-cube method , 2011 .

[12] Peter M. A. Sloot,et al. Compute Bound and I/O Bound Cellular Automata Simulations on FPGA Logic , 2009, TRETS.

[13] Satoru Yamamoto,et al. Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth , 2014, IEEE Transactions on Parallel and Distributed Systems.

[14] John C. Strikwerda,et al. The Accuracy of the Fractional Step Method , 1999, SIAM J. Numer. Anal..

[15] Charles L. Byrne,et al. Applied Iterative Methods , 2007 .