FPGA-based Custom Computing Architecture for Large-Scale Fluid Simulation with Building Cube Method

We are designing a custom computing machine for large-scale flui simulation with the building-cube method (BCM). In BCM, parallel computation is performed with cubes, each of which is an orthogonal grid with a f xed resolution of cells. Although BCM is advantageous in balancing loads with cubes, it also has a problem of efficien y and scalability for comptuting with general-purpose supercomputers due to insufficien memory bandwidth and communication overhead of an interconnection network. In this paper, we present a custom computing architecture for FPGA-based scalable BCM computation with a dedicated network, called an accelerator domain network (ADN). We design a cube engine which allows bandwidth-efficien computation of cubes based on streamed stencil computation of the fractional-step method. Through prototype implementation, we evaluate the potential performance of the architecture. For ALTERA Stratix V 28nm FPGA, we estimate that a single FPGA has the peak performance of 107 GFlop/s in a single precision.

[1]  Kazuhiro Nakahashi,et al.  Building-Cube Method for Flow Problems with Broadband Characteristic Length , 2003 .

[2]  Satoru Yamamoto,et al.  Systolic Architecture for Computational Fluid Dynamics on FPGAs , 2007 .

[3]  Ryo Ito,et al.  Efficient custom computing of fully-streamed lattice boltzmann method on tightly-coupled FPGA cluster , 2013, CARN.

[4]  Satoru Yamamoto,et al.  Systolic Architecture for Computational Fluid Dynamics on FPGAs , 2007, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007).

[5]  Satoru Yamamoto,et al.  FPGA-Array with Bandwidth-Reduction Mechanism for Scalable and Power-Efficient Numerical Simulations Based on Finite Difference Methods , 2010, TRETS.

[6]  Kazuhiro Nakahashi,et al.  Large Scaled Computation of Incompressible Flows on Cartesian Mesh Using a Vector-Parallel Supercomputer , 2010 .

[7]  André DeHon,et al.  Floating-point sparse matrix-vector multiply for FPGAs , 2005, FPGA '05.

[8]  Viktor K. Prasanna,et al.  Sparse Matrix-Vector multiplication on FPGAs , 2005, FPGA '05.

[9]  P. Moin,et al.  Application of a Fractional-Step Method to Incompressible Navier-Stokes Equations , 1984 .

[10]  Yong Dou,et al.  64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.

[11]  Kazuhiro Nakahashi,et al.  Landing gear aerodynamic noise prediction using building-cube method , 2011 .

[12]  Peter M. A. Sloot,et al.  Compute Bound and I/O Bound Cellular Automata Simulations on FPGA Logic , 2009, TRETS.

[13]  Satoru Yamamoto,et al.  Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth , 2014, IEEE Transactions on Parallel and Distributed Systems.

[14]  John C. Strikwerda,et al.  The Accuracy of the Fractional Step Method , 1999, SIAM J. Numer. Anal..

[15]  Charles L. Byrne,et al.  Applied Iterative Methods , 2007 .