Systolic Architecture for Computational Fluid Dynamics on FPGAs

This paper presents an FPGA-based flow solver based on the systolic architecture. We show that the fractional-step method employing central difference schemes can be expressed as a systolic algorithm, and therefore the systolic architecture is suitable for a dedicated processor to the flow solver. We have designed a 2D systolic array of cells, each of which has a micro-programmable data-path containing a MAC (multiplication and accumulation) unit and a local memory to store necessary data for computational fluid dynamics. With ALTERA Stratix II FPGA, we implemented 96(= 12 times 8) cells running at 60 MHz. Since the MAC unit has both an adder and a multiplier for single-precision floating-point numbers, the total peak performance is 11.5(= 96times60 MHztimes2) GFlops. We made a choice of 2D square driven cavity flow as a benchmark computation based on the fractional-step method. For this computation, the FPGA-based processor running only at 60 MHz achieved 7.14 and 6.41 times faster computations than Pentium4 processor at 3.2 GHz and Itanium2 at 1.4 GHz, respectively.

[1]  Duncan G. Elliott,et al.  Computational RAM: Implementing Processors in Memory , 1999, IEEE Des. Test Comput..

[2]  Viktor K. Prasanna,et al.  Sparse Matrix-Vector multiplication on FPGAs , 2005, FPGA '05.

[3]  Peter M. Athanas,et al.  Quantitative analysis of floating point arithmetic on FPGA based custom computing machines , 1995, Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[4]  T. Nakamura,et al.  Systolic computational memory approach to high-speed codebook design , 2005, Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005..

[5]  Karl S. Hemmert,et al.  Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[6]  Thomas Hauser A Flow Solver for a Reconfigurable FPGA-Based Hypercomputer , 2005 .

[7]  Dennis W. Prather,et al.  FPGA-based acceleration of the 3D finite-difference time-domain method , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[8]  Noah Treuhaft,et al.  Intelligent RAM (IRAM): the industrial setting, applications, and architectures , 1997, Proceedings International Conference on Computer Design VLSI in Computers and Processors.

[9]  Katherine Yelick,et al.  A Case for Intelligent RAM: IRAM , 1997 .

[10]  H. T. Kung Why systolic architectures? , 1982, Computer.

[11]  John C. Strikwerda,et al.  The Accuracy of the Fractional Step Method , 1999, SIAM J. Numer. Anal..

[12]  Joel H. Ferziger,et al.  Computational methods for fluid dynamics , 1996 .

[13]  P. Moin,et al.  Application of a Fractional-Step Method to Incompressible Navier-Stokes Equations , 1984 .

[14]  André DeHon,et al.  Floating-point sparse matrix-vector multiply for FPGAs , 2005, FPGA '05.

[15]  Keith D. Underwood,et al.  FPGAs vs. CPUs: trends in peak floating-point performance , 2004, FPGA '04.

[16]  Mi Lu,et al.  Time domain numerical simulation for transient waves on reconfigurable coprocessor platform , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[17]  Yong Dou,et al.  64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.

[18]  Mi Lu,et al.  Accelerating seismic migration using FPGA-based coprocessor platform , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[19]  Tadao Nakamura,et al.  A systolic memory architecture for fast codebook design based on MMPDCL algorithm , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[20]  Viktor K. Prasanna,et al.  Hardware/Software Approach to Molecular Dynamics on Reconfigurable Computers , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.