FPGAs (Field-Programmable Gate Arrays) are becoming more attractive to high-performance scientific computing. FPGAs are high volume, off-the-shelf semiconductor devices containing programmable logic components, embedded arithmetic units, embedded memories and their programmable interconnection network. FPGAs have remarkably increased their potential for high-performance computing by integrating much more programmable hardware resources and increasing their operating frequency, and therefore recent leading-edge FPGAs can have peak floating-point computation performance surpassing that of typical microprocessors [1]. By designing a custom computing machine (CCM) on FPGAs, the properties, e.g., parallelism, regularity and homogeneity, of a specific application can be efficiently exploited by customized data-paths, customized arithmetic units and customized memory systems. Although FPGAs are programmable, programming FPGAs requires designing hardware. Therefore, it is very difficult for software programmers to implement CCMs for specific applications on FPGAs without knowledge of hardware design. Recently, A Stream Compiler (ASC) [2] solves this designing problem for FPGAs. By automating the production of CCMs that process streamed data, ASC allows users to write code with statements similar to the C language [2][3]. ASC also supports floating-point computations with flexible precisions, which are very suitable for efficient resource utilization on FPGAs. We consider which application is suitable for FPGAs, and how to make it work. This paper shows that the lattice Boltzmann method (LBM) is suitable for stream processing; an FPGA-based stream accelerator only at 67MHz, implemented with the x1 transfer rate of PCI-Express, achieves 1.15 times faster LBM computation than a 2.2GHz Opteron processor. We estimate the speedup of an FPGA-based stream accelerator with the x8 transfer rate at 7.68. LBM computes fluid flow by tracking fictive particles on a gird. Although relatively large data-sets are necessary to define multiple particle distribution functions on each grid-point, the algorithm for LBM has simplicity and parallelism among grid-points. These properties are appropriate for direct hardware implementation. With a state-of-the-art FPGA, we design and implement a stream accelerator for 2D LBM computation. In the following sections, we describe the stream-based LBM computation and its efficient implementation on an FPGA. Related work has shown that FPGAs have significant potential for computational fluid dynamics. For instance, a single FPGA implementation of a 3D lattice gas model [4] can run 200 times faster than a software version on a 1.8GHz Athlon processor. It has also been reported that FPGA-based accelerators for computational fluid dynamics [5] promise large improvement in sustained performance at better price-performance ratios with lower overall power consumption than conventional processors.
[1]
Takaji Inamuro,et al.
A NON-SLIP BOUNDARY CONDITION FOR LATTICE BOLTZMANN SIMULATIONS
,
1995,
comp-gas/9508002.
[2]
Yitzhak Birk,et al.
ASC-Based Acceleration in an FPGA with a Processor Core Using Software-Only Skills
,
2006,
2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[3]
Oskar Mencer,et al.
ASC: a stream compiler for computing with FPGAs
,
2006,
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[4]
William D. Smith,et al.
Towards an RCC-Based Accelerator for Computational Fluid Dynamics Applications
,
2004,
The Journal of Supercomputing.
[5]
Keith D. Underwood,et al.
FPGAs vs. CPUs: trends in peak floating-point performance
,
2004,
FPGA '04.
[6]
Skordos,et al.
Initial and boundary conditions for the lattice Boltzmann method.
,
1993,
Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.