Accelerating unstructured finite volume computations on field‐programmable gate arrays

In the paper, an field‐programmable gate array (FPGA)‐based framework is described to efficiently accelerate unstructured finite volume computations where the same mathematical expression has to be evaluated at every point of the mesh. The irregular memory access patterns caused by the unstructured spatial discretization are eliminated by a novel mesh node reordering technique, and a special architecture is designed to fully utilize the benefits of the predictable memory access patterns. In the proposed architecture, a fixed‐size moving window of the input stream of the reordered state variables is cached into the on‐chip memory and a pipelined chain of processing elements, which gets input only from the fast on‐chip memory, is used to carry out the computations. The arithmetic unit (AU) of the processing elements is generated from the data flow graph extracted from the given mathematical expression. The data flow graph is partitioned with a novel graph partitioning algorithm to break up the AU into smaller locally controlled parts, which can be more efficiently implemented in FPGA than the globally controlled AU. The proposed architecture and algorithms are presented via a case study solving the Euler equations on an unstructured mesh. On the currently available largest FPGA, the generated architecture contains three processing elements working in a pipelined fashion to provide one order of magnitude speedup compared with a high performance microprocessor and three times speedup compared with a high performance graphics processing unit. Copyright © 2013 John Wiley & Sons, Ltd.

[1]  Satoru Yamamoto,et al.  Systolic Architecture for Computational Fluid Dynamics on FPGAs , 2007, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007).

[2]  Z. Nagy,et al.  CNN-UM based transversely isotropic elastic wave propagation simulation , 2007, 2007 18th European Conference on Circuit Theory and Design.

[3]  T. Chung Computational Fluid Dynamics: FOUR. AUTOMATIC GRID GENERATION, ADAPTIVE METHODS, AND COMPUTING TECHNIQUES , 2002 .

[4]  J. Anderson,et al.  Computational fluid dynamics : the basics with applications , 1995 .

[5]  Péter Szolgay,et al.  Efficient mapping of mathematical expressions to FPGAs: Exploring different design methodologies , 2011, 2011 20th European Conference on Circuit Theory and Design (ECCTD).

[6]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[7]  D. Birchall,et al.  Computational Fluid Dynamics , 2020, Radial Flow Turbocompressors.

[8]  Neil W. Bergmann,et al.  Automatic Self-Reconfiguration of System-on-Chip Peripherals , 2007 .

[9]  J.-C. Luo,et al.  Algorithms for reducing the bandwidth and profile of a sparse matrix , 1992 .

[10]  Dennis W. Prather,et al.  FPGA-based acceleration of the 3D finite-difference time-domain method , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[11]  Martin Langhammer,et al.  FPGA Floating Point Datapath Compiler , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[12]  Isaac Plana,et al.  GRASP and path relinking for the matrix bandwidth minimization , 2004, Eur. J. Oper. Res..

[13]  William G. Poole,et al.  An algorithm for reducing the bandwidth and profile of a sparse matrix , 1976 .

[14]  Nadine Gottschalk,et al.  Vlsi Physical Design From Graph Partitioning To Timing Closure , 2016 .

[15]  Christos H. Papadimitriou,et al.  The NP-Completeness of the bandwidth minimization problem , 1976, Computing.

[16]  Martin Isenburg,et al.  Streaming computation of Delaunay triangulations , 2006, ACM Trans. Graph..

[17]  Florent de Dinechin,et al.  Designing Custom Arithmetic Data Paths with FloPoCo , 2011, IEEE Design & Test of Computers.

[18]  Herman Deconinck,et al.  Space-time residual distribution schemes for hyperbolic conservation laws on unstructured linear finite elements , 2002 .

[19]  J. Remacle,et al.  Gmsh: A 3‐D finite element mesh generator with built‐in pre‐ and post‐processing facilities , 2009 .

[20]  Mitsuhiko Toda,et al.  Methods for Visual Understanding of Hierarchical System Structures , 1981, IEEE Transactions on Systems, Man, and Cybernetics.

[21]  P. Szolgay,et al.  Analysis of a GPU based CNN implementation , 2012, 2012 13th International Workshop on Cellular Nanoscale Networks and their Applications.

[22]  John G. Lewis,et al.  Sparse matrix test problems , 1982, SGNM.

[23]  Péter Szolgay,et al.  Mapping of high performance data-flow graphs into programmable logic devices , 2010 .

[24]  Jason D. Bakos,et al.  A Sparse Matrix Personality for the Convey HC-1 , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[25]  Christophe Geuzaine,et al.  Gmsh: A 3‐D finite element mesh generator with built‐in pre‐ and post‐processing facilities , 2009 .

[26]  Zoltán Nagy,et al.  Simulation of 2D inviscid, adiabatic, compressible flows on emulated digital CNN-UM , 2009 .

[27]  E. Cuthill,et al.  Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[28]  André DeHon,et al.  Floating-point sparse matrix-vector multiply for FPGAs , 2005, FPGA '05.

[29]  Warren J. Gross,et al.  FPGA architecture and implementation of sparse matrix-vector multiplication for the finite element method , 2008, Comput. Phys. Commun..

[30]  DuboisDavid,et al.  Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer with Application , 2010 .

[31]  Mark T. Jones,et al.  Unstructured mesh computations on CCMs , 2000 .

[32]  Mi Lu,et al.  Time domain numerical simulation for transient waves on reconfigurable coprocessor platform , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).