Mapping Hierarchical Multiple File VHDL Kernels onto an SRC-7 High Performance Reconfigurable Computer

The increasing computational requirements of today's software systems have led researchers to investigate ways of accelerating military and scientific computing applications. Contemporary field programmable gate arrays (FPGAs) are now equipped with multimillion gate logic fabrics, faster clock rates, reasonably large on-chip memory, and fast I/O resources for off-chip communication. The use of FPGAs as reconfigurable computational units complementing a fixed computational device such as a general-purpose processor (GPP) is the basic idea behind what are known as high performance reconfigurable computers (HPRCs). These exciting architectures allow development of reconfigurable processors that target the computationally intensive parts of a given application. Ideally, one should use a high-level language (HLL) rather than a hardware description language (HDL) to implement HPRC-based applications. However, in order to accelerate some applications, an HDL must be used to design computational kernels. The HPRC used in the joint research project between the U.S. Army Engineer Research and Development Center {DoD Supercomputing} Resource Center (ERDC DSRC) and Jackson State University (JSU) employs the SRC Computers' Carte development environment. Carte allows application development using a conventional HLL, an HLL-to-HDL compiler, and custom-built VHDL-based kernels ("user macros" in SRC parlance). Currently, the off-the-shelf Carte mechanism for incorporating user macros does not directly support the common case of a multiple file VHDL hierarchy. This research explores a novel approach that allows multiple file VHDL kernels to be mapped onto the SRC-7 HPRC. The approach facilitates the development of FPGA-based elements via a hybrid technique that uses the Carte HLL-to-HDL compiler in conjunction with multiple file VHDL-based user macros. This paper describes the use of this novel approach to map a parameterized, parallelized, and pipelined FPGA-based sparse matrix vector multiply kernel onto an SRC-7 HPRC. The HPRC-based version runs nearly four times faster than the software-only version.

[1]  Thomas L. Moore,et al.  LLC , 2022, The Fairchild Books Dictionary of Fashion.

[2]  Ying Zhang,et al.  Implementation and Optimization of Sparse Matrix-Vector Multiplication on Imagine Stream Processor , 2007, ISPA.

[3]  Florent de Dinechin,et al.  Generating high-performance custom floating-point pipelines , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[4]  Viktor K. Prasanna,et al.  Sparse Matrix Computations on Reconfigurable Hardware , 2007, Computer.

[5]  Robert J. Brunner,et al.  Accelerating Cosmological Data Analysis with FPGAs , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[6]  Martin C. Herbordt,et al.  Parallel Discrete Event Simulation of Molecular Dynamics Through Event-Based Decomposition , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[7]  Manfred Glesner,et al.  High-performance fpga-based floating-point adder with three inputs , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[8]  Youcef Saad,et al.  A Basic Tool Kit for Sparse Matrix Computations , 1990 .

[9]  Yong Dou,et al.  A Fine-grained Pipelined Implementation of the LINPACK Benchmark on FPGAs , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[10]  Gerald Estrin,et al.  Reconfigurable Computer Origins: The UCLA Fixed-Plus-Variable (F+V) Structure Computer , 2002, IEEE Ann. Hist. Comput..

[11]  Viktor K. Prasanna,et al.  High Performance Linear Algebra Operations on Reconfigurable Systems , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[12]  Khalid H. Abed,et al.  Accelerating a Sparse Matrix Iterative Solver Using a High Performance Reconfigurable Computer , 2010, 2010 DoD High Performance Computing Modernization Program Users Group Conference.

[13]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[14]  E. Cuthill,et al.  Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[15]  Viktor K. Prasanna,et al.  A pipelined-loop-compatible architecture and algorithm to reduce variable-length sets of floating-point data on a reconfigurable computer , 2008, J. Parallel Distributed Comput..

[16]  Thomas Boorman,et al.  Non-Preconditioned Conjugate Gradient on Cell and FPGA Based Hybrid Supercomputer Nodes , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[17]  Gerald Estrin,et al.  Organization of computer systems: the fixed plus variable structure computer , 1960, IRE-AIEE-ACM '60 (Western).