FPGA acceleration of Sparse Matrix-Vector Multiplication based on Network-on-Chip

A new design concept for accelerating Sparse Matrix-Vector Multiplication (SMVM) in FPGA by using Network-on-Chip (NoC) is presented. In traditional circuit design on-chip communications have been designed with dedicated point-to-point interconnections or shared buses. Therefore, regular data transfer is the major concern of many parallel implementations. However, when dealing with the SMVM operation, which is the main step of most iterative algorithms for solving systems of linear equations, the required data transfers are usually dependent on the sparsity structure of the matrix and can be extremely irregular. Using a NoC architecture makes it possible to deal with arbitrary structure of the data transfers, i.e. with arbitrary structured sparse matrices. In this paper, a configurable interface is presented which can generate the pipelined SMVM calculator based on NoC architecture with size of 2×2, 4×4, ..., p×p (p ∈ ℕ). The implementation is done in IEEE-754 single floating-point precision on the Xilinx Virtex-6 FPGA.

[1]  Gregory D. Peterson,et al.  Sparse Matrix-Vector Multiplication Design on FPGAs , 2007, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007).

[2]  Uwe Schwiegelshohn,et al.  Sparse matrix-vector multiplication on a systolic array , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[3]  Chita R. Das,et al.  A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[4]  Warren J. Gross,et al.  FPGA architecture and implementation of sparse matrix-vector multiplication for the finite element method , 2008, Comput. Phys. Commun..

[5]  Viktor K. Prasanna,et al.  Sparse Matrix Computations on Reconfigurable Hardware , 2007, Computer.

[6]  Andrew B. Kahng,et al.  A power-constrained MPU roadmap for the International Technology Roadmap for Semiconductors (ITRS) , 2009, 2009 International SoC Design Conference (ISOCC).

[7]  Séamas McGettrick,et al.  An FPGA architecture for the Pagerank eigenvector problem , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[8]  Michael Garland,et al.  Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .

[9]  Shahriar Mirabbasi,et al.  System-on-Chip: Reuse and Integration , 2006, Proceedings of the IEEE.

[10]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[11]  André DeHon,et al.  Floating-point sparse matrix-vector multiply for FPGAs , 2005, FPGA '05.

[12]  R. Bisseling SPARSE MATRIX–VECTOR MULTIPLICATION , 2004 .

[13]  Jürgen Götze,et al.  Sparse Matrix-Vector Multiplication Based on Network-on-Chip: On Data Mapping , 2012, 2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming.

[14]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .

[15]  Viktor K. Prasanna,et al.  Sparse Matrix-Vector multiplication on FPGAs , 2005, FPGA '05.

[16]  L. Benini,et al.  Xpipes: a network-on-chip architecture for gigascale systems-on-chip , 2004, IEEE Circuits and Systems Magazine.

[17]  Axel Jantsch,et al.  Network on Chip : An architecture for billion transistor era , 2000 .

[18]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[19]  Wayne H. Wolf,et al.  The future of multiprocessor systems-on-chips , 2004, Proceedings. 41st Design Automation Conference, 2004..

[20]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[21]  K PrasannaViktor,et al.  Sparse Matrix Computations on Reconfigurable Hardware , 2007 .

[22]  Nachiket Kapre,et al.  Optimistic Parallelization of Floating-Point Accumulation , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[23]  David Gregg,et al.  FPGA Based Sparse Matrix Vector Multiplication using Commodity DRAM Memory , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[24]  Luca Fanucci,et al.  Low-Complexity Link Microarchitecture for Mesochronous Communication in Networks-on-Chip , 2008, IEEE Transactions on Computers.

[25]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[26]  Andrew B. Kahng,et al.  Scaling: More than Moore's law , 2010, IEEE Design & Test of Computers.