FPGA-based coprocessor for matrix algorithms implementation

Matrix algorithms are important in many types of applications including image and signal processing. These areas require enormous computing power. A close examination of the algorithms used in these, and related, applications reveals that many of the fundamental actions involve matrix operations such as matrix multiplication which is of O (N3) on a sequential computer and O (N3/p) on a parallel system with p processors complexity. This paper presents an investigation into the design and implementation of different matrix algorithms such as matrix operations, matrix transforms and matrix decompositions using an FPGA based environment. Solutions for the problem of processing large matrices have been proposed. The proposed system architectures are scalable, modular and require less area and time complexity with reduced latency when compared with existing structures.

[1]  S. S. Nayak,et al.  High throughput VLSI implementation of discrete orthogonal transforms using bit-level vector-matrix multiplier , 1999 .

[2]  b. mccollum,et al.  The Queen's University of Belfast , 1955, Nature.

[3]  Abbes Amira A custom coprocessor for matrix algorithms , 2001 .

[4]  Russell Tessier,et al.  Fast place and route approaches for fpgas , 1999 .

[5]  D. Trainor,et al.  Rapid design of complex DSP cores , 1997, Proceedings of the 23rd European Solid-State Circuits Conference.

[6]  Abbes Amira,et al.  Accelerating Matrix Product on Reconfigurable Hardware for Signal Processing , 2001, FPL.

[7]  Roger F. Woods,et al.  Architectural Strategies for Implementing an Image Processing Algorithm on XC6000 FPGA , 1996, FPL.

[8]  S. Kung,et al.  VLSI Array processors , 1985, IEEE ASSP Magazine.

[9]  Long-Wen Chang,et al.  A bit level systolic array for Walsh-Hadamard transforms , 1993, Signal Process..

[10]  Graham M. Megson,et al.  Triangular systolic arrays for matrix product and factorisation , 1988 .

[11]  Abbes Amira,et al.  Design of efficient architectures for discrete orthogonal transforms using bit level systolic structures , 2002 .

[12]  Russell Tessier,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. Reconfigurable Computing for Digital Signal Processing: A Survey ∗ , 1999 .

[13]  Graham M. Megson,et al.  The systolic array genetic algorithm, an example of systolic arrays as a reconfigurable design methodology , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[14]  David J. Evans,et al.  Improved Matrix Product Computation Using Double-Pipeline Systolic Arrays , 1988, Comput. J..

[15]  Abbes Amira,et al.  An FPGA based Walsh Hadamard transforms , 2001, ISCAS 2001. The 2001 IEEE International Symposium on Circuits and Systems (Cat. No.01CH37196).

[16]  Sun-Yuan Kung,et al.  The use of data dependence graphs in the design of bit-level systolic arrays , 1990, IEEE Trans. Acoust. Speech Signal Process..

[17]  Graham M. Megson,et al.  Automatic derivation of systolic algorithms for Kalman filtering , 1994 .

[18]  Abbes Amira,et al.  A high throughput FPGA implementation of a bit-level matrix-matrix product , 2000, Proceedings of the 43rd IEEE Midwest Symposium on Circuits and Systems (Cat.No.CH37144).

[19]  Abbes Amira,et al.  Novel FPGA implementations of Walsh-Hadamard transforms for signal processing , 2001 .

[20]  Abbes Amira,et al.  A high throughput FPGA implementation of a bit-level matrix product , 2000, 2000 IEEE Workshop on SiGNAL PROCESSING SYSTEMS. SiPS 2000. Design and Implementation (Cat. No.00TH8528).

[21]  Michael J. Flynn,et al.  PAM-Blox: high performance FPGA design for adaptive computing , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[22]  Alex K. Jones,et al.  A MATLAB compiler for distributed, heterogeneous, reconfigurable computing systems , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).