论文信息 - Linear Array For Efficient Execution Of Partitioned Matrix Algorithms

Linear Array For Efficient Execution Of Partitioned Matrix Algorithms

We propose a class-specific linear array suitable for partitioned execution of matrix algorithms, which achieves high efficiency, exploits pipelining within cells in a simple manner, has off cells communication rate lower than computation rate, and has a small storage per cell (whose size is independent of the size of problems). This array is well suited to use the MMG method, a data-dependency graph-based mapping technique. The MMG method has capabilities to realize fixed-size data and partitioned problems as algorithm-specific arrays, and to map algorithms onto class-specific arrays. The array proposed here uses the mapping capabilities of the method, which combine coalescing and cut-and-pile as partition strategies. Mapping is illustrated using the LU-decomposition algorithm; results obtained from mapping other algorithms are also indicated. Performance estimates of the mappings show that, for example, LU-decomposition of a 2000 by 2000 matrix computed in a linear array with 100-cells, two operation units per cell in a 4-stage pipeline, and 50 [nsec] clock period (i.e., 4000 [Mflops]), achieves 87% efficiency (3480 [Mflops]). This performance is obtained while requiring communication among cells of only 5 [Mwords/sec] and peak external I/O bandwidth for the entire array also of 5 [Mwords/sec]. Moreover, for a problem of this size, the use of cut-and-pile leads to storage requirements of only 8000 words per memory module.

Tomás Lang | Jaime H. Moreno | T. Lang | J. Moreno

[1] S. Kung,et al. VLSI Array processors , 1985, IEEE ASSP Magazine.

[2] W. E. Gentleman. Least Squares Computations by Givens Transformations Without Square Roots , 1973 .

[3] David E. Foulser,et al. The Saxpy Matrix-1: A General-Purpose Systolic Computer , 1987, Computer.

[4] K. Wojtek Przytula,et al. The Systolic/Cellular System for Signal Processing , 1987, Computer.

[5] H. T. Kung,et al. The Warp Computer: Architecture, Implementation, and Performance , 1987, IEEE Transactions on Computers.

[6] Thomas C. Henderson,et al. Video analysis transputer array , 1988 .

[7] Tomás Lang,et al. Arrays For Partitioned Matrix Algorithms: Tradeoffs Between Cell Storage And Cell Bandwidth , 1989, Optics & Photonics.

[8] H. T. Kung. Why systolic architectures? , 1982, Computer.

[9] J. H. Moreno. Comparing design methods based on index-dependencies and on data-dependencies , 1990 .

[10] H. T. Kung,et al. The Domain Parallel Computation Model On Warp , 1989, Optics & Photonics.

[11] Benjamin W. Wah,et al. Systematic approaches to the design of algorithmically specified systolic arrays , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12] J. H. Moreno. Matrix computations on mesh arrays , 1990 .

[13] Franklin T. Luk,et al. SLAPP: A Systolic Linear Algebra Parallel Processor , 1987, Computer.

[14] Mateo Valero,et al. Partitioning: An Essential Step in Mapping Algorithms Into Systolic Array Processors , 1987, Computer.