Three-dimensional computational wavefronts for matrix product

Abstract A three-dimensional wavefront array for matrix product with minimal block pipelining period of 1 is introduced and compared to existing systolic array architectures for matrix product. An optimal processor-time product of n 3 with cycles defined computationally by two operations is obtained when successive problem instances are considered. The 3-D architecture; is extensible and scalable, is cycle invariant (all respects), is node invariant (all respects), has minimal node complexity of one multiply and one addition per cycle, has unidirectional and local data forwarding in three dimensions, has 100% utilization of processing elements, and has a cycle-invariant one-to-one correspondence between input/output ports and input/output matrix elements.

[1]  Viktor K. Prasanna,et al.  On Synthesizing Optimal Family of Linear Systolic Arrays for Matrix Multiplication , 1991, IEEE Trans. Computers.

[2]  Jenq-Neng Hwang,et al.  Wavefront Array Processors-Concept to Implementation , 1987, Computer.

[3]  Peter R. Cappello,et al.  A Processor-Time-Minimal Systolic Array for Cubical Mesh Algorithms , 1992, IEEE Trans. Parallel Distributed Syst..

[4]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[5]  Jong-Chuang Tsay,et al.  Some New Designs of 2-D Array for Matrix Multiplication and Transitive Closure , 1995, IEEE Trans. Parallel Distributed Syst..

[6]  Chris J. Scheiman,et al.  A Period-Processor-Time-Minimal Schedule for Cubical Mesh Algorithms , 1994, IEEE Trans. Parallel Distributed Syst..

[7]  David J. Evans,et al.  A New Matrix Vector Product Systolic Array , 1994, J. Parallel Distributed Comput..

[8]  Jong-Chuang Tsay,et al.  Design of Efficient Regular Arrays for Matrix Multiplication by Two-Step Regularization , 1995, IEEE Trans. Parallel Distributed Syst..

[9]  D. V. Bhaskar Rao,et al.  Wavefront Array Processor: Language, Architecture, and Applications , 1982, IEEE Transactions on Computers.

[10]  Ramesh C. Agarwal,et al.  A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication , 1994, IBM J. Res. Dev..

[11]  Yuefan Deng,et al.  Parallelizing Strassen's method for matrix multiplication on distributed-memory MIMD architectures☆ , 1995 .

[12]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[13]  S. Kung,et al.  VLSI Array processors , 1985, IEEE ASSP Magazine.

[14]  G. Miel Trends in systolic and cellular computation , 1991 .