HARP: An Open Architecture for Parallel Matrix and Signal Processing

Describes and analyzes the Hybrid Array Ring Processor (HARP) architecture. The HARP is an application specific architecture built around a host processor, shared memory, and a set of memory mapped processing cells that are connected both into an open backplane and a bidirectional systolic ring. The architecture is analyzed through detailed simulation of a system implementation based on the Texas Instruments TMS34082 floating point RISC. A bus controller is designed that provides a tightly coupled DMA function that accelerates systolic communication and supports new interleaved transparent communications and reduced overhead message passing. The architecture is benchmarked with the matrix multiplication, FFT, QRD, and SVD algorithms. >

[1]  Charles L. Seitz,et al.  Concurrent VLSI Architectures , 1984, IEEE Transactions on Computers.

[2]  Franklin T. Luk,et al.  On parallel Jacobi orderings , 1989 .

[3]  David E. Foulser,et al.  The Saxpy Matrix-1: A General-Purpose Systolic Computer , 1987, Computer.

[4]  Dan I. Moldovan,et al.  Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays , 1986, IEEE Transactions on Computers.

[5]  H. T. Kung,et al.  Supporting systolic and memory communication in iWarp , 1990, ISCA '90.

[6]  M. Hestenes Inversion of Matrices by Biorthogonalization and Related Results , 1958 .

[7]  J. Robert Jump,et al.  Cross-profiling as an efficient technique in simulating parallel computer systems , 1989, [1989] Proceedings of the Thirteenth Annual International Computer Software & Applications Conference.

[8]  Lynn Conway,et al.  Introduction to VLSI systems , 1978 .

[9]  Fred J. Taylor,et al.  A Multi-purpose VLSI Floating-point Array Processor , 1988, Twenty-Second Asilomar Conference on Signals, Systems and Computers.

[10]  H. T. Kung,et al.  Architecture of the PSC-a programmable systolic chip , 1983, ISCA '83.

[11]  G. Forsythe,et al.  The cyclic Jacobi method for computing the principal values of a complex matrix , 1960 .

[12]  Peter G. Hibbard,et al.  A Parallel Jacobi Diagonalization Algorthm for a Loop Multiple Processor System , 1984, IEEE Transactions on Computers.

[13]  Benjamin W. Wah,et al.  Systolic Arrays: A Survey of Seven Projects , 1987, Computer.

[14]  S. Kung,et al.  VLSI Array processors , 1985, IEEE ASSP Magazine.

[15]  Satoshi Horiike,et al.  A design method of systolic arrays under the constraint of the number of the processors , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Dan I. Moldovan,et al.  On the Analysis and Synthesis of VLSI Algorithms , 1982, IEEE Transactions on Computers.

[17]  H. T. Kung,et al.  Numerically Stable Solution of Dense Systems of Linear Equations Using Mesh-Connected Processors , 1984 .

[18]  Benjamin W. Wah,et al.  Systematic approaches to the design of algorithmically specified systolic arrays , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  R. Schreiber,et al.  On the convergence of the cyclic Jacobi method for parallel block orderings , 1989 .

[20]  John C. Nash,et al.  A One-Sided Transformation Method for the Singular Value Decomposition and Algebraic Eigenproblem , 1975, Computer/law journal.

[21]  Franklin T. Luk,et al.  SLAPP: A Systolic Linear Algebra Parallel Processor , 1987, Computer.

[22]  Benjamin W. Wah,et al.  Guest Editors' Introduction: Systolic Arrays-From Concept to Implementation , 1987, Computer.

[23]  J. H. Wilkinson The algebraic eigenvalue problem , 1966 .

[24]  H. T. Kung Why systolic architectures? , 1982, Computer.

[25]  H. T. Kung,et al.  Warp architecture: From prototype to production , 1899 .

[26]  R. Brent,et al.  The Solution of Singular-Value and Symmetric Eigenvalue Problems on Multiprocessor Arrays , 1985 .

[27]  Ilse C. F. Ipsen,et al.  Systolic Networks for Orthogonal Decompositions , 1983 .