Simulation and performance evaluation of a modularly configurable attached processor

A new architecture for high-performance parallel attached processors is studied in this paper. The unique features are that the attached processor can be configured to match a set of algorithms and its memory controllers can be programmed to fit the access patterns required by the algorithms. As a result, high utilization of the processing logic for given sets of algorithms can be obtained. A simulator with interactive graphic interface is designed to study the performance of the proposed architecture. An example based on matrix multiplication is used for illustration. The simulation results show that a sustained execution rate as high as 95% of the peak speed for matrices with a size of 128/spl times/128 can be achieved in the proposed attached processor architecture. If CMOS technology is chosen to implement the MCAP architecture, a sustained speed of 190 MFLOPS can be obtained for matrix multiplication with four multipliers and four adders.

[1]  David H. Bailey,et al.  NAS parallel benchmark results , 1992, Proceedings Supercomputing '92.

[2]  Jack J. Dongarra,et al.  Linear algebra libraries for high-performance computers: a personal perspective , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[3]  Edward S. Davidson,et al.  An evaluation of Cray X-MP performance on vectorizable Livermore FORTRAN kernels , 1988, ICS '88.