Architectural support for block transfers in a shared-memory multiprocessor

This paper examines how the performance of a shared-memory multiprocessor can be improved by including hardware support for block transfers. A system similar to the Hector multiprocessor developed at the University of Toronto is used as a base architecture. It is shown that such hardware support can improve the performance of initialization code by as much as 50%, but that the amount of improvement depends on the memory access behavior of the program and the way in which the operating system issues block transfer requests.<<ETX>>