The effects of thread placement on the KSR1

This paper describes a effects of thread placement on memory access times measurement study on the Kendall Square KSR1 multiprocessor. The KSR1 uses a conventional shared memory programming model in a distributed memory architecture based on a ring of rings of 64-bit superscalar microprocessors. Memory consists of local cache memories attached to each processor and is managed in a cache-only memory architecture (COMA) fashion. Experiments run on the KSR1 across a variety of thread configurations show that shared memory access is accelerated through strategic placement of threads which share data. The experiments "stress test" the automatic prefetching feature of the hardware. Strategies to keep the KSR1 memory access times nearly constant even when the number of participating threads increases are proposed.<<ETX>>