Measuring the Effects of Thread Placement on the Kendall Square KSR1

This paper describes a measurement study of the effects of thread placement on memory access times on the Kendall Square multiprocessor, the KSR1. The KSR1 uses a conventional shared memory programming model in a distributed memory architecture. The architecture is based on a ring of rings of 64-bit superscalar microprocessors. The KSR1 has a Cache-Only Memory Architecture (COMA). Memory consists of the local cache memories attached to each processor. Whenever an address is accessed, the data item is automatically copied to the local cache memory module, so that access times for subsequent references will be minimal. Experiments run on the KSR1 across a wide variety of thread configurations show that shared memory access is accelerated through strategic placement of threads which share data. The results indicate strategies for improving the performance of applications programs, and illustrate that KSR1 memory access times can remain nearly constant even when the number of participating threads increases.