Load Balancing vs. Locality Management in Shared-Memory Multiprocessors

A parallel application executes most efficiently when its workload is evenly distributed among the available processors, and processes are located close to their data. There is often a conflict between the goals of load balancing and locality management policies however, which many existing systems resolve in favor of load balancing. In this paper we use both experimentation and simulation to investigate the relationship between load balancing and locality management in shared-memory multiprocessors. Our experiments with applications on the BBN Butterfly multiprocessor show that although both techniques can improve application performance, locality management should be done before load balancing in many cases. Our simulations show that even in cases where there is a significant variation in the completion time of processes, or where the remote-to-local access time ratio is small, maximizing locality first and balancing load second usually results in the best performance. We conclude that the scheduler should not be oblivious to the location of data, as is the case with the popular central work queue model; instead, the location of data should be the primary factor influencing the initial assignment of processes to processors.