Locality Optimizations for Parallel Machines

This paper focuses on the problem of locality optimizations for high-performance uniprocessor and multiprocessor systems. It shows that the problems of minimizing interprocessor communication and optimizing cache locality can be formulated in a similar manner. It outlines the algorithms to optimize for the various levels of the memory hierarchy simultaneously.