Memory hierarchy management for iterative graph structures

The increasing gap in processor and memory speeds has forced microprocessors to rely on deep cache hierarchies to keep the processors from starving for data. For many applications, this results in a wide disparity between sustained and peak achievable speed. Applications need to be tuned to processor and memory system architectures for cache locality, memory layout and data prefetch and reuse. In this paper we investigate optimizations for unstructured iterative applications in which the computational structure remains static or changes only slightly through iterations. Our methods reorganize the data elements to obtain better memory system performance without modifying code fragments. Our experimental results show that the overall time can be reduced significantly using our optimizations. Further, the overhead of our methods is small enough that they are applicable even if the computational structure does nor substantially change for tens of iterations.