The increasing gap in processor and memory speeds has forced microprocessors to rely on deep cache hierarchies to keep the processors from starving for data. For many applications, this results in a wide disparity between sustained and peak achievable speed. Applications need to be tuned to processor and memory system architectures for cache locality, memory layout and data prefetch and reuse. In this paper we investigate optimizations for unstructured iterative applications in which the computational structure remains static or changes only slightly through iterations. Our methods reorganize the data elements to obtain better memory system performance without modifying code fragments. Our experimental results show that the overall time can be reduced significantly using our optimizations. Further, the overhead of our methods is small enough that they are applicable even if the computational structure does nor substantially change for tens of iterations.
[1]
Joel H. Saltz,et al.
Dynamic Remapping of Parallel Computations with Varying Resource Demands
,
1988,
IEEE Trans. Computers.
[2]
Eorge,et al.
Unstructured Graph Partitioning and Sparse Matrix Ordering System Version 2 . 0
,
1995
.
[3]
Viktor K. Decyk,et al.
Optimization of Particle-in-Cell Codes on RISC Processors
,
1996
.
[4]
L. Dagum.
Automatic partitioning of unstructured grids into connected components
,
1993,
Supercomputing '93.
[5]
David F. Bacon,et al.
Compiler transformations for high-performance computing
,
1994,
CSUR.
[6]
Steven Mark Carr,et al.
Memory-hierarchy management
,
1993
.
[7]
Viktor K. Decyk,et al.
Optimization of particle-in-cell codes on reduced instruction set computer processors
,
1996
.