Improving cache performance in dynamic applications through data and computation reorganization at run time

With the rapid improvement of processor speed, performance of the memory hierarchy has become the principal bottleneck for most applications. A number of compiler transformations have been developed to improve data reuse in cache and registers, thus reducing the total number of direct memory accesses in a program. Until now, however, most data reuse transformations have been static---applied only at compile time. As a result, these transformations cannot be used to optimize irregular and dynamic applications, in which the data layout and data access patterns remain unknown until run time and may even change during the computation.In this paper, we explore ways to achieve better data reuse in irregular and dynamic applications by building on the inspector-executor method used by Saltz for run-time parallelization. In particular, we present and evaluate a dynamic approach for improving both computation and data locality in irregular programs. Our results demonstrate that run-time program transformations can substantially improve computation and data locality and, despite the complexity and cost involved, a compiler can automate such transformations, eliminating much of the associated run-time overhead.

[1]  Chandra Krintz,et al.  Cache-conscious data placement , 1998, ASPLOS VIII.

[2]  Wei Li,et al.  Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.

[3]  Ken Kennedy,et al.  Improving register allocation for subscripted variables , 1990, PLDI '90.

[4]  Ken Kennedy,et al.  An Implementation of Interprocedural Bounded Regular Section Analysis , 1991, IEEE Trans. Parallel Distributed Syst..

[5]  Santosh G. Abraham,et al.  Data and program restructuring of irregular applications for cache-coherent multiprocessor , 1994, ICS '94.

[6]  Ken Kennedy,et al.  Improving memory hierarchy performance for irregular applications , 1999, ICS '99.

[7]  Chau-Wen Tseng,et al.  Improving compiler and run-time support for adaptive irregular codes , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[8]  Nenad Nedeljkovic,et al.  Data distribution support on distributed shared memory multiprocessors , 1997, PLDI '97.

[9]  David G. Kirkpatrick,et al.  On the completeness of a generalized matching problem , 1978, STOC.

[10]  Joel H. Saltz,et al.  ICASE Report No . 92-12 / iVG / / ff 3 J / ICASE THE DESIGN AND IMPLEMENTATION OF A PARALLEL UNSTRUCTURED EULER SOLVER USING SOFTWARE PRIMITIVES , 2022 .

[11]  Joel H. Saltz,et al.  Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures , 1994, J. Parallel Distributed Comput..

[12]  Susan J. Eggers,et al.  Reducing false sharing on shared memory multiprocessors through compile time data transformations , 1995, PPOPP '95.

[13]  Erik Brunvand,et al.  Memory System Support for Irregular Applications , 1998, LCR.

[14]  Sanjay Ranka,et al.  Memory hierarchy management for iterative graph structures , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[15]  Khalid Omar Thabit,et al.  Cache management by the compiler , 1982 .

[16]  Matthew L. Seidl,et al.  Segregating heap objects by reference behavior and lifetime , 1998, ASPLOS VIII.

[17]  Duncan H. Lawrie,et al.  On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations , 1981, IEEE Transactions on Computers.

[18]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[19]  Monica S. Lam,et al.  Data and computation transformations for multiprocessors , 1995, PPOPP '95.

[20]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.