Compiler-based prefetching for recursive data structures

Software-controlled data prefetching offers the potential for bridging the ever-increasing speed gap between the memory subsystem and today's high-performance processors. While prefetching has enjoyed considerable success in array-based numeric codes, its potential in pointer-based applications has remained largely unexplored. This paper investigates compiler-based prefetching for pointer-based applications---in particular, those containing recursive data structures. We identify the fundamental problem in prefetching pointer-based data structures and propose a guideline for devising successful prefetching schemes. Based on this guideline, we design three prefetching schemes, we automate the most widely applicable scheme (greedy prefetching) in an optimizing research compiler, and we evaluate the performance of all three schemes on a modern superscalar processor similar to the MIPS R10000. Our results demonstrate that compiler-inserted prefetching can significantly improve the execution speed of pointer-based codes---as much as 45% for the applications we study. In addition, the more sophisticated algorithms (which we currently perform by hand, but which might be implemented in future compilers) can improve performance by as much as twofold. Compared with the only other compiler-based pointer prefetching scheme in the literature, our algorithms offer substantially better performance by avoiding unnecessary overhead and hiding more latency.

[1]  Mikko H. Lipasti,et al.  Cache miss heuristics and preloading techniques for general-purpose programs , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[2]  Mikko H. Lipasti,et al.  Cache miss heuristics and preloading techniques for general-purpose programs , 1995, MICRO 28.

[3]  J. Torrellas,et al.  Speeding up irregular applications in shared-memory multiprocessors: memory binding and group prefetching , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[4]  Anne Rogers,et al.  Supporting dynamic data structures on distributed-memory machines , 1995, TOPL.

[5]  Anoop Gupta,et al.  Interleaving: a multithreading technique targeting multiprocessors and workstations , 1994, ASPLOS VI.

[6]  Chau-Wen Tseng,et al.  Compiler optimizations for improving data locality , 1994, ASPLOS VI.

[7]  Todd C. Mowry,et al.  Tolerating latency through software-controlled data prefetching , 1994 .

[8]  Alexandru Nicolau,et al.  A general data dependence test for dynamic, pointer-based data structures , 1994, PLDI '94.

[9]  Laurie J. Hendren,et al.  Context-sensitive interprocedural points-to analysis in the presence of function pointers , 1994, PLDI '94.

[10]  LandiWilliam,et al.  Interprocedural modification side effect analysis with pointer aliasing , 1993 .

[11]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[12]  Steven W. K. Tjiang,et al.  Sharlit—a tool for building optimizers , 1992, PLDI '92.

[13]  A. Deutsch,et al.  A storeless model of aliasing and its abstractions using finite representations of right-regular equivalence relations , 1992, Proceedings of the 1992 International Conference on Computer Languages.

[14]  Scott A. Mahlke,et al.  Data access microarchitectures for superscalar processors with compiler-assisted data prefetching , 1991, MICRO 24.

[15]  Jean-Loup Baer,et al.  An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[16]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[17]  Ken Kennedy,et al.  Software prefetching , 1991, ASPLOS IV.

[18]  Anant Agarwal,et al.  APRIL: a processor architecture for multiprocessing , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[19]  Robert H. Halstead,et al.  MASA: a multithreaded processor architecture for parallel symbolic computing , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[20]  Tetsuya Fujita,et al.  A Multithreaded Processor Architecture for Parallel Symbolic Computation. , 1987 .

[21]  Janusz S. Kowalik,et al.  Parallel MIMD computation : the HEP supercomputer and its applications , 1985 .

[22]  C. J. Stephenson,et al.  New methods for dynamic storage allocation (Fast Fits) , 1983, SOSP '83.

[23]  Laurie J. Hendren,et al.  Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C , 1996, POPL '96.

[24]  Michael D. Smith,et al.  Tracing with Pixie , 1991 .

[25]  C. J. Stephenson,et al.  Fast Fits , 1983, SOSP.