A self-repairing prefetcher in an event-driven dynamic optimization framework

Software prefetching has been demonstrated as a powerful technique to tolerate long load latencies. However, to be effective, prefetching must target the most critical (frequently missing) loads, and prefetch them sufficiently far in advance. This is difficult to do correctly with a static optimizer, because locality characteristics and cache latencies vary across data inputs and across different machines. This paper presents a mechanism that dynamically inserts prefetch instructions into frequently executed hot traces. Hot traces are dynamically analyzed to identify delinquent loads and the appropriate prefetch distance for those loads. Those prefetches are then inserted into the hot trace. The low overhead of the event-driven dynamic optimization system allows the optimizer to continuously monitor the performance of the software prefetches. This is done to find an accurate and stable prefetch distance and to adapt to changes in program behavior using what we call self-repairing prefetching. Relative to the baseline hardware stride prefetching, we find a total 23% improvement when we use the self-repairing mechanism to adoptively discover the best prefetch distance for each load, which is 12% better performance than dynamic prefetching techniques without adaptive repairing.

[1]  Donald Yeung,et al.  Design and evaluation of compiler algorithms for pre-execution , 2002, ASPLOS X.

[2]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[3]  MoshovosAndreas,et al.  Dependence based prefetching for linked data structures , 1998 .

[4]  T. Sherwood,et al.  Predictor-directed stream buffers , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[5]  John Paul Shen,et al.  Speculative precomputation: long-range prefetching of delinquent loads , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[6]  Martin Hirzel,et al.  Dynamic hot data stream prefetching for general-purpose programs , 2002, PLDI '02.

[7]  Toshio Nakatani,et al.  Stride prefetching by dynamically inspecting objects , 2003, PLDI '03.

[8]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[9]  G. Sohi,et al.  Effective jump-pointer prefetching for linked data structures , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).

[10]  Youfeng Wu,et al.  Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching , 2002, PLDI '02.

[11]  Weifeng Zhang,et al.  An event-driven multithreaded dynamic optimization framework , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[12]  Martin Burtscher,et al.  Future execution: a hardware prefetching technique for chip multiprocessors , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[13]  Chi-Keung Luk,et al.  Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[14]  Norman P. Jouppi,et al.  Memory-System Design Considerations for Dynamically-Scheduled Processors , 1997, ISCA.

[15]  John Paul Shen,et al.  Dynamic speculative precomputation , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[16]  Brad Calder,et al.  Discovering and Exploiting Program Phases , 2003, IEEE Micro.

[17]  Josep Torrellas,et al.  Speeding up irregular applications in shared-memory multiprocessors: memory binding and group prefetching , 1995, ISCA.

[18]  Wei-Chung Hsu,et al.  Design and Implementation of a Lightweight Dynamic Optimization System , 2004, J. Instr. Level Parallelism.

[19]  Yoji Yamada,et al.  Data relocation and prefetching for programs with large data sets , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[20]  James E. Smith,et al.  Prefetching in supercomputer instruction caches , 1992, Proceedings Supercomputing '92.

[21]  Dean M. Tullsen,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[22]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[23]  Daeyeon Park,et al.  Improving the effectiveness of software prefetching with adaptive executions , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[24]  Kathryn S. McKinley,et al.  Data flow analysis for software prefetching linked data structures in Java , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[25]  Todd C. Mowry,et al.  Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.

[26]  Dionisios N. Pnevmatikatos,et al.  Slice-processors: an implementation of operation-based prediction , 2001, ICS '01.

[27]  Gurindar S. Sohi,et al.  Speculative data-driven multithreading , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[28]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[29]  Andreas Moshovos,et al.  Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.

[30]  Wei-Chung Hsu,et al.  Continuous Adaptive Object-Code Re-optimization Framework , 2004, Asia-Pacific Computer Systems Architecture Conference.

[31]  Ken Kennedy,et al.  Software prefetching , 1991, ASPLOS IV.

[32]  Jignesh M. Patel,et al.  Data prefetching by dependence graph precomputation , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[33]  Weng-Fai Wong,et al.  Compiler orchestrated prefetching via speculation and predication , 2004, ASPLOS XI.