Exploring Speculative Techniques to Improve the Memory System Performance

As processor clock speeds have increased along with microarchitectural innovations, the gap between processor and memory performance has become a greater bottleneck and improvements in memory system design have become more important. This dissertation focuses on improving memory performance through the addition of novel functionalities in the memory system. Specifically, we have proposed two different techniques to hide the latency for memory accesses: Incorrect Speculation and Address Correlation. Both techniques, while based on different ideas, try to reduce or eliminate the misses in the data cache either by prefetching or data forwarding. The two techniques of prefetching and data forwarding by correlated addresses are complementary, one trying to bring the data into the cache before it is requested by the processor, and the other trying to forward the data that is already residing in the cache at one or more correlated addresses on a miss of a requested address. The speculated execution of threads in a multithreaded architecture, plus the branch prediction used in each thread execution units, allows many instructions to be executed speculatively, that is, before it is known whether they actually will be needed by the program. We have examined how the load instructions executed on what turn out to be incorrectly executed program paths impact the memory system performance. We have

[1]  Jenn-Yuan Tsai,et al.  The superthreaded architecture: thread pipelining with run-time data dependence checking and control speculation , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[2]  Jun Yang,et al.  Load redundancy removal through instruction reuse , 2000, Proceedings 2000 International Conference on Parallel Processing.

[3]  Glenn Reinman,et al.  Fetch directed instruction prefetching , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[4]  Norman P. Jouppi,et al.  How useful are non-blocking loads, stream buffers and speculative execution in multiple issue processors? , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[5]  Brad Calder,et al.  Instruction recycling on a multiple-path processor , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[6]  Mikko H. Lipasti,et al.  Silent stores for free , 2000, MICRO 33.

[7]  Ying Chen,et al.  Using incorrect speculation to prefetch data in a concurrent multithreaded processor , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[8]  李幼升,et al.  Ph , 1989 .

[9]  James E. Smith,et al.  A study of branch prediction strategies , 1981, ISCA '98.

[10]  T. Ozawa,et al.  Cache miss heuristics and preloading techniques for general-purpose programs , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[11]  Mikko H. Lipasti,et al.  Value locality and load value prediction , 1996, ASPLOS VII.

[12]  Trevor N. Mudge,et al.  Wrong-path instruction prefetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[13]  David J. Lilja,et al.  Exploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions , 2002, Euro-Par.

[14]  Jun Yang,et al.  Frequent value locality and value-centric data cache design , 2000, SIGP.

[15]  F. Gabbay Speculative Execution based on Value Prediction Research Proposal towards the Degree of Doctor of Sciences , 1996 .

[16]  Gurindar S. Sohi,et al.  Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[17]  David Bernstein,et al.  Compiler techniques for data prefetching on the PowerPC , 1995, PACT.

[18]  Theo Ungerer,et al.  Multithreaded Processors , 2002, Comput. J..

[19]  Trevor N. Mudge,et al.  The effect of speculative execution on cache performance , 1994, Proceedings of 8th International Parallel Processing Symposium.

[20]  Todd C. Mowry,et al.  Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.

[21]  Janak H. Patel,et al.  Data prefetching in multiprocessor vector cache memories , 1991, ISCA '91.

[22]  Gurindar S. Sohi,et al.  Understanding the differences between value prediction and instruction reuse , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[23]  Ravi Pendse,et al.  Selective prefetching: prefetching when only required , 1999, 42nd Midwest Symposium on Circuits and Systems (Cat. No.99CH36356).

[24]  Alexander V. Veidenbaum,et al.  Compiler-directed data prefetching in multiprocessors with memory hierarchies , 1990, ICS '90.

[25]  Christopher Hughes,et al.  Speculative precomputation: long-range prefetching of delinquent loads , 2001, ISCA 2001.

[26]  Jean-Loup Baer,et al.  A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.

[27]  Jun Yang,et al.  Frequent value compression in data caches , 2000, MICRO 33.

[28]  David J. Lilja,et al.  Data prefetch mechanisms , 2000, CSUR.

[29]  Mikko H. Lipasti,et al.  Characterization of silent stores , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).

[30]  A. J. KleinOsowski,et al.  MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research , 2002, IEEE Computer Architecture Letters.

[31]  Mikko H. Lipasti,et al.  Partial resolution in branch target buffers , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[32]  Dirk Grunwald,et al.  Confidence estimation for speculation control , 1998, ISCA.

[33]  David J. Lilja,et al.  Address Correlation: Exceeding the Limits of Locality , 2003, IEEE Computer Architecture Letters.

[34]  Jean-Loup Baer,et al.  Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.

[35]  Jian Huang,et al.  The Superthreaded Processor Architecture , 1999, IEEE Trans. Computers.

[36]  Mikko H. Lipasti Value locality and speculative execution , 1998 .

[37]  Yale N. Patt,et al.  A Comparison Of Dynamic Branch Predictors That Use Two Levels Of Branch History , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[38]  J. Gregory Steffan The Potential for Thread-Level Data Speculat ion in Tight ly-Coupled Mult iprocessors , 1997 .

[39]  Margaret Martonosi,et al.  Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques , 1999, IEEE Trans. Computers.

[40]  Anoop Gupta,et al.  Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..

[41]  Rajiv Gupta,et al.  Value prediction in VLIW machines , 1999, ISCA.

[42]  Chi-Keung Luk,et al.  Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[43]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[44]  Todd C. Mowry,et al.  The Potential for Thread-level Data Speculation in Tightly-coupled Multiprocessors , 1997 .

[45]  Rajiv Gupta,et al.  Global context-based value prediction , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[46]  Jenn-Yuan Tsai,et al.  Program Optimization for Concurrent Multithreaded Architectures , 1997, LCPC.

[47]  J. Liang,et al.  Designing the Agassiz Compiler for Concurrent Multithreaded Architectures , 1999, LCPC.

[48]  Douglas J. Joseph,et al.  Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[49]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[50]  Jenn-Yuan Tsai,et al.  Performance study of a concurrent multithreaded processor , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[51]  Glenn Reinman,et al.  A scalable front-end architecture for fast instruction delivery , 1999, ISCA.

[52]  Jun Yang,et al.  Energy efficient Frequent Value data Cache design , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[53]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[54]  Michael D. Smith,et al.  Improving the accuracy of static branch prediction using branch correlation , 1994, ASPLOS VI.

[55]  Mikko H. Lipasti,et al.  Silent Stores and Store Value Locality , 2001, IEEE Trans. Computers.

[56]  Andreas Moshovos,et al.  Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.

[57]  Doug Burger,et al.  Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .

[58]  James E. Smith,et al.  Prefetching in supercomputer instruction caches , 1992, Proceedings Supercomputing '92.

[59]  Jun Yang,et al.  Energy-efficient load and store reuse , 2001, ISLPED '01.

[60]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[61]  Michel Dubois,et al.  Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[62]  G.S. Sohi,et al.  Dynamic Instruction Reuse , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[63]  S. McFarling Combining Branch Predictors , 1993 .

[64]  Joseph T. Rahmeh,et al.  Improving the accuracy of dynamic branch prediction using branch correlation , 1992, ASPLOS V.

[65]  Mikko H. Lipasti,et al.  Temporally silent stores , 2002, ASPLOS X.

[66]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[67]  Joseph A. Fisher,et al.  Predicting conditional branch directions from previous runs of a program , 1992, ASPLOS V.

[68]  Jun Yang,et al.  Frequent value locality and its applications , 2002, TECS.