Understanding and Designing for Dependent Store/Load Pairs in High Performance Microprocessors

v

[1]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[2]  Mike Johnson,et al.  Superscalar microprocessor design , 1991, Prentice Hall series in innovative technology.

[3]  Brian N. Bershad,et al.  Consistency management for virtually indexed caches , 1992, ASPLOS V.

[4]  Brian K. Bray Specialized Caches To Improve Data Access Performance , 1993 .

[5]  Norman P. Jouppi Cache write policies and performance , 1993, ISCA '93.

[6]  S. McFarling Combining Branch Predictors , 1993 .

[7]  Dirk Grunwald,et al.  Quantifying Behavioral Differences Between C and C++ Programs , 1994 .

[8]  erDavid,et al.  Dynamic Memory Disambiguation Using the Memory Con ict Bu er , 1994 .

[9]  Sally A. McKee,et al.  Experimental implementation of dynamic access ordering , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[10]  Sally A. McKee,et al.  Dynamic Access Ordering: Bounds on Memory Bandwidth , 1994 .

[11]  David L Weaver,et al.  The SPARC architecture manual : version 9 , 1994 .

[12]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[13]  Exploiting short-lived variables in superscalar processors , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[14]  Zero-cycle loads: microarchitecture support for reducing load latency , 1995, MICRO.

[15]  Sally A. McKee,et al.  Access ordering and memory-conscious cache utilization , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[16]  Gary S. Tyson,et al.  A modified approach to data cache management , 1995, MICRO 1995.

[17]  Gurindar S. Sohi,et al.  ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.

[18]  James R. Goodman,et al.  Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[19]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[20]  Edward S. Davidson,et al.  Reducing conflicts in direct-mapped caches with a temporality-based design , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[21]  Karel Driesen,et al.  The direct cost of virtual function calls in C++ , 1996, OOPSLA '96.

[22]  Harvey G. Cragon,et al.  Memory systems and pipelined processors , 1996 .

[23]  Mikko H. Lipasti,et al.  Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[24]  Shlomit S. Pinter,et al.  Tango: a hardware-based data prefetching technique for superscalar processors , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[25]  Mikko H. Lipasti,et al.  Value locality and load value prediction , 1996, ASPLOS VII.

[26]  Yale N. Patt,et al.  Alternative fetch and issue policies for the trace cache fetch mechanism , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[27]  Nigel P. Topham,et al.  A comparison of data prefetching on an access decoupled and superscalar machine , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[28]  Gary S. Tyson,et al.  Improving the accuracy and performance of memory communication through renaming , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[29]  José González,et al.  Speculative execution via address prediction and data prefetching , 1997, ICS '97.

[30]  Mark J. Charney,et al.  Prefetching and memory system behavior of the SPEC95 benchmark suite , 1997, IBM J. Res. Dev..

[31]  Sanjay J. Patel,et al.  Critical Issues Regarding the Trace Cache Fetch Mechanism , 1997 .

[32]  Avi Mendelson,et al.  Can program profiling support value prediction? , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[33]  James E. Smith,et al.  Path-based next trace prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[34]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[35]  Wen-mei W. Hwu,et al.  Run-Time Adaptive Cache Hierarchy Management via Reference Analysis , 1997, ISCA.

[36]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[37]  Andreas Moshovos,et al.  Streamlining inter-operation memory communication via data dependence prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[38]  Kevin Skadron,et al.  Design issues and tradeoffs for write buffers , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[39]  James E. Smith,et al.  The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[40]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor architecture , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[41]  Al Davis,et al.  Improving I/O performance with a conditional store buffer , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[42]  Alvin R. Lebeck,et al.  Exploiting Load Latency Tolerance in Dynamically Scheduled Processors , 1998 .

[43]  F. Gabbay,et al.  The effect of instruction fetch bandwidth on value prediction , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[44]  Sanjay J. Patel,et al.  Improving trace cache effectiveness with branch promotion and trace packing , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[45]  Stéphan Jourdan,et al.  A novel renaming scheme to exploit value temporal locality through physical register reuse and unification , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[46]  Yale N. Patt,et al.  Putting the fill unit to work: dynamic optimizations for trace cache microprocessors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[47]  José González,et al.  The potential of data value speculation to boost ILP , 1998, ICS '98.

[48]  Bruce D. Shriver,et al.  The anatomy of a high-performance microprocessor - a systems perspective , 1998 .

[49]  Improving Memory Access Performance Using a Code Coalescing Unit , 1998 .

[50]  Gary S. Tyson,et al.  Classifying load and store instructions for memory renaming , 1999, ICS '99.

[51]  Lizy Kurian John,et al.  Accurately modeling speculative instruction fetching in trace-driven simulation , 1999, 1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305).

[52]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[53]  Lizy Kurian John,et al.  Issues in the design of store buffers in dynamically scheduled processors , 2000, 2000 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS (Cat. No.00EX422).