论文信息 - Understanding and Designing for Dependent Store/Load Pairs in High Performance Microprocessors - 字舞流文

Understanding and Designing for Dependent Store/Load Pairs in High Performance Microprocessors

v

L. John | Ravi Bhargava

[1] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .

[2] Mike Johnson,et al. Superscalar microprocessor design , 1991, Prentice Hall series in innovative technology.

[3] Brian N. Bershad,et al. Consistency management for virtually indexed caches , 1992, ASPLOS V.

[4] Brian K. Bray. Specialized Caches To Improve Data Access Performance , 1993 .

[5] Norman P. Jouppi. Cache write policies and performance , 1993, ISCA '93.

[6] S. McFarling. Combining Branch Predictors , 1993 .

[7] Dirk Grunwald,et al. Quantifying Behavioral Differences Between C and C++ Programs , 1994 .

[8] erDavid,et al. Dynamic Memory Disambiguation Using the Memory Con ict Bu er , 1994 .

[9] Sally A. McKee,et al. Experimental implementation of dynamic access ordering , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[10] Sally A. McKee,et al. Dynamic Access Ordering: Bounds on Memory Bandwidth , 1994 .

[11] David L Weaver,et al. The SPARC architecture manual : version 9 , 1994 .

[12] David Keppel,et al. Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[13] Exploiting short-lived variables in superscalar processors , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[14] Zero-cycle loads: microarchitecture support for reducing load latency , 1995, MICRO.

[15] Sally A. McKee,et al. Access ordering and memory-conscious cache utilization , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[16] Gary S. Tyson,et al. A modified approach to data cache management , 1995, MICRO 1995.

[17] Gurindar S. Sohi,et al. ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.

[18] James R. Goodman,et al. Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[19] Eric Rotenberg,et al. Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[20] Edward S. Davidson,et al. Reducing conflicts in direct-mapped caches with a temporality-based design , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[21] Karel Driesen,et al. The direct cost of virtual function calls in C++ , 1996, OOPSLA '96.

[22] Harvey G. Cragon,et al. Memory systems and pipelined processors , 1996 .

[23] Mikko H. Lipasti,et al. Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[24] Shlomit S. Pinter,et al. Tango: a hardware-based data prefetching technique for superscalar processors , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[25] Mikko H. Lipasti,et al. Value locality and load value prediction , 1996, ASPLOS VII.

[26] Yale N. Patt,et al. Alternative fetch and issue policies for the trace cache fetch mechanism , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[27] Nigel P. Topham,et al. A comparison of data prefetching on an access decoupled and superscalar machine , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[28] Gary S. Tyson,et al. Improving the accuracy and performance of memory communication through renaming , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[29] José González,et al. Speculative execution via address prediction and data prefetching , 1997, ICS '97.

[30] Mark J. Charney,et al. Prefetching and memory system behavior of the SPEC95 benchmark suite , 1997, IBM J. Res. Dev..

[31] Sanjay J. Patel,et al. Critical Issues Regarding the Trace Cache Fetch Mechanism , 1997 .

[32] Avi Mendelson,et al. Can program profiling support value prediction? , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[33] James E. Smith,et al. Path-based next trace prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[34] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.

[35] Wen-mei W. Hwu,et al. Run-Time Adaptive Cache Hierarchy Management via Reference Analysis , 1997, ISCA.

[36] Christoforos E. Kozyrakis,et al. A case for intelligent RAM , 1997, IEEE Micro.

[37] Andreas Moshovos,et al. Streamlining inter-operation memory communication via data dependence prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[38] Kevin Skadron,et al. Design issues and tradeoffs for write buffers , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[39] James E. Smith,et al. The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[40] Richard E. Kessler,et al. The Alpha 21264 microprocessor architecture , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[41] Al Davis,et al. Improving I/O performance with a conditional store buffer , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[42] Alvin R. Lebeck,et al. Exploiting Load Latency Tolerance in Dynamically Scheduled Processors , 1998 .

[43] F. Gabbay,et al. The effect of instruction fetch bandwidth on value prediction , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[44] Sanjay J. Patel,et al. Improving trace cache effectiveness with branch promotion and trace packing , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[45] Stéphan Jourdan,et al. A novel renaming scheme to exploit value temporal locality through physical register reuse and unification , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[46] Yale N. Patt,et al. Putting the fill unit to work: dynamic optimizations for trace cache microprocessors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[47] José González,et al. The potential of data value speculation to boost ILP , 1998, ICS '98.

[48] Bruce D. Shriver,et al. The anatomy of a high-performance microprocessor - a systems perspective , 1998 .

[49] Improving Memory Access Performance Using a Code Coalescing Unit , 1998 .

[50] Gary S. Tyson,et al. Classifying load and store instructions for memory renaming , 1999, ICS '99.

[51] Lizy Kurian John,et al. Accurately modeling speculative instruction fetching in trace-driven simulation , 1999, 1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305).

[52] Vikas Agarwal,et al. Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[53] Lizy Kurian John,et al. Issues in the design of store buffers in dynamically scheduled processors , 2000, 2000 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS (Cat. No.00EX422).