Runahead execution: an alternative to very large instruction windows for out-of-order processors
暂无分享,去创建一个
Onur Mutlu | Yale N. Patt | Chris Wilkerson | Jared Stark | O. Mutlu | Y. Patt | C. Wilkerson | J. Stark | Jared Stark
[1] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[2] John Paul Shen,et al. Dynamic speculative precomputation , 2001, MICRO.
[3] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[4] David Kroft,et al. Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.
[5] Douglas J. Joseph,et al. Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[6] Joel S. Emer,et al. Memory dependence prediction using store sets , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).
[7] José González,et al. Dual path instruction processing , 2002, ICS '02.
[8] Trevor N. Mudge,et al. Author retrospective improving data cache performance by pre-executing instructions under a cache miss , 1997, International Conference on Supercomputing.
[9] Eric Sprangle,et al. Increasing processor performance by implementing deeper pipelines , 2002, ISCA.
[10] Trevor Mudge,et al. Improving processor performance by dynamically pre-processing the instruction stream , 1998 .
[11] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[12] Richard E. Kessler,et al. The Alpha 21264 microprocessor , 1999, IEEE Micro.
[13] Eric Rotenberg,et al. A large, fast instruction window for tolerating cache misses , 2002, ISCA.
[14] Andreas Moshovos,et al. Streamlining inter-operation memory communication via data dependence prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[15] Dirk Grunwald,et al. Content-sensitive data prefetching , 2002 .
[16] Craig Zilles,et al. Execution-based prediction using speculative slices , 2001, ISCA 2001.
[17] Kenneth C. Yeager. The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.
[18] Todd C. Mowry,et al. Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.
[19] Chi-Keung Luk,et al. Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.
[20] Daniel A. Jiménez,et al. Dynamic branch prediction with perceptrons , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[21] SprangleEric,et al. Increasing processor performance by implementing deeper pipelines , 2002 .
[22] David J. Sager,et al. The microarchitecture of the Pentium 4 processor , 2001 .
[23] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[24] Maurice V. Wilkes,et al. Slave Memories and Dynamic Storage Allocation , 1965, IEEE Trans. Electron. Comput..
[25] Rajeev Balasubramonian,et al. Dynamically allocating processor resources between nearby and distant ILP , 2001, ISCA 2001.
[26] Yale N. Patt,et al. Simultaneous subordinate microthreading (SSMT) , 1999, ISCA.