Speculative Precomputation on Chip Multiprocessors
暂无分享,去创建一个
John Paul Shen | Hong Wang | Jeffery A. Brown | Perry Wang | John Shen | George Z. Chrysos | I. Corp | I. Corp | Hong Wang | P. Wang | Jeffery A. Brown
[1] Yale N. Patt,et al. An effective programmable prefetch engine for on-chip caches , 1995, MICRO 1995.
[2] Harsh Sharangpani,et al. Itanium Processor Microarchitecture , 2000, IEEE Micro.
[3] Joel S. Emer,et al. Simultaneous multithreading: multiplying alpha performance , 1999 .
[4] Eric Rotenberg,et al. Slipstream processors: improving both performance and fault tolerance , 2000, SIGP.
[5] Kunle Olukotun,et al. The case for a single-chip multiprocessor , 1996, ASPLOS VII.
[6] Dean M. Tullsen,et al. Fellowship - Simulation And Modeling Of A Simultaneous Multithreading Processor , 1996, Int. CMG Conference.
[7] S. Abraham,et al. Predicating Load Latencies Using Cache Profiling , 1996 .
[8] Martin C. Carlisle,et al. Olden: parallelizing programs with dynamic data structures on distributed-memory machines , 1996 .
[9] Dirk Grunwald,et al. Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[10] Rakesh Krishnaiyer,et al. An Advanced Optimizer for the IA-64 Architecture , 2000, IEEE Micro.
[11] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[12] John Paul Shen,et al. Memory latency-tolerance approaches for Itanium processors: out-of-order execution vs. speculative precomputation , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.
[13] John Paul Shen,et al. Post-pass binary adaptation for software-based speculative precomputation , 2002, PLDI '02.
[14] Gurindar S. Sohi,et al. Speculative data-driven multithreading , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[15] William J. Dally,et al. Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).
[16] Luiz André Barroso,et al. Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[17] D. Scott Wills,et al. Architecture of the Atlas chip-multiprocessor: dynamically parallelizing irregular applications , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).
[18] John Paul Shen,et al. Speculative precomputation: long-range prefetching of delinquent loads , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.
[19] Kunle Olukotun,et al. The Stanford Hydra CMP , 2000, IEEE Micro.
[20] Todd C. Mowry,et al. The potential for using thread-level data speculation to facilitate automatic parallelization , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[21] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[22] Hans Mulder,et al. Introducing the IA-64 Architecture , 2000, IEEE Micro.
[23] Tien-Fu Chen,et al. Alternative implementations of hybrid branch predictors , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[24] Craig Zilles,et al. Execution-based prediction using speculative slices , 2001, ISCA 2001.
[25] Anoop Gupta,et al. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..
[26] Weihaw Chuang,et al. The Intel IA-64 Compiler Code Generator , 2000, IEEE Micro.
[27] John Paul Shen,et al. Dynamic speculative precomputation , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[28] Chi-Keung Luk,et al. Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.