Hiding the misprediction penalty of a resource-efficient high-performance processor
暂无分享,去创建一个
[1] Kenneth C. Yeager. The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.
[2] Justin R. Rattner. Multi-Core to the Masses , 2005, IEEE PACT.
[3] Josep Torrellas,et al. Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[4] José González,et al. Dual path instruction processing , 2002, ICS '02.
[5] Andreas Moshovos. Checkpointing alternatives for high performance, power-aware processors , 2003, ISLPED '03.
[6] John Paul Shen,et al. Reducing branch misprediction penalties via dynamic control independence detection , 1999, ICS '99.
[7] Ravi Rajwar,et al. Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[8] Dirk Grunwald,et al. Confidence estimation for speculation control , 1998, ISCA.
[9] Kunle Olukotun,et al. Maximizing CMP throughput with mediocre cores , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[10] E. Smith,et al. Selective Dual Path Execution , 1996 .
[11] Ravi Rajwar,et al. The impact of performance asymmetry in emerging multicore architectures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[12] Josep Llosa,et al. Out-of-order commit processors , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[13] Kunle Olukotun,et al. The common case transactional behavior of multithreaded programs , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..
[14] Josep Llosa,et al. A case for resource-conscious out-of-order processors , 2004, IEEE Computer Architecture Letters.
[15] Wei Liu,et al. ReSlice: selective re-execution of long-retired misspeculated instructions using forward slicing , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[16] Dirk Grunwald,et al. Pipeline gating: speculation control for energy reduction , 1998, ISCA.
[17] Haitham Akkary,et al. An analysis of a resource efficient checkpoint architecture , 2004, TACO.
[18] James E. Smith,et al. Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[19] James R. Goodman,et al. Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, MICRO.
[20] Haitham Akkary,et al. Reducing branch misprediction penalty via selective branch recovery , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[21] David J. Sager,et al. The microarchitecture of the Pentium 4 processor , 2001 .
[22] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[23] Santosh G. Abraham,et al. Chip multithreading: opportunities and challenges , 2005, 11th International Symposium on High-Performance Computer Architecture.
[24] Pierre Michaud,et al. A case for (partially) TAgged GEometric history length branch prediction , 2006, J. Instr. Level Parallelism.
[25] J. T. Robinson,et al. On optimistic methods for concurrency control , 1979, TODS.
[26] Mikko H. Lipasti,et al. Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[27] Andrew R. Pleszkun,et al. Implementing Precise Interrupts in Pipelined Processors , 1988, IEEE Trans. Computers.
[28] Onur Mutlu,et al. Address-value delta (AVD) prediction: increasing the effectiveness of runahead execution by exploiting regular memory allocation patterns , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[29] Onur Mutlu,et al. Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[30] John Paul Shen,et al. Best of both latency and throughput , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..
[31] Kunle Olukotun,et al. Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[32] James A. Kahle,et al. The Cell Processor Architecture , 2005, MICRO.
[33] Andreas Moshovos,et al. Read-after-read memory dependence prediction , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[34] Haitham Akkary,et al. Continual flow pipelines , 2004, ASPLOS XI.
[35] Eric Rotenberg,et al. Assigning confidence to conditional branch predictions , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[36] Alan Jay Smith,et al. Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.
[37] Yale N. Patt,et al. Checkpoint Repair for High-Performance Out-of-Order Execution Machines , 1987, IEEE Transactions on Computers.
[38] Kunle Olukotun,et al. Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.
[39] Haitham Akkary,et al. Checkpoint processing and recovery: towards scalable large instruction window processors , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[40] Haitham Akkary,et al. Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors , 2003, MICRO.
[41] Onur Mutlu,et al. On Reusing the Results of Pre-Executed Instructions in a Runahead Execution Processor , 2005, IEEE Computer Architecture Letters.
[42] Mikko H. Lipasti,et al. Modern Processor Design: Fundamentals of Superscalar Processors , 2002 .
[43] Mateo Valero,et al. Toward kilo-instruction processors , 2004, TACO.
[44] Peng Zhou,et al. Fast branch misprediction recovery in out-of-order superscalar processors , 2005, ICS '05.
[45] Richard E. Kessler,et al. The Alpha 21264 microprocessor , 1999, IEEE Micro.
[46] Uri C. Weiser,et al. Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors , 2006, IEEE Computer Architecture Letters.
[47] Jose Renau,et al. CAVA: Using checkpoint-assisted value prediction to hide L2 misses , 2006, TACO.
[48] Stamatis Vassiliadis,et al. Register renaming and dynamic speculation: an alternative approach , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.
[49] Ramon Canal,et al. Reducing the complexity of the issue logic , 2001, ICS '01.
[50] Haitham Akkary,et al. Scalable Load and Store Processing in Latency-Tolerant Processors , 2005, IEEE Micro.