Microarchitectural innovations: boosting microprocessor performance beyond semiconductor technology scaling
暂无分享,去创建一个
[1] Yale N. Patt,et al. A comprehensive instruction fetch mechanism for a processor supporting speculative execution , 1992, MICRO 25.
[2] David Kroft,et al. Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.
[3] Edward McLellan. The Alpha AXP architecture and 21064 processor , 1993, IEEE Micro.
[4] James E. Smith,et al. A study of branch prediction strategies , 1981, ISCA '98.
[5] Gurindar S. Sohi,et al. A programmable co-processor for profiling , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[6] M. Dubois,et al. Assisted Execution , 1998 .
[7] Joel S. Emer,et al. Memory dependence prediction using store sets , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).
[8] Farid N. Najm,et al. A gate-level leakage power reduction method for ultra-low-power CMOS circuits , 1997, Proceedings of CICC 97 - Custom Integrated Circuits Conference.
[9] Yale N. Patt,et al. Target prediction for indirect jumps , 1997, ISCA '97.
[10] Donald B. Alpert,et al. Architecture of the Pentium microprocessor , 1993, IEEE Micro.
[11] Andreas Moshovos,et al. Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.
[12] James E. Smith,et al. The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[13] K. Kavi. Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .
[14] Todd M. Austin,et al. DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[15] Gurindar S. Sohi,et al. Speculative data-driven multithreading , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[16] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[17] David W. Anderson,et al. The IBM System/360 model 91: machine philosophy and instruction-handling , 1967 .
[18] J. E. Thornton,et al. Parallel operation in the control data 6600 , 1964, AFIPS '64 (Fall, part II).
[19] Mikko H. Lipasti. Value locality and speculative execution , 1998 .
[20] Mario Nemirovsky,et al. Increasing superscalar performance through multistreaming , 1995, PACT.
[21] Yale N. Patt,et al. Simultaneous subordinate microthreading (SSMT) , 1999, ISCA.
[22] M. V. Wilkes. Abstracts of Current Computer Literature , 1965 .
[23] Craig Zilles,et al. Execution-based prediction using speculative slices , 2001, ISCA 2001.
[24] Trevor N. Mudge,et al. The YAGS branch prediction scheme , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[25] Yale N. Patt,et al. The agree predictor: a mechanism for reducing negative branch history interference , 1997, ISCA '97.
[26] Yale N. Patt,et al. Checkpoint Repair for High-Performance Out-of-Order Execution Machines , 1987, IEEE Transactions on Computers.
[27] Doug Matzke,et al. Will Physical Scalability Sabotage Performance Gains? , 1997, Computer.
[28] Kaushik Roy,et al. An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[29] Gurindar S. Sohi,et al. Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[30] David R. Kaeli,et al. Predicting indirect branches via data compression , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[31] Alan Jay Smith,et al. Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.
[32] Gregory F. Grohoski,et al. Machine Organization of the IBM RISC System/6000 Processor , 1990, IBM J. Res. Dev..
[33] Sanjay J. Patel,et al. Improving trace cache effectiveness with branch promotion and trace packing , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).
[34] S. McFarling. Combining Branch Predictors , 1993 .
[35] Vivek De,et al. Technology and design challenges for low power and high performance [microprocessors] , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).
[36] Richard E. Kessler,et al. Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[37] James D. Meindl,et al. Interconnect performance limits on gigascale integration (GSI) , 1995 .
[38] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[39] Michael D. Smith,et al. A comparative analysis of schemes for correlated branch prediction , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[40] N PattYale,et al. The agree predictor , 1997 .
[41] Andreas Moshovos,et al. Dynamic Speculation and Synchronization of Data Dependences , 1997, ISCA.
[42] Yale N. Patt,et al. Improving trace cache effectiveness with branch promotion and trace packing , 1998, ISCA.
[43] Karel Driesen,et al. Accurate indirect branch prediction , 1998, ISCA.
[44] R. M. Tomasulo,et al. An efficient algorithm for exploiting multiple arithmetic units , 1995 .
[45] Eric Rotenberg,et al. Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[46] Yale N. Patt,et al. HPS, a new microarchitecture: rationale and introduction , 1985, MICRO 18.
[47] Ruben W. Castelino,et al. Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad-issue CMOS RISC Microprocessor , 1995, Digit. Tech. J..
[48] Kenneth C. Yeager. The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.
[49] Marc Tremblay,et al. The MAJC Architecture: A Synthesis of Parallelism and Scalability , 2000, IEEE Micro.
[50] Andreas Moshovos,et al. Memory dependence prediction , 1998 .
[51] Doug Hunt,et al. Advanced performance features of the 64-bit PA-8000 , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.
[52] Andreas Moshovos,et al. Improving virtual function call target prediction via dependence-based pre-computation , 1999, ICS '99.
[53] Yale N. Patt,et al. HPSm, a high performance restricted data flow architecture having minimal functionality , 1986, ISCA '98.
[54] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[55] Eric Rotenberg,et al. AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).
[56] F. Gabbay. Speculative Execution based on Value Prediction Research Proposal towards the Degree of Doctor of Sciences , 1996 .
[57] Jean-Loup Baer,et al. A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.
[58] Gurindar S. Sohi,et al. A static power model for architects , 2000, MICRO 33.
[59] Gurindar S. Sohi,et al. Speculative versioning cache , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[60] André Seznec,et al. A case for two-way skewed-associative caches , 1993, ISCA '93.
[61] Gurindar S. Sohi,et al. ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.
[62] David A. Patterson,et al. Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .
[63] Richard E. Kessler,et al. The Alpha 21264 microprocessor architecture , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).
[64] Gurindar S. Sohi,et al. High-bandwidth data memory systems for superscalar processors , 1991, ASPLOS IV.
[65] Mikko H. Lipasti,et al. Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[66] Andrew R. Pleszkun,et al. Implementing Precise Interrupts in Pipelined Processors , 1988, IEEE Trans. Computers.
[67] Gurindar S. Sohi,et al. An empirical analysis of instruction repetition , 1998, ASPLOS VIII.
[68] Masato Edahiro,et al. A Single-Chip Multiprocessor for Smart Terminals , 2000, IEEE Micro.
[69] G.S. Sohi,et al. Dynamic instruction reuse , 1997, ISCA '97.
[70] Maurice V. Wilkes,et al. Slave Memories and Dynamic Storage Allocation , 1965, IEEE Trans. Electron. Comput..
[71] Peter M. Kogge,et al. The Architecture of Pipelined Computers , 1981 .
[72] S SohiGurindar. Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers , 1990 .
[73] Joseph T. Rahmeh,et al. Improving the accuracy of dynamic branch prediction using branch correlation , 1992, ASPLOS V.
[74] D.R. Kaeli,et al. Branch history table prediction of moving target branches due to subroutine returns , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.
[75] Dirk Grunwald,et al. Pipeline gating: speculation control for energy reduction , 1998, ISCA.