Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques
暂无分享,去创建一个
Margaret Martonosi | Kevin Skadron | Douglas W. Clark | Pritpal S. Ahuja | M. Martonosi | D. Clark | K. Skadron
[1] Richard Johnson,et al. Analysis techniques for predicated code , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[2] David A. Wood,et al. A model for estimating trace-sample miss ratios , 1991, SIGMETRICS '91.
[3] Yale N. Patt,et al. Alternative implementations of hybrid branch predictors , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[4] Norman P. Jouppi,et al. Available instruction-level parallelism for superscalar and superpipelined machines , 1989, ASPLOS III.
[5] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[6] Doug Burger,et al. Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .
[7] Michael D. Smith,et al. Improving the accuracy of static branch prediction using branch correlation , 1994, ASPLOS VI.
[8] Yale N. Patt,et al. The agree predictor: a mechanism for reducing negative branch history interference , 1997, ISCA '97.
[9] Yale N. Patt,et al. A comparison of dynamic branch predictors that use two levels of branch history , 1993, ISCA '93.
[10] Wei-Chung Hsu,et al. Data Prefetching On The HP PA-8000 , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[11] David I. August,et al. Architectural support for compiler-synthesized dynamic branch prediction strategies: Rationale and initial results , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[12] Yale N. Patt,et al. The effect of speculatively updating branch history on branch prediction accuracy, revisited , 1994, MICRO 27.
[13] Anoop Gupta,et al. Working sets, cache sizes, and node granularity issues for large-scale multiprocessors , 1993, ISCA '93.
[14] Joseph A. Fisher,et al. Predicting conditional branch directions from previous runs of a program , 1992, ASPLOS V.
[15] Norman P. Jouppi,et al. Available instruction-level parallelism for superscalar and superpipelined machines , 1989, ASPLOS 1989.
[16] Richard E. Kessler,et al. The Alpha 21264 microprocessor architecture , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).
[17] Margaret Martonosi,et al. Effectiveness of trace sampling for performance debugging tools , 1993, SIGMETRICS '93.
[18] Scott A. Mahlke,et al. Compiler synthesized dynamic branch prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[19] Ran Ginosar,et al. Kin: a high performance asynchronous processor architecture , 1998, ICS '98.
[20] Margaret Martonosi,et al. Selecting a Single, Representative Sample for Accurate Simulation of SPECint Benchmarks , 1999 .
[21] Yale N. Patt,et al. A Comparison Of Dynamic Branch Predictors That Use Two Levels Of Branch History , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[22] Margaret Martonosi,et al. Multipath execution: opportunities and limits , 1998, ICS '98.
[23] S. McFarling. Combining Branch Predictors , 1993 .
[24] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .
[25] Gurindar S. Sohi,et al. Instruction issue logic for high-performance, interruptable pipelined processors , 1987, ISCA '87.
[26] Joseph T. Rahmeh,et al. Improving the accuracy of dynamic branch prediction using branch correlation , 1992, ASPLOS V.
[27] Janak H. Patel,et al. Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems , 1988, IEEE Trans. Computers.
[28] Wen-mei W. Hwu,et al. Run-time Adaptive Cache Hierarchy Via Reference Analysis , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[29] Yale N. Patt,et al. An analysis of correlation and predictability: what makes two-level branch predictors work , 1998, ISCA.
[30] Margaret Martonosi,et al. Improving prediction for procedure returns with return-address-stack repair mechanisms , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[31] Margaret Martonosi,et al. Speculative Updates of Local and Global Branch History: A Quantitative Analysis , 2000, J. Instr. Level Parallelism.
[32] Ann Marie Grizzaffi Maynard,et al. Contrasting characteristics and cache performance of technical and multi-user commercial workloads , 1994, ASPLOS VI.
[33] Wen-mei W. Hwu,et al. Run-Time Adaptive Cache Hierarchy Management via Reference Analysis , 1997, ISCA.
[34] N. Jouppi,et al. The Relative Importance of Memory Latency , Bandwidth , and Branch Limits toPerformanceNorman , 1997 .
[35] Monica S. Lam,et al. Limits of control flow on parallelism , 1992, ISCA '92.
[36] Trevor Mudge,et al. The role of adaptivity in two-level adaptive branch prediction , 1995, MICRO 1995.
[37] Margaret Martonosi,et al. Alloying Global and Local Branch History: Taxonomy, Performance, and Analysis , 1999 .
[38] B. Ramakrishna Rau,et al. The Cydra 5 departmental supercomputer: design philosophies, decisions, and trade-offs , 1989, Computer.
[39] Alvin R. Lebeck,et al. Load latency tolerance in dynamically scheduled processors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[40] Trevor N. Mudge,et al. Correlation and Aliasing in Dynamic Branch Predictors , 1996, ISCA.
[41] Trevor N. Mudge,et al. The bi-mode branch predictor , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[42] Gurindar S. Sohi,et al. Instruction issue logic for high-performance, interruptable pipelined processors , 1987, ISCA '98.
[43] Nicholas C. Gloy,et al. A Language For Describing Predictors And Its Application To Automatic Synthesis , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[44] A. Seznec,et al. Trading Conflict And Capacity Aliasing In Conditional Branch Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[45] Michael D. Smith,et al. A comparative analysis of schemes for correlated branch prediction , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[46] Karel Driesen,et al. Accurate indirect branch prediction , 1998, ISCA.
[47] Eric Rotenberg,et al. Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[48] David Kroft,et al. Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.
[49] P. Chow,et al. Memory-system Design Considerations For Dynamically-scheduled Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[50] Quinn Jacobson,et al. Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[51] Yale N. Patt,et al. Target prediction for indirect jumps , 1997, ISCA '97.
[52] Trevor N. Mudge,et al. The YAGS branch prediction scheme , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[53] Kevin Skadron,et al. Design issues and tradeoffs for write buffers , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[54] Michael D. Smith,et al. Limits on multiple instruction issue , 1989, ASPLOS III.
[55] Kenneth M. Wilson,et al. Designing High Bandwidth On-chip Caches , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[56] Brad Calder,et al. Threaded multiple path execution , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).
[57] D. Grunwald,et al. Fast & Accurate Instruction Fetch and Branch Prediction , 1994 .
[58] Trevor N. Mudge,et al. Wrong-path instruction prefetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[59] Yale N. Patt,et al. An effective programmable prefetch engine for on-chip caches , 1995, MICRO 1995.
[60] James E. Smith,et al. Path-based next trace prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[61] Dirk Grunwald,et al. Fast and accurate instruction fetch and branch prediction , 1994, ISCA '94.
[62] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[63] David A. Wood,et al. A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches , 1994, IEEE Trans. Computers.
[64] Dirk Grunwald,et al. Selective eager execution on the PolyPath architecture , 1998, ISCA.
[65] Dirk Grunwald,et al. Reducing indirect function call overhead in C++ programs , 1994, POPL '94.
[66] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.
[67] Chih-Chieh Lee,et al. Correlation and Aliasing in Dynamic Branch Predictors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).