Using value prediction to increase the power of speculative execution hardware
暂无分享,去创建一个
[1] Mikko H. Lipasti,et al. Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[2] Shlomit S. Pinter,et al. Tango: a hardware-based data prefetching technique for superscalar processors , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[3] John R. Ellis,et al. Bulldog: A Compiler for VLIW Architectures , 1986 .
[4] Todd M. Austin,et al. Zero-cycle loads: microarchitecture support for reducing load latency , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[5] Janak H. Patel,et al. Stride directed prefetching in scalar processors , 1992, MICRO.
[6] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[7] Andrew R. Pleszkun,et al. Implementing Precise Interrupts in Pipelined Processors , 1988, IEEE Trans. Computers.
[8] Stamatis Vassiliadis,et al. A load-instruction unit for pipelined processors , 1993, IBM J. Res. Dev..
[9] A. Krishnamoorthy,et al. Implementation trade-offs in using a restricted data flow architecture in a high performance RISC microprocessor , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[10] Yale N. Patt,et al. Improving branch prediction accuracy by reducing pattern history table interference , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.
[11] S. McFarling,et al. Reducing the cost of branches , 1986, ISCA '86.
[12] Doug Hunt,et al. Advanced performance features of the 64-bit PA-8000 , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.
[13] Yale N. Patt,et al. A Comparison Of Dynamic Branch Predictors That Use Two Levels Of Branch History , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[14] Norman P. Jouppi,et al. Complexity/performance tradeoffs with non-blocking loads , 1994, ISCA '94.
[15] Avi Mendelson,et al. Can program profiling support value prediction? , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[16] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.
[17] Mikko H. Lipasti,et al. Value locality and load value prediction , 1996, ASPLOS VII.
[18] M. Bergman,et al. "Introduction to nMOS and cMOS VLSI Systems Design" by Amar Mukherjee, from: Prentice-Hall, Englewood Cliffs, NJ 07632, U.S.A , 1986, Integr..
[19] Dionisios N. Pnevmatikatos,et al. Streamlining data cache access with fast address calculation , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[20] Mike Johnson,et al. Superscalar microprocessor design , 1991, Prentice Hall series in innovative technology.
[21] Avi Mendelson,et al. The effect of instruction fetch bandwidth on value prediction , 1998, ISCA.
[22] Yvon Jégou,et al. Speculative prefetching , 1993, ICS '93.
[23] Robert M. Keller,et al. Look-Ahead Processors , 1975, CSUR.
[24] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[25] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[26] John H. Edmondson,et al. Superscalar instruction execution in the 21164 Alpha microprocessor , 1995, IEEE Micro.
[27] Alan E. Charlesworth,et al. An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family , 1981, Computer.
[28] David Bernstein,et al. Compiler techniques for data prefetching on the PowerPC , 1995, PACT.
[29] Jean-Loup Baer,et al. A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.
[30] Monica S. Lam,et al. RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .
[31] Trevor Mudge,et al. Hardware support for hiding cache latency , 1993 .
[32] David R. Ditzel,et al. Branch folding in the CRISP microprocessor: reducing branch delay to zero , 1987, ISCA '87.
[33] Andreas Moshovos,et al. A Dynamic Approach to Improve the Accuracy of Data Speculation , 1996 .
[34] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.
[35] F. Gabbay. Speculative Execution based on Value Prediction Research Proposal towards the Degree of Doctor of Sciences , 1996 .
[36] B. Ramakrishna Rau,et al. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.
[37] James E. Smith,et al. A study of scalar compilation techniques for pipelined supercomputers , 1987, ASPLOS.
[38] Joseph Allen Fisher,et al. The Optimization of Horizontal Microcode within and Beyond Basic Blocks: an Application of Processor Scheduling with Resources , 2018 .
[39] Alan Jay Smith,et al. Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.
[40] John R. Ellis,et al. Bulldog: a compiler for vliw architectures (parallel computing, reduced-instruction-set, trace scheduling, scientific) , 1985 .
[41] K. Mani Chandy,et al. A comparison of list schedules for parallel processing systems , 1974, Commun. ACM.
[42] José González,et al. Memory Address Prediction for Data Speculation , 1997, Euro-Par.
[43] Jean-Loup Baer,et al. Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.
[44] Joseph A. Fisher,et al. Predicting conditional branch directions from previous runs of a program , 1992, ASPLOS V.
[45] S. Vassiliadis,et al. SCISM: A scalable compound instruction set machine , 1994, IBM J. Res. Dev..
[46] James E. Smith,et al. The performance potential of data dependence speculation and collapsing , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[47] Janak H. Patel,et al. Stride directed prefetching in scalar processors , 1992, MICRO 1992.
[48] Paul L. Hazan. Computing and the Handicapped , 1981, Computer.
[49] Bruce D. Shriver,et al. Some Experiments in Local Microcode Compaction for Horizontal Machines , 1981, IEEE Transactions on Computers.
[50] Alexander Aiken,et al. Perfect Pipelining: A New Loop Parallelization Technique , 1988, ESOP.
[51] James E. Smith,et al. A study of branch prediction strategies , 1981, ISCA '98.
[52] Gurindar S. Sohi,et al. ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.
[53] Trung A. Diep,et al. Performance evaluation of the PowerPC 620 microarchitecture , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[54] Bob Blainey,et al. Instruction scheduling in the TOBEY compiler , 1994, IBM J. Res. Dev..
[55] R. M. Tomasulo,et al. An efficient algorithm for exploiting multiple arithmetic units , 1995 .
[56] José González,et al. Speculative execution via address prediction and data prefetching , 1997, ICS '97.
[57] Gurindar S. Sohi,et al. Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[58] Alexandru Nicolau,et al. Run-Time Disambiguation: Coping with Statically Unpredictable Dependencies , 1989, IEEE Trans. Computers.
[59] Yale N. Patt,et al. Alternative Implementations of Two-Level Adaptive Branch Prediction , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.
[60] Monica Sin-Ling Lam,et al. A Systolic Array Optimizing Compiler , 1989 .
[61] Yale N. Patt,et al. A comparison of dynamic branch predictors that use two levels of branch history , 1993, ISCA '93.