Partial resolution in branch target buffers
暂无分享,去创建一个
[1] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[2] Donald J. Hatfield,et al. Program Restructuring for Virtual Memory , 1971, IBM Syst. J..
[3] Domenico Ferrari,et al. Improving locality by critical working sets , 1974, CACM.
[4] Duncan H. Lawrie,et al. On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations , 1981, IEEE Transactions on Computers.
[5] Joseph A. Fisher,et al. Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.
[6] John Cocke,et al. Register Allocation Via Coloring , 1981, Comput. Lang..
[7] David Kroft,et al. Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.
[8] Thomas R. Gross,et al. Postpass Code Optimization of Pipeline Constraints , 1983, TOPL.
[9] Alan Jay Smith,et al. Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.
[10] S. McFarling,et al. Reducing the cost of branches , 1986, ISCA '86.
[11] Efficient instruction scheduling for a pipelined architecture , 1986, SIGPLAN Symposium on Compiler Construction.
[12] Joe D. Warren,et al. The program dependence graph and its use in optimization , 1987, TOPL.
[13] Mark D. Hill,et al. Aspects of Cache Memory and Instruction , 1987 .
[14] Anant Agarwal,et al. On-Chip Instruction Caches for High Performance Processors, , 1987 .
[15] Alan Jay Smith,et al. Aspects of cache memory and instruction buffer performance , 1987 .
[16] James E. Smith,et al. A study of scalar compilation techniques for pipelined supercomputers , 1987, ASPLOS.
[17] Monica Lam. Software pipelining: an effective scheduling technique for VLIW machines , 1988, ACM-SIGPLAN Symposium on Programming Language Design and Implementation.
[18] Michel Dubois,et al. Concurrent Miss Resolution in Multiprocessor Caches , 1988, ICPP.
[19] Stephen J. Hartley. Compile-Time Program Restructuring in Multiprogrammed Virtual Memory Systems , 1988, IEEE Trans. Software Eng..
[20] David Bernstein,et al. An Improved Approximation Algorithm for Scheduling Pipelined Machines , 1988, International Conference on Parallel Processing.
[21] J.P. Costello,et al. Design tradeoffs for a 40 MIPS (peak) CMOS 32-bit microprocessor , 1988, Proceedings 1988 IEEE International Conference on Computer Design: VLSI.
[22] Monica S. Lam,et al. RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .
[23] Yale N. Patt,et al. Hardware Support For Large Atomic Units in Dynamically Scheduled Machines , 1988, [1988] Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitecture - MICRO '21.
[24] Ken Kennedy,et al. Blocking Linear Algebra Codes for Memory Hierarchies , 1989, PPSC.
[25] W. W. Hwu,et al. Achieving high instruction cache performance with an optimizing compiler , 1989, ISCA '89.
[26] Peter Steenkiste,et al. A simple interprocedural register allocation algorithm and its effectiveness for LISP , 1989, TOPL.
[27] Alan Jay Smith,et al. Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.
[28] Rajiv Gupta,et al. Register allocation via clique separators , 1989, PLDI '89.
[29] David R. Stiles,et al. Pipeline control for a single cycle VLSI implementation of a complex instruction set computer , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.
[30] Charles N. Fischer,et al. On the Minimization of Loads/Stores in Local Register Allocation , 1989, IEEE Transactions on Software Engineering.
[31] Andrew R. Pleszkun,et al. Improving Performance Of Small On-chip Instruction Caches , 1989, The 16th Annual International Symposium on Computer Architecture.
[32] Paul Chow,et al. Mips-X RISC Microprocessor , 1989 .
[33] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .
[34] Karl Pettis,et al. Profile guided code positioning , 1990, PLDI '90.
[35] Rajiv Gupta,et al. Improving instruction cache behavior by reducing cache pollution , 1990, Proceedings SUPERCOMPUTING '90.
[36] John L. Hennessy,et al. The priority-based coloring approach to register allocation , 1990, TOPL.
[37] Rajiv Gupta,et al. Region Scheduling: An Approach for Detecting and Redistributing Parallelism , 1990, IEEE Trans. Software Eng..
[38] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[39] Steven A. Przybylski,et al. Cache and memory hierarchy design: a performance-directed approach , 1990 .
[40] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[41] Henry M. Levy,et al. An Architecture for Software-Controlled Data Prefetching , 1991, ISCA.
[42] Vivek Sarkar,et al. On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.
[43] Scott A. Mahlke,et al. Data access microarchitectures for superscalar processors with compiler-assisted data prefetching , 1991, MICRO 24.
[44] Michael Rodeh,et al. Global instruction scheduling for superscalar machines , 1991, PLDI '91.
[45] Scott McFarling,et al. Procedure merging with instruction caches , 1991, PLDI '91.
[46] Wen-mei W. Hwu,et al. IMPACT: an architectural framework for multiple-instruction-issue processors , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.
[47] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[48] Janak H. Patel,et al. Data prefetching in multiprocessor vector cache memories , 1991, ISCA '91.
[49] Gurindar S. Sohi,et al. High-bandwidth data memory systems for superscalar processors , 1991, ASPLOS IV.
[50] Michael D. Smith,et al. Tracing with Pixie , 1991 .
[51] Anoop Gupta,et al. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..
[52] Mark N. Wegman,et al. Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.
[53] Susan J. Eggers,et al. Integrating register allocation and instruction scheduling for RISCs , 1991, ASPLOS IV.
[54] Scott A. Mahlke,et al. Using profile information to assist classic code optimizations , 1991, Softw. Pract. Exp..
[55] Youfeng Wu. Ordering functions for improving memory reference locality in a shared memory multiprocessor system , 1992, MICRO 25.
[56] Guang R. Gao,et al. A Register Allocation Framework Based on Hierarchical Cyclic Interval Graphs , 1992, CC.
[57] Harvey G. Cragon,et al. Branch strategy taxonomy and performance models , 1991, IEEE computer society press monograph.
[58] Joseph T. Rahmeh,et al. Improving the accuracy of dynamic branch prediction using branch correlation , 1992, ASPLOS V.
[59] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[60] Thomas Martin Conte,et al. Systematic Computer Architecture Prototyping , 1992 .
[61] Scott Mahlke,et al. Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.
[62] Rajiv Gupta,et al. URSA: A Unified ReSource Allocator for Registers and Functional Units in VLIW Architectures , 1993, Architectures and Compilation Techniques for Fine and Medium Grain Parallelism.
[63] Bantwal R. Rau. Dynamically scheduled VLIW processors , 1993, MICRO 1993.
[64] Dionisios N. Pnevmatikatos,et al. Cache performance of the SPEC92 benchmark suite , 1993, IEEE Micro.
[65] Ken Kennedy,et al. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.
[66] Chris H. Perleberg,et al. Branch Target Buffer Design and Optimization , 1993, IEEE Trans. Computers.
[67] Mikko H. Lipasti,et al. Architecture-compatible code boosting for performance enhancement of the IBM RS/6000 , 1993, Proceedings of 1993 IEEE International Conference on Computer Design ICCD'93.
[68] Brian N. Bershad,et al. The impact of operating system structure on memory system performance , 1994, SOSP '93.
[69] Yale N. Patt,et al. A Comparison Of Dynamic Branch Predictors That Use Two Levels Of Branch History , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[70] Mary Jean Harrold,et al. Load/store range analysis for global register allocation , 1993, PLDI '93.
[71] Kemal Ebcioglu,et al. An architectural framework for supporting heterogeneous instruction-set architectures , 1993, Computer.
[72] Predictability of load/store instruction latencies , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.
[73] Shlomit S. Pinter,et al. Compile time instruction cache optimizations , 1994, CARN.
[74] Rajiv Gupta,et al. Resource Spackling: A Framework for Integrating Register Allocation in Local and Global Schedulers , 1994, IFIP PACT.
[75] Chau-Wen Tseng,et al. Compiler optimizations for improving data locality , 1994, ASPLOS VI.
[76] Todd C. Mowry,et al. Tolerating latency through software-controlled data prefetching , 1994 .
[77] Dirk Grunwald,et al. Reducing branch costs via branch alignment , 1994, ASPLOS VI.
[78] Manoj Franklin,et al. A fill-unit approach to multiple instruction issue , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.
[79] Apostolos Dollas,et al. Predicting and precluding problems with memory latency , 1994, IEEE Micro.
[80] Randall R. Heisch. Trace-directed program restructuring for AIX executables , 1994, IBM J. Res. Dev..
[81] Jean-Loup Baer,et al. A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.
[82] D. Grunwald,et al. Fast & Accurate Instruction Fetch and Branch Prediction , 1994 .
[83] Lori L. Pollock,et al. Register allocation over the program dependence graph , 1994, PLDI '94.
[84] Ann Marie Grizzaffi Maynard,et al. Contrasting characteristics and cache performance of technical and multi-user commercial workloads , 1994, ASPLOS VI.
[85] Dirk Grunwald,et al. Fast and accurate instruction fetch and branch prediction , 1994, ISCA '94.
[86] Dawson R. Engler,et al. DCG: an efficient, retargetable dynamic code generation system , 1994, ASPLOS VI.
[87] Mauricio J. Serrano,et al. The impact of unresolved branches on branch prediction scheme performance , 1994, ISCA '94.
[88] David Keppel,et al. Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.
[89] Yale N. Patt,et al. Alternative implementations of hybrid branch predictors , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[90] David Bernstein,et al. Compiler techniques for data prefetching on the PowerPC , 1995, PACT.
[91] B. Fagin. Partial Resolution in Branch Target Buffers , 1997, IEEE Trans. Computers.