A co-designed virtual machine for instruction-level distributed processing
暂无分享,去创建一个
[1] Mikko H. Lipasti,et al. Modern Processor Design: Fundamentals of Superscalar Processors , 2002 .
[2] Manoj Franklin,et al. PEWs: a decentralized dynamic scheduler for ILP processing , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.
[3] Erik R. Altman,et al. Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[4] Jaume Abella,et al. Power- and Complexity-Aware Issue Queue Designs , 2003, IEEE Micro.
[5] James E. Smith,et al. Complexity-Effective Superscalar Processors , 1997, ISCA.
[6] J. E. Thornton. Design of a Computer: The Control Data 6600 , 1970 .
[7] Gurindar S. Sohi,et al. Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors , 1992, MICRO 1992.
[8] James E. Smith,et al. Using dynamic binary translation to fuse dependent instructions , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[9] Andreas Moshovos,et al. Streamlining inter-operation memory communication via data dependence prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[10] B. Miller,et al. Dynamic Kernel I-Cache Optimization , 1998 .
[11] Rastislav Bodík,et al. Focusing processor policies via critical-path prediction , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.
[12] Kenneth C. Yeager. The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.
[13] Ramon Canal,et al. Dynamic cluster assignment mechanisms , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[14] B. Calder,et al. A scalable front-end architecture for fast instruction delivery , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).
[15] David Gregg,et al. The Behavior of Efficient Virtual Machine Interpreters on Modern Architectures , 2001, Euro-Par.
[16] Gurindar S. Sohi,et al. ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.
[17] Dirk Grunwald,et al. Reducing indirect function call overhead in C++ programs , 1994, POPL '94.
[18] Trevor N. Mudge,et al. Virtual memory in contemporary microprocessors , 1998, IEEE Micro.
[19] Peter G. Sassone,et al. Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[20] Jeffrey Dean,et al. ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[21] Enric Morancho,et al. Recovery mechanism for latency misprediction , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[22] Norman P. Jouppi,et al. The multicluster architecture: reducing cycle time through partitioning , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[23] Anne Rogers,et al. The performance impact of incomplete bypassing in processor pipelines , 1995, MICRO 1995.
[24] Andreas Moshovos,et al. Dynamic Speculation and Synchronization of Data Dependences , 1997, ISCA.
[25] Ramon Canal,et al. A low-complexity issue logic , 2000, ICS '00.
[26] Tulika Mitra,et al. Improving Superscalar Instruction Dispatch and Issue by Exploiting Dynamic Code Sequences , 1997, ISCA.
[27] Pierre Michaud,et al. Data-flow prescheduling for large instruction windows in out-of-order processors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[28] M. Gschwind,et al. On Achieving Precise Exceptions Semantics in Dynamic Optimization , 2000 .
[29] Gurindar S. Sohi,et al. Use-based register caching with decoupled indexing , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[30] Cristina Cifuentes,et al. Dynamic Binary Translation , 2000 .
[31] Yun Wang,et al. IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems , 2003, MICRO.
[32] Daniel A. Jiménez,et al. The impact of delay on the design of branch predictors , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.
[33] James E. Smith,et al. Characterizing computer performance with a single number , 1988, CACM.
[34] James E. Smith,et al. Dynamic instruction scheduling and the Astronautics ZS-1 , 1989, Computer.
[35] John Yates,et al. FX!32 a profile-directed binary translator , 1998, IEEE Micro.
[36] Trevor N. Mudge,et al. Power: A First-Class Architectural Design Constraint , 2001, Computer.
[37] Chris Wilkerson,et al. Hierarchical scheduling windows , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[38] Ruben W. Castelino,et al. Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad-issue CMOS RISC Microprocessor , 1995, Digit. Tech. J..
[39] Richard Phelan. Improving ARM Code Density and Performance , 2003 .
[40] Mateo Valero,et al. Trace cache redundancy: red and blue traces , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[41] Kim Hazelwood,et al. Generational cache management of code traces in dynamic optimization systems , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[42] David Keppel,et al. Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.
[43] Scott A. Mahlke,et al. The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.
[44] 裕幸 飯田,et al. International Technology Roadmap for Semiconductors 2003の要求清浄度について - シリコンウエハ表面と雰囲気環境に要求される清浄度, 分析方法の現状について - , 2004 .
[45] Bich C. Le,et al. An out-of-order execution technique for runtime binary translators , 1998, ASPLOS VIII.
[46] Manoj Franklin,et al. A fill-unit approach to multiple instruction issue , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.
[47] Ronak Singhal,et al. Performance Analysis and Validation of the Intel Pentium 4 Processor on 90nm Technology , 2004 .
[48] André Seznec,et al. Effective ahead pipelining of instruction block address generation , 2003, ISCA '03.
[49] Doug Matzke,et al. Will Physical Scalability Sabotage Performance Gains? , 1997, Computer.
[50] James E. Smith,et al. Dynamic binary translation for accumulator-oriented architectures , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[51] S. Tomita,et al. A high-speed dynamic instruction scheduling scheme for supersealar processors , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[52] Robert C. Bedichek. Talisman: fast and accurate multicomputer simulation , 1995, SIGMETRICS '95/PERFORMANCE '95.
[53] Mendel Rosenblum,et al. Embra: fast and flexible machine simulation , 1996, SIGMETRICS '96.
[54] Narayanan Vijaykrishnan,et al. Exploring Wakeup-Free Instruction Scheduling , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[55] Gurindar S. Sohi,et al. A programmable co-processor for profiling , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[56] Norman P. Jouppi,et al. CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.
[57] James E. Smith,et al. Relational profiling: enabling thread-level parallelism in virtual machines , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.
[58] Paolo Faraboschi,et al. DELI: a new run-time control point , 2002, MICRO.
[59] Dirk Grunwald,et al. Fast and accurate instruction fetch and branch prediction , 1994, ISCA '94.
[60] Michael Franz,et al. Continuous Program Optimization: Design and Evaluation , 2001, IEEE Trans. Computers.
[61] D. Marr,et al. Hyper-Threading Technology Architecture and MIcroarchitecture , 2002 .
[62] William J. Dally,et al. Register organization for media processing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[63] John Whaley. Partial method compilation using dynamic profile information , 2001, OOPSLA '01.
[64] John Paul Shen,et al. Instruction path coprocessors , 2000, ISCA '00.
[65] Sanjay J. Patel,et al. Performance characterization of a hardware mechanism for dynamic optimization , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[66] Norman P. Jouppi,et al. Register file design considerations in dynamically scheduled processors , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[67] Mayan Moudgill,et al. Environment for PowerPC microarchitecture exploration , 1999, IEEE Micro.
[68] Vasanth Bala,et al. Transparent Dynamic Optimization: The Design and Implementation of Dynamo , 1999 .
[69] Kemal Ebcioglu,et al. An architectural framework for supporting heterogeneous instruction-set architectures , 1993, Computer.
[70] Haitham Akkary,et al. A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[71] Mateo Valero,et al. The effect of code reordering on branch prediction , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).
[72] Tong Li,et al. A large, fast instruction window for tolerating cache misses , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[73] Wen-mei W. Hwu,et al. A hardware mechanism for dynamic extraction and relayout of program hot spots , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[74] D.R. Kaeli,et al. Branch history table prediction of moving target branches due to subroutine returns , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.
[75] Krste Asanovic,et al. Banked multiported register files for high-frequency superscalar microprocessors , 2003, ISCA '03.
[76] Henry Hoffmann,et al. Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[77] Scott Devine,et al. Using the SimOS machine simulator to study complex computer systems , 1997, TOMC.
[78] Wen-mei W. Hwu,et al. Code reordering and speculation support for dynamic optimization systems , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[79] Rajeev Balasubramonian,et al. Reducing the complexity of the register file in dynamic superscalar processors , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[80] Mikko H. Lipasti,et al. Precise and Accurate Processor Simulation , 2002 .
[81] James R. Bell,et al. Threaded code , 1973, CACM.
[82] Ramon Canal,et al. A cost-effective clustered architecture , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).
[83] Cindy Zheng,et al. PA-RISC to IA-64: Transparent Execution, No Recompilation , 2000, Computer.
[84] Mikko H. Lipasti,et al. Understanding scheduling replay schemes , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[85] Raymond J. Hookway,et al. DIGITAL FX!32: Combining Emulation and Binary Translation , 1997, Digit. Tech. J..
[86] Avi Mendelson,et al. Filtering techniques to improve trace-cache efficiency , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[87] Paul Klint,et al. Interpretation Techniques , 1981, Softw. Pract. Exp..
[88] Pradip Bose,et al. Microarchitecture-Level Power-Performance Simulators: Modeling, Validation, and Impact on Design , 2003 .
[89] Jack W. Davidson,et al. Strata: A Software Dynamic Translation Infrastructure , 2001 .
[90] Vivek Sarkar,et al. Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.
[91] Yale N. Patt,et al. Putting the fill unit to work: dynamic optimizations for trace cache microprocessors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[92] Michael Gschwind,et al. Dynamic and Transparent Binary Translation , 2000, Computer.
[93] Richard E. Kessler,et al. The Alpha 21264 microprocessor , 1999, IEEE Micro.
[94] R. D. Barnes,et al. An Architectural Framework for Run-Time Optimization , 2001 .
[95] Gurindar S. Sohi,et al. Speculative Multithreaded Processors , 2001, Computer.
[96] T. Austin,et al. Cyclone: a broadcast-free dynamic instruction scheduler with selective replay , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..
[97] Balaram Sinharoy,et al. POWER4 system microarchitecture , 2002, IBM J. Res. Dev..
[98] Vivek Sarkar,et al. Baring It All to Software: Raw Machines , 1997, Computer.
[99] G.E. Moore,et al. Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.
[100] Mikko H. Lipasti,et al. Macro-op Scheduling: Relaxing Scheduling Loop Constraints , 2003, MICRO.
[101] Keith Diefendorff. K7 Challenges Intel: 10/26/98 , 1998 .
[102] Gurindar S. Sohi,et al. Characterizing and predicting value degree of use , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[103] Michael Gschwind,et al. Dynamic Binary Translation and Optimization , 2001, IEEE Trans. Computers.
[104] Mikko H. Lipasti,et al. Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[105] Robert S. Cohn,et al. Hot cold optimization of large Windows/NT applications , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[106] James E. Smith,et al. Rapid profiling via stratified sampling , 2001, ISCA 2001.
[107] Bradley C. Kuszmaul,et al. Circuits for wide-window superscalar processors , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[108] Woody Lichtenstein,et al. The multiflow trace scheduling compiler , 1993, The Journal of Supercomputing.
[109] M.J. Flynn,et al. Deep submicron microprocessor design issues , 1999, IEEE Micro.
[110] James E. Smith,et al. Optimal Pipelining in Supercomputers , 1986, ISCA.
[111] Mikko H. Lipasti,et al. Half-price architecture , 2003, ISCA '03.
[112] Rastislav Bodík,et al. Slack: maximizing performance under technological constraints , 2002, ISCA.
[113] Andrew R. Pleszkun,et al. Implementing Precise Interrupts in Pipelined Processors , 1988, IEEE Trans. Computers.
[114] Evelyn Duesterwald,et al. Design and implementation of a dynamic optimization framework for windows , 2000 .
[115] Luiz André Barroso,et al. Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[116] Olivier Temam,et al. MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[117] James E. Smith,et al. The microarchitecture of superscalar processors , 1995, Proc. IEEE.
[118] Quinn Jacobson,et al. Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[119] M. Merten,et al. A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).
[120] John Paul Shen,et al. Scalable Register Renaming via the Quack Register File , 2000 .
[121] Eric Sprangle,et al. Increasing processor performance by implementing deeper pipelines , 2002, ISCA.
[122] Ken Mai,et al. The future of wires , 2001, Proc. IEEE.
[123] M. K. Gschwind,et al. Method and apparatus for determining branch addresses in programs generated by binary translation , 1998 .
[124] Peter S. Magnusson,et al. A Compact Intermediate Format for SimICS , 1994 .
[125] Pascal Sainrat,et al. Multiple-block ahead branch predictors , 1996, ASPLOS VII.
[126] Takeo Asakawa,et al. Microarchitecture and performance analysis of a SPARC-V9 microprocessor for enterprise server systems , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[127] Vinod K. Agarwal,et al. The Effect of Technology Scaling on Microarchitectural Structures , 2000 .
[128] Stéphan Jourdan,et al. Speculation techniques for improving load related instruction scheduling , 1999, ISCA.
[129] Mateo Valero,et al. Multiple-banked register file architectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[130] Yale N. Patt,et al. On pipelining dynamic instruction scheduling logic , 2000, MICRO 33.
[131] Michael J. Flynn,et al. Optimal Pipelining , 1990, J. Parallel Distributed Comput..
[132] H. B. Bakoglu,et al. The IBM RISC System/6000 Processor: Hardware Overview , 1990, IBM J. Res. Dev..
[133] Derek Bruening,et al. An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[134] Todd M. Austin,et al. Efficient dynamic scheduling through tag elimination , 2002, ISCA.
[135] Steven K. Reinhardt,et al. A scalable instruction queue design using dependence chains , 2002, ISCA.
[136] Venkatesh Akella,et al. Synchroscalar: a multiple clock domain, power-aware, tile-based embedded processor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[137] Laurie J. Hendren,et al. Dynamic profiling and trace cache generation , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[138] Carlo H. Séquin,et al. Design Considerations for Single-Chip Computers of the Future , 1980, IEEE Transactions on Computers.
[139] James E. Smith,et al. Instruction Level Distributed Processing , 2000, HiPC.
[140] Sheldon B. Levenstein,et al. Architecture, design, and performance of Application System/400 (AS/400) multiprocessors , 1992, IBM J. Res. Dev..
[141] Jack W. Davidson,et al. Profile guided code positioning , 1990, SIGP.
[142] G.E. Moore,et al. No exponential is forever: but "Forever" can be delayed! [semiconductor industry] , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..
[143] Michael D. Smith,et al. Code cache management schemes for dynamic optimizers , 2002, Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures.
[144] Guang R. Gao,et al. An investigation of the performance of various instruction-issue buffer topologies , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[145] James E. Smith,et al. PowerPC 601 and Alpha 21064: a tale of two RISCs , 1994, Computer.
[146] James E. Smith,et al. The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[147] D. Grunwald,et al. Fast & Accurate Instruction Fetch and Branch Prediction , 1994 .
[148] Michael Gschwind,et al. Optimizations and oracle parallelism with dynamic translation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[149] Mark D. Hill,et al. Multiprocessors Should Support Simple Memory-Consistency Models , 1998, Computer.
[150] J.E. Smith,et al. Achieving high performance via co-designed virtual machines , 1998, Innovative Architecture for Future Generation High-Performance Processors and Systems.
[151] A. Klaiber. The Technology Behind Crusoe TM Processors Low-power x 86-Compatible Processors Implemented with Code Morphing , 2000 .
[152] David J. Sager,et al. The microarchitecture of the Pentium 4 processor , 2001 .
[153] John L. Henning. SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.
[154] James E. Smith,et al. Hardware support for control transfers in code caches , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[155] Thomas R. Puzak,et al. Optimum power/performance pipeline depth , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[156] R. M. Tomasulo,et al. An efficient algorithm for exploiting multiple arithmetic units , 1995 .
[157] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[158] Stéphan Jourdan,et al. A novel renaming scheme to exploit value temporal locality through physical register reuse and unification , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[159] M.A. Horowitz,et al. Speed and power scaling of SRAM's , 2000, IEEE Journal of Solid-State Circuits.
[160] Eric Rotenberg,et al. Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[161] Steven S. Muchnick,et al. Advanced Compiler Design and Implementation , 1997 .
[162] Trent Jaeger,et al. An unconventional proposal: using the x86 architecture as the ubiquitous virtual standard architecture , 1998, EW 8.
[163] Kunle Olukotun,et al. Designing High Bandwidth On-Chip Caches , 1997, ISCA.
[164] R. Balasubramonian,et al. Dynamically managing the communication-parallelism trade-off in future clustered processors , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..
[165] Guang R. Gao,et al. Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures , 2003, IEEE Trans. Computers.
[166] Doug Burger,et al. Measuring Experimental Error in Microprocessor Simulation , 2001, ISCA 2001.
[167] Burzin A. Patel,et al. Optimization of instruction fetch mechanisms for high issue rates , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[168] R. D. Valentine,et al. The Intel Pentium M processor: Microarchitecture and performance , 2003 .
[169] Vasanth Bala,et al. Software Profiling for Hot Path Prediction: Less is More , 2000, ASPLOS.
[170] Joel S. Emer,et al. Loose loops sink chips , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.
[171] Bruce Jacob,et al. Concurrency, latency, or system overhead: Which has the largest impact on uniprocessor DRAM-system performance? , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.
[172] Gurindar S. Sohi,et al. Dynamic dead-instruction detection and elimination , 2002, ASPLOS X.
[173] James E. Smith,et al. A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[174] Pat Conway,et al. The AMD Opteron Processor for Multiprocessor Servers , 2003, IEEE Micro.
[175] Matthew Arnold,et al. Adaptive optimization in the Jalapeño JVM , 2000, OOPSLA '00.
[176] Mateo Valero,et al. Delaying physical register allocation through virtual-physical registers , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[177] Brian N. Bershad,et al. Execution characteristics of desktop applications on Windows NT , 1998, ISCA.
[178] Gurindar S. Sohi,et al. An empirical analysis of instruction repetition , 1998, ASPLOS VIII.
[179] Sumedh W. Sathaye,et al. Dynamic rescheduling: a technique for object code compatibility in VLIW architectures , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[180] Neil C. Wilhelm,et al. Caching processor general registers , 1995, Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors.
[181] Milo M. K. Martin,et al. Exploiting dead value information , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[182] R. Bedicheck. Some efficient architecture simulation tech-niques , 1990 .
[183] T. Puzak,et al. The optimum pipeline depth for a microprocessor , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[184] Mateo Valero,et al. Fetching instruction streams , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[185] Erik R. Altman,et al. BOA: The Architecture of a Binary Translation Processor , 1999 .
[186] Vikram S. Adve,et al. LLVA: a low-level virtual instruction set architecture , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[187] Richard Johnson,et al. The Transmeta Code Morphing/spl trade/ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[188] Fischer. Issue Logic For A 600 MHz Out-of-order Execution , 1997, Symposium 1997 on VLSI Circuits.
[189] Daniel H. Friendly,et al. Evaluation of Design Options for the Trace Cache Fetch Mechanism , 1999, IEEE Trans. Computers.
[190] J. M. Codina,et al. Instruction replication for clustered microarchitectures , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[191] Quinn Jacobson,et al. Instruction pre-processing in trace processors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[192] Vikas Agarwal,et al. Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[193] John Paul Shen,et al. Parallel cachelets , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.
[194] R. Nagarajan,et al. A design space evaluation of grid processor architectures , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[195] Nader Bagherzadeh,et al. A scalable register file architecture for dynamically scheduled processors , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.
[196] T. N. Vijaykumar,et al. Reducing register ports for higher speed and lower energy , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[197] Jan M. Van Campenhout,et al. Interpretation and instruction path coprocessing , 1990, Computer systems.
[198] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[199] Yale N. Patt,et al. Select-free instruction scheduling logic , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[200] Sorin Lerner,et al. Mojo: A Dynamic Optimization System , 2000 .
[201] K. Ebcioglu,et al. Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[202] Andreas Moshovos,et al. Memory dependence speculation tradeoffs in centralized, continuous-window superscalar processors , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[203] Yun Wang,et al. IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium/spl reg/-based systems , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[204] Rolf Ernst,et al. Codesign of Embedded Systems: Status and Trends , 1998, IEEE Des. Test Comput..
[205] Haitham Akkary,et al. Checkpoint processing and recovery: towards scalable large instruction window processors , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[206] Mateo Valero,et al. Software Trace Cache , 2014, IEEE Transactions on Computers.
[207] Mike Johnson,et al. Superscalar microprocessor design , 1991, Prentice Hall series in innovative technology.
[208] S SohiGurindar. Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers , 1990 .
[209] Stephen H. Gunther,et al. Managing the Impact of Increasing Microprocessor Power Consumption , 2001 .
[210] James E. Smith,et al. Instruction Issue Logic in Pipelined Supercomputers , 1984, IEEE Trans. Computers.
[211] Philippe Roussel,et al. The microarchitecture of the intel pentium 4 processor on 90nm technology , 2004 .
[212] Shekhar Y. Borkar,et al. Design challenges of technology scaling , 1999, IEEE Micro.
[213] Ho-Seop Kim,et al. An instruction set and microarchitecture for instruction level distributed processing , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[214] Trevor N. Mudge,et al. Integrating superscalar processor components to implement register caching , 2001, ICS '01.
[215] Ravi Nair,et al. Exploiting Instruction Level Parallelism in Processors by Caching Scheduled Groups , 1997, ISCA.
[216] Yale N. Patt,et al. Partitioned first-level cache design for clustered microarchitectures , 2003, ICS '03.
[217] Gurindar S. Sohi,et al. Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.