Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?
暂无分享,去创建一个
[1] Olivier Temam,et al. VHC: quickly building an optimizer for complex embedded architectures , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[2] Martin Hopkins,et al. Synergistic Processing in Cell's Multicore Architecture , 2006, IEEE Micro.
[3] Sanjay J. Patel,et al. Improving quasi-dynamic schedules through region slip , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[4] Kathryn S. McKinley,et al. Static placement, dynamic issue (SPDI) scheduling for EDGE architectures , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[5] Youngmoon Choi,et al. The next-generation 64b SPARC core in a T4 SoC processor , 2012, 2012 IEEE International Solid-State Circuits Conference.
[6] Erik R. Altman,et al. Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[7] Bantwal R. Rau. Dynamically scheduled VLIW processors , 1993, MICRO 1993.
[8] Kanad Ghose,et al. Incremental commit groups for non-atomic trace processing , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[9] Jack Doweck,et al. Inside Intel® Core microarchitecture , 2006, 2006 IEEE Hot Chips 18 Symposium (HCS).
[10] Óscar Palomar Pérez. Reusing cached schedules in an out-of-order processor with in-order issue logic , 2011 .
[11] Michael D. Smith,et al. Efficient superscalar performance through boosting , 1992, ASPLOS V.
[12] Yale N. Patt,et al. An investigation of the performance of various dynamic scheduling techniques , 1992, MICRO.
[13] Craig B. Zilles,et al. Hardware atomicity for reliable software speculation , 2007, ISCA '07.
[14] Michael J. Flynn,et al. Instruction-level parallel processors-dynamic and static scheduling tradeoffs , 1997, Proceedings of IEEE International Symposium on Parallel Algorithms Architecture Synthesis.
[15] Haitham Akkary,et al. Continual flow pipelines , 2004, ASPLOS XI.
[16] K. Ebcioglu,et al. Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[17] D. J. Lalja,et al. Reducing the branch penalty in pipelined processors , 1988, Computer.
[18] J. M. Codina,et al. SoftHV: a HW/SW co-designed processor with horizontal and vertical fusion , 2011, CF '11.
[19] Rastislav Bodík,et al. Focusing processor policies via critical-path prediction , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.
[20] Sanjay J. Patel,et al. Increasing the size of atomic instruction blocks using control flow assertions , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.
[21] Jeffrey R. Diamond,et al. An evaluation of the TRIPS computer system , 2009, ASPLOS.
[22] Eric M. Schwarz,et al. IBM POWER6 microarchitecture , 2007, IBM J. Res. Dev..
[23] Harsh Sharangpani,et al. Itanium Processor Microarchitecture , 2000, IEEE Micro.
[24] Steve Undy. Poulson: An 8 core 32 nm next generation Intel® Itanium® processor , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).
[25] Emil Talpes,et al. Execution cache-based microarchitecture for power-efficient superscalar processors , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[26] Ghassan Shobaki,et al. Optimal global instruction scheduling using enumeration , 2006 .
[27] Lizy Kurian John,et al. Low-power, low-complexity instruction issue using compiler assistance , 2005, ICS '05.
[28] Sanjay J. Patel,et al. rePLay: A Hardware Framework for Dynamic Optimization , 2001, IEEE Trans. Computers.
[29] Donald Yeung,et al. Design and evaluation of compiler algorithms for pre-execution , 2002, ASPLOS X.
[30] J. P. Grossman. Cheap out-of-order execution using delayed issue , 2000, Proceedings 2000 International Conference on Computer Design.
[31] Margaret Martonosi,et al. Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors , 1996, ISCA.
[32] Wayne Wolf,et al. Evaluation of Static and Dynamic Scheduling for Media Processors , 2000 .
[33] Harry F. Jordan,et al. An investigation of static versus dynamic scheduling , 1990, ISCA '90.
[34] Sanjay J. Patel,et al. Performance characterization of a hardware mechanism for dynamic optimization , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[35] E LoveCarl,et al. An investigation of static versus dynamic scheduling , 1990 .
[36] Calvin Lin,et al. Combining Hyperblocks and Exit Prediction to Increase Front-End Bandwidth and Performance , 2002 .
[37] Brad Calder,et al. An EPIC Processor with Pending Functional Units , 2002, ISHPC.
[38] Sanjay J. Patel,et al. Beating in-order stalls with "flea-flicker" two-pass pipelining , 2006, IEEE transactions on computers.
[39] Cameron McNairy,et al. Itanium 2 Processor Microarchitecture , 2003, IEEE Micro.
[40] Mark Heffernan,et al. Data-Dependency Graph Transformations for Instruction Scheduling , 2005, J. Sched..
[41] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.
[42] Scott A. Mahlke,et al. Comparing static and dynamic code scheduling for multiple-instruction-issue processors , 1991, MICRO 24.
[43] Harold W. Cain,et al. Runahead execution vs. conventional data prefetching in the IBM POWER6 microprocessor , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[44] Richard Johnson,et al. The Transmeta Code Morphing/spl trade/ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[45] A. Klaiber. The Technology Behind Crusoe TM Processors Low-power x 86-Compatible Processors Implemented with Code Morphing , 2000 .
[46] S McFarlinDaniel,et al. Discerning the dominant out-of-order performance advantage , 2013 .
[47] Abhishek Tiwari,et al. Enhanching MLP : Runahead Execution and Related Techniques , 2006 .
[48] Santosh Nagarakatte,et al. iCFP: Tolerating all-level cache misses in in-order processors , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[49] Ghassan Shobaki,et al. Optimal trace scheduling using enumeration , 2009, TACO.