Long term parking (LTP): Criticality-aware resource allocation in OOO processors
暂无分享,去创建一个
David Black-Schaffer | Erik Hagersten | Andreas Sembrant | Trevor E. Carlson | Pierre Michaud | André Seznec | Arthur Perais | Erik Hagersten | D. Black-Schaffer | André Seznec | P. Michaud | Arthur Perais | Andreas Sembrant
[1] T. Austin,et al. Cyclone: a broadcast-free dynamic instruction scheduler with selective replay , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..
[2] Alvin M. Despain,et al. The 16-fold way: a microparallel taxonomy , 1993, MICRO 1993.
[3] Norman P. Jouppi,et al. CACTI 6.0: A Tool to Model Large Caches , 2009 .
[4] Enric Morancho,et al. Recovery mechanism for latency misprediction , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[5] R. Iris Bahar,et al. The non-critical buffer: using load latency tolerance to improve data cache efficiency , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).
[6] Sanjay J. Patel,et al. Instruction fetch deferral using static slack , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[7] Lieven Eeckhout,et al. The Load Slice Core microarchitecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[8] Dirk Grunwald,et al. Predictive sequential associative cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[9] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[10] Mateo Valero,et al. Toward kilo-instruction processors , 2004, TACO.
[11] Brad Calder,et al. Dynamic prediction of critical path instructions , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[12] Huiyang Zhou,et al. Dual-core execution: building a highly scalable single-thread instruction window , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[13] John L. Henning. SPEC CPU2006 benchmark descriptions , 2006, CARN.
[14] Stamatis Vassiliadis,et al. Register renaming and dynamic speculation: an alternative approach , 1993, MICRO.
[15] Mateo Valero,et al. Delaying physical register allocation through virtual-physical registers , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[16] Amir Roth,et al. BOLT: Energy-efficient Out-of-Order Latency-Tolerant execution , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[17] Simha Sethumadhavan,et al. Late-binding: enabling unordered load-store queues , 2007, ISCA '07.
[18] Masahiro Goshima,et al. A high-speed dynamic instruction scheduling scheme for superscalar processors , 2001, MICRO.
[19] Larry L. Biro,et al. Power considerations in the design of the Alpha 21264 microprocessor , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).
[20] James E. Smith,et al. Complexity-Effective Superscalar Processors , 1997, ISCA.
[21] Rastislav Bodík,et al. Focusing processor policies via critical-path prediction , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.
[22] Enric Morancho,et al. On reducing energy-consumption by late-inserting instructions into the issue queue , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).
[23] Tong Li,et al. A large, fast instruction window for tolerating cache misses , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[24] Haitham Akkary,et al. Scalable load and store processing in latency tolerant processors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[25] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[26] Hideki Ando,et al. MLP-aware dynamic instruction window resizing for adaptively exploiting both ILP and MLP , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[27] Amir Roth,et al. Decoupled store completion/silent deterministic replay: enabling scalable data memory for CPR/CFP processors , 2009, ISCA '09.
[28] Trevor Mudge,et al. Improving data cache performance by pre-executing instructions under a cache miss , 1997 .
[29] Haitham Akkary,et al. Continual flow pipelines , 2004, ASPLOS XI.
[30] Onur Mutlu,et al. Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[31] Nader Bagherzadeh,et al. A scalable register file architecture for dynamically scheduled processors , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.