Orinoco: Ordered Issue and Unordered Commit with Non-Collapsible Queues
暂无分享,去创建一个
Yang Liu | Yi Huang | Leibo Liu | Shaojun Wei | Jianfeng Zhu | Bing Li | Tairan Zhang | Pengfei Gou | Dibei Chen | Chunyang Feng
[1] Myung Kuk Yoon,et al. Reconstructing Out-of-Order Issue Queue , 2022, 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO).
[2] S. Kaxiras,et al. Free atomics: hardware atomic operations without fences , 2022, ISCA.
[3] M. C. Jeffrey,et al. A scalable architecture for reprioritizing ordered parallelism , 2022, ISCA.
[4] Mike Clark,et al. The AMD Next-Generation “Zen 3” Core , 2022, IEEE Micro.
[5] A. Yoaz,et al. Intel Alder Lake CPU Architectures , 2022, IEEE Micro.
[6] Heiner Litz,et al. CRISP: critical slice prefetching , 2022, ASPLOS.
[7] Narayanan Vijaykrishnan,et al. Microprocessor at 50: Industry Leaders Speak , 2021, IEEE Micro.
[8] L. Rizzo,et al. ghOSt: Fast & Flexible User-Space Delegation of Linux Scheduling , 2021, SOSP.
[9] Lieven Eeckhout,et al. TIP: Time-Proportional Instruction Profiling , 2021, MICRO.
[10] Yale N. Patt,et al. Criticality Driven Fetch , 2021, MICRO.
[11] Tony Nowatzki,et al. PolyGraph: Exposing the Value of Flexibility for Graph Processing Accelerators , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).
[12] Lieven Eeckhout,et al. Vector Runahead , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).
[13] Trevor E. Carlson,et al. NOREBA: a compiler-informed non-speculative out-of-order commit processor , 2021, ASPLOS.
[14] Junning Chen,et al. Two-Direction In-Memory Computing Based on 10T SRAM With Horizontal and Vertical Decoupled Read Ports , 2021, IEEE Journal of Solid-State Circuits.
[15] Zhiwei Liu,et al. CATCAM: Constant-time Alteration Ternary CAM with Scalable In-Memory Architecture , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[16] Lieven Eeckhout,et al. The Forward Slice Core Microarchitecture , 2020, PACT.
[17] David Black-Schaffer,et al. Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[18] Won Woo Ro,et al. CASINO Core Microarchitecture: Generating Out-of-Order Schedules Using Cascaded In-Order Scheduling Windows , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[19] Josep Torrellas,et al. Understanding priority-based scheduling of graph algorithms on a shared-memory platform , 2019, SC.
[20] Hideki Ando,et al. SWQUE: A Mode Switching Issue Queue with Priority-Correcting Circular Queue , 2019, MICRO.
[21] Thomas F. Wenisch,et al. SoftSKU: Optimizing Server Architectures for Microservice Diversity @Scale , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[22] David Black-Schaffer,et al. FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[23] Christoforos E. Kozyrakis,et al. Shinjuku: Preemptive Scheduling for μsecond-scale Tail Latency , 2019, NSDI.
[24] David Black-Schaffer,et al. Freeway: Maximizing MLP for Slice-Out-of-Order Execution , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[25] Lieven Eeckhout,et al. Precise Runahead Execution , 2019, IEEE Computer Architecture Letters.
[26] Qian Li,et al. Arachne: Core-Aware Thread Management , 2018, OSDI.
[27] Stefanos Kaxiras,et al. The Superfluous Load Queue , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[28] Stefanos Kaxiras,et al. SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores , 2018, PLDI.
[29] Jóakim von Kistowski,et al. SPEC CPU2017: Next-Generation Compute Benchmark , 2018, ICPE Companion.
[30] Sujan Kumar Gonugondla,et al. A Multi-Functional In-Memory Inference Processor Using a Standard 6T SRAM Array , 2018, IEEE Journal of Solid-State Circuits.
[31] Kaushik Roy,et al. X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.
[32] David Blaauw,et al. Cache Automaton , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[33] Stefanos Kaxiras,et al. Non-speculative load-load reordering in TSO , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[34] Stefanos Kaxiras,et al. Exploring the Performance Limits of Out-of-order Commit , 2017, Conf. Computing Frontiers.
[35] Efraim Rotem,et al. Inside 6th-Generation Intel Core: New Microarchitecture Code-Named Skylake , 2017, IEEE Micro.
[36] Onur Mutlu,et al. Continuous runahead: Transparent hardware acceleration for memory intensive workloads , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[37] Naveen Verma,et al. A machine-learning classifier implemented in a standard 6T SRAM array , 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).
[38] Miao Hu,et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[39] David Blaauw,et al. A 28 nm Configurable Memory (TCAM/BCAM/SRAM) Using Push-Rule 6T Bit Cell Enabling Logic-in-Memory , 2016, IEEE Journal of Solid-State Circuits.
[40] Margaret Martonosi,et al. DeSC: Decoupled supply-compute communication management for heterogeneous architectures , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[41] David Black-Schaffer,et al. Long term parking (LTP): Criticality-aware resource allocation in OOO processors , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[42] Cong Yan,et al. A scalable architecture for ordered parallelism , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[43] Keshav Pingali,et al. Priority Queues Are Not Good Concurrent Priority Schedulers , 2015, Euro-Par.
[44] Lieven Eeckhout,et al. The Load Slice Core microarchitecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[45] Gu-Yeon Wei,et al. Profiling a warehouse-scale computer , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[46] Michael Gschwind,et al. IBM POWER8 processor core microarchitecture , 2015, IBM J. Res. Dev..
[47] Min Huang,et al. An Energy Efficient 32-nm 20-MB Shared On-Die L3 Cache for Intel® Xeon® Processor E5 Family , 2013, IEEE Journal of Solid-State Circuits.
[48] Craig B. Zilles,et al. Discerning the dominant out-of-order performance advantage: is it speculation or dynamism? , 2013, ASPLOS '13.
[49] S McFarlinDaniel,et al. Discerning the dominant out-of-order performance advantage , 2013 .
[50] Keshav Pingali,et al. The tao of parallelism in algorithms , 2011, PLDI '11.
[51] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[52] Michael Golden,et al. 40-Entry unified out-of-order scheduler and integer execution unit for the AMD Bulldozer x86–64 core , 2011, 2011 IEEE International Solid-State Circuits Conference.
[53] Grigorios Magklis,et al. Processor Microarchitecture: An Implementation Perspective , 2010, Processor Microarchitecture.
[54] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[55] S.P. Marti,et al. A Complexity-Effective Out-of-Order Retirement Microarchitecture , 2009, IEEE Transactions on Computers.
[56] Chandandeep Singh Pabla. Completely fair scheduler , 2009 .
[57] Gabriel H. Loh,et al. Matrix scheduler reloaded , 2007, ISCA '07.
[58] Amir Roth,et al. Store vulnerability window (SVW): re-execution filtering for enhanced load optimization , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[59] Mikko H. Lipasti,et al. Deconstructing commit , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.
[60] Jaume Abella,et al. Power- and Complexity-Aware Issue Queue Designs , 2003, IEEE Micro.
[61] Onur Mutlu,et al. Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[62] J.F. Martinez,et al. Cherry: Checkpointed early resource recycling in out-of-order microprocessors , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[63] Chris Wilkerson,et al. Hierarchical scheduling windows , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[64] S. Tomita,et al. A high-speed dynamic instruction scheduling scheme for supersealar processors , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[65] Richard E. Kessler,et al. The Alpha 21264 microprocessor , 1999, IEEE Micro.
[66] T. Fischer,et al. Issue Logic For A 600 MHz Out-of-order Execution , 1997, Symposium 1997 on VLSI Circuits.
[67] Kenneth C. Yeager. The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.
[68] R. M. Tomasulo,et al. An efficient algorithm for exploiting multiple arithmetic units , 1995 .
[69] Izidor Gertner,et al. On the Complexity of Scheduling Problems for Parallel/Pipelined Machines , 1989, IEEE Trans. Computers.
[70] Andrew R. Pleszkun,et al. Implementing Precise Interrupts in Pipelined Processors , 1988, IEEE Trans. Computers.
[71] Scott Owens,et al. x86-TSO: A Rigorous and Usable Programmer’s Model for x86 Multiprocessors , 2022 .
[72] Nick McKeown,et al. The nanoPU: A Nanosecond Network Stack for Datacenters , 2021, OSDI.
[73] Hari Balakrishnan,et al. Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads , 2019, NSDI.
[74] Yunsup Lee,et al. The RISC-V Instruction Set Manual , 2014 .
[75] No License,et al. Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .
[76] Brad Calder,et al. SimPoint 3.0: Faster and More Flexible Program Phase Analysis , 2005, J. Instr. Level Parallelism.
[77] Michael L. Overton,et al. Numerical Computing with IEEE Floating Point Arithmetic , 2001 .