论文信息 - Jumanji: The Case for Dynamic NUCA in the Datacenter

Jumanji: The Case for Dynamic NUCA in the Datacenter

The datacenter introduces new challenges for computer systems around tail latency and security. This paper argues that dynamic NUCA techniques are a better solution to these challenges than prior cache designs. We show that dynamic NUCA designs can meet tail-latency deadlines with much less cache space than prior work, and that they also provide a natural defense against cache attacks. Unfortunately, prior dynamic NUCAs have missed these opportunities because they focus exclusively on reducing data movement.We present Jumanji, a dynamic NUCA technique designed for tail latency and security. We show that prior last-level cache designs are vulnerable to new attacks and offer imperfect performance isolation. Jumanji solves these problems while significantly improving performance of co-running batch applications. Moreover, Jumanji only requires lightweight hardware and a few simple changes to system software, similar to prior D-NUCAs. At 20 cores, Jumanji improves batch weighted speedup by 14% on average, vs. just 2% for a non-NUCA design with weaker security, and is within 2% of an idealized design.

Nathan Beckmann | Brian C. Schwedock | Nathan Beckmann

[1] Ronak Singhal,et al. Inside Intel® Core microarchitecture (Nehalem) , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[2] Christoforos E. Kozyrakis,et al. Towards energy proportionality for large-scale latency-critical workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[3] William J. Dally,et al. GPUs and the Future of Parallel Computing , 2011, IEEE Micro.

[4] David A. Wood,et al. ASR: Adaptive Selective Replication for CMP Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[5] Dan Page,et al. Partitioned Cache Architecture as a Side-Channel Defence Mechanism , 2005, IACR Cryptology ePrint Archive.

[6] Sangyeun Cho,et al. SOS: A Software-Oriented Distributed Shared Cache Management Approach for Chip Multiprocessors , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[7] Christoforos E. Kozyrakis,et al. Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[8] Eric Rotenberg,et al. Jigsaw: Scalable software-defined caches , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[9] Daniel Sánchez,et al. Scaling distributed cache hierarchies through computation and data co-scheduling , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[10] Yan Solihin,et al. A Framework for Providing Quality of Service in Chip Multi-Processors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[11] Daniel Sánchez,et al. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[12] Yingwei Luo,et al. DCAPS: dynamic cache allocation with partial sharing , 2018, EuroSys.

[13] Christina Delimitrou,et al. Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[14] Luiz André Barroso,et al. The tail at scale , 2013, CACM.

[15] John Shalf,et al. Exascale Computing Technology Challenges , 2010, VECPAR.

[16] Cesar Pereida García,et al. Port Contention for Fun and Profit , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[17] Luiz André Barroso,et al. The Case for Energy-Proportional Computing , 2007, Computer.

[18] Christoforos E. Kozyrakis,et al. Vantage: Scalable and efficient fine-grain cache partitioning , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[19] Adi Shamir,et al. Cache Attacks and Countermeasures: The Case of AES , 2006, CT-RSA.

[20] Yang Li,et al. dCat: dynamic cache management for efficient, performance-sensitive infrastructure-as-a-service , 2018, EuroSys.

[21] Daniel Sánchez,et al. Rubik: Fast analytical power management for latency-critical systems , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[22] Christoforos E. Kozyrakis,et al. ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.

[23] Daniel Sánchez,et al. Whirlpool: Improving Dynamic Cache Management with Static Data Classification , 2016, ASPLOS.

[24] Amir Roth,et al. FIESTA: A Sample-Balanced Multi-Program Workload Methodology , 2009 .

[25] Babak Falsafi,et al. SMoTherSpectre: Exploiting Speculative Execution through Port Contention , 2019, CCS.

[26] Michael Hamburg,et al. Meltdown: Reading Kernel Memory from User Space , 2018, USENIX Security Symposium.

[27] Michael Hamburg,et al. Spectre Attacks: Exploiting Speculative Execution , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[28] Ruby B. Lee,et al. New cache designs for thwarting software cache-based side channel attacks , 2007, ISCA '07.

[29] Rami G. Melhem,et al. Practical PACE for embedded systems , 2004, EMSOFT '04.

[30] Daniel Mossé,et al. Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[31] Valentin Puente,et al. ESP-NUCA: A low-cost adaptive Non-Uniform Cache Architecture , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[32] Omer Khan,et al. IRONHIDE: A Secure Multicore that Efficiently Mitigates Microarchitecture State Attacks for Interactive Applications , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[33] Josep Torrellas,et al. Speculative Taint Tracking (STT): A Comprehensive Protection for Speculatively Accessed Data , 2019, IEEE Micro.

[34] Josep Torrellas,et al. InvisiSpec: Making Speculative Execution Invisible in the Cache Hierarchy , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[35] Daniel Sánchez,et al. Jenga: Software-defined cache hierarchies , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[36] Daniel Sánchez,et al. Ubik: efficient cache sharing with strict qos for latency-critical workloads , 2014, ASPLOS.

[37] Alan Jay Smith,et al. Improving dynamic voltage scaling algorithms with PACE , 2001, SIGMETRICS '01.

[38] Nick Knupffer. Intel Corporation , 2018, The Grants Register 2019.

[39] José González,et al. Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors , 2010, ISCA.

[40] Boris Grot,et al. Stretch: Balancing QoS and Throughput for Colocated Server Workloads on SMT Cores , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[41] Zeshan Chishti,et al. Optimizing replication, communication, and capacity allocation in CMPs , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[42] David Meisner,et al. Stochastic Queuing Simulation for Data Center Workloads , 2010 .

[43] Ruby B. Lee,et al. Random Fill Cache Architecture , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[44] Thomas F. Wenisch,et al. PowerNap: eliminating server idle power , 2009, ASPLOS.

[45] Thu D. Nguyen,et al. Exploiting Heterogeneity for Tail Latency and Energy Efficiency , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[46] Thomas F. Wenisch,et al. Power management of online data-intensive services , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[47] Mainak Chaudhuri. PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[48] NahrstedtKlara,et al. Energy-efficient soft real-time CPU scheduling for mobile multimedia systems , 2003 .

[49] Yale N. Patt,et al. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[50] Moinuddin K. Qureshi. Adaptive Spill-Receive for robust high-performance caching in CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[51] Tao Li,et al. Optimizing virtual machine consolidation performance on NUMA server architecture for cloud workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[52] Hao Wu,et al. Newcache: Secure Cache Architecture Thwarting Cache Side-Channel Attacks , 2016, IEEE Micro.

[53] Lingjia Tang,et al. Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.

[54] Krste Asanovic,et al. Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[55] Taesoo Kim,et al. STEALTHMEM: System-Level Protection Against Cache-Based Side Channel Attacks in the Cloud , 2012, USENIX Security Symposium.

[56] R. Govindarajan,et al. Probabilistic Shared Cache Management (PriSM) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[57] David Kaeli,et al. Exploiting Bank Conflict-based Side-channel Timing Leakage of GPUs , 2019, ACM Trans. Archit. Code Optim..

[58] Christina Delimitrou,et al. Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[59] Christina Delimitrou,et al. PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services , 2019, ASPLOS.

[60] Gernot Heiser,et al. Last-Level Cache Side-Channel Attacks are Practical , 2015, 2015 IEEE Symposium on Security and Privacy.

[61] Babak Falsafi,et al. Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.

[62] Mehmet Kayaalp,et al. A high-resolution side-channel attack on last-level cache , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[63] Moinuddin K. Qureshi. CEASER: Mitigating Conflict-Based Cache Attacks via Encrypted-Address and Remapping , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[64] Daniel Sánchez,et al. Nexus: A New Approach to Replication in Distributed Shared Caches , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[65] Lizhong Chen,et al. Futility Scaling: High-Associativity Cache Partitioning , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[66] Gernot Heiser,et al. CATalyst: Defeating last-level cache side channel attacks in cloud computing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[67] Stephan Krenn,et al. Cache Games -- Bringing Access-Based Cache Attacks on AES to Practice , 2011, 2011 IEEE Symposium on Security and Privacy.

[68] David A. Wood,et al. Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[69] Xiaosong Ma,et al. KPart: A Hybrid Cache Partitioning-Sharing Technique for Commodity Multicores , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[70] Moinuddin K. Qureshi. New Attacks and Defense for Encrypted-Address Cache , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[71] Josep Torrellas,et al. MicroScope: Enabling Microarchitectural Replay Attacks , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[72] Aamer Jaleel,et al. Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[73] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[74] Jichuan Chang,et al. Cooperative Caching for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[75] Klara Nahrstedt,et al. Energy-efficient soft real-time CPU scheduling for mobile multimedia systems , 2003, SOSP '03.

[76] Mehmet Kayaalp,et al. RIC: Relaxed Inclusion Caches for mitigating LLC side-channel attacks , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[77] Aamer Jaleel,et al. High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.

[78] Francisco J. Cazorla,et al. FlexDCP: a QoS framework for CMP architectures , 2009, OPSR.

[79] Daniel Sánchez,et al. Talus: A simple way to remove cliffs in cache performance , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[80] Rajeev Balasubramonian,et al. Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[81] Yunsi Fei,et al. A novel cache bank timing attack , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[82] Paul M. Carpenter,et al. Hipster: Hybrid Task Manager for Latency-Critical Cloud Workloads , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[83] Irfan Ahmad,et al. Cache Modeling and Optimization using Miniature Simulations , 2017, USENIX Annual Technical Conference.

[84] Sangyeun Cho,et al. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[85] Ronald G. Dreslinski,et al. Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[86] Ying Ye,et al. COLORIS: A dynamic cache partitioning system using page coloring , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[87] Zhao Zhang,et al. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[88] Kevin Skadron,et al. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[89] Vijay S. Pai,et al. Imbalanced cache partitioning for balanced data-parallel programs , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[90] Josep Torrellas,et al. Secure hierarchy-aware cache replacement policy (SHARP): Defending against cache-based side channel attacks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).