Page Reusability-Based Cache Partitioning for Multi-Core Systems

Most modern multi-core processors provide a shared last level cache (LLC) where data from all cores are placed to improve performance. However, this opens a new challenge for cache management, owing to cache pollution. With cache pollution, data with weak temporal locality can evict other data with strong temporal locality when both are mapped into the same cache set. In this article, we propose page reusability-based cache partitioning (PRCP) for multi-core systems to maximize cache utilization by minimizing cache pollution. To achieve this, PRCP divides pages into two groups: (1) highly-reused pages and (2) lowly-reused pages. The reusability of each page is collected online via periodic page table scans. PRCP then dynamically partitions the shared cache into two corresponding areas using page coloring technique. We have implemented PRCP in Linux kernel and evaluated it using SPEC CPU2006 benchmarks. The results show that our scheme can achieve comparable performance to the optimal offline MRC-guided process-based cache partitioning scheme without a priori knowledge of workloads.

[1]  Hai Jin,et al.  NightWatch: Integrating Lightweight and Transparent Cache Pollution Control into Dynamic Memory Allocation Systems , 2015, USENIX Annual Technical Conference.

[2]  Vicent Selfa,et al.  Improving System Turnaround Time with Intel CAT by Identifying LLC Critical Applications , 2018, Euro-Par.

[3]  Xiaosong Ma,et al.  KPart: A Hybrid Cache Partitioning-Sharing Technique for Commodity Multicores , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[4]  Andrew Wolfe,et al.  Software-based cache partitioning for real-time applications , 1994 .

[5]  Eric Rotenberg,et al.  Jigsaw: Scalable software-defined caches , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[6]  Xiaodong Wang,et al.  SWAP: Effective Fine-Grain Management of Shared Last-Level Caches with Minimum Hardware Support , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[7]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[8]  Sparsh Mittal,et al.  A Survey of Techniques for Cache Partitioning in Multicore Processors , 2017, ACM Comput. Surv..

[9]  Marco Cesati,et al.  Understanding the Linux Kernel - from I / O ports to process management: covers Linux Kernel version 2.4 (2. ed.) , 2005 .

[10]  David A. Patterson,et al.  A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness , 2013, ISCA.

[11]  Christoforos E. Kozyrakis,et al.  Vantage: Scalable and efficient fine-grain cache partitioning , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[12]  Daniel Sánchez,et al.  Jenga: Software-defined cache hierarchies , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[13]  Gorka Irazoqui Apecechea,et al.  Systematic Reverse Engineering of Cache Slice Selection in Intel Processors , 2015, 2015 Euromicro Conference on Digital System Design.

[14]  Marco D. Santambrogio,et al.  A Software Cache Partitioning System for Hash-Based Caches , 2016, ACM Trans. Archit. Code Optim..

[15]  Michael Stumm,et al.  RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations , 2009, ASPLOS.

[16]  Michael Stumm,et al.  Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[17]  Hai Jin,et al.  FractalMRC: Online Cache Miss Rate Curve Prediction on Commodity Systems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[18]  Zhao Zhang,et al.  Soft-OLP: Improving Hardware Cache Performance through Software-Controlled Object-Level Partitioning , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[19]  Lizhong Chen,et al.  Futility Scaling: High-Associativity Cache Partitioning , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[20]  David K. Tam,et al.  Managing Shared L2 Caches on Multicore Systems in Software , 2007 .

[21]  Xiao Zhang,et al.  Towards practical page coloring-based multicore cache management , 2009, EuroSys '09.

[22]  Aamer Jaleel,et al.  Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[23]  Ying Ye,et al.  COLORIS: A dynamic cache partitioning system using page coloring , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[24]  Zhao Zhang,et al.  Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[25]  Sam H. Noh,et al.  Cache-aware block allocation for memory-technology storage targeted file systems , 2019, SAC.

[26]  Nicolas Le Scouarnec,et al.  Reverse Engineering Intel Last-Level Cache Complex Addressing Using Performance Counters , 2015, RAID.