Premier: A Concurrency-Aware Pseudo-Partitioning Framework for Shared Last-Level Cache

As the number of on-chip cores and application demands increase, efficient management of shared cache resources becomes imperative. Cache partitioning techniques have been studied for decades to reduce interference between applications in a shared cache and provide performance and fairness guarantees. However, there are few studies on how concurrent memory accesses affect the effectiveness of partitioning. When concurrent memory requests exist, cache miss does not reflect concurrency overlapping well. In this work, we first introduce pure misses per kilo instructions (PMPKI), a metric that quantifies the cache efficiency considering concurrent access activities. Then we propose Premier, a dynamically adaptive concurrency-aware cache pseudo-partitioning framework. Premier provides insertion and promotion policies based on PMPKI curves to achieve the benefits of cache partitioning. Finally, our evaluation of various workloads shows that Premier outperforms state-of-the-art cache partitioning schemes in terms of performance and fairness. In an 8-core system, Premier achieves 15.45% higher system performance and 10.91% better fairness than the UCP scheme.

[1]  Xian-He Sun,et al.  A Study on Modeling and Optimization of Memory Systems , 2021, J. Comput. Sci. Technol..

[2]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[3]  Onur Mutlu,et al.  The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Manoj Franklin,et al.  Balancing thoughput and fairness in SMT processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[5]  Lizy Kurian John,et al.  Cache Friendliness-Aware Managementof Shared Last-Level Caches for HighPerformance Multi-Core Systems , 2014, IEEE Transactions on Computers.

[6]  Dawei Wang,et al.  Concurrent Average Memory Access Time , 2014, Computer.

[7]  Gabriel H. Loh,et al.  PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches , 2009, ISCA '09.

[8]  Sparsh Mittal,et al.  A Survey of Techniques for Cache Partitioning in Multicore Processors , 2017, ACM Comput. Surv..

[9]  Xian-He Sun,et al.  APAC: An Accurate and Adaptive Prefetch Framework with Concurrent Memory Access Analysis , 2020, 2020 IEEE 38th International Conference on Computer Design (ICCD).