Improving System Turnaround Time with Intel CAT by Identifying LLC Critical Applications

Resource sharing is a major concern in current multicore processors. Among the shared system resources, the Last Level Cache (LLC) is one of the most critical, since destructive interference between applications accessing it implies more off-chip accesses to main memory, which incur long latencies that can severely impact the overall system performance. To help alleviate this issue, current processors implement huge LLCs, but even so, inter-application interference can harm the performance of a subset of the running applications when executing multiprogram workloads. For this reason, recent Intel processors feature Cache Allocation Technologies (CAT) to partition the cache and assign subsets of cache ways to groups of applications. This paper proposes the Critical-Aware (CA) LLC partitioning approach, which leverages CAT and improves the performance of multiprogram workloads, by identifying and protecting the applications whose performance is more damaged by LLC sharing. Experimental results show that CA improves turnaround time on average by 15%, and up to 40% compared to a baseline system without partitioning.

[1]  Onur Mutlu,et al.  The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Ana Pont,et al.  The filter cache: a run-time cache management approach , 1999, Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium.

[3]  Xiaosong Ma,et al.  KPart: A Hybrid Cache Partitioning-Sharing Technique for Commodity Multicores , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[4]  Christoforos E. Kozyrakis,et al.  Vantage: Scalable and efficient fine-grain cache partitioning , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[5]  Stijn Eyerman,et al.  System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.

[6]  Christoforos E. Kozyrakis,et al.  Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[7]  Mattan Erez,et al.  Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems , 2016 .

[8]  R. Govindarajan,et al.  Probabilistic Shared Cache Management (PriSM) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[9]  Jin Li,et al.  FPS: A Fair-Progress Process Scheduling Policy on Shared-Memory Multiprocessors , 2015, IEEE Transactions on Parallel and Distributed Systems.

[10]  Lieven Eeckhout,et al.  Fairness-aware scheduling on single-ISA heterogeneous multi-cores , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[11]  Orna Agmon Ben-Yehuda,et al.  Ginseng: Market-Driven LLC Allocation , 2016, USENIX Annual Technical Conference.

[12]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[13]  Henrik Nilsson,et al.  Functional reactive programming, refactored , 2016, Haskell.

[14]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[15]  Christophe Ley,et al.  Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median , 2013 .

[16]  Lieven Eeckhout,et al.  Application Clustering Policies to Address System Fairness with Intel’s Cache Allocation Technology , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[17]  Yen-Chen Liu,et al.  Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.

[18]  Jeff Miller,et al.  Short Report: Reaction Time Analysis with Outlier Exclusion: Bias Varies with Sample Size , 1991, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[19]  José Duato,et al.  Addressing Fairness in SMT Multicores with a Progress-Aware Scheduler , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.