Reducing last level cache pollution through OS-level software-controlled region-based partitioning

Performance degradation caused by cache pollution in the last level cache is extremely severe. In this paper, we propose a software controlled mechanism for last level cache partitioning at the region level in order to reduce intra-application last level cache misses due to cache pollution. A profiling feedback mechanism is used to analyze the inter-region cache interference. Guided by the profiling information, we enhance operating system support for mapping poor locality regions to a small slice in the last level cache in order to eliminate the harmful effect of non-reusable data. Our approach does not require any hardware support or new instructions, and is also application transparent. In comparison with the default linux, our approach, called Soft-RP, reduces LLC MPKI, the last level cache misses per 1000 instructions, up to 30.88%, and 19.31% on average; execution time measurement shows that Soft-RP can improve the performance up to 15.51%, and 8.14% on average.

[1]  Aamer Jaleel,et al.  Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[2]  Sangyeun Cho,et al.  Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[3]  Zhao Zhang,et al.  Enabling software management for multicore caches with a lightweight hardware support , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[4]  Andreas Moshovos,et al.  A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[5]  Aamer Jaleel,et al.  Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Kristof Beyls,et al.  Reuse Distance as a Metric for Cache Behavior. , 2001 .

[7]  Gabriel H. Loh,et al.  PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches , 2009, ISCA '09.

[8]  Feng Liu,et al.  Research Progress of UniCore CPUs and PKUnity SoCs , 2010, Journal of Computer Science and Technology.

[9]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[10]  Zhao Zhang,et al.  Soft-OLP: Improving Hardware Cache Performance through Software-Controlled Object-Level Partitioning , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[11]  Richard E. Kessler,et al.  Page placement algorithms for large real-indexed caches , 1992, TOCS.

[12]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[13]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[14]  Zhiyuan Li,et al.  Using cache mapping to improve memory performance handheld devices , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[15]  David Eklov,et al.  Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory Accesses , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.