论文信息 - Towards a Better Cache Utilization Using Controlled Cache Partitioning

Towards a Better Cache Utilization Using Controlled Cache Partitioning

Many multi-core processors nowadays employ a shared Last Level Cache (LLC). Partitioning LLC becomes more important as LLC is shared among the cores. Past research has demonstrated that the traditional least recently used (LRU) based partitioning cum replacement policy has adverse effects on parameters like instruction per cycle (IPC), miss rate and speedup. This leads to poor performance in an environment when multiple cores compete for one global LLC. Applications, enjoying locality of reference are purely benefited by LRU, however LRU fails for the applications showing working set size (WSS) large than the LLC size. In this work, we propose a scheme which allows cores to steal/donate their lines upto a threshold and give them a chance to adjust their partition when there is a miss. Instead of maintaining strict target partitioning, we introduce a flexible threshold window. Our evaluation with multiprogrammed workloads shows significant performance improvement.

Shirshendu Das | Hemangee K. Kapoor | Prateek D. Halwe

[1] Xuejun Yang,et al. System Level Speedup Oriented Cache Partitioning for Multi-programmed Systems , 2009, 2009 Sixth IFIP International Conference on Network and Parallel Computing.

[2] Moinuddin K. Qureshi. Adaptive Spill-Receive for robust high-performance caching in CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[3] Aamer Jaleel,et al. Set-Dueling-Controlled Adaptive Insertion for High-Performance Caching , 2008, IEEE Micro.

[4] Zeshan Chishti,et al. Optimizing replication, communication, and capacity allocation in CMPs , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[5] Yan Solihin,et al. Counter-Based Cache Replacement and Bypassing Algorithms , 2008, IEEE Transactions on Computers.

[6] T. N. Vijaykumar,et al. Distance associativity for high-performance energy-efficient non-uniform cache architectures , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[7] David A. Wood,et al. Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[8] Yale N. Patt,et al. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[9] Zhonghai Lu,et al. Towards hierarchical cluster based cache coherence for large-scale network-on-chip , 2009, 2009 4th International Conference on Design & Technology of Integrated Systems in Nanoscal Era.

[10] Jing Wang,et al. Cache Management with Partitioning-Aware Eviction and Thread-Aware Insertion/Promotion Policy , 2010, International Symposium on Parallel and Distributed Processing with Applications.

[11] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[12] Babak Falsafi,et al. Dead-block prediction & dead-block correlating prefetchers , 2001, ISCA 2001.

[13] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[14] Jichuan Chang,et al. Cooperative Caching for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[15] Christian Bienia,et al. Benchmarking modern multiprocessors , 2011 .

[16] Aamer Jaleel,et al. Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[17] Niraj K. Jha,et al. GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[18] Gabriel H. Loh,et al. PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches , 2009, ISCA '09.

[19] Norman P. Jouppi,et al. Multi-Core Cache Hierarchies , 2011, Multi-Core Cache Hierarchies.

[20] Manoj Franklin,et al. Balancing thoughput and fairness in SMT processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..