A Survey of Techniques for Cache Partitioning in Multicore Processors

As the number of on-chip cores and memory demands of applications increase, judicious management of cache resources has become not merely attractive but imperative. Cache partitioning, that is, dividing cache space between applications based on their memory demands, is a promising approach to provide capacity benefits of shared cache with performance isolation of private caches. However, naively partitioning the cache may lead to performance loss, unfairness, and lack of quality-of-service guarantees. It is clear that intelligent techniques are required for realizing the full potential of cache partitioning. In this article, we present a survey of techniques for partitioning shared caches in multicore processors. We categorize the techniques based on important characteristics and provide a bird’s eye view of the field of cache partitioning.

[1]  Jichuan Chang,et al.  Cooperative cache partitioning for chip multiprocessors , 2007, ICS '07.

[2]  Onur Mutlu,et al.  Improving cache performance using read-write partitioning , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[3]  Hong Jiang,et al.  CLU: Co-Optimizing Locality and Utility in Thread-Aware Capacity Management for Shared Last Level Caches , 2014, IEEE Transactions on Computers.

[4]  Dong Li,et al.  A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-Volatile On-Chip Caches , 2015, IEEE Transactions on Parallel and Distributed Systems.

[5]  Tulika Mitra,et al.  Exploring locking & partitioning for predictable shared caches on multi-cores , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[6]  Lizy Kurian John,et al.  Cache Friendliness-Aware Managementof Shared Last-Level Caches for HighPerformance Multi-Core Systems , 2014, IEEE Transactions on Computers.

[7]  Rami G. Melhem,et al.  Writeback-aware partitioning and replacement for last-level caches in phase change main memory systems , 2012, TACO.

[8]  S. Kim,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[9]  Mahmut T. Kandemir,et al.  SHARP control: Controlled shared cache management in chip multiprocessors , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  G. Edward Suh,et al.  Dynamic Partitioning of Shared Cache Memory , 2004, The Journal of Supercomputing.

[11]  Peter Petrov,et al.  Cache partitioning for energy-efficient and interference-free embedded multitasking , 2010, TECS.

[12]  Aamer Jaleel,et al.  The gradient-based cache partitioning algorithm , 2012, TACO.

[13]  Xiao Zhang,et al.  Towards practical page coloring-based multicore cache management , 2009, EuroSys '09.

[14]  Lizhong Chen,et al.  Futility Scaling: High-Associativity Cache Partitioning , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[15]  Sparsh Mittal,et al.  A Survey of Cache Bypassing Techniques , 2016 .

[16]  Ying Ye,et al.  COLORIS: A dynamic cache partitioning system using page coloring , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[17]  Zhao Zhang,et al.  Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[18]  Hiroaki Kobayashi,et al.  Power-Aware Dynamic Cache Partitioning for CMPs , 2011, Trans. High Perform. Embed. Archit. Compil..

[19]  Srihari Makineni,et al.  Communist, Utilitarian, and Capitalist cache policies on CMPs: Caches as a shared resource , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[20]  John Turek,et al.  Optimal Partitioning of Cache Memory , 1992, IEEE Trans. Computers.

[21]  Chong-Min Kyung,et al.  Latency-aware Utility-based NUCA Cache Partitioning in 3D-stacked multi-processor systems , 2010, 2010 18th IEEE/IFIP International Conference on VLSI and System-on-Chip.

[22]  Aamer Jaleel,et al.  Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[23]  Onur Mutlu,et al.  The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[24]  Vijay S. Pai,et al.  Imbalanced cache partitioning for balanced data-parallel programs , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[25]  Ravi Iyer,et al.  Cache QoS: From concept to reality in the Intel® Xeon® processor E5-2600 v3 product family , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[26]  Mingyu Chen,et al.  A swap-based cache set index scheme to leverage both superpage and page coloring optimizations , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[27]  Yingwei Luo,et al.  A Simple Cache Partitioning Approach in a Virtualized Environment , 2009, 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[28]  Christoforos E. Kozyrakis,et al.  The ZCache: Decoupling Ways and Associativity , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[29]  Lei Liu,et al.  Going vertical in memory management: Handling multiplicity by multi-policy , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[30]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[31]  Heechul Yun Evaluating the Isolation Effect of Cache Partitioning on COTS Multicore Platforms , 2015 .

[32]  José González,et al.  Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors , 2010, ISCA.

[33]  Mahmut T. Kandemir,et al.  A helper thread based dynamic cache partitioning scheme for multithreaded applications , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[34]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[35]  Won-Taek Lim,et al.  Architectural support for operating system-driven CMP cache management , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[36]  Bharadwaj S. Amrutur,et al.  Molecular Caches: A caching structure for dynamic creation of application-specific Heterogeneous cache regions , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[37]  Lizy Kurian John,et al.  Bank-aware Dynamic Cache Partitioning for Multicore Architectures , 2009, 2009 International Conference on Parallel Processing.

[38]  Antonio González,et al.  A dynamically reconfigurable cache for multithreaded processors , 2006, J. Embed. Comput..

[39]  Daniel Sánchez,et al.  Modeling cache performance beyond LRU , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[40]  Francisco J. Cazorla,et al.  FlexDCP: a QoS framework for CMP architectures , 2009, OPSR.

[41]  Gabriel H. Loh,et al.  Scalable Shared-Cache Management by Containing Thrashing Workloads , 2010, HiPEAC.

[42]  Jeffrey S. Vetter,et al.  A Survey of CPU-GPU Heterogeneous Computing Techniques , 2015, ACM Comput. Surv..

[43]  Rajeev Balasubramonian,et al.  Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[44]  Jim D. Garside,et al.  An adaptive bloom filter cache partitioning scheme for multicore architectures , 2008, 2008 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation.

[45]  Sangyeun Cho,et al.  An Analytical Performance Model for Co-management of Last-Level Cache and Bandwidth Sharing , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[46]  Jeffrey S. Vetter,et al.  A Survey Of Techniques for Architecting DRAM Caches , 2016, IEEE Transactions on Parallel and Distributed Systems.

[47]  Mahmut T. Kandemir,et al.  A case for integrated processor-cache partitioning in chip multiprocessors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[48]  Rami G. Melhem,et al.  Symmetry-Agnostic Coordinated Management of the Memory Hierarchy in Multicore Systems , 2016, ACM Trans. Archit. Code Optim..

[49]  S. Stall,et al.  Improving Cache Performance by Exploiting Read-Write Disparity , 2014 .

[50]  Francisco J. Cazorla,et al.  Adapting cache partitioning algorithms to pseudo-LRU replacement policies , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[51]  Francisco J. Cazorla,et al.  Explaining Dynamic Cache Partitioning Speed Ups , 2007, IEEE Computer Architecture Letters.

[52]  Glenn Reinman,et al.  Fast and fair: data-stream quality of service , 2005, CASES '05.

[53]  Aamer Jaleel,et al.  Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[54]  Francisco J. Cazorla,et al.  MLP-Aware Dynamic Cache Partitioning , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[55]  Björn Franke,et al.  Cooperative partitioning: Energy-efficient cache partitioning for high-performance CMPs , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[56]  Mahmut T. Kandemir,et al.  Multilayer Cache Partitioning for Multiprogram Workloads , 2011, Euro-Par.

[57]  Gabriel H. Loh,et al.  PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches , 2009, ISCA '09.

[58]  Sparsh Mittal A Survey of Architectural Techniques for Managing Process Variation , 2016, ACM Comput. Surv..

[59]  Srinivas Devadas,et al.  Application-specific memory management for embedded systems using software-controlled caches , 2000, Proceedings 37th Design Automation Conference.

[60]  Chenjie Yu,et al.  Off-chip memory bandwidth minimization through cache partitioning for multi-core platforms , 2010, Design Automation Conference.

[61]  Ravi R. Iyer,et al.  CQoS: a framework for enabling QoS in shared caches of CMP platforms , 2004, ICS '04.

[62]  Kevin M. Lepak,et al.  Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor , 2010, IEEE Micro.

[63]  Erik Hagersten,et al.  STATSHARE: A Statistical Model for Managing Cache Sharing via Decay , 2006 .

[64]  Daniel Sánchez,et al.  Ubik: efficient cache sharing with strict qos for latency-critical workloads , 2014, ASPLOS.

[65]  Yingwei Luo,et al.  Optimal Cache Partition-Sharing , 2015, 2015 44th International Conference on Parallel Processing.

[66]  Hyesoon Kim,et al.  TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[67]  Christoforos E. Kozyrakis,et al.  Improving Resource Efficiency at Scale with Heracles , 2016, ACM Trans. Comput. Syst..

[68]  Yan Solihin,et al.  QoS policies and architecture for cache/memory in CMP platforms , 2007, SIGMETRICS '07.

[69]  R. Balasubramonian,et al.  Refining the Utility Metric for Utility-Based Cache Partitioning ∗ , 2011 .

[70]  Per Stenström,et al.  An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[71]  Zhao Zhang,et al.  MASTER: A Multicore Cache Energy-Saving Technique Using Dynamic Cache Reconfiguration , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[72]  David A. Patterson,et al.  A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness , 2013, ISCA.

[73]  Christoforos E. Kozyrakis,et al.  Vantage: Scalable and efficient fine-grain cache partitioning , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[74]  Lizy Kurian John,et al.  A bandwidth-aware memory-subsystem resource management using non-invasive resource profilers for large CMP systems , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[75]  Engin Ipek,et al.  Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[76]  Jichuan Chang,et al.  Cooperative Caching for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[77]  Sujit Dey,et al.  Variation aware cache partitioning for multithreaded programs , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[78]  David K. Tam,et al.  Managing Shared L2 Caches on Multicore Systems in Software , 2007 .

[79]  Hyunjin Lee,et al.  CloudCache: Expanding and shrinking private caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[80]  Saurabh Gupta,et al.  Spatial Locality-Aware Cache Partitioning for Effective Cache Sharing , 2015, 2015 44th International Conference on Parallel Processing.

[81]  Thomas M. Chen,et al.  Network Traffic Management , 2012 .

[82]  Antonia Zhai,et al.  Managing shared last-level cache in a heterogeneous multicore processor , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[83]  Shirshendu Das,et al.  Towards a Better Cache Utilization Using Controlled Cache Partitioning , 2013, 2013 IEEE 11th International Conference on Dependable, Autonomic and Secure Computing.

[84]  Kai Ma,et al.  Cache Latency Control for Application Fairness or Differentiation in Power-Constrained Chip Multiprocessors , 2012, IEEE Transactions on Computers.

[85]  Mahmut T. Kandemir,et al.  Intra-application cache partitioning , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[86]  Mateo Valero,et al.  Improving Cache Management Policies Using Dynamic Reuse Distances , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[87]  II. B ACKGROUND,et al.  MANAGER : A Multicore Shared Cache Energy Saving Technique for QoS Systems , 2013 .

[88]  Fang Liu,et al.  Understanding how off-chip memory bandwidth partitioning in Chip Multiprocessors affects system performance , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[89]  G. Edward Suh,et al.  Analytical cache models with applications to cache partitioning , 2001, ICS '01.

[90]  J. Vetter,et al.  Exploring Design Space of 3 D NVM and eDRAM Caches Using DESTINY Tool , 2015 .

[91]  Xiaodong Wang,et al.  XChange: A market-based approach to scalable dynamic multi-resource allocation in multicore architectures , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[92]  Zhao Zhang,et al.  Enabling software management for multicore caches with a lightweight hardware support , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.