Partitioning mechanism based on dynamic Allocation of Data entries for chip multiprocessors

Exploiting the locality of blocks in the same set, LRU replacement strategy has deficiencies to manage L2 cache resources as the temporal locality has filtered by L1 caches. Instead, reuse replacement strategy [1] develops the reuse characteristics of blocks in entire cache scope being more potential to improve cache resources utilization. We use reuse replacement to manage L2 cache resources in chip multiprocessors (CMP) and propose a new partitioning mechanism named PAD (Partitioning based on dynamic Allocation of Data entries). PAD divides the tag array into sub-arrays and the data array into private and shared data regions, and partitions cache resources among cores depending on their memory access demand. As data entries are dynamically allocated to tag entries by reuse replacement strategy, a core that have obtained more data entries in time interval can have a higher demand of cache resources. Collecting occupied data entries, a PAD algorithm with initial, partitioning and rollback stages is proposed to decide the amount of cache resources assigned to each core. Capacity adjustment is accomplished by allocating data entries from the private data region or the shared data region. Using programs from PARSEC benchmark to build multi-threaded and multi-programmed applications, our experiments show that this new scheme can achieve an average IPC improvement of 22.33% on both traditional private and shared cache organizations.

[1]  Srinivas Devadas,et al.  Dynamic Cache Partitioning via Columnization , 2000, DAC 2000.

[2]  Brian N. Bershad,et al.  Avoiding conflict misses dynamically in large direct-mapped caches , 1994, ASPLOS VI.

[3]  Won-Taek Lim,et al.  Architectural support for operating system-driven CMP cache management , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[4]  R. Govindarajan,et al.  Emulating Optimal Replacement with a Shepherd Cache , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[5]  Per Stenström,et al.  An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[6]  Yang Xue A Weighted Dynamic Shared Cache Partitioning Mechanism for Multi-Threaded Multi-Programmed Workloads , 2008 .

[7]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[8]  N. Muralimanohar,et al.  CACTI 6 . 0 : A Tool to Understand Large Caches , 2007 .

[9]  Ravi R. Iyer,et al.  CQoS: a framework for enabling QoS in shared caches of CMP platforms , 2004, ICS '04.

[10]  Yale N. Patt,et al.  The V-Way cache: demand-based associativity via global replacement , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[11]  Glenn Reinman,et al.  Fast and fair: data-stream quality of service , 2005, CASES '05.

[12]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[13]  Erik Hagersten,et al.  STATSHARE: A Statistical Model for Managing Cache Sharing via Decay , 2006 .

[14]  Guang Suo,et al.  A Weighted Dynamic Shared Cache Partitioning Mechanism for Multi-Threaded Multi-Programmed Workloads: A Weighted Dynamic Shared Cache Partitioning Mechanism for Multi-Threaded Multi-Programmed Workloads , 2009 .

[15]  Yan Solihin,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[16]  Steven K. Reinhardt,et al.  A fully associative software-managed cache design , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[17]  G. Edward Suh,et al.  Dynamic Partitioning of Shared Cache Memory , 2004, The Journal of Supercomputing.

[18]  Jichuan Chang,et al.  Cooperative cache partitioning for chip multiprocessors , 2007, ICS '07.

[19]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).