Partially shared cache and adaptive replacement algorithm for NoC-based many-core systems

Abstract The Network-on-Chip(NoC) is a promising alternative to traditional bus-based architectures that has been widely applied to interconnect multi/many-core systems due to its scalable and modular design. Undoubtedly, the memory wall problem is one of the most important challenges; however, this problem can now be somewhat be alleviated by cache subsystems. In this paper, to overcome the high resource consumption and low data-sharing rate problems of the private cache scheme, we propose a partially shared cache structure and a corresponding replacement algorithm based on a mesh NoC. In this scheme, the L2 cache is shared by each group of four cores that connected as a cluster to a given node by the local bus. To maximize the performance of this partially shared cache structure, we propose a core-aware re-reference interval prediction (CA-RRIP) replacement algorithm. The algorithm performs dynamic virtual partitioning on the partially shared cache; the core that initiated the cache access request will be given top priority when a cache area needs to be replaced or inserted. This approach guarantees cache exclusivity and can mitigate interactions among cores using different access patterns. We implement the traditional private, the proposed partially shared and the row-shared cache subsystems in our experiments. The comparisons indicate that the overall system resource occupation can be reduced by 20% with the same number of cores, and the instructions per cycle(IPC) of the system could increase by up to 49.2%. Moreover, the system throughput(STP) increased by an average of 5.89%. Our experimental results showed that the proposed CA-RRIP algorithm also reduces the average cache miss rate of the system under various cache access patterns.

[1]  Quan Wang,et al.  Effective Task Scheduling and IP Mapping Algorithm for Heterogeneous NoC-Based MPSoC , 2014 .

[2]  Sang Lyul Min,et al.  LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies , 2001, IEEE Trans. Computers.

[3]  Hong Jiang,et al.  Improving Performance for Flash-Based Storage Systems through GC-Aware Cache Management , 2017, IEEE Transactions on Parallel and Distributed Systems.

[4]  Ali Ahmadinia,et al.  Energy and performance-aware application mapping for inhomogeneous 3D networks-on-chip , 2018, J. Syst. Archit..

[5]  Aamer Jaleel,et al.  Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[6]  Richard T. B. Ma,et al.  APP: adaptively protective policy against cache thrashing and pollution , 2015, The 21st IEEE International Workshop on Local and Metropolitan Area Networks.

[7]  Peter P. Puschner,et al.  A Time-Predictable Instruction-Cache Architecture that Uses Prefetching and Cache Locking , 2015, 2015 IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops.

[8]  David A. Wood,et al.  ASR: Adaptive Selective Replication for CMP Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[9]  Sergio Bampi,et al.  Approximation-aware Multi-Level Cells STT-RAM cache architecture , 2015, 2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[10]  Valentin Puente,et al.  SP-NUCA: a cost effective dynamic non-uniform cache architecture , 2008, CARN.

[11]  Jaafar Alghazo,et al.  SF-LRU cache replacement algorithm , 2004 .

[12]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[13]  Lieven Eeckhout,et al.  Sniper: scalable and accurate parallel multi-core simulation , 2012 .

[14]  Aamer Jaleel,et al.  Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[15]  Ing-Chao Lin,et al.  High-Endurance Hybrid Cache Design in CMP Architecture With Cache Partitioning and Access-Aware Policies , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[16]  Mohammad Ali Maddah-Ali,et al.  Fundamental Limits of Cache-Aided Interference Management , 2017, IEEE Trans. Inf. Theory.

[17]  Swadhesh Kumar,et al.  An overview of modern cache memory and performance analysis of replacement policies , 2016, 2016 IEEE International Conference on Engineering and Technology (ICETECH).

[18]  Wei Li,et al.  A Fault Tolerance NoC Topology and Adaptive Routing Algorithm , 2016, 2016 13th International Conference on Embedded Software and Systems (ICESS).

[19]  Adrian Moga,et al.  High performing cache hierarchies for server workloads: Relaxing inclusion to capture the latency benefits of exclusive caches , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[20]  Jichuan Chang,et al.  Cooperative cache partitioning for chip multiprocessors , 2007, ICS '07.

[21]  Aamer Jaleel,et al.  High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.

[22]  Hong Jiang,et al.  CLU: Co-Optimizing Locality and Utility in Thread-Aware Capacity Management for Shared Last Level Caches , 2014, IEEE Transactions on Computers.

[23]  Pengfei Yang,et al.  Heterogeneous Honeycomb-like NoC Topology and Routing based on Communication Division , 2015 .

[24]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[25]  Mazen Kharbutli,et al.  LACS: A Locality-Aware Cost-Sensitive Cache Replacement Algorithm , 2014, IEEE Transactions on Computers.

[26]  Bouziane Beldjilali,et al.  Energy consumption in reconfigurable mpsoc architecture: Two-level caches optimization oriented approach , 2013, 2013 8th IEEE Design and Test Symposium.

[27]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[28]  Kayhan M. Imre,et al.  Scheduling computation and communication on a software-defined photonic Network-on-Chip architecture for high-performance real-time systems , 2018, J. Syst. Archit..

[29]  Michael Zhang,et al.  Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors , 2005, ISCA 2005.

[30]  Hybrid-comp: A criticality-aware compressed last-level cache , 2018, 2018 19th International Symposium on Quality Electronic Design (ISQED).

[31]  Mingwei Xu,et al.  Age-based cooperative caching in information-centric networking , 2014, ICCCN.