Efficient Cache Resizing policy for DRAM-based LLCs in ChipMultiprocessors

Abstract In today’s ChipMultiprocessors (CMPs), multiple cores share the common Last Level Cache (LLC), divided into multiple banks. As the data requirement is increasing the demand for larger LLC sizes is also increasing. The traditional SRAM technology is not area efficient to design such larger LLCs as demanded by the modern CMPs. From the last few years, DRAM technologies have been used to propose LLC. DRAM technology has almost 8 times density over the SRAM and hence larger cache size can be designed. Though DRAM is already considered as an alternative to design low cost, area-efficient larger size LLC, it must be used efficiently to get the benefits. Due to its overheads like access latency and refresh operations efficient techniques must be used to get better performance from DRAM LLC. In the existing works, it has been observed that though the larger LLC is required for the current as well as future data-intensive applications, the entire LLC may not be required while executing other applications. In such situations, some banks can be almost idle during a particular period of execution. These idle banks can be powered-off and restart later whenever required. The mechanism is called Cache Resizing as it resizes the cache (LLC) according to the current requirements. Cache resizing techniques are already proposed for SRAM based LLCs but due to the larger size of DRAM LLC, the same mechanisms cannot be used for DRAM LLCs. In this paper, we have proposed an efficient cache resizing policy for large sized LLC, especially for DRAM-based LLCs. We call our proposed cache resizing technique as Efficient Cache Resizing (ECR) which is implemented on top of a 3D Tiled CMP. Experimental analysis shows that ECR can reduce up to 44% more energy consumption as compared to the existing technique.

[1]  Menglong Guan,et al.  Improving DRAM Performance in 3-D ICs via Temperature Aware Refresh , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2]  Hemangee K. Kapoor,et al.  Performance linked dynamic cache tuning: A static energy reduction approach in tiled CMPs , 2017, Microprocess. Microsystems.

[3]  Mark D. Hill,et al.  Supporting Very Large DRAM Caches with Compound-Access Scheduling and MissMap , 2012, IEEE Micro.

[4]  Hannu Tenhunen,et al.  Exploring DRAM Last Level Cache for 3D Network-on-Chip Architecture , 2010 .

[5]  Hassan Salamy,et al.  Task allocation, migration and scheduling for energy-efficient real-time multiprocessor architectures , 2019, J. Syst. Archit..

[6]  Hemangee K. Kapoor,et al.  Exploring the Role of Large Centralised Caches in Thermal Efficient Chip Design , 2019, ACM Trans. Design Autom. Electr. Syst..

[7]  Dong Li,et al.  A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-Volatile On-Chip Caches , 2015, IEEE Transactions on Parallel and Distributed Systems.

[8]  Jun Yang,et al.  Process Variation-Aware Nonuniform Cache Management in a 3D Die-Stacked Multicore Processor , 2013, IEEE Transactions on Computers.

[9]  Kun Cao,et al.  A survey of optimization techniques for thermal-aware 3D processors , 2019, J. Syst. Archit..

[10]  Shirshendu Das,et al.  Static energy efficient cache reconfiguration for dynamic NUCA in tiled CMPs , 2016, SAC.

[11]  Hemangee K. Kapoor,et al.  Dynamic reconfiguration of embedded-DRAM caches employing zero data detection based refresh optimisation , 2019, J. Syst. Archit..

[12]  Jörg Henkel,et al.  Architecting On-Chip DRAM Cache for Simultaneous Miss Rate and Latency Reduction , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13]  Shirshendu Das,et al.  Dynamic Associativity Management in Tiled CMPs by Runtime Adaptation of Fellow Sets , 2017, IEEE Transactions on Parallel and Distributed Systems.

[14]  Yuan Xie,et al.  Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Onur Mutlu,et al.  Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management , 2012, IEEE Computer Architecture Letters.

[16]  Sparsh Mittal,et al.  A survey of architectural techniques for improving cache power efficiency , 2014, Sustain. Comput. Informatics Syst..

[17]  Hemangee K. Kapoor,et al.  Analysing the Role of Last Level Caches in Controlling Chip Temperature , 2018, IEEE Transactions on Sustainable Computing.

[18]  Engin Ipek,et al.  Content Aware Refresh: Exploiting the Asymmetry of DRAM Retention Errors to Reduce the Refresh Frequency of Less Vulnerable Data , 2019, IEEE Transactions on Computers.

[19]  Babak Falsafi,et al.  Power Scaling: the Ultimate Obstacle to 1K-Core Chips , 2010 .

[20]  Ge Yu,et al.  Minimizing temperature and energy of real-time applications with precedence constraints on heterogeneous MPSoC systems , 2019, J. Syst. Archit..

[21]  Norman P. Jouppi,et al.  Multi-Core Cache Hierarchies , 2011, Multi-Core Cache Hierarchies.

[22]  Kevin Kai-Wei Chang,et al.  Enabling Efficient Dynamic Resizing of Large DRAM Caches via A Hardware Consistent Hashing Mechanism , 2016, ArXiv.

[23]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[24]  Ali Ahmadinia,et al.  Energy and performance-aware application mapping for inhomogeneous 3D networks-on-chip , 2018, J. Syst. Archit..

[25]  Jeffrey S. Vetter,et al.  A Survey Of Techniques for Architecting DRAM Caches , 2016, IEEE Transactions on Parallel and Distributed Systems.

[26]  Sparsh Mittal,et al.  A survey of techniques for improving error-resilience of DRAM , 2018, J. Syst. Archit..

[27]  Enrico Macii,et al.  Architectural Leakage Power Minimization of Scratchpad Memories by Application-Driven Subbanking , 2010, IEEE Transactions on Computers.

[28]  Brian Rogers,et al.  Scaling the bandwidth wall: challenges in and avenues for CMP scaling , 2009, ISCA '09.