CMP CACHE ARCHITECTURES - A SURVEY

As the number of cores on Chip Multi-Processor (CMP) increases, the need for effective utilization (management) of the cache increases. Cache Management plays an important role in improving the performance and miss latency by reducing the number of misses. In most of the cases, CMP with shared Last Level Cache (LLC) is a winner over the private LLC. Non-Uniform Cache Access (NUCA) represent two emerging trends in computer architecture. In NUCA the LLC is divided into multiple banks which lead to different banks being accessed with different latencies. Hence the heavily used blocks can be mapped or migrated towards the closer bank of the requesting core. Though NUCA is the best architecture for single core systems, implementing NUCA in CMP has many challenges. Researchers proposed many innovative ideas to implement NUCA in CMP but still there exists lot more complexities. Thus CMP cache architecture is a widely open research area. In this paper we did a survey on different CMP cache architectures based on NUCA. We have only given a basic overview and there are lot more advanced innovations which are not been covered. The performance evaluation of CMP architecture is a challenging task and must have to do for proving the correctness of any proposed architecture. Therefore, we also discussed about how the performance of CMP cache architectures can be evaluated

[1]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[2]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[3]  Shirshendu Das,et al.  A formal framework for interfacing mixed-timing systems , 2013, Integr..

[4]  Alberto L. Sangiovanni-Vincentelli,et al.  Theory of latency-insensitive design , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[5]  Jaehyuk Huh,et al.  A NUCA substrate for flexible CMP cache sharing , 2005, ICS.

[6]  Jichuan Chang,et al.  Cooperative Caching for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[7]  David A. Wood,et al.  Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[8]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[9]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[10]  David L. Dill,et al.  Automatic verification of the SCI cache coherence protocol , 1995, CHARME.

[11]  Pierfrancesco Foglia,et al.  Re-NUCA: Boosting CMP Performance Through Block Replication , 2010, 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools.

[12]  Hyunjin Lee,et al.  CloudCache: Expanding and shrinking private caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[13]  C. A. R. Hoare,et al.  Communicating Sequential Processes (Reprint) , 1983, Commun. ACM.

[14]  Babak Falsafi,et al.  Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.

[15]  Robin Milner,et al.  Communication and concurrency , 1989, PHI Series in computer science.

[16]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[17]  David A. Wood,et al.  ASR: Adaptive Selective Replication for CMP Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[18]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[19]  Sharad Malik,et al.  Orion: a power-performance simulator for interconnection networks , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[20]  Norman P. Jouppi,et al.  Multi-Core Cache Hierarchies , 2011, Multi-Core Cache Hierarchies.

[21]  Antonio González,et al.  HK-NUCA: Boosting Data Searches in Dynamic Non-Uniform Cache Architectures for Chip Multiprocessors , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[22]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[23]  Mark Ryan,et al.  Logic in Computer Science: Modelling and Reasoning about Systems , 2000 .

[24]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[25]  Shirshendu Das,et al.  Design and formal verification of a hierarchical cache coherence protocol for NoC based multiprocessors , 2012, The Journal of Supercomputing.

[26]  Norman P. Jouppi,et al.  Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[27]  Bradford M. Beckmann,et al.  TLC: transmission line caches , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..