An optimized multicore cache coherence design for exploiting communication locality

Supporting cache coherence in current multicore processor still faces scalability and performance problems. This paper presents an optimized cache coherence design targeting at NoC-based multicore processors. It tries to achieve the best characteristics both of the snooping and of the directory-based protocols. With the observation of network traffic locality, we design a cache coherence that aims at local and remote access separately. At the first level, snooping is achieved within a cache group and at the second level of the protocol, the coarse directories provide the caches with information about which processors must be involved in first level snooping. To support efficient coherence broadcasting, we also propose a low latency, broadcast-enabled underlying NoC design. It incorporates light weight buses into NoCs, where the snooping protocol can be performed in a broadcast fashion. Extensive experimental results demonstrate that the proposed coherence design can achieve low complexity and high performance goals.

[1]  Karthik Ramani,et al.  Interconnect-Aware Coherence Protocols for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[2]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[3]  Alan J. Hu,et al.  Improving multiple-CMP systems using token coherence , 2005, 11th International Symposium on High-Performance Computer Architecture.

[4]  Jan M. Van Campenhout,et al.  Rent's rule and parallel programs: characterizing network traffic behavior , 2008, SLIP '08.

[5]  James Laudon,et al.  The SGI Origin: A ccNUMA Highly Scalable Server , 1997, ISCA.

[6]  Veljko Milutinovic,et al.  The Cache Coherence Problem in Shared-Memory Multiprocessors: Software Solutions , 1996 .

[7]  D. Lenoski,et al.  The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[8]  Jr. Richard Thomas Simoni,et al.  Cache coherence directories for scalable multiprocessors , 1992 .

[9]  Ivo Bolsens,et al.  Proceedings of the conference on Design, Automation & Test in Europe , 2000 .

[10]  Li-Shiuan Peh,et al.  In-network cache coherence , 2006, IEEE Comput. Archit. Lett..

[11]  Natalie D. Enright Jerger,et al.  Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[12]  Mark D. Hill,et al.  Virtual hierarchies to support server consolidation , 2007, ISCA '07.

[13]  Avinoam Kolodny,et al.  BENoC: A Bus-Enhanced Network on-Chip for a Power Efficient CMP , 2008, IEEE Computer Architecture Letters.

[14]  Ion I. Mandoiu,et al.  Proceedings of the 2008 international workshop on System level interconnect prediction , 2007, SLIP 2008.

[15]  裕幸 飯田,et al.  International Technology Roadmap for Semiconductors 2003の要求清浄度について - シリコンウエハ表面と雰囲気環境に要求される清浄度, 分析方法の現状について - , 2004 .

[16]  Anoop Gupta,et al.  Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes , 1990, ICPP.

[17]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[18]  J. K. Archibald The cache coherence problem in shared-memory multiprocessors , 1987 .

[19]  Feng Shi,et al.  Group-caching for NoC based multicore cache coherent systems , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[20]  Milo M. K. Martin,et al.  Bandwidth adaptive snooping , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.