Technique for reducing network traffic of token protocol

Token protocol provides a flexible framework for designing new coherence protocols. It decouples performance from correctness, which allows for greater flexibility, making it possible to adapt the protocol for different systems. Unfortunately, message in token protocol are always broadcast, which creates heavy network traffic and limits its scalability. In this paper, we propose an efficient technique which employs a filter table to reduce the network traffic of token protocol in CMP systems. The filter table records the recently used blocks and provides the block's sharers and owner information. When a local cache miss happens, the core checks if a matching block exist in the filter table before broadcasting. If a matching block is found, the recorded sharer and owner information can be used to avoid a broadcast, and thus reduces network traffic. Simulation results show that our technique reduces the network traffic by 27% on average.

[1]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[2]  Calvin K. Tang Cache system design in the tightly coupled multiprocessor system , 1976, AFIPS '76.

[3]  James R. Goodman Using cache memory to reduce processor-memory traffic , 1998, ISCA '98.

[4]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[5]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[6]  Babak Falsafi,et al.  JETTY: filtering snoops for reduced energy consumption in SMP servers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[7]  Mikko H. Lipasti,et al.  Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking , 2005, ISCA 2005.

[8]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[9]  Mikko H. Lipasti,et al.  Improving multiprocessor performance with coarse-grain coherence tracking , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[10]  李鹏,et al.  Reducing Network Traffic of Token Protocol Using Sharing Relation Cache , 2007 .

[11]  Andreas Moshovos RegionScout: exploiting coarse grain sharing in snoop-based coherence , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[12]  Milo M. K. Martin,et al.  Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors , 2003, ISCA '03.

[13]  Mark Horowitz,et al.  An evaluation of directory schemes for cache coherence , 1998, ISCA '98.