论文信息 - Library Cache Coherence

Library Cache Coherence

Directory-based cache coherence is a popular mechanism for chip multiprocessors and multicores. The directory protocol, however, requires multicast for invalidation messages and the collection of acknowledgement messages, which can be expensive in terms of latency and network traffic. Furthermore, the size of the directory increases with the number of cores. We present Library Cache Coherence (LCC), which requires neither broadcast/multicast for invalidations nor waiting for invalidation acknowledgements. A library is a set of timestamps that are used to auto-invalidate shared cache lines, and delay writes on the lines until all shared copies expire. The size of library is independent of the number of cores. By removing the complex invalidation process of directorybased cache coherence protocols, LCC generates fewer network messages. At the same time, LCC also allows reads on a cache block to take place while a write to the block is being delayed, without breaking sequential consistency. As a result, LCC has 1.85X less average memory latency than a MESI directory-based protocol on our set of benchmarks, even with a simple timestamp choosing algorithm; moreover, our experimental results on LCC with an ideal timestamp scheme (though not implementable) show the potential of further improvement for LCC with more sophisticated timestamp schemes.

Srinivas Devadas | Omer Khan | Mieszko Lis | Myong Hyon Cho | Keun Sup Shim

[1] S. Borkar,et al. An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.

[2] Li Shang,et al. In-Network Cache Coherence , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[3] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[4] Shekhar Y. Borkar,et al. Thousand Core ChipsA Technology Perspective , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[5] David A. Wood,et al. Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[6] Anant Agarwal,et al. Directory-based cache coherence in large-scale multiprocessors , 1990, Computer.

[7] Mark Horowitz,et al. An evaluation of directory schemes for cache coherence , 1998, ISCA '98.

[8] George Kurian,et al. Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[9] Srinivas Devadas,et al. Scalable, accurate multicore simulation in the 1000-core era , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[10] Dean M. Tullsen,et al. Proximity-aware directory-based coherence for multi-core processor architectures , 2007, SPAA '07.

[11] Babak Falsafi,et al. Selective, accurate, and timely self-invalidation using last-touch prediction , 2000, ISCA '00.

[12] Aamer Jaleel,et al. Analyzing Parallel Programs with PIN , 2010, Computer.

[13] Henry Hoffmann,et al. On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[14] Vijayalakshmi Srinivasan,et al. A Tagless Coherence Directory , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15] Sang Lyul Min,et al. Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps , 1992, IEEE Trans. Parallel Distributed Syst..

[16] Rami G. Melhem,et al. A timestamp-based selective invalidation scheme for multiprocessor cache coherence , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[17] Sandhya Dwarkadas,et al. SPACE: Sharing pattern-based directory coherence for multicore scalability , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[18] Anoop Gupta,et al. Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes , 1990, ICPP.

[19] Jaehyuk Huh,et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.

[20] Vivek Sarkar,et al. Baring It All to Software: Raw Machines , 1997, Computer.