BFDir: A Space-Efficient Coherence Directory Based on Bloom Filter

Directory-based coherence is widely used in modern CMP systems. As the number of cores increases, it is increasingly deemed as the only candidate for on-chip cache coherence maintaining. However, limitations of traditional coherence directory pose serious challenges to deal with the ever-increasing size of the system. The hardware overhead and redundant message broadcasting problems dramatically degrade the scalability and performance of the system. In this paper, a space-efficient coherence directory BFDir is proposed. The directory dramatically reduces the directory size as the share list is shortened by Bloom filter. Also, it does not incur message broadcasting as that in limited directories. The evaluation results show, for 32-core CMP systems, compared to full-map directory, 59% overhead of share list can be avoided at the expense of 2.77% performance loss on average; compared to 16-bit coarse directory, 22% overhead of share list can be avoided at the expense of 0.16% average performance loss on average; compared to 8-bit coarse directory, 48% invalid messages are saved and the performance is improved by 2.31%.

[1]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[2]  Mark Horowitz,et al.  An evaluation of directory schemes for cache coherence , 1998, ISCA '98.

[3]  Deyuan Gao,et al.  A survey on cache coherence for tiled many-core processor , 2012, 2012 IEEE International Conference on Signal Processing, Communication and Computing (ICSPCC 2012).

[4]  Sandhya Dwarkadas,et al.  SPACE: Sharing pattern-based directory coherence for multicore scalability , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[5]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[6]  Anant Agarwal,et al.  Directory-based cache coherence in large-scale multiprocessors , 1990, Computer.

[7]  Eric M. Schwarz,et al.  IBM POWER6 microarchitecture , 2007, IBM J. Res. Dev..

[8]  Michel Dubois,et al.  Scalable shared-memory multiprocessor architectures , 1990, Computer.

[9]  Peng Liu,et al.  Building expressive, area-efficient coherence directories , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[10]  Mineo Takai,et al.  Parssec: A Parallel Simulation Environment for Complex Systems , 1998, Computer.

[11]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[12]  Vijayalakshmi Srinivasan,et al.  SPATL: Honey, I Shrunk the Coherence Directory , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[13]  Michael J. Flynn,et al.  Linked list cache coherence for scalable shared memory multiprocessors , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[14]  Vijayalakshmi Srinivasan,et al.  A Tagless Coherence Directory , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[16]  David B. Gustavson,et al.  Scalable Coherent Interface , 1990, COMPEURO'90: Proceedings of the 1990 IEEE International Conference on Computer Systems and Software Engineering@m_Systems Engineering Aspects of Complex Computerized Systems.