Reconfigurable MPB combined with cache coherence protocol in many-core

Data sharing is an important problem for manycore processor. Penalties of cache misses increase heavily on a larger NoC. In order to address the problem of coherence wall, message passing is introduced into manycore processors. Different from massive parallel processing (MPP) system, manycore is much more sensitive with on-chip storage. In this paper, we propose a reconfigurable cache system, which could reconfigure cache lines into message passing buffers (MPBs) as is needed. In this way, we could improve utilization of on-chip storage. The penalty of hardware design is low because most of functions are reused from structure and state machine in the original cache coherence protocol. This mechanism could be used in any cache protocols with MOESI state machines. Comparing with separated MPB having 5.26% overhead in hardware costs, simulation shows that RMCC has 11.4% improvements in overall performance. At the meantime, RMCC without 5.26% overhead in SRAMs has the same performance with the separated MPB mechanism.

[1]  Axel Jantsch,et al.  A network on chip architecture and design methodology , 2002, Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002.

[2]  Saurabh Dighe,et al.  The 48-core SCC Processor: the Programmer's View , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  Coniferous softwood GENERAL TERMS , 2003 .

[4]  Alberto Ros,et al.  Cache Coherence Protocols for Many-Core CMPs , 2010 .

[5]  Xiaoya Fan,et al.  DLWAP-buffer: A Novel HW/SW Architecture to Alleviate the Cache Coherence on Streaming-like Data in CMP , 2012, 2012 IEEE 6th International Symposium on Embedded Multicore SoCs.

[6]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[7]  Alberto Ros,et al.  DiCo-CMP: Efficient cache coherency in tiled CMP architectures , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[8]  Shekhar Borkar Thousand Core ChipsA Technology Perspective , 2007, DAC 2007.

[9]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.