Design and Implementation of Dynamically Reconfigurable Token Coherence Protocol for Many-Core Processor

To efficiently maintain cache coherence in a many-core processor remains a big challenge today. Traditional protocols either offer low cache miss latency (like snoopy protocol) or not depending on bus-like interconnects (like directory protocol). Recently, Token Coherence has been proposed to capture the main characteristic of traditional protocols. However, since Token Coherence relies on broadcast-based transient request and inefficient persistent request, it is only suitable for small system. In order to make Token Coherence be scalable in many-core architectures, in this paper we introduce a dynamically reconfigurable mechanism to Token Coherence. Basing on sub-net, this mechanism can significantly reduce the average execution time and communication cost in 16-core processor. Therefore, this dynamically reconfigurable mechanism makes Token Coherence applicable in many-core architecture.

[1]  Antonio Robles,et al.  Switch-Based Packing Technique for Improving Token Coherence Scalability , 2008, 2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies.

[2]  Antonio Robles,et al.  Efficient and Scalable Starvation Prevention Mechanism for Token Coherence , 2011, IEEE Transactions on Parallel and Distributed Systems.

[3]  Kunle Olukotun,et al.  A Single-Chip Multiprocessor , 1997, Computer.

[4]  Jung-Hsien Chiang,et al.  Neural and Fuzzy Methods in Handwriting Recognition , 1997, Computer.

[5]  Alberto Ros,et al.  Cache Coherence Protocols for Many-Core CMPs , 2010 .

[6]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[7]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[8]  Antonio Robles,et al.  Improving Token Coherence by Multicast Coherence Messages , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[9]  Rakesh Kumar,et al.  The Case for Message Passing on Many-Core Chips , 2011, Multiprocessor System-on-Chip.

[10]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[11]  Remzi H. Arpaci-Dusseau,et al.  Architectural Requirements and Scalability of the NAS Parallel Benchmarks , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[12]  Milo M. K. Martin,et al.  Token Coherence: decoupling performance and correctness , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[13]  Jun Yang,et al.  A composite and scalable cache coherence protocol for large scale CMPs , 2011, ICS '11.