A Cache Coherence Protocol Using Distributed Data Dependence Violation Checking in TLS

Current hardware implementations of TLS (thread-level speculation) in both Hydra and Renau's SESC simulator use a global component to check data dependence violations, e.g. L2 Cache or hardware list. Frequent memory accesses cause global component bottlenecks. Implementation and verification of the global component dramatically slows the processor's frequency. In this paper, we propose a cache coherence protocol using a distributed data dependence violation checking mechanism for TLS. The proposed protocol extends the current MESI cache coherence protocol by including several methods to exceed the present limits of centralized violation checking methods. In order not to broadcast every exposed write to the snooping bus, the protocol adds an invalidation vector to each private L1 cache to record threads that violate RAW data dependence. It also adds a versioning priority register that compares data versions. Added to each private L1 cache block is a snooping bit which indicates whether the thread possesses a bus snooping right for the block. The L1 Cache gets a bus snooping right when setting snooping bit. The L1 Cache catches exposed read miss whose address matching cache block address field. If a read miss from a remote core with a lower versioning priority, the L1 Cache updates the invalidation vector according to the core ID on the bus. If TLS runtime is going to commit or invalidate a thread, then L1 Cache invalidates threads whose bits have been set in the invalidation vector and changes any cache blocks to a corresponding non-speculative state. In order to implement the proposed protocol, we modified the SESC simulator, which is an open-source cycle-accurate simulator, to confirm its correctness and analyze its performance.

[1]  Wei Liu,et al.  Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation , 2005, ICS '05.

[2]  Todd C. Mowry,et al.  The Potential for Thread-level Data Speculation in Tightly-coupled Multiprocessors , 1997 .

[3]  Kunle Olukotun,et al.  The Stanford Hydra CMP , 2000, IEEE Micro.

[4]  Kunle Olukotun,et al.  Data speculation support for a chip multiprocessor , 1998, ASPLOS VIII.

[5]  John A. Gregory,et al.  Architectural Support for Thread-Level Data Speculation , 1997 .

[6]  Kunle Olukotun,et al.  Improving the performance of speculatively parallel applications on the Hydra CMP , 1999 .

[7]  Gurindar S. Sohi,et al.  Speculative versioning cache , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[8]  Antonia Zhai,et al.  The STAMPede approach to thread-level speculation , 2005, TOCS.

[9]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[10]  Antonia Zhai,et al.  Improving value communication for thread-level speculation , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[11]  Ian F. Akyildiz,et al.  Wireless sensor networks: a survey , 2002, Comput. Networks.

[12]  Wang Yan Multi-function partitioned imitation of curve data compression , 2002 .

[13]  Clark Verbrugge,et al.  Software Thread Level Speculation for the Java Language and Virtual Machine Environment , 2005, LCPC.

[14]  Antonia Zhai,et al.  A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[15]  Per Stenström,et al.  An All-Software Thread-Level Data Dependence Speculation System for Multiprocessors , 2001, J. Instr. Level Parallelism.