A tampering protocol for reducing the coherence transactions in regular computation

This paper proposes a tampering protocol for reducing the coherence transactions in the computations with regular communication patterns. This protocol is a subsidiary of the conventional cache-coherence protocol and is activated on a memory-block basis. If activated for a block, the exclusive copy of that block is frozen in the cache and is accessed (i.e., tampered) with no coherence transactions; otherwise, the coherency is maintained by the conventional protocol. Thus by activating the tampering protocol for the shared data of processes, the latency of communication between the processes reduces. As a by-product, the stream data are effectively implemented with the tampering protocol. The effects of the tampering protocol on the regular computations are evaluated by an RTL simulator of our multiprocessor. The result shows that the tampering protocol greatly improves the performance with a conventional protocol. Then the stream is effective for the process synchronization.

[1]  Robert J. Fowler,et al.  The implementation of a coherent memory abstraction on a NUMA multiprocessor: experiences with platinum , 1989, SOSP '89.

[2]  Mats Brorsson,et al.  An adaptive cache coherence protocol optimized for migratory sharing , 1993, ISCA '93.

[3]  Anoop Gupta,et al.  Hiding memory latency using dynamic scheduling in shared-memory multiprocessors , 1992, ISCA '92.

[4]  Frank Thomson Leighton Introduction to parallel algorithms and architectures: arrays , 1992 .

[5]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[6]  Alvin M. Despain,et al.  Multiprocessor cache synchronization: issues, innovations, evolution , 1986, ISCA '86.

[7]  Alvin M. Despain,et al.  Multiprocessor cache synchronization: issues, innovations, evolution , 1986, ISCA 1986.

[8]  Sajal K. Das,et al.  Book Review: Introduction to Parallel Algorithms and Architectures : Arrays, Trees, Hypercubes by F. T. Leighton (Morgan Kauffman Pub, 1992) , 1992, SIGA.

[9]  Donald Yeung,et al.  The MIT Alewife machine: architecture and performance , 1995, ISCA '98.

[10]  H. Grahn,et al.  Efficient strategies for software-only directory protocols in shared-memory multiprocessors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[11]  Jean-Loup Baer,et al.  A performance study of memory consistency models , 1992, ISCA '92.

[12]  Robert J. Fowler,et al.  Adaptive cache coherency for detecting migratory shared data , 1993, ISCA '93.

[13]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[14]  Anoop Gupta,et al.  The Stanford FLASH multiprocessor , 1994, ISCA '94.

[15]  Anant Agarwal,et al.  LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.

[16]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[17]  Per Stenström,et al.  A Survey of Cache Coherence Schemes for Multiprocessors , 1990, Computer.

[18]  Anoop Gupta,et al.  The Stanford FLASH Multiprocessor , 1994, ISCA.

[19]  Laxmi N. Bhuyan,et al.  Design of an Adaptive Cache Coherence Protocol for Large Scale Multiprocessors , 1992, IEEE Trans. Parallel Distributed Syst..

[20]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[21]  Alexander V. Veidenbaum,et al.  Compiler-directed cache management in multiprocessors , 1990, Computer.

[22]  Anoop Gupta,et al.  Performance evaluation of memory consistency models for shared-memory multiprocessors , 1991, ASPLOS IV.