论文信息 - A mechanism for speculative memory accesses following synchronizing operations

A mechanism for speculative memory accesses following synchronizing operations

In order to reduce the overhead of synchronizing operations of shared memory multiprocessors, this paper proposes a mechanism, named specMEM, to execute memory accesses following a synchronizing operation speculatively before the completion of the synchronization is confirmed. A unique feature of our mechanism is that the detection of speculation failure and the restoration of computational state on the failure are implemented by a small extension of coherent cache. It is also remarkable that operations for speculation on its success and failure are performed in a constant time for each independent of the number of speculative accesses. This is realized by implementing a part of cache tag for cache line state with a simple functional memory. This paper also describes an evaluation result of specMEM applied to barrier synchronization. Performance data was obtained by simulation running benchmark programs in SPLASH-2. We found that the execution time of LU decomposition, in which the length of period between a pair of barriers significantly varies because of the fluctuation of computational load, is improved by 13%.

Hiroshi Nakashima | Takayuki Sato | Kazuhiko Ohno

[1] T. N. Vijaykumar,et al. Is SC + ILP = RC? , 1999, ISCA.

[2] Leslie Lamport,et al. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[3] P. Stenstrom. A survey of cache coherence schemes for multiprocessors , 1990, Computer.

[4] M. Hill,et al. Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[5] Mark D. Hill,et al. Weak ordering—a new definition , 1998, ISCA '98.

[6] Babak Falsafi,et al. Memory sharing predictor: the key to a speculative coherent DSM , 1999, ISCA.

[7] Gurindar S. Sohi,et al. Speculative versioning cache , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[8] Keshav Pingali,et al. I-structures: Data structures for parallel computing , 1986, Graph Reduction.

[9] Brian N. Bershad,et al. The Midway distributed shared memory system , 1993, Digest of Papers. Compcon Spring.

[10] Henry G. Dietz,et al. Hardware Barrier Synchronization: Static Barrier MIMD (SBM) , 1990, ICPP.

[11] Keshav Pingali,et al. I-structures: data structures for parallel computing , 1986, Graph Reduction.

[12] Gurindar S. Sohi,et al. ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.

[13] Mark D. Hill,et al. Using prediction to accelerate coherence protocols , 1998, ISCA.

[14] Stefanos Kaxiras,et al. Improving CC-NUMA performance using Instruction-based Prediction , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[15] Michael D. Smith,et al. Limits on multiple instruction issue , 1989, ASPLOS III.

[16] Michael D. Smith,et al. Boosting beyond static scheduling in a superscalar processor , 1990, ISCA '90.

[17] Anoop Gupta,et al. Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.

[18] Hiroshi Nakashima,et al. The intelligent cache controller of a massively parallel processor JUMP-I , 1997, Proceedings Innovative Architecture for Future Generation High-Performance Processors and Systems.

[19] James E. Smith,et al. Dynamic instruction scheduling and the Astronautics ZS-1 , 1989, Computer.

[20] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.