A mechanism for speculative memory accesses following synchronizing operations

In order to reduce the overhead of synchronizing operations of shared memory multiprocessors, this paper proposes a mechanism, named specMEM, to execute memory accesses following a synchronizing operation speculatively before the completion of the synchronization is confirmed. A unique feature of our mechanism is that the detection of speculation failure and the restoration of computational state on the failure are implemented by a small extension of coherent cache. It is also remarkable that operations for speculation on its success and failure are performed in a constant time for each independent of the number of speculative accesses. This is realized by implementing a part of cache tag for cache line state with a simple functional memory. This paper also describes an evaluation result of specMEM applied to barrier synchronization. Performance data was obtained by simulation running benchmark programs in SPLASH-2. We found that the execution time of LU decomposition, in which the length of period between a pair of barriers significantly varies because of the fluctuation of computational load, is improved by 13%.

[1]  T. N. Vijaykumar,et al.  Is SC + ILP = RC? , 1999, ISCA.

[2]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[3]  P. Stenstrom A survey of cache coherence schemes for multiprocessors , 1990, Computer.

[4]  M. Hill,et al.  Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[5]  Mark D. Hill,et al.  Weak ordering—a new definition , 1998, ISCA '98.

[6]  Babak Falsafi,et al.  Memory sharing predictor: the key to a speculative coherent DSM , 1999, ISCA.

[7]  Gurindar S. Sohi,et al.  Speculative versioning cache , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[8]  Keshav Pingali,et al.  I-structures: Data structures for parallel computing , 1986, Graph Reduction.

[9]  Brian N. Bershad,et al.  The Midway distributed shared memory system , 1993, Digest of Papers. Compcon Spring.

[10]  Henry G. Dietz,et al.  Hardware Barrier Synchronization: Static Barrier MIMD (SBM) , 1990, ICPP.

[11]  Keshav Pingali,et al.  I-structures: data structures for parallel computing , 1986, Graph Reduction.

[12]  Gurindar S. Sohi,et al.  ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.

[13]  Mark D. Hill,et al.  Using prediction to accelerate coherence protocols , 1998, ISCA.

[14]  Stefanos Kaxiras,et al.  Improving CC-NUMA performance using Instruction-based Prediction , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[15]  Michael D. Smith,et al.  Limits on multiple instruction issue , 1989, ASPLOS III.

[16]  Michael D. Smith,et al.  Boosting beyond static scheduling in a superscalar processor , 1990, ISCA '90.

[17]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.

[18]  Hiroshi Nakashima,et al.  The intelligent cache controller of a massively parallel processor JUMP-I , 1997, Proceedings Innovative Architecture for Future Generation High-Performance Processors and Systems.

[19]  James E. Smith,et al.  Dynamic instruction scheduling and the Astronautics ZS-1 , 1989, Computer.

[20]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.