Memory access buffering in multiprocessors

In highly-pipelined machines, instructions and data are prefetched and buffered in both the processor and the cache. This is done to reduce the average memory access latency and to take advantage of memory interleaving. Lock-up free caches are designed to avoid processor blocking on a cache miss. Write buffers are often included in a pipelined machine to avoid processor waiting on writes. In a shared memory multiprocessor, there are more advantages in buffering memory requests, since each memory access has to traverse the memory- processor interconnection and has to compete with memory requests issued by different processors. Buffering, however, can cause logical problems in multiprocessors. These problems are aggravated if each processor has a private memory in which shared writable data may be present, such as in a cache-based system or in a system with a distributed global memory. In this paper, we analyze the benefits and problems associated with the buffering of memory requests in shared memory multiprocessors. We show that the logical problem of buffering is directly related to the problem of synchronization. A simple model is presented to evaluate the performance improvement resulting from buffering.

[1]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[2]  William W. Collier,et al.  Reasoning about parallel architectures , 1992 .

[3]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.

[4]  David Kroft,et al.  Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[5]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[6]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[7]  Alan Jay Smith,et al.  Cache Memories , 1982, CSUR.

[8]  Michel Dubois,et al.  Access ordering and coherence in shared memory multiprocessors , 1989 .

[9]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[10]  Faye A. Briggs Effects of buffered memory requests in multiprocessor systems , 1979, SIGMETRICS Perform. Evaluation Rev..

[11]  Michel Dubois,et al.  Lockup-free Caches in High-Performance Multiprocessors , 1990, J. Parallel Distributed Comput..

[12]  Kai Hwang,et al.  Computer architecture and parallel processing , 1984, McGraw-Hill Series in computer organization and architecture.

[13]  M. Hill,et al.  Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[14]  Kai Hwang,et al.  Supercomputers - Design and Applications , 1984 .

[15]  Peter M. Kogge,et al.  The Architecture of Pipelined Computers , 1981 .

[16]  Michel Dubois,et al.  A New Approach for the Verification of Cache Coherence Protocols , 1995, IEEE Trans. Parallel Distributed Syst..

[17]  Michel Dubois,et al.  Effectiveness of Private Caches in Multiprocessor Systems with Parallel-Pipelined Memories , 1983, IEEE Transactions on Computers.

[18]  Michel Dubois,et al.  Effects of Cache Coherency in Multiprocessors , 1982, IEEE Transactions on Computers.

[19]  Michel Dubois,et al.  Trace-Driven Simulations of Parallel and Distributed Algorithms in Multiprocessors , 1986, International Conference on Parallel Processing.

[20]  Michel Dubois,et al.  RPM: A Rapid Prototyping Engine for Multiprocessor Systems , 1995, Computer.

[21]  Kai Hwang,et al.  Packet Switching Networks for Multiprocessors and Data Flow Computers , 1984, IEEE Transactions on Computers.

[22]  Michel Dubois,et al.  Memory Access Dependencies in Shared-Memory Multiprocessors , 1990, IEEE Trans. Software Eng..

[23]  Edward F. Gehringer,et al.  The Cm* Testbed , 1982, Computer.

[24]  Lars Philipson,et al.  A communication structure for a multiprocessor computer with distributed global memory , 1983, ISCA '83.

[25]  Michel Dubois,et al.  Correct memory operation of cache-based multiprocessors , 1987, ISCA '87.

[26]  H. T. Kung,et al.  Synchronized and asynchronous parallel algorithms for multiprocessors , 1976 .

[27]  Gregory R. Andrews,et al.  Concepts and Notations for Concurrent Programming , 1983, CSUR.