The implementation of a coherent memory abstraction on a NUMA multiprocessor: experiences with platinum

PLATINUM is an operating system kernel with a novel memory management system for Non-Uniform Memory Access (NUMA) multiprocessor architectures. This memory management system implements a coherent memory abstraction. Coherent memory is uniformly accessible from all processors in the system. When used by applications coded with appropriate programming styles it appears to be nearly as fast as local physical memory and it reduces memory contention. Coherent memory makes programming NUMA multiprocessors easier for the user while attaining a level of performance comparable with hand-tuned programs. This paper describes the design and implementation of the PLATINUM memory management system, emphasizing the coherent memory. We measure the cost of basic operations implementing the coherent memory. We also measure the performance of a set of application programs running on PLATINUM. Finally, we comment on the interaction between architecture and the coherent memory system. PLATINUM currently runs on the BBN Butterfly Plus Multiprocessor.

[1]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[2]  Michel Dubois,et al.  Dynamic Page Migration in Multiprocessors with Distributed Global Memory , 1989, IEEE Trans. Computers.

[3]  David L. Black,et al.  Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures , 1987, IEEE Trans. Computers.

[4]  Robert J. Fowler,et al.  An integrated approach to parallel program debugging and performance analysis onlarge-scale multiprocessors , 1988, PADD '88.

[5]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS '88.

[6]  David L. Black,et al.  The duality of memory and communication in the implementation of a multiprocessor operating system , 1987, SOSP '87.

[7]  Lawrence Snyder,et al.  Type architectures, shared memory, and the corollary of modest potential , 1986 .

[8]  Anoop Gupta,et al.  The VMP multiprocessor: initial experience, refinements, and performance evaluation , 1988, ISCA '88.

[9]  Thomas J. LeBlanc Shared Memory Versus Message-Passing in a Tightly-Coupled Multiprocessor: A Case Study , 1986, ICPP.

[10]  David R. Cheriton,et al.  Software-controlled caches in the VMP multiprocessor , 1986, ISCA 1986.

[11]  Alan L. Cox,et al.  An Overview of PLATINUM A Platform for Investigating Non-Uniform Memory (Preliminary Version) , 1988 .

[12]  Lawrence A. Crowl A uniform object model for parallel programming , 1989, ACM SIGPLAN Notices.

[13]  Mark A. Holliday,et al.  Reference history, page size, and migration daemons in local/remote architectures , 1989, ASPLOS III.

[14]  T.J. LeBlanc,et al.  Structured message passing on a shared-memory multiprocessor , 1988, [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume II: Software track.

[15]  Alan L. Cox,et al.  An Empirical Study of Message-Passing Overhead , 1986 .

[16]  Kai Li,et al.  Shared virtual memory on loosely coupled multiprocessors , 1986 .

[17]  Michael L. Scott,et al.  Memory Management for Large-Scale NUMA Multiprocessors , 1989 .

[18]  L. A. Crowl A uniform object model for parallel programming , 1988, OOPSLA/ECOOP '88.

[19]  M Mellor-CrummeyJohn,et al.  An integrated approach to parallel program debugging and performance analysis onlarge-scale multiprocessors , 1988 .

[20]  Robert H. Thomas,et al.  Performance Measurements on a 128-Node Butterfly Parallel Processor , 1985, ICPP.

[21]  Anoop Gupta,et al.  Competitive management of distributed shared memory , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[22]  Patrice Y. Simard,et al.  Analysis of Recurrent Backpropagation , 1988 .

[23]  Michael L. Scott,et al.  Simple but effective techniques for NUMA memory management , 1989, SOSP '89.

[24]  David L. Black,et al.  Translation lookaside buffer consistency: a software approach , 1989, ASPLOS III.

[25]  Andrew P. Black,et al.  Fine-grained mobility in the Emerald system , 1987, TOCS.

[26]  J. K. Archibald The cache coherence problem in shared-memory multiprocessors , 1987 .

[27]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[28]  Mark A. Holliday,et al.  Page table management in local/remote architectures , 1988, ICS '88.

[29]  David L. Black,et al.  Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures , 1987, ASPLOS 1987.

[30]  Thomas J. LeBlanc,et al.  Problem Decomposition and Communication Tradeoffs in a Shared-Memory Multiprocessor , 1988 .

[31]  David R. Cheriton,et al.  Software-Controlled Caches in the VMP Multiprocessor , 1986, ISCA.

[32]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .