NUMA policies and their relation to memory architecture

Multiprocessor memory reference traces provide a wealth of information on the behavior of parallel programs. We have used this information to explore the relationship between kernel-based NUMA management policies and multiprocessor memory architecture. Our trace analysis techniques employ an off-line, optimal cost policy as a baseline against which to compare on-line policies, and as a policyinsensitive tool for evaluating architectural design alternatives. We compare the performance of our optimal policy with that of three implementable policies (two of which appear in previous work), on a variety of applications, with varying relative speeds for page moves and local, global, and remote memory references. Our results indicate that a good NUMA policy must be chosen to match its machine, and confirm that such policies can be both simple and effective. They also indicate that programs for NUMA machines must be written with care to obtain the best performance.

[1]  Mark A. Holliday,et al.  Reference history, page size, and migration daemons in local/remote architectures , 1989, ASPLOS III.

[2]  James R. Larus,et al.  Abstract execution: A technique for efficiently tracing programs , 1990, Softw. Pract. Exp..

[3]  Robert J. Fowler,et al.  The implementation of a coherent memory abstraction on a NUMA multiprocessor: experiences with platinum , 1989, SOSP '89.

[4]  Willy Zwaenepoel,et al.  Adaptive software cache management for distributed shared memory architectures , 1990, ISCA '90.

[5]  W. Kent Fuchs,et al.  TRAPEDS: producing traces for multicomputers via execution driven simulation , 1989, SIGMETRICS '89.

[6]  Abhinav Gupta,et al.  Analysis of cache invalidation patterns in multiprocessors , 1989, ASPLOS 1989.

[7]  A M Holliday On the Effectiveness of Dynamic Page Placement , 1989 .

[8]  J. Ortega,et al.  Solution of Partial Differential Equations on Vector and Parallel Computers , 1987 .

[9]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[10]  David L. Black,et al.  The mach exception handling facility , 1988, PADD '88.

[11]  Carla Schlatter Ellis,et al.  Experimental comparison of memory management policies for NUMA multiprocessors , 1991, TOCS.

[12]  Anoop Gupta,et al.  Competitive management of distributed shared memory , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[13]  William J. Bolosky,et al.  Mach: A New Kernel Foundation for UNIX Development , 1986, USENIX Summer.

[14]  Brian N. Bershad,et al.  PRESTO: A system for object‐oriented parallel programming , 1988, Softw. Pract. Exp..

[15]  David L. Black,et al.  Competitive algorithms for replication and migration problems , 1989 .

[16]  Michael L. Scott,et al.  Simple but effective techniques for NUMA memory management , 1989, SOSP '89.

[17]  Sandra Johnson Baylor,et al.  An Evaluation of the Memory Reference Behavior of Engineering/Scientific Applications in Parallel Systems , 1989, Int. J. High Speed Comput..

[18]  Kevin P. McAuliffe,et al.  The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.

[19]  Mark A. Holliday Reference history, page size, and migration daemons in local/remote architectures , 1989, ASPLOS 1989.