Lazy home migration for distributed shared memory systems

In a distributed shared memory system, each memory page is associated with a home node that maintains the directory state for cache lines within that page. Memory access patterns and home node locations have a strong influence on performance, especially if remote communication is costly. Since access patterns are difficult to predict and may change dynamically, it is useful to dynamically migrate home nodes to reduce the amount of remote communication. The paper presents a new and efficient algorithm for migrating home nodes in distributed shared memory systems. Unlike previous page migration algorithms, our algorithm avoids global coordination. Allowing the system to be more responsive to changing workloads. We verify the algorithm's correctness with the Mur/spl sigma/ protocol verification tool. We explore several policies for deciding when and where to migrate home nodes. Trace driven simulations of several SPLASH-2 benchmarks show that our home migration algorithm and policies can reduce the amount of remote communication by 50%. The results also emphasize the importance of minimizing the cost of migration.

[1]  Josep Torrellas,et al.  The Augmint multiprocessor simulation toolkit for Intel x86 architectures , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.

[2]  Alan J. Hu,et al.  Protocol verification as a hardware design aid , 1992, Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computers & Processors.

[3]  Anoop Gupta,et al.  Competitive management of distributed shared memory , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[4]  Anoop Gupta,et al.  Operating system support for improving data locality on CC-NUMA compute servers , 1996, ASPLOS VII.

[5]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[6]  Robert J. Fowler,et al.  The implementation of a coherent memory abstraction on a NUMA multiprocessor: experiences with platinum , 1989, SOSP '89.

[7]  John B. Carter,et al.  An argument for simple COMA , 1995, Future Gener. Comput. Syst..

[8]  Monica S. Lam,et al.  The design and evaluation of a shared object system for distributed memory machines , 1994, OSDI '94.

[9]  Michael C. Browne,et al.  The S3.mp scalable shared memory multiprocessor , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[10]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.