Scheduling and page migration for multiprocessor compute servers

Several cache-coherent shared-memory multiprocessors have been developed that are scalable and offer a very tight coupling between the processing resources. They are therefore quite attractive for use as compute servers for multiprogramming and parallel application workloads. Process scheduling and memory management, however, remain challenging due to the distributed main memory found on such machines. This paper examines the effects of OS scheduling and page migration policies on the performance of such compute servers. Our experiments are done on the Stanford DASH, a distributed-memory cache-coherent multiprocessor. We show that for our multiprogramming workloads consisting of sequential jobs, the traditional Unix scheduling policy does very poorly. In contrast, a policy incorporating cluster and cache affinity along with a simple page-migration algorithm offers up to two-fold performance improvement. For our workloads consisting of multiple parallel applications, we compare space-sharing policies that divide the processors among the applications to time-slicing policies such as standard Unix or gang scheduling. We show that space-sharing policies can achieve better processor utilization due to the operating point effect, but time-slicing policies benefit strongly from user-level data distribution. Our initial experience with automatic page migration suggests that policies based only on TLB miss information can be quite effective, and useful for addressing the data distribution problems of space-sharing schedulers.

[1]  John K. Ousterhout,et al.  Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[2]  Sturgis,et al.  Proceedings of the 49th International Conference on Parallel Processing , 1988 .

[3]  Shreekant S. Thakkar,et al.  The Symmetry Multiprocessor System , 1988, ICPP.

[4]  Anoop Gupta,et al.  Competitive management of distributed shared memory , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[5]  Anoop Gupta,et al.  Process control and scheduling issues for multiprogrammed shared-memory multiprocessors , 1989, SOSP '89.

[6]  Mary K. Vernon,et al.  The performance of multiprogrammed multiprocessor scheduling algorithms , 1990, SIGMETRICS '90.

[7]  Larry Rudolph,et al.  Distributed hierarchical control for parallel processing , 1990, Computer.

[8]  David L. Black Scheduling support for concurrency and parallelism in the Mach operating system , 1990, Computer.

[9]  The Performance of Multiprogrammed Multiprocessor Scheduling Policies , 1990, SIGMETRICS.

[10]  Anoop Gupta,et al.  Making effective use of shared-memory multiprocessors: the process control approach , 1991 .

[11]  Evangelos P. Markatos,et al.  Multiprogramming on multiprocessors , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[12]  Carla Schlatter Ellis,et al.  Experimental comparison of memory management policies for NUMA multiprocessors , 1991, TOCS.

[13]  Robert J. Fowler,et al.  NUMA policies and their relation to memory architecture , 1991, ASPLOS IV.

[14]  Mark S. Squillante,et al.  Analysis of task migration in shared-memory multiprocessor scheduling , 1991, SIGMETRICS '91.

[15]  Anoop Gupta,et al.  The impact of operating system scheduling policies and synchronization methods of performance of parallel applications , 1991, SIGMETRICS '91.

[16]  Raj Vaswani,et al.  The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors , 1991, SOSP '91.

[17]  Brian N. Bershad,et al.  Scheduler activations: effective kernel support for the user-level management of parallelism , 1991, TOCS.

[18]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[19]  Anoop Gupta,et al.  Comparative performance evaluation of cache-coherent NUMA and COMA architectures , 1992, ISCA '92.

[20]  Anoop Gupta,et al.  The DASH prototype: implementation and performance , 1992, ISCA '92.

[21]  Murthy V. Devarakonda,et al.  Issues in implementation of cache-affinity scheduling , 1992 .

[22]  Mark S. Squillante,et al.  Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling , 1993, IEEE Trans. Parallel Distributed Syst..

[23]  Josep Torrellas,et al.  Benefits of cache-affinity scheduling in shared-memory multiprocessors: a summary , 1993, SIGMETRICS '93.

[24]  Anoop Gupta,et al.  COOL: An object-based language for parallel programming , 1994, Computer.

[25]  Andrew Gilliam Tucker,et al.  Efficient Scheduling on Multiprogrammed Shared-Memory Multiprocessors , 1994 .