An Experimental Evaluation of Processor Pool-Based Scheduling for Shared-Memory NUMA Multiprocessors

In this paper we describe the design, implementation and experimental evaluation of a technique for operating system schedulers called processor pool-based scheduling [51]. Our technique is designed to assign processes (or kernel threads) of parallel applications to processors in multiprogrammed, shared-memory NUMA multiprocessors. The results of the experiments conducted in this research demonstrate that: 1) Pool-based scheduling is an effective method for localizing application execution and reducing mean response times. 2) Although application parallelism should be considered, the optimal pool size is a function of the the system architecture. 3) The strategies of placing new applications in a pool with the largest potential for inpool growth (i.e., the pool containing the fewest jobs) and of isolating applications from each other are desirable properties of algorithms for operating system schedulers executing on NUMA architectures. The “Worst-Fit” policy we examine incorporates both of these properties.

[1]  David L. Black Scheduling support for concurrency and parallelism in the Mach operating system , 1990, Computer.

[2]  T. Lovett,et al.  STiNG: A CC-NUMA Computer System for the Commercial Marketplace , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[3]  Larry Rudolph,et al.  Evaluation of Design Choices for Gang Scheduling Using Distributed Hierarchical Control , 1996, J. Parallel Distributed Comput..

[4]  Kenneth C. Sevcik,et al.  Coordinated allocation of memory and processors in multiprocessors , 1996, SIGMETRICS '96.

[5]  Carla Schlatter Ellis,et al.  The robustness of NUMA memory management , 1991, SOSP '91.

[6]  Raj Vaswani,et al.  A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors , 1993, TOCS.

[7]  Mark S. Squillante,et al.  Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling , 1993, IEEE Trans. Parallel Distributed Syst..

[8]  Benjamin Gamsa,et al.  Region-Oriented Main Memory Management in Shared-Memory NUMA Multiprocessors , 1992 .

[9]  Tim Brecht,et al.  On the importance of parallel application placement in NUMA multiprocessors , 1993 .

[10]  Tim Brecht,et al.  Multiprogrammed parallel application scheduling in NUMA multiprocessors , 1994 .

[11]  John K. Ousterhout Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.

[12]  Anoop Gupta,et al.  Operating system support for improving data locality on CC-NUMA compute servers , 1996, ASPLOS VII.

[13]  John Zahorjan,et al.  Scheduling memory constrained jobs on distributed memory parallel computers , 1995, SIGMETRICS '95/PERFORMANCE '95.

[14]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[15]  Anoop Gupta,et al.  Memory system performance of UNIX on CC-NUMA multiprocessors , 1995, SIGMETRICS '95/PERFORMANCE '95.

[16]  Ευαγγελοσ Μαρκατοσ SCHEDULING FOR LOCALITY IN SHARED-MEMORY MULTIPROCESSORS , 1993 .

[17]  ZahorjanJohn,et al.  A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors , 1993 .

[18]  Anoop Gupta,et al.  Process control and scheduling issues for multiprogrammed shared-memory multiprocessors , 1989, SOSP '89.

[19]  J. L. Hennessy,et al.  An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH multiprocessors , 1993, Supercomputing '93.

[20]  Evangelos P. Markatos,et al.  Using processor affinity in loop scheduling on shared-memory multiprocessors , 1992, Supercomputing '92.

[21]  Michel Dubois,et al.  Scalable Shared Memory Multiprocessors , 1992, Springer US.

[22]  Larry Rudolph,et al.  Mapping and Scheduling in a Shared Parallel Environment Using Distributed Hierarchical Control , 1990, ICPP.

[23]  Tim Brecht,et al.  Processor-pool-based scheduling for large-scale NUMA multiprocessors , 1991, SIGMETRICS '91.

[24]  Michael C. Browne,et al.  The S3.mp scalable shared memory multiprocessor , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[25]  Anoop Gupta,et al.  Scheduling and page migration for multiprocessor compute servers , 1994, ASPLOS VI.

[26]  R. Sarnath,et al.  Proceedings of the International Conference on Parallel Processing , 1992 .

[27]  Tim Brecht,et al.  Using Parallel Program Characteristics in Dynamic Processor Allocation Policies , 1996, Perform. Evaluation.

[28]  Robert J. Fowler,et al.  The implementation of a coherent memory abstraction on a NUMA multiprocessor: experiences with platinum , 1989, SOSP '89.

[29]  Anoop Gupta,et al.  Hive: fault containment for shared-memory multiprocessors , 1995, SOSP.

[30]  Evangelos P. Markatos Scheduling for locality in shared-memory multiprocessors , 1993 .

[31]  Anoop Gupta,et al.  The Stanford FLASH multiprocessor , 1994, ISCA '94.

[32]  John K. Ousterhout,et al.  Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[33]  Arif Ghafoor,et al.  Semi-Distributed Load Balancing For Massively Parallel Multicomputer Systems , 1991, IEEE Trans. Software Eng..

[34]  Michael Stumm,et al.  Hector: a hierarchically structured shared-memory multiprocessor , 1991, Computer.

[35]  Larry Rudolph,et al.  Distributed hierarchical control for parallel processing , 1990, Computer.

[36]  Mark S. Squillante Issues in Shared-Memory Multipro-cessor Scheduling: A Performance Evaluation , 1990 .

[37]  Ronald C. Unrau Scalable memory management through hierarchical symmetric multiprocessing , 1993 .

[38]  Robert J. Fowler,et al.  NUMA policies and their relation to memory architecture , 1991, ASPLOS IV.

[39]  Donald Yeung,et al.  THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR , 1991 .

[40]  Mark A. Holliday Reference history, page size, and migration daemons in local/remote architectures , 1989, ASPLOS 1989.

[41]  Abraham Silberschatz,et al.  Operating System Concepts , 1983 .

[42]  Mark S. Squillante,et al.  Analysis of the Impact of Memory in Distributed Parallel Processing Systems , 1994, SIGMETRICS.

[43]  Mary K. Vernon,et al.  The performance of multiprogrammed multiprocessor scheduling algorithms , 1990, SIGMETRICS '90.

[44]  Kai Li,et al.  Thread scheduling for cache locality , 1996, ASPLOS VII.

[45]  Harold S. Stone,et al.  Footprints in the cache , 1987, TOCS.

[46]  Mark A. Holliday,et al.  Reference history, page size, and migration daemons in local/remote architectures , 1989, ASPLOS III.

[47]  Sanjeev Setia,et al.  The Interaction between Memory Allocation and Adaptive Partitioning in Message-Passing Multicomputers , 1995, JSSPP.

[48]  Evangelos P. Markatos,et al.  Load Balancing vs. Locality Management in Shared-Memory Multiprocessors , 1992, ICPP.

[49]  Anoop Gupta,et al.  The DASH prototype: implementation and performance , 1992, ISCA '92.

[50]  Raj Vaswani,et al.  The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors , 1991, SOSP '91.

[51]  Anoop Gupta,et al.  The impact of operating system scheduling policies and synchronization methods of performance of parallel applications , 1991, SIGMETRICS '91.

[52]  Frank Bellosa,et al.  Locality Information Based Scheduling in Shared Memory Multiprocessors , 1996, JSSPP.

[53]  John Zahorjan,et al.  Processor scheduling in shared memory multiprocessors , 1990, SIGMETRICS '90.