Processor-pool-based scheduling for large-scale NUMA multiprocessors

Large-scale Non-Uniform Memory Access (NUMA) multiprocessors are gaining increased attention due to their potential for achieving high performance through the replication of relatively simple components. Because of the complexity of such systems, scheduling algorithms for parallel applications are crucial in realizing the performance potential of these systems. In particular, scheduling methods must consider the scale of the system, with the increased likelihood of creating bottlenecks, along with the NUMA characteristics of the system, and the benefits to be gained by placing threads close to their code and data.We propose a class of scheduling algorithms based on processor pools. A processor pool is a software construct for organizing and managing a large number of processors by dividing them into groups called pools. The parallel threads of a job are run in a single processor pool, unless there are performance advantages for a job to span multiple pools. Several jobs may share one pool. Our simulation experiments show that processor pool-based scheduling may effectively reduce the average job response time. The performance improvements attained by using processor pools increase with the average parallelism of the jobs, the load level of the system, the differentials in memory access costs, and the likelihood of having system bottlenecks. As the system size increasesr, while maintaining the workload composition and intensity, we observed that processor pools can be used to provide significant performance improvements. We therefore conclude that processor pool-based scheduling may be an effective and efficient technique for scalable systems.

[1]  Edward D. Lazowska,et al.  Speedup Versus Efficiency in Parallel Systems , 1989, IEEE Trans. Computers.

[2]  Larry Rudolph,et al.  Mapping and Scheduling in a Shared Parallel Environment Using Distributed Hierarchical Control , 1990, ICPP.

[3]  John K. Ousterhout Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.

[4]  Philip J. Woest,et al.  The Wisconsin multicube: a new large-scale cache-coherent multiprocessor , 1988, ISCA '88.

[5]  John K. Ousterhout,et al.  Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[6]  Mark S. Squillante,et al.  Analysis of Contention in Multiprocessor Scheduling , 1990, Performance.

[7]  Kenneth C. Sevcik Characterizations of parallelism in applications and their use in scheduling , 1989, SIGMETRICS '89.

[8]  John Zahorjan,et al.  Processor scheduling in shared memory multiprocessors , 1990, SIGMETRICS '90.

[9]  Shikharesh Majumdar,et al.  Scheduling in multiprogrammed parallel systems , 1988, SIGMETRICS '88.

[10]  Carla Schlatter Ellis,et al.  Experimental comparison of memory management policies for NUMA multiprocessors , 1991, TOCS.

[11]  Mosur Ravishankar,et al.  PLUS: a distributed shared-memory system , 1990, ISCA '90.

[12]  M NiLionel,et al.  Design Tradeoffs for Process Scheduling in Shared Memory Multiprocessor Systems , 1989 .

[13]  David L. Black Scheduling support for concurrency and parallelism in the Mach operating system , 1990, Computer.

[14]  Daniel Gajski,et al.  CEDAR: a large scale multiprocessor , 1983, CARN.

[15]  Hendrik A. Goosen,et al.  Multi-level Shared Caching Techniques For Scalability In VMP-MC , 1989, The 16th Annual International Symposium on Computer Architecture.

[16]  Mary K. Vernon,et al.  The performance of multiprogrammed multiprocessor scheduling algorithms , 1990, SIGMETRICS '90.

[17]  D. R. Cheriton,et al.  Multi-level shared caching techniques for scalability in VMP-M/C , 1989, ISCA '89.

[18]  Lionel M. Ni,et al.  Design Tradeoffs for Process Scheduling in Shared Memory Multiprocessor Systems , 1989, IEEE Trans. Software Eng..

[19]  Anoop Gupta,et al.  Process control and scheduling issues for multiprogrammed shared-memory multiprocessors , 1989, SOSP '89.

[20]  Michael Stumm,et al.  Hector: a hierarchically structured shared-memory multiprocessor , 1991, Computer.

[21]  Kevin P. McAuliffe,et al.  The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.

[22]  Alexander V. Veidenbaum,et al.  Compiler-directed cache management in multiprocessors , 1990, Computer.