Performance impact of run queue organization and synchronization on large-scale NUMA multiprocessor systems

Abstract The goal of this paper is to study the impact of run queue organization on the performance of synchronization methods in multiprocessor systems. Two run queue organizations are considered: distributed and hierarchical organizations. The performance impact of spinning and blocking synchronization methods on these two run queue organizations is studied. We use two canonical workload types that require task synchronization: lock accessing and barrier synchronization workloads. The results presented here show that, when fine grain synchronization is required, the distributed organization is better. However, for large granularity tasks, the performance of the distributed organization is unacceptable and the hierarchical organization should be used. Note that the distributed organization is embedded into the hierarchical organization. Thus, for coarse granularity parallel applications, the hierarchical organization with its load sharing feature can be used; for fine-granularity parallel applications, the hierarchy of queues can be circumvented and the round robin task assignment can be done on processor's local queues as in the distributed organization. Therefore, the hierarchical organization is useful in general-purpose large-scale shared-memory multiprocessors.

[1]  Sivarama P. Dandamudi Performance implications of task routing and task scheduling strategies for multiprocessor systems , 1994, Proceedings of the First International Conference on Massively Parallel Computing Systems (MPCS) The Challenges of General-Purpose and Special-Purpose Computing.

[2]  Thomas E. Anderson,et al.  The performance implications of thread management alternatives for shared-memory multiprocessors , 1989, SIGMETRICS '89.

[3]  Mark S. Squillante,et al.  Analysis of Contention in Multiprocessor Scheduling , 1990, Performance.

[4]  Brian N. Bershad,et al.  PRESTO: A system for object‐oriented parallel programming , 1988, Softw. Pract. Exp..

[5]  Andrzej Duda On the Tradeoff Between Parallelism and Communication , 1989 .

[6]  Sivarama P. Dandamudi A comparison of task scheduling strategies for multiprocessor systems , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[7]  Mary K. Vernon,et al.  The performance of multiprogrammed multiprocessor scheduling algorithms , 1990, SIGMETRICS '90.

[8]  David L. Black Scheduling support for concurrency and parallelism in the Mach operating system , 1990, Computer.

[9]  Daniel Gajski,et al.  CEDAR: a large scale multiprocessor , 1983, CARN.

[10]  Rainer Hoch,et al.  From paper to office document standard representation , 1992, Computer.

[11]  Lionel M. Ni,et al.  Resource Contention in Shared-Memory Multiprocessors: A Parameterized Performance Degradation Model , 1991, J. Parallel Distributed Comput..

[12]  Anoop Gupta,et al.  The Stanford Dash multiprocessor , 1992, Computer.

[13]  Dennis Gannon,et al.  Object-oriented parallel programming , 1995, International Conference on Software Composition.

[14]  D J Evans,et al.  Parallel processing , 1986 .

[15]  Tim Brecht,et al.  Processor-pool-based scheduling for large-scale NUMA multiprocessors , 1991, SIGMETRICS '91.

[16]  Sivarama P. Dandamudi,et al.  A Hierarchical Task Queue Organization for Shared-Memory Multiprocessor Systems , 1995, IEEE Trans. Parallel Distributed Syst..

[17]  Evangelos P. Markatos,et al.  The effects of multiprogramming on barrier synchronization , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[18]  Lawrence C. Stewart,et al.  Firefly: a multiprocessor workstation , 1987, ASPLOS 1987.

[19]  Anna R. Karlin,et al.  Empirical studies of competitve spinning for a shared-memory multiprocessor , 1991, SOSP '91.

[20]  Edward D. Lazowska,et al.  The Effect of Scheduling Discipline on Spin Overhead in Shared Memory Parallel Systems , 1991, IEEE Trans. Parallel Distributed Syst..

[21]  Lionel M. Ni,et al.  Design Tradeoffs for Process Scheduling in Shared Memory Multiprocessor Systems , 1989, IEEE Trans. Software Eng..

[22]  Mark S. Squillante,et al.  Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling , 1993, IEEE Trans. Parallel Distributed Syst..

[23]  Larry Rudolph,et al.  Gang Scheduling Performance Benefits for Fine-Grain Synchronization , 1992, J. Parallel Distributed Comput..

[24]  Nian-Feng Tzeng,et al.  Distributing Hot-Spot Addressing in Large-Scale Multiprocessors , 1987, IEEE Transactions on Computers.

[25]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[26]  John Zahorjan,et al.  Processor scheduling in shared memory multiprocessors , 1990, SIGMETRICS '90.

[27]  Shreekant S. Thakkar,et al.  The Symmetry Multiprocessor System , 1988, ICPP.

[28]  Thomas E. Anderson,et al.  The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[29]  P. Jones,et al.  Practical Experience of Run-Time Link Reconfiguration in a Multi-Transputer Machine , 1990, Concurr. Pract. Exp..

[30]  Raj Vaswani,et al.  The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors , 1991, SOSP '91.

[31]  Sivarama P. Dandamudi,et al.  Scheduling in parallel systems with a hierarchical organization of tasks , 1992, ICS '92.

[32]  Brian N. Bershad,et al.  Scheduler activations: effective kernel support for the user-level management of parallelism , 1991, TOCS.

[33]  Gregory F. Pfister,et al.  “Hot spot” contention and combining in multistage interconnection networks , 1985, IEEE Transactions on Computers.

[34]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[35]  Shikharesh Majumdar,et al.  Scheduling in multiprogrammed parallel systems , 1988, SIGMETRICS 1988.