Scheduler-conscious synchronization

Efficient synchronization is important for achieving good performance in parallel programs, especially on large-scale multiprocessors. Most synchronization algorithms have been designed to run on a dedicated machine, with one application process per processor, and can suffer serious performance degradation in the presence of multiprogramming. Problems arise when running processes block or, worse, busy-wait for action on the part of a process that the scheduler has chosen not to run. We show that these problems are particularly severe for scalable synchronization algorithms based on distributed data structures. We then describe and evaluate a set of algorithms that perform well in the presence of multiprogramming while maintaining good performance on dedicated machines. We consider both large and small machines, with a particular focus on scalability, and examine mutual-exclusion locks, reader-writer locks, and barriers. Our algorithms vary in the degree of support required from the kernel scheduler. We find that while it is possible to avoid pathological performance problems using previously proposed kernel mechanisms, a modest additional widening of the kernel/user interface can make scheduler-conscious synchronization algorithms significantly simpler and faster, with performance on dedicated machines comparable to that of scheduler-oblivious algorithms.

[1]  Leonidas I. Kontothanassis,et al.  Using scheduler information to achieve optimal barrier synchronization performance , 1993, PPOPP '93.

[2]  Michael L. Scott,et al.  Multi-model parallel programming in psyche , 1990, PPOPP '90.

[3]  Stein Gjessing,et al.  Hardware support for synchronization in the Scalable Coherent Interface (SCI) , 1994, Proceedings of 8th International Parallel Processing Symposium.

[4]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[5]  C. A. Lee Barrier synchronization over multistage interconnection networks , 1990, Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing 1990.

[6]  Anna R. Karlin,et al.  Empirical studies of competitve spinning for a shared-memory multiprocessor , 1991, SOSP '91.

[7]  John K. Ousterhout,et al.  Medusa: An experiment in distributed operating system structure (Summary) , 1979, SOSP '79.

[8]  Evangelos P. Markatos,et al.  The effects of multiprogramming on barrier synchronization , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[9]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[10]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[11]  Anant Agarwal,et al.  Waiting algorithms for synchronization in large-scale multiprocessors , 1993, TOCS.

[12]  Mary K. Vernon,et al.  The performance of multiprogrammed multiprocessor scheduling algorithms , 1990, SIGMETRICS '90.

[13]  Per Brinch Hansen,et al.  Distributed processes: a concurrent programming concept , 1978, CACM.

[14]  Michael Stumm,et al.  A Fair Fast Scalable Rea,der-Writer Lock , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[15]  Anoop Gupta,et al.  The Stanford Dash multiprocessor , 1992, Computer.

[16]  Hiroaki Takada,et al.  Predictable spin lock algorithms with preemption , 1994, Proceedings of 11th IEEE Workshop on Real-Time Operating Systems and Software.

[17]  Shreekant S. Thakkar,et al.  Synchronization algorithms for shared-memory multiprocessors , 1990, Computer.

[18]  James H. Anderson,et al.  Fast, Scalable Synchronization with Minimal Hardware Support (Extended Abstract). , 1993, PODC 1993.

[19]  Michael L. Scott,et al.  Contention-free combining tree barriers , 1994 .

[20]  Anoop Gupta,et al.  The impact of operating system scheduling policies and synchronization methods of performance of parallel applications , 1991, SIGMETRICS '91.

[21]  Maurice Herlihy,et al.  A methodology for implementing highly concurrent data objects , 1993, TOPL.

[22]  Travis S. Craig Queuing spin lock algorithms to support timing predictability , 1993, 1993 Proceedings Real-Time Systems Symposium.

[23]  Evangelos P. Markatos,et al.  First-class user-level threads , 1991, SOSP '91.

[24]  Erik Hagersten,et al.  Queue locks on cache coherent multiprocessors , 1994, Proceedings of 8th International Parallel Processing Symposium.

[25]  John K. Ousterhout,et al.  Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[26]  Michael L. Scott,et al.  Scalable spin locks for multiprogrammed systems , 1994, Proceedings of 8th International Parallel Processing Symposium.

[27]  Anant Agarwal,et al.  Integrating message-passing and shared-memory: early experience , 1993, PPOPP '93.

[28]  Brian N. Bershad,et al.  Scheduler activations: effective kernel support for the user-level management of parallelism , 1991, TOCS.

[29]  Michael L. Scott,et al.  Scalable reader-writer synchronization for shared-memory multiprocessors , 1991, PPOPP '91.

[30]  James H. Anderson,et al.  Fast, scalable synchronization with minimal hardware support , 1993, PODC '93.

[31]  Timothy S. Axelrod,et al.  Effects of synchronization barriers on multiprocessor performance , 1986, Parallel Comput..

[32]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[33]  David L. Black Scheduling support for concurrency and parallelism in the Mach operating system , 1990, Computer.

[34]  David K. Gifford,et al.  Remote evaluation , 1990, TOPL.

[35]  L ScottMichael,et al.  Scheduler-conscious synchronization , 1997 .

[36]  Beng-Hong Lim,et al.  Reactive synchronization algorithms for multiprocessors , 1994, ASPLOS VI.

[37]  Evangelos P. Markatos,et al.  Multiprogramming on multiprocessors , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[38]  William J. Dally,et al.  The J-machine Multicomputer: An Architectural Evaluation , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[39]  Nian-Feng Tzeng,et al.  Distributing Hot-Spot Addressing in Large-Scale Multiprocessors , 1987, IEEE Transactions on Computers.

[40]  John Zahorjan,et al.  Processor scheduling in shared memory multiprocessors , 1990, SIGMETRICS '90.

[41]  Thomas E. Anderson,et al.  The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[42]  Harry F. Jordan,et al.  Comparing barrier algorithms , 1989, Parallel Comput..

[43]  Evangelos P. Markatos Multiprocessor Synchronization Primitives with Priorities , 1991 .

[44]  Michael L. Scott,et al.  Fast, Contention-Free Combining Tree Barriers , 1992 .

[45]  John K. Ousterhout Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.

[46]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[47]  Michael L. Scott,et al.  Synchronization without contention , 1991, ASPLOS IV.

[48]  Edward D. Lazowska,et al.  The Effect of Scheduling Discipline on Spin Overhead in Shared Memory Parallel Systems , 1991, IEEE Trans. Parallel Distributed Syst..

[49]  Laura M. Haas,et al.  Computation and communication in R*: a distributed database manager , 1984, TOCS.

[50]  Anoop Gupta,et al.  Process control and scheduling issues for multiprogrammed shared-memory multiprocessors , 1989, SOSP '89.

[51]  David B. Gustavson,et al.  Scalable Coherent Interface , 1990, COMPEURO'90: Proceedings of the 1990 IEEE International Conference on Computer Systems and Software Engineering@m_Systems Engineering Aspects of Complex Computerized Systems.