The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors

An examination is made of the performance implications of several data structure and algorithm alternatives for thread management in shared-memory multiprocessors. Both experimental measurements and analytical model projections are presented. For applications with fine-grained parallelism, small differences in thread management are shown to have significant performance impact, often posing a tradeoff between throughput and latency. Per-processor data structures can be used to to improve throughput, and in some circumstances to avoid locking, improving latency as well. The method used by processors to queue for locks is also shown to affect performance significantly. Normal methods of critical resource waiting can substantially degrade performance with moderate numbers of waiting processors. The authors present an Ethernet-style backoff algorithm that largely eliminates this effect. >

[1]  Andrew P. Black,et al.  Fine-grained mobility in the Emerald system , 1987, TOCS.

[2]  Richard C. Holt A short introduction to Concurrent Euclid , 1982, SIGP.

[3]  Michael L. Scott,et al.  Design Rationale for Psyche a General-Purpose Multiprocessor Operating System , 1988, ICPP.

[4]  Edward D. Lazowska,et al.  Quantitative system performance - computer system analysis using queueing network models , 1983, Int. CMG Conference.

[5]  Lionel M. Ni,et al.  Design Tradeoffs for Process Scheduling in Shared Memory Multiprocessor Systems , 1989, IEEE Trans. Software Eng..

[6]  David A. Fisher,et al.  Parallel Processing in Ada , 1986, Computer.

[7]  Thomas E. Anderson,et al.  The Performance Implications of Spin-Waiting Alternatives for Shared-Memory Multiprocessors , 1989, ICPP.

[8]  William J. Bolosky,et al.  Mach: A New Kernel Foundation for UNIX Development , 1986, USENIX Summer.

[9]  Brian N. Bershad,et al.  An Open Environment for Building Parallel Programming Systems , 1988, PPOPP/PPEALS.

[10]  Brian N. Bershad,et al.  PRESTO: A system for object‐oriented parallel programming , 1988, Softw. Pract. Exp..

[11]  Butler W. Lampson,et al.  Experience with processes and monitors in Mesa , 1980, CACM.

[12]  Lionel M. Ni,et al.  Design Trade-offs for Process Scheduling in Tightly Coupled Multiprocessor Systems , 1985, International Conference on Parallel Processing.

[13]  Thomas E. Anderson,et al.  The performance implications of thread management alternatives for shared-memory multiprocessors , 1989, SIGMETRICS '89.

[14]  Edward D. Lazowska,et al.  Adaptive load sharing in homogeneous distributed systems , 1986, IEEE Transactions on Software Engineering.

[15]  A. Agarwal,et al.  Adaptive backoff synchronization techniques , 1989, ISCA '89.

[16]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[17]  Lawrence C. Stewart,et al.  Firefly: a multiprocessor workstation , 1987, ASPLOS 1987.

[18]  James M. Boyle,et al.  Beyond "'Speedup": Performance Analysis of Parallel Programs , 1987 .

[19]  M. J. Bach,et al.  The UNIX system: Multiprocessor UNIX operating systems , 1984, AT&T Bell Laboratories Technical Journal.

[20]  Edward D. Lazowska,et al.  Quantitative System Performance , 1985, Int. CMG Conference.

[21]  C. A. R. Hoare,et al.  Communicating Sequential Processes (Reprint) , 1983, Commun. ACM.

[22]  Robert Metcalfe,et al.  Ethernet: distributed packet switching for local computer networks , 1976, CACM.

[23]  Timothy A. Gonsalves,et al.  Modelling and analysis of distributed software systems , 1979, SOSP '79.

[24]  Shreekant S. Thakkar,et al.  The Symmetry Multiprocessor System , 1988, ICPP.