论文信息 - Waiting algorithms for synchronization in large-scale multiprocessors - 字舞流文

Waiting algorithms for synchronization in large-scale multiprocessors

Through analysis and experiments, this paper investigates two-phase waiting algorithms to minimize the cost of waiting for synchronization in large-scale multiprocessors. In a two-phase algorithm, a thread first waits by polling a synchronization variable. If the cost of polling reaches a limit <italic>L<subscrpt>poll</subscrpt></italic> and further waiting is necessary, the thread is blocked, incurring an additional fixed cost, <italic>B</italic>. The choice of <italic>L<subscrpt>poll</subscrpt></italic> is a critical determinant of the performance of two-phase algorithms. We focus on methods for statically determining <italic>L<subscrpt>poll</subscrpt></italic> because the run-time overhead of dynamically determining <italic>L<subscrpt>poll</subscrpt></italic> can be comparable to the cost of blocking in large-scale multiprocessor systems with lightweight threads. Our experiments show that <italic>always-block</italic> (<italic>L<subscrpt>poll</subscrpt></italic> = 0) is a good waiting algorithm with performance that is usually close to the best of the algorithms compared. We show that even better performance can be achieved with a static choice of <italic>L<subscrpt>poll</subscrpt></italic> based on knowledge of likely wait-time distributions. Motivated by the observation that different synchronization types exhibit different wait-time distributions, we prove that a static choice of <italic>L<subscrpt>poll</subscrpt></italic> can yield close to optimal on-line performance against an adversary that is restricted to choosing wait times from a fixed family of probability distributions. This result allows us to make an optimal static choice of <italic>L<subscrpt>poll</subscrpt></italic> based on synchronization type. For exponentially distributed wait times, we prove that setting <italic>L<subscrpt>poll</subscrpt></italic> = 1n(e-1)<italic>B</italic> results in a waiting cost that is no more than <italic>e/(e-1)</italic> times the cost of an optimal off-line algorithm. For uniformly distributed wait times, we prove that setting <italic>L</italic><subscrpt>poll</subscrpt>=1/2(square root of 5 -1)<italic>B</italic> results in a waiting cost that is no more than (square root of 5 + 1)/2 (the golden ratio) times the cost of an optimal off-line algorithm. Experimental measurements of several parallel applications on the Alewife multiprocessor simulator corroborate our theoretical findings.

Anant Agarwal | Beng-Hong Lim | A. Agarwal | Beng-Hong Lim

[1] John K. Ousterhout. Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.

[2] David Chaiken,et al. Latency Tolerance through Multithreading in Large-Scale Multiprocessors , 1991 .

[3] Donald Yeung,et al. THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR , 1991 .

[4] Anant Agarwal,et al. LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.

[5] Thomas E. Anderson,et al. The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[6] Arvind,et al. M-Structures: Extending a Parallel, Non-strict, Functional Language with State , 1991, FPCA.

[7] B. Bershad. Practical considerations for lock-free concurrent objects , 1991 .

[8] John L. Hennessy,et al. Characterizing the synchronization behavior of parallel programs , 1988, PPEALS '88.

[9] Robert H. Halstead,et al. MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.

[10] Nian-Feng Tzeng,et al. Distributing Hot-Spot Addressing in Large-Scale Multiprocessors , 1987, IEEE Transactions on Computers.

[11] Michael L. Scott,et al. Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[12] Anant Agarwal,et al. APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.

[13] A. Agarwal,et al. Adaptive backoff synchronization techniques , 1989, ISCA '89.

[14] John K. Ousterhout,et al. Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[15] Robert H. Halstead,et al. MASA: a multithreaded processor architecture for parallel symbolic computing , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[16] Burton J. Smith. Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.

[17] Keshav Pingali,et al. I-structures: Data structures for parallel computing , 1986, Graph Reduction.

[18] Terry Williams,et al. Probability and Statistics with Reliability, Queueing and Computer Science Applications , 1983 .

[19] Anna R. Karlin,et al. Competitive randomized algorithms for non-uniform problems , 1990, SODA '90.

[20] Virgil D. Gligor,et al. A Comparative Analysis of Multiprocessor Scheduling Algorithms , 1987, ICDCS.

[21] Shreekant S. Thakkar,et al. Synchronization algorithms for shared-memory multiprocessors , 1990, Computer.

[22] Thomas E. Anderson,et al. The Performance Implications of Spin-Waiting Alternatives for Shared-Memory Multiprocessors , 1989, ICPP.

[23] Robert H. Halstead,et al. Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.

[24] Anna R. Karlin,et al. Empirical studies of competitve spinning for a shared-memory multiprocessor , 1991, SOSP '91.

[25] Jeannette M. Wing,et al. A Library of Concurrent Objects and Their Proofs of Correctness , 1990 .

[26] Kishor S. Trivedi. Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[27] Larry Rudolph,et al. Dynamic decentralized cache schemes for mimd parallel processors , 1984, ISCA '84.

[28] AgarwalAnant,et al. Waiting algorithms for synchronization in large-scale multiprocessors , 1993 .

[29] Maurice Herlihy,et al. Counting networks and multi-processor coordination , 1991, STOC '91.

[30] Maurice Herlihy,et al. A methodology for implementing highly concurrent data structures , 1990, PPOPP '90.