论文信息 - The performance of work stealing in multiprogrammed environments (extended abstract)

The performance of work stealing in multiprogrammed environments (extended abstract)

We study the performance of user-level thread schedulers in multiprogrammed environments. Our goal is a user-level thread scheduler that delivers efficient performance under multiprogramming without any need for kernel-level resource management, such as coscheduling or process control. We show that a non-blocking implementation of the work-stealing algorithm achieves this goal. With this implementation, the execution time of a computation running with arbitrarily many processes on arbitrarily many processors can be modeled as a simple function of work and critical-path length. This model holds even when the processes run on a set of processors that arbitrarily grows and shrinks over time. We observe linear speedup whenever the number of processes is small relative to the average parallelism.

Robert D. Blumofe | Dionisios Papadopoulos | R. Blumofe | D. Papadopoulos

[1] Thomas E. Anderson,et al. The performance implications of thread management alternatives for shared-memory multiprocessors , 1989, SIGMETRICS '89.

[2] Raj Vaswani,et al. The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors , 1991, SOSP '91.

[3] John D. Valois. Lock-free linked lists using compare-and-swap , 1995, PODC '95.

[4] Anoop Gupta,et al. The impact of operating system scheduling policies and synchronization methods of performance of parallel applications , 1991, SIGMETRICS '91.

[5] Calton Pu,et al. A Lock-Free Multiprocessor OS Kernel , 1992, OPSR.

[6] Andrea C. Arpaci-Dusseau,et al. The interaction of parallel and sequential workloads on a network of workstations , 1995, SIGMETRICS '95/PERFORMANCE '95.

[7] Robert H. Halstead,et al. Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.

[8] Mark Moir,et al. Universal constructions for multi-object operations , 1995, PODC '95.

[9] Brian N. Bershad,et al. The interaction of architecture and operating system design , 1991, ASPLOS IV.

[10] Shreekant S. Thakkar,et al. Synchronization algorithms for shared-memory multiprocessors , 1990, Computer.

[11] James R. Goodman,et al. Efficient Synchronization: Let Them Eat QOLB , 1997, International Symposium on Computer Architecture.

[12] John K. Ousterhout. Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.

[13] Robert H. Halstead,et al. Implementation of multilisp: Lisp on a multiprocessor , 1984, LFP '84.

[14] Seth Copen Goldstein,et al. Lazy Threads: Implementing a Fast Parallel Call , 1996, J. Parallel Distributed Comput..

[15] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[16] Ken Arnold,et al. The Java Programming Language , 1996 .

[17] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[18] David R. Cheriton,et al. The synergy between non-blocking synchronization and operating system structure , 1996, OSDI '96.

[19] Mark Moir. Practical implementations of non-blocking synchronization primitives , 1997, PODC '97.

[20] Maurice Herlihy,et al. A methodology for implementing highly concurrent data structures , 1990, PPOPP '90.

[21] James H. Anderson,et al. Implementing wait-free objects on priority-based systems , 1997, PODC '97.

[22] Thomas E. Anderson,et al. The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[23] Raj Vaswani,et al. A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors , 1993, TOCS.

[24] C. Greg Plaxton,et al. Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.

[25] Maged M. Michael,et al. Relative performance of preemption-safe locking and non-blocking synchronization on multiprogrammed shared memory multiprocessors , 1997, Proceedings 11th International Parallel Processing Symposium.

[26] Piet Hut,et al. A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[27] Marc Levoy,et al. Parallel visualization algorithms: performance and architectural implications , 1994, Computer.

[28] Evangelos P. Markatos,et al. Multiprogramming on multiprocessors , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[29] Maurice Herlihy,et al. Wait-free synchronization , 1991, TOPL.

[30] Brian N. Bershad,et al. Practical considerations for non-blocking concurrent objects , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[31] Brian N. Bershad,et al. Scheduler activations: effective kernel support for the user-level management of parallelism , 1991, TOCS.

[32] Gregory R. Andrews,et al. Distributed filaments: efficient fine-grain parallelism on a cluster of workstations , 1994, OSDI '94.

[33] Patrick Sobalvarro,et al. Demand-Based Coscheduling of Parallel Jobs on Multiprogrammed Multiprocessors , 1995, JSSPP.

[34] F. Warren Burton,et al. Executing functional programs on a virtual tree of processors , 1981, FPCA '81.

[35] Kenneth C. Sevcik,et al. Application Scheduling and Processor Allocation in Multiprogrammed Parallel Processing Systems , 1994, Perform. Evaluation.

[36] Mary K. Vernon,et al. The performance of multiprogrammed multiprocessor scheduling algorithms , 1990, SIGMETRICS '90.

[37] Brian N. Bershad,et al. The interaction of architecture and operating system design , 1991, ASPLOS IV.

[38] John K. Ousterhout,et al. Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[39] Evangelos P. Markatos,et al. First-class user-level threads , 1991, SOSP '91.

[40] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[41] Greg Barnes,et al. A method for implementing lock-free shared-data structures , 1993, SPAA '93.

[42] Edward W. Felten,et al. Performance issues in non-blocking synchronization on shared-memory multiprocessors , 1992, PODC '92.

[43] Anoop Gupta,et al. Process control and scheduling issues for multiprogrammed shared-memory multiprocessors , 1989, SOSP '89.

[44] Andrea C. Arpaci-Dusseau,et al. Effective distributed scheduling of parallel workloads , 1996, SIGMETRICS '96.

[45] Devang Shah,et al. Programming with threads , 1996 .

[46] Udi Manber,et al. DIB—a distributed implementation of backtracking , 1987, TOPL.

[47] Maged M. Michael,et al. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.