A comparison of task pools for dynamic load balancing of irregular algorithms

Since a static work distribution does not allow for satisfactory speed‐ups of parallel irregular algorithms, there is a need for a dynamic distribution of work and data that can be adapted to the runtime behavior of the algorithm. Task pools are data structures which can distribute tasks dynamically to different processors where each task specifies computations to be performed and provides the data for these computations. This paper discusses the characteristics of task‐based algorithms and describes the implementation of selected types of task pools for shared‐memory multiprocessors. Several task pools have been implemented in C with POSIX threads and in Java. The task pools differ in the data structures to store the tasks, the mechanism to achieve load balance, and the memory manager used to store the tasks. Runtime experiments have been performed on three different shared‐memory systems using a synthetic algorithm, the hierarchical radiosity method, and a volume rendering algorithm. Copyright © 2004 John Wiley & Sons, Ltd.

[1]  Timothy A. Davis,et al.  A Concurrent Dynamic Task Graph , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[2]  Edith Schonberg,et al.  Factoring: a practical and robust method for scheduling parallel loops , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[3]  Theodore Gyle Lewis Foundations of parallel programming - a machine-independent approach , 1994 .

[4]  Randy H. Katz,et al.  The effect of sharing on the cache and bus performance of parallel programs , 1989, ASPLOS 1989.

[5]  Thomas Rauber,et al.  Scalability and Granularity Issues of the Hierarchical Radiosity Method , 1996, Euro-Par, Vol. I.

[6]  Scott Oaks,et al.  Java Threads, Second Edition , 1999 .

[7]  Rudolf Berrendorf,et al.  PCL - The Performance Counter Library: A Common Interface to Access Hardware Performance Counters on Microprocessors , 1998 .

[8]  Vipin Kumar,et al.  Scalable Load Balancing Techniques for Parallel Computers , 1994, J. Parallel Distributed Comput..

[9]  Lawrence Rauchwerger,et al.  Run-Time Parallelization: Its Time Has Come , 1998, Parallel Comput..

[10]  Alexandre Plastino,et al.  Exploring load balancing in a scientific SPMD parallel application , 2002, Proceedings. International Conference on Parallel Processing Workshop.

[11]  John D. Valois Lock-free linked lists using compare-and-swap , 1995, PODC '95.

[12]  Maurice Herlihy,et al.  A methodology for implementing highly concurrent data objects , 1993, TOPL.

[13]  Edwin Hsing-Mean Sha,et al.  Probabilistic rotation: scheduling graphs with uncertain execution time , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).

[14]  Sivarama P. Dandamudi,et al.  Performance of Hierarchical Processor Scheduling in Shared-Memory Multiprocessor Systems , 1999, IEEE Trans. Computers.

[15]  Marc Levoy,et al.  Display of surfaces from volume data , 1988, IEEE Computer Graphics and Applications.

[16]  David R. Butenhof Programming with POSIX threads , 1993 .

[17]  W Jalby,et al.  Load Balancing Performance of Dynamic Scheduling on NUMA Multiprocessors , 1997 .

[18]  Emmanuel Jeannot,et al.  SLC: Symbolic scheduling for executing parameterized task graphs on multiprocessors , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[19]  Marc Levoy,et al.  Efficient ray tracing of volume data , 1990, TOGS.

[20]  Gudula Rünger,et al.  Task pool teams for implementing irregular algorithms on clusters of SMPs , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[21]  Marc Levoy,et al.  Volume rendering by adaptive refinement , 1990, The Visual Computer.

[22]  Stephen Taylor,et al.  A Practical Approach to Dynamic Load Balancing , 1998, IEEE Trans. Parallel Distributed Syst..

[23]  Joel H. Saltz,et al.  Runtime and language support for compiling adaptive irregular programs on distributed‐memory machines , 1995, Softw. Pract. Exp..

[24]  Michael L. Scott,et al.  High performance synchronization algorithms for multiprogrammed multiprocessors , 1995, PPOPP '95.

[25]  Eli Upfal,et al.  A simple load balancing scheme for task allocation in parallel machines , 1991, SPAA '91.

[26]  Pat Hanrahan,et al.  A rapid hierarchical radiosity algorithm , 1991, SIGGRAPH.

[27]  Vipin Kumar,et al.  A Unified Algorithm for Load-balancing Adaptive Scientific Simulations , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[28]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[29]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[30]  Marc Levoy,et al.  Volume rendering on scalable shared-memory MIMD architectures , 1992, VVS.

[31]  Josep Torrellas,et al.  False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.

[32]  Ioana Banicescu,et al.  Balancing Processor Loads and Exploiting Data Locality in N-Body Simulations , 1995, SC.

[33]  Edith Schonberg,et al.  Factoring: a method for scheduling parallel loops , 1992 .

[34]  Matthias Korch Einsatz von Taskpools in Pthreads und Java zur parallelen Implementierung irregulärer Algorithmen , 2001 .

[35]  D. Atkin OR scheduling algorithms. , 2000, Anesthesiology.

[36]  Edwin H.-M. Sha,et al.  Imprecise task schedule optimization , 1997, Proceedings of 6th International Fuzzy Systems Conference.

[37]  Hofreiter Collected works I , 1936 .

[38]  Thomas Rauber,et al.  A Shared-Memory Implementation of the Hierarchical Radiosity Method , 1998, Theor. Comput. Sci..

[39]  Theodore Johnson,et al.  Short communication A concurrent dynamic task graph , 1996 .

[40]  Min-You Wu Parallel Incremental Scheduling , 1995, Parallel Process. Lett..

[41]  Andrew Gilliam Tucker,et al.  Efficient Scheduling on Multiprogrammed Shared-Memory Multiprocessors , 1994 .

[42]  Dannie Durand,et al.  Impact of Memory Contention on Dynamic Scheduling on Numa Multiprocessors , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[43]  Thomas Rauber,et al.  Evaluation of task pools for the implementation of parallel irregular algorithms , 2002, Proceedings. International Conference on Parallel Processing Workshop.

[44]  Maged M. Michael,et al.  Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors , 1998, J. Parallel Distributed Comput..

[45]  Scott Oaks,et al.  Java Threads , 1997 .

[46]  Ioana Banicescu,et al.  Dynamic scheduling parallel loops with variable iterate execution times , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[47]  Ioana Banicescu Load Balancing and Data Locality in the Parallelization of the Fast Multipole Algorithm , 1996 .

[48]  Ishfaq Ahmad,et al.  Benchmarking and Comparison of the Task Graph Scheduling Algorithms , 1999, J. Parallel Distributed Comput..

[49]  Randy H. Katz,et al.  The effect of sharing on the cache and bus performance of parallel programs , 1989, ASPLOS III.