论文信息 - Work-First and Help-First Scheduling Policies for Terminally Strict Parallel Programs

Work-First and Help-First Scheduling Policies for Terminally Strict Parallel Programs

Multiple programming models are emerging to address an increased need for dynamic task parallelism in applications for multicore processors and shared-addressspace parallel computing. Examples include OpenMP 3.0, Java Concurrency Utilities, Microsoft Task Parallel Library, Intel Thread Building Blocks, Cilk, X10, Chapel, and Fortress. Scheduling algorithms based on work stealing, as embodied in Cilk’s implementation of dynamic spawn-sync parallelism, are gaining in popularity but also have inherent limitations. In this paper, we address the problem of efficient and scalable implementation of X10’s terminally strict async-finish task parallelism, which is more general than Cilk’s fully strict spawn-sync parallelism. We introduce a new workstealing scheduler with compiler support for async-finish task parallelism that can accommodate both work-first and help-first scheduling policies. Performance results on two different multicore SMP platforms show significant improvements due to our new work-stealing algorithm compared to the existing work-sharing scheduler for X10, and also provide insight on scenarios in which the help-first policy yields better results than the work-first policy and vice versa.

Vivek Sarkar | Rajkishore Barik | Yi Guo | Raghavan Raman

[1] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[2] David Holmes,et al. Java Concurrency in Practice , 2006 .

[3] Vivek Sarkar,et al. Phasers: a unified deadlock-free construct for collective and point-to-point synchronization , 2008, ICS '08.

[4] David Chase,et al. Dynamic circular work-stealing deque , 2005, SPAA '05.

[5] Robert H. Halstead,et al. Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.

[6] Vivek Sarkar,et al. Language Extensions in Support of Compiler Parallelization , 2007, LCPC.

[7] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[8] Vivek Sarkar,et al. Deadlock-free scheduling of X10 computations with bounded resources , 2007, SPAA '07.

[9] Laurie Hendren,et al. Soot: a Java bytecode optimization framework , 2010, CASCON.

[10] Sriram Krishnamoorthy,et al. Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing , 2008, 2008 37th International Conference on Parallel Processing.