Synthesizing concurrent schedulers for irregular algorithms

Scheduling is the assignment of tasks or activities to processors for execution, and it is an important concern in parallel programming. Most prior work on scheduling has focused either on static scheduling of applications in which the dependence graph is known at compile-time or on dynamic scheduling of independent loop iterations such as in OpenMP. In irregular algorithms, dependences between activities are complex functions of runtime values so these algorithms are not amenable to compile-time analysis nor do they consist of independent activities. Moreover, the amount of work can vary dramatically with the scheduling policy. To handle these complexities, implementations of irregular algorithms employ carefully handcrafted, algorithm-specific schedulers but these schedulers are themselves parallel programs, complicating the parallel programming problem further. In this paper, we present a flexible and efficient approach for specifying and synthesizing scheduling policies for irregular algorithms. We develop a simple compositional specification language and show how it can concisely encode scheduling policies in the literature. Then, we show how to synthesize efficient parallel schedulers from these specifications. We evaluate our approach for five irregular algorithms on three multicore architectures and show that (1) the performance of some algorithms can improve by orders of magnitude with the right scheduling policy, and (2) for the same policy, the overheads of our synthesized schedulers are comparable to those of fixed-function schedulers.

[1]  Gary L. Miller,et al.  A time efficient Delaunay refinement algorithm , 2004, SODA '04.

[2]  Lars Ole Andersen,et al.  Program Analysis and Specialization for the C Programming Language , 2005 .

[3]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[4]  Harry Berryman,et al.  Distributed Memory Compiler Design for Sparse Problems , 1995, IEEE Trans. Computers.

[5]  Sebastian Burckhardt,et al.  The design of a task parallel library , 2009, OOPSLA.

[6]  Martin Burtscher,et al.  Amorphous Data-parallelism in Irregular Algorithms ∗ , 2009 .

[7]  Robert H. Halstead,et al.  MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.

[8]  Flemming Nielson,et al.  Principles of Program Analysis , 1999, Springer Berlin Heidelberg.

[9]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[10]  Jonathan Richard Shewchuk,et al.  Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator , 1996, WACG.

[11]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[12]  L. Paul Chew,et al.  Guaranteed-quality mesh generation for curved surfaces , 1993, SCG '93.

[13]  Eran Yahav,et al.  CGCExplorer: a semi-automated search procedure for provably correct concurrent collectors , 2007, PLDI '07.

[14]  David A. Bader,et al.  Parallel Shortest Path Algorithms for Solving Large-Scale Instances , 2006, The Shortest Path Problem.

[15]  Adrian Bowyer,et al.  Computing Dirichlet Tessellations , 1981, Comput. J..

[16]  Alejandro Duran,et al.  Evaluation of OpenMP Task Scheduling Strategies , 2008, IWOMP.

[17]  David A. Bader,et al.  A Cache-Aware Parallel Implementation of the Push-Relabel Network Flow Algorithm and Experimental Evaluation of the Gap Relabeling Heuristic , 2006, PDCS.

[18]  D. F. Watson Computing the n-Dimensional Delaunay Tesselation with Application to Voronoi Polytopes , 1981, Comput. J..

[19]  Andrew V. Goldberg,et al.  On Implementing Push-Relabel Method for the Maximum Flow Problem , 1995, IPCO.

[20]  Keshav Pingali,et al.  Optimistic parallelism requires abstractions , 2009, CACM.

[21]  Keshav Pingali,et al.  Defining and Implementing Commutativity Conditions for Parallel Execution , 2009 .

[22]  Chris Hankin,et al.  Online cycle detection and difference propagation for pointer analysis , 2003, Proceedings Third IEEE International Workshop on Source Code Analysis and Manipulation.

[23]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[24]  Kenneth L. Clarkson,et al.  Applications of random sampling in computational geometry, II , 1988, SCG '88.

[25]  Armando Solar-Lezama,et al.  Sketching concurrent data structures , 2008, PLDI '08.

[26]  Maged M. Michael,et al.  Idempotent work stealing , 2009, PPoPP '09.

[27]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[28]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[29]  Keshav Pingali,et al.  Scheduling strategies for optimistic parallel execution of irregular programs , 2008, SPAA '08.

[30]  Günter Rote,et al.  Incremental constructions con BRIO , 2003, SCG '03.

[31]  Doug Lea,et al.  A Java fork/join framework , 2000, JAVA '00.

[32]  Jim Ruppert,et al.  A Delaunay Refinement Algorithm for Quality 2-Dimensional Mesh Generation , 1995, J. Algorithms.

[33]  Ben Hardekopf,et al.  The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code , 2007, PLDI '07.

[34]  Keshav Pingali,et al.  Parallel inclusion-based points-to analysis , 2010, OOPSLA.

[35]  Ulrich Meyer,et al.  Delta-Stepping: A Parallel Single Source Shortest Path Algorithm , 1998, ESA.