Backtracking-based load balancing

High-productivity languages for parallel computing become more important as parallel environments including multicores become more common. Cilk is such a language. It provides good load balancing for many applications including irregular ones; that is, it keeps all workers busy by creating plenty of "logical" threads and adopting the oldest-first work stealing strategy. This paper proposes a "logical thread"-free framework called Tascell, which achieves a higher performance and supports a wider range of parallel environments including clusters without loss of productivity. A Tascell worker spawns a "real" task only when requested by another idle worker. The worker performs the spawning by temporarily "backtracking" and restoring its oldest task-spawnable state. Our approach eliminates the cost of spawning/managing logical threads. It also promotes the reuse of workspaces and improves the locality of reference since it does not need to prepare a workspace for each concurrently runnable logical thread. Furthermore, Tascell enables elegant and highly-efficient backtrack search algorithms with delayed workspace copying. For instance, our 16-queens problem solver is 1.86 times faster than Cilk on a system with two dual-core processors. Our approach also enables a single program to run in both shared and distributed memory environments with reasonable efficiency and scalability.

[1]  Thomas M. Breuel Lexical Closures for C++ , 1988, C++ Conference.

[2]  Robert H. Halstead,et al.  New Ideas in Parallel Lisp: Language Design, Implementation, and Programming Tools , 1989, Workshop on Parallel Lisp.

[3]  Robert H. Halstead,et al.  Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.

[4]  Marc Feeley,et al.  A Message Passing Implementation of Lazy Task Creation , 1992, Parallel Symbolic Computing.

[5]  Brad Calder,et al.  Leapfrogging: a portable technique for implementing efficient futures , 1993, PPOPP '93.

[6]  Marc Feeley,et al.  Lazy Remote Procedure Call and its Implementation in a Parallel Variant of C , 1995, PSLS.

[7]  Seth Copen Goldstein,et al.  Lazy Threads: Implementing a Fast Parallel Call , 1996, J. Parallel Distributed Comput..

[8]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[9]  Volker Strumpen,et al.  Indolent Closure Creation , 1998 .

[10]  Harold Abelson,et al.  Revised5 report on the algorithmic language scheme , 1998, SIGP.

[11]  Keith H. Randall,et al.  Cilk: efficient multithreaded computing , 1998 .

[12]  Jonathan Rees,et al.  Revised3 report on the algorithmic language scheme , 1986, SIGP.

[13]  R. Kent Dybvig,et al.  Revised5 Report on the Algorithmic Language Scheme , 1986, SIGP.

[14]  Akinori Yonezawa,et al.  StackThreads/MP: integrating futures into calling standards , 1999, PPoPP '99.

[15]  Richard M. Stallman,et al.  Using and Porting the GNU Compiler Collection , 2000 .

[16]  Weng-Fai Wong,et al.  SilkRoad: a multithreaded runtime system with software distributed shared memory for SMP clusters , 2000, Proceedings IEEE International Conference on Cluster Computing. CLUSTER 2000.

[17]  Henri E. Bal,et al.  Efficient load balancing for wide-area divide-and-conquer applications , 2001, PPoPP '01.

[18]  Taiichi Yuasa,et al.  Pursuing Laziness for Efficient Implementation of Modern Multithreaded Languages , 2003, ISHPC.

[19]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[20]  Eric S. Roberts,et al.  WorkCrews: An abstraction for controlling parallelism , 2005, International Journal of Parallel Programming.

[21]  Taiichi Yuasa,et al.  Lightweight Lexical Closures for Legitimate Execution Stack Access , 2006, CC.

[22]  Taiichi Yuasa,et al.  A Transformation-Based Implementation of Lightweight Nested Functions , 2006 .

[23]  Bradley C. Kuszmaul Cilk provides the "best overall productivity" for high performance computing: (and won the HPC challenge award to prove it) , 2007, SPAA '07.