The use of a work stealing scheduler has become a popular approach for providing task parallelism. It is used in many modern parallel programming languages, such as Cilk and X10, which have emerged to address the concerns of parallel programming complexity on modern multicore architectures. There are various challenges in providing an efficient implementation of work-stealing, but in any implementation it must be possible for the thief to access the execution state required to perform the stolen task. The natural way to achieve this is to save the necessary state whenever a producer creates stealable work. While the ability to provide some degree of parallelism may dominate performance at scale, it is common for the vast majority of potentially stealable work to never actually be stolen, but instead processed by the producer itself. This indicates that to further improve performance we should minimize the overheads incurred in making work available for stealing. We are not the only ones to make this observation, for example X10’s current C++ work-stealing implementation stack-allocates state objects and lazily copies them to the heap to avoid unnecessary heap allocation during normal execution. In our context of a Java virtual machine, it is possible to extend this idea further and avoid heap allocating state objects, but instead allow thieves to extract state directly from within stack frames of the producer. This is achieved by using state-map information provided by a cooperative runtime compiler, allowing us to drive down the cost of making state available for stealable work items. We discuss our design and preliminary findings for the implementation of our framework inside X10 work-stealing runtime and the optimizing compiler of Jikes RVM, a high-performance Java research virtual machine.
[1]
Lei Wang,et al.
An adaptive task creation strategy for work-stealing scheduling
,
2010,
CGO '10.
[2]
Yi Guo,et al.
SLAW: A scalable locality-aware adaptive work-stealing scheduler
,
2010,
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[3]
Yi Guo,et al.
Work-first and help-first scheduling policies for async-finish task parallelism
,
2009,
2009 IEEE International Symposium on Parallel & Distributed Processing.
[4]
Bradford L. Chamberlain,et al.
Parallel Programmability and the Chapel Language
,
2007,
Int. J. High Perform. Comput. Appl..
[5]
C. H. Flood,et al.
The Fortress Language Specification
,
2007
.
[6]
Vivek Sarkar,et al.
X10: an object-oriented approach to non-uniform cluster computing
,
2005,
OOPSLA '05.
[7]
Doug Lea,et al.
A Java fork/join framework
,
2000,
JAVA '00.
[8]
Stephen J. Fink,et al.
The Jalapeño virtual machine
,
2000,
IBM Syst. J..
[9]
Matteo Frigo,et al.
The implementation of the Cilk-5 multithreaded language
,
1998,
PLDI.
[10]
Robert D. Blumofe,et al.
Scheduling multithreaded computations by work stealing
,
1994,
Proceedings 35th Annual Symposium on Foundations of Computer Science.
[11]
R. Blumofe.
Scheduling Multithreaded Computations by Work Stealing
,
1994,
FOCS.
[12]
Robert H. Halstead,et al.
Implementation of multilisp: Lisp on a multiprocessor
,
1984,
LFP '84.
[13]
F. Warren Burton,et al.
Executing functional programs on a virtual tree of processors
,
1981,
FPCA '81.